Browsing by Subject "Computational Biology"

Now showing 1 - 12 of 12

Biochemical Analysis of the Drosophila RNAi Pathway
(2009-01-14) Jiang, Feng; Liu, Qinghua
RNA interference is post-transcriptional gene silencing mediated by (21-26 nt) miRNAs and siRNAs. In Drosophila, the RNase III enzymes Dicer-1 and Dicer-2 generate miRNAs and siRNAs, respectively. Nascent miRNA and siRNA duplexes are assembled into distinct RNA induced silencing complexes termed miRISC and siRISC, of which AGO1 and AGO2 are the respective catalytic subunits. My dissertation project is focused on identifying new RNAi components and understanding mechanisms of RISC assembly by biochemical reconstitution. Our group previously identified a novel dsRNA-binding protein named R2D2 which functioned in complex with Dicer-2 to process dsRNA into siRNA. Only the Dicer-2/R2D2 complex, but neither Dicer-2 nor R2D2 alone, efficiently interact with duplex siRNA. Furthermore, the tandem dsRNA binding domains of R2D2 are required for siRNA binding. Therefore, although R2D2 is dispensable for siRNA production, it is required for incorporating siRNA onto the siRISC complex. Generation of recombinant AGO2 protein is essential for in vitro reconstitution of the RNAi pathway. We believe that the unique poly glutamine repeat region of fly AGO2 may be problematic for expression. Thus, a series of truncated AGO2 baculoviruses that remove some or all polyQ repeats of AGO2 were generated. Co-expression with AGO1 increases the expression level of AGO2 by at least 10 fold. Affinity purified full length and one truncated form of AGO2 show minimal RISC activity, i.e. could be programmed with single stranded siRNA and perform sequence specific cleavage of mRNA. Most interestingly, adding purified recombinant Dicer-2/R2D2 complex to recombinant Ago2 generated dsRNA and siRNA initiated RISC activity. Catalytic mutant of Ago2 is unable to reconstitute RISC activity with recombinant Dicer-2/R2D2 complex, showing that the RISC activity is specific. Therefore, the three component system, Dicer-2, R2D2, and Ago2, can reconstitute the RNAi pathway of Drosophila. By a bioinformatics approach, a novel protein named Loquacious (Loqs) was identified with considerable sequence homology to R2D2. Loqs and Dicer-1 interact with each other by co-immunoprecipitation in S2 cell extract. Recombinant Loqs could enhance miRNA production by Dicer-1 by increasing its affinity for the pre-miRNA substrate. Furthermore, depleting Loqs or Dicer-1 by dsRNA knockdown resulted in reduction of the miRNA-generating activity and accumulation of pre-miRNA in S2 cells. To study the physiological function of loqs in flies, we obtained a piggyback (PB) fly strain in which the PB transposon was inserted into the first exon and before the translation start site of loqs gene. Pre-miRNAs accumulate in the loqs PB flies, indicating they are defective for miRNA biogenesis. However, while both siRISC and miRISC activities are greatly reduced in dcr-1 null extract, these activities are not affected in loqs null extract, indicating that loqs is not essential for miRISC assembly. To test whether the known components are sufficient to reconstitute the miRNA pathway, recombinant AGO1 protein was expressed using the insect cell expression system. It is generally believed that siRISC slices, whereas miRISC represses translation of cognate mRNA in animals. However, recombinant AGO1 can be programmed by single stranded miRNA into a minimal miRISC and sequence specifically cleaves complementary mRNA in vitro. Furthermore, the catalytic activity of AGO1 is dependent on the consensus catalytic "DDH" motif. My present studies suggest that recombinant Dicer-1, Loqs and AGO1 are not sufficient to reconstitute the miRNA pathway, indicating that there are other unknown components to be discovered.
Building a Methodological Framework for Cell Fate Engineering
(August 2021) Li, Boxun; Xu, Jian; Hon, Gary C.; Banaszynski, Laura; Cleaver, Ondine; Munshi, Nikhil
Cell fate engineering has become an area of intense research in the last fifteen years. A useful framework of cell fate engineering should include three pillars: the discovery of new cell fate-reprogramming cocktails of factors, the evaluation of engineered cells, and the revelation of underlying molecular mechanisms. One major challenge has been the lack of a scalable screening approach in vitro for the performance of reprogramming cocktails. This limits the speed of discovering new cocktails that can efficiently reprogram diverse cell types. Such new cocktails are needed to unleash the full applicational potential of engineered cells in regenerative medicine, disease modeling, and drug discovery. Another challenge is that despite the advantages of in vivo reprogramming, such as more efficient and mature fate conversion, the underlying gene programs, and thereby the molecular mechanisms, have been largely unknown. This is in large part due to the difficulty of specifically isolating and analyzing reprogrammed cells, without contamination from their endogenous counterparts. To address these, in this thesis, I first develop Reprogram-seq, a method that screens thousands of transcription factor cocktails for their reprogramming performance by single-cell perturbation screens. Reprogram-seq found a cocktail of three factors that efficiently and functionally reprograms fibroblasts to epicardial-like cells. Thus, Reprogram-seq accelerates rational cell fate engineering. Next, I performed single-cell transcriptomic analysis of in vivo neurogenesis induced in astrocytes by a novel reprogramming factor, DLX2. This is enabled by a lineage tracer that highly specifically tracks all cells reprogrammed from astrocytes. My analysis reveals that DLX2 induces a neural stem cell-like behavior, transitioning from quiescence to activation, proliferation, and neurogenesis. Gene regulatory network analysis and mouse genetics identify and confirm key nodes mediating DLX2-dependent fate reprogramming. Therefore, this study dissects the gene programs of in vivo reprogramming with single-cell transcriptomics and paves the way for applying Reprogram-seq in vivo. Together, my thesis research has demonstrated that single-cell omic technologies accelerate the discovery of new reprogramming cocktails, streamline the transcriptional evaluation of engineered cells, and dissect gene programs that underlie reprogramming, contributing to all three pillars of the framework. I expect these methodologies to be generalizable to and useful for other cell fate engineering scenarios.
Cardiovascular Risk Factors Predict the Spatial Distribution of White Matter Hyperintensity
(2015-03-24) Banerjee, Soham; McColl, Roderick W.; Whittemore, Anthony W.; Hulsey, Keith M.
OBJECTIVES: To identify the different spatial distribution of white matter hyperintensity (WMH) associated with specific risk factors and use this distribution to estimate the extent of risk factor associated WMH in an individual. MATERIALS AND METHODS: MRI brain images were obtained from 2066 healthy adult participants (858 males, 1208 females; mean age: 50) from a population based sample. An automated algorithm generated each participant’s WMH distribution, registered onto the MNI-152 standard template. For univariate analysis, each risk factor group was compared to the non-risk factor group. Voxels in which WMH frequency was significantly higher (p<0.05) in the risk factor group were mapped. Multivariate analysis consisted of subgroup analysis to minimize confounding of a risk factor on the others. RESULTS: 431891 MNI-space voxels comprised WMH distribution of the entire population. For univariate analysis, 23697 voxels (5.5%) of these voxels were exclusively associated with hypertension and were prevalent in the anterior frontal lobe. Similarly, 24637 voxels (5.7%) were exclusively associated with diabetes and were prevalent at the callososeptal interface. 7315 voxels (1.7%) were only associated with hypercholesterolemia and did not form a discrete spatial distribution. 282115 voxels (65.3%) were not associated with any of the specified risk factors. Multivariate results corroborated the univariate findings. CONCLUSIONS: Each risk factor was associated with a different spatial distribution of WMH. Hypertension was associated with WMH in the anterior frontal lobe and diabetes was associated with WMH in the callososeptal interface.
Classification and Differentiation of Homologs and Structural Analogs
(2007-08-08) Cheng, Hua; Grishin, Nick V.
It is both meaningful and useful to study protein sequence, structure, and function in the context of evolution. In divergent evolution, homologs, or proteins having descended from a common ancestor, usually share sequence, structure, and functional properties, and an unknown protein's structure and function can be hypothesized from its experimentally characterized homologs. In convergent evolution, proteins from distinct evolutionary lineages converge to similar structures or functions, and these proteins are called "analogs". To classify proteins into evolutionary families, it is necessary to differentiate these two opposite scenarios. Statistically significant sequence similarity is commonly accepted as adequate evidence for homology. Yet in the absence of significant sequence similarity, discrimination between homology and analogy frequently requires manual work. This dissertation describes an effort in developing an automatic tool to differentiate remote homologs and structural analogs.
Computer Vision to Characterize Protein Interactions at the Cell Membrane
(2018-12-26) Vega, Anthony Raphael; Yu, Hongtao; Jaqaman, Khuloud; Schmid, Sandra; Grishin, Nick V.
Protein interactions at the cell membrane provide critical insight into how cells respond to and interact with their environment. Technological advances in light microscopy have allowed an unprecedented perspective into these interactions, however manual analysis of data has become increasingly insufficient to characterize interactions as advances progress. Computer vision tools offer a powerful approach to automate analysis, overcoming limitations of manual analysis to optimize the discovery of novel interactions and their underlying mechanisms. In this thesis, I develop novel computer vision tools to probe the intensity and mobility properties of proteins on the cell membrane, and demonstrate how these can be used to provide insight into membrane protein interactions and organization.
Doctors need evolution the way engineers need physics, but they don't get it because of politics
(2018-09-11) Nesse, Randolph M.
[Note: The slides are not available from this event.] The past 25 years have seen many new applications of evolutionary biology in medicine. Some investigate why natural selection has left systems vulnerable, expanding medicine's perspective from that of a mechanic to that of an engineer. Others use phylogenetic methods to trace relationships among organisms, especially pathogens. These advances have inspired a score of books, new journals, a new scientific society, and undergraduate courses in most universities. However, no medical school teaches evolutionary biology the way other basic sciences are taught, so most doctors have misconceptions that are the equivalent of engineers believing in perpetual motion. Historical, practical, political and religious factors conspire to keep evolutionary biology separate from medicine. Clinical mistakes and slowed research progress result. Recognition of the problem and the opportunity are growing but solutions are likely to be piecemeal until a new generation of doctors assumes leadership positions.
Effects of Size on Ovoid Anterior Septal Perforations: Physiologic Modeling With Computational Fluid Dynamics
(2016-04-04) Farzal, Zainab; Shah, Gopi; Mitchell, Ron; Ryan, Matthew
BACKGROUND: Nasal septal perforations (NSPs) often cause bleeding, crusting, obstruction, and/or whistling. Exposed cartilage along the perimeter of the perforation prolongs healing time. The perforation perimeter lies in the path of nasal airflow which could exacerbate these effects. Understanding the interaction of airflow with perforation edges can lead to better treatments for perforation symptoms. OBJECTIVE: To analyze the impact of NSP size on nasal physiology including its effects on airflow, heat and water vapor transport, wall shear, resistance, and humidification using computational fluid dynamics (CFD). METHODS: A 3-dimensional model of the nasal cavity was constructed from a radiologically normal CT scan using MimicsTM 17.0 imaging software (Materialise, Plymouth, MI). Ovoid anterior NSPs that were 0.5, 1, 2, and 3 cm long anterior-to-posteriorly were virtually created in the septum of the model. Perforation walls were divided into ventral, dorsal, anterior, and posterior regions in ICEM-CFDTM 15.0 (ANSYS, Canonsburg, PA). Planar surfaces at the nostrils and trachea were constructed for specifying inlet and outlet conditions on simulated airflow. Computational meshes of the airspaces, consisting of approximately 4 million unstructured, graded tetrahedral elements, were created. Steady-state inspiratory airflow, heat, and water vapor transport were simulated using FluentTM CFD softwareTM15.0 (ANSYS, Inc., Canonsburg, PA). Air crossover through the perforation, wall shear, heat flux, water vapor flux, resistance, and humidification were analyzed. RESULTS: Air crossover increased with perforation size with the highest crossover rate of 12.2% through the 3 cm NSP. Regionally, wall shear and heat and water vapor flux were highest along the posterior region and lowest anteriorly (p<0.05). Wall shear stress averaged over the entire perforation increased with NSP size. The highest heat and water vapor flux averaged over the entire perforation occurred in the 2 cm NSP. Dorsal and ventral values for wall shear stress and heat and water vapor flux did not correlate with size. Resistance decreased by 5% or more from normal only in the 3 cm perforation case. No change in humidification with perforation size was evident. CONCLUSION: High wall shear and heat and water vapor flux in posterior perforation regions may explain the crusting most commonly noted on posterior edges of NSPs. This study suggests that smaller NSPs may not grossly affect nasal resistance or humidification, and that perforation size effects on individual airflow patterns may be important in dorsal and ventral perforation regions. Further studies will correlate these findings with clinical implications.
Establishing a Trustworthy First Approximation for Evolutionary Distances
(2016-04-18) Bromberg, Raquel; Chook, Yuh Min; Otwinowski, Zbyszek; Grishin, Nick V.; Hooper, Lora V.; Jaqaman, Khuloud
Advances in sequencing have generated a large number of complete genomes. Traditionally, phylogenetic analysis relies on alignments of orthologs, but defining orthologs and separating them from paralogs is a complex task that may not always be suited to the large datasets of the future. An alternative to traditional, alignment-based approaches are whole-genome, alignment-free methods. These methods are scalable and require minimal manual intervention. I developed SlopeTree, a new alignment-free method that estimates evolutionary distances by measuring the decay of exact sub-sequence matches as a function of match length. SlopeTree corrects for horizontal gene transfer, for composition variation and low complexity sequences, and for branch-length nonlinearity caused by multiple mutations at the same site. SlopeTree also includes several optional features for removing mobile elements from proteomes, for reducing proteomes to their conserved core, for automatically identifying poor quality proteomes in large inputs, and for explicitly identifying pairs of organisms that have horizontally transferred genes and then identifying those genes. I tested SlopeTree on large and diverse sets of bacteria and archaea, and I also applied it at the strain level. I compared the SlopeTree trees to the NCBI taxonomy, to trees based on concatenated alignments, and to trees produced by other alignment-free methods. The results were consistent with current knowledge about prokaryotic evolution. I assessed differences in tree topology over different methods and settings and found that the majority of bacteria and archaea have a core set of proteins that evolves by descent. In trees built from complete genomes rather than from sets of core genes, I observed some grouping by phenotype rather than phylogeny. In general, SlopeTree generates sensible topologies which are relatively stable between whole proteome and reduced proteome inputs, which validates the concept of species and phyla as having a core proteome evolving by descent, but not necessarily coevolving with the ribosome and its proteins.
Evolutionary Classification of Protein Domains: From Remote Homology to Family
(2017-11-20) Liao, Yuxing; Rizo-Rey, José; Grishin, Nick V.; Rice, Luke M.; Tomchick, Diana R.
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. A protein domain classification splits protein into domains and organizes them according to their evolutionary history. Existing classification databases fall back the speed of protein structure determination and do not include some known homologous relationships. I have participated in creating a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures and developed a website for easy access and searches with keyword, sequence or structure (http://prodata.swmed.edu/ecod). ECOD (Evolutionary Classification Of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or fold). Our database uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary relationships among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. The classification is assisted by an automated pipeline that classifies the most of new structures in Protein Data Bank weekly. This synchronization uniquely distinguishes ECOD among all protein classifications. For proteins that lack confident results from the automatic pipeline, I rely on information from literature, sequence and structure similarity scores, visual comparison and experience to classify them manually. I document the manual curation process in detail with an example of the remote homology between an autoproteolytic domain found in GPCR-Autoproteolysis Inducing domain, ZU5 and nucleoporin98. ECOD also recognizes closer relationships at the family level, initially with Pfam families. However, existing family databases do not cover all structures and disagree with ECOD in terms of domain definition and boundary. I generate multiple sequence alignment and profile for domains in the same family with structural information and demonstrate that the alignment quality is similar to manually checked Pfam seed alignments. I compare ECOD family profiles with Pfam and Conserved Domain Database and discuss about the improvement of domain boundary over known families and the dominance of small families in new families.
Toward Structural and Functional Predictions from Biological Sequences
(2018-05-25) Li, Wenlin; Otwinowski, Zbyszek; Grishin, Nick V.; Thomas, Philip J.; Rosenbaum, Daniel M.
Biological sequences, including DNA and protein sequences, are believed to encode sufficient information to determine the structure and function of biological molecules, which in turn decide the phenotypic traits of animals. Deciphering the biological sequences is an important and multiscale problem that connecting the information flow from genotypes to phenotypes. Current advances in next-generation sequence technology provided tons of sequencing data, demanding innovations in computational algorithm for better interpretation. I developed computational methodologies to understand the biological sequences in various levels. In the primary sequence level, I analyzed the evolutionary information encoded in protein families and predicted the function (and active sites) of the proteins. To aid my sequence analysis, I developed a set of computational methodologies and deployed them as public web-servers. In the protein structure level, I studied the plasticity of the 3D structures, as well as demonstrated its effect on the uncertainty of computational scoring algorithms. In the organism level, I innovated the computational methodology to assemble and analyze complete genomes of butterflies and discovered convergence evolution in butterfly wing patterns. In conclusion, I advanced the knowledge of biological sequences in multi-layers by computational approaches.
Using Molecular and Clinical Data to Stratify Cancer Patients for Precision Medicine
(2019-04-15) Ci, Bo; Skapek, Stephen X.; Minna, John D.; Zhan, Xiaowei; Xie, Yang; Xiao, Guanghua
Cancers are heterogeneous across different individuals. Insights derived from clinical and/or molecular data could be used to develop robust patient stratification models to tailor treatments for each individual patient, in order to improve patient outcome and reduce deleterious side effects. My thesis research mainly focused on using computational methods to understand the biological/clinical differences between disease subgroups and their clinical implications in two diseases, lung cancer and germ cell tumor. A comprehensive analysis using The Cancer Genomic Atlas (TCGA) lung adenocarcinoma datasets showed that FOXM1 was likely to play an important role in the morphology differences among different morphological subgroups in invasive lung adenocarcinoma. In collaboration with the Malignant Germ Cell International Consortium (MaGIC), I developed the MaGIC data dictionary as a uniform data standard to build a germ cell tumor data commons. The MaGIC data commons was then used to harmonize and integrate the patient and genomic data from both MaGIC and the public domain. Concurrently, I also developed a prognostic model for pediatric extracranial germ cell tumor using the integrated dataset, identifying older age, higher disease stage and extragonadal site as adverse prognostic factors. The model was evaluated in an independent dataset of data combined from Brazilian and French clinical trials.
Using Multiple Screening Strategies to Biologically and Chemically Characterize Natural Products
(2016-06-10) Oswald, Nathaniel Walter; De Brabander, Jef K.; MacMillan, John; Corey, David R.; White, Michael A.
Natural products play an important role in the discovery and development of therapeutics and biological probes. However, in recent decades therapeutic screening efforts have moved away from using natural product libraries, instead opting for large synthetic molecule library. This move has corresponded with a move from phenotypic screens to target based screening approaches. The perceived incompatibility of natural product libraries with target based screening efforts is often cited for these shifts. Herein we discuss the benefits of phenotypic screens and advances in bioinformatic approaches to improve natural product discovery using phenotypic screens. We describe the development and implementation of a screen using a natural product fraction library of ~9000 fractions to screen 26 non-small cell lung cancer cell lines for selective cytotoxic compounds. A screen of this magnitude is unprecedented in academia, therefore we developed a process to rapidly identify and characterize natural products of interest. Natural product fractions with selective toxicity were filtered to ~1000 high priority natural product fractions. Using LC-MS analysis and bioinformatic approaches, Elastic Net (EN) and Functional Signatures of Ontology (FuSiOn),we further prioritized these 1000 natural product fractions based on sensitivity and mechanism of action predictions, and chemical complexity. The implementation of this prioritization process has resulted in an effective discovery pipeline. Ikarugamycin, a selective cytotoxin and endocytosis inhibitor, has been characterized for cytotoxicity and as a chemical tool for inhibiting endocytosis. Piericidin A, a known complex I inhibitor, displays extreme selective toxicity to a subset of cancer cell lines independent of its complex I inhibition, however we can predict this sensitivity using a common genetic biomarker. Other natural products have also been identified, although their biological characterization is ongoing. Using FuSiOn we effectivity identified a minor metabolite (N6,N6-dimethyladenosine) responsible for AKT inhibition. FuSiOn was implemented in conjunction with our non-small cell lung cancer cell line screen to characterize the mechanisms of action of natural product fractions. This correlation led to the rapid identification of bafilomycin among our prioritized 1000 natural product fractions. The process we outline herein effectively uses bioinformatics (EN and FuSiOn) in concert with primary screening data to select for those natural product fractions of greatest biologic and chemical significance.