Browsing by Subject "Sequence Analysis, Protein"

Now showing 1 - 5 of 5

Algorithmic Developments for Sequence Analysis, Structure Modeling and Functional Prediction of Proteins
(2006-12-20) Qi, Yuan; Grishin, Nick V.
Sequence, structure and function, being the three most important properties of proteins, are interrelated through homology relationships. In this post-genome era, we are equipped with abundant sequence information. Homology inference is thus of great practical importance because of its ability to make structural and functional predictions through sequence analysis. In an effort to explore and utilize the protein sequence-structure-function relationships, with homology detection and utilization as the central scheme, this work concentrates on algorithmic development of methods and systems for sequence similarity search, structure modeling and functional prediction purposes, as well as performs structure prediction and classification for specific protein families. Three algorithmic developments are described in this dissertation. First, to facilitate identification of structurally or functionally important interactions between positions in a protein family, a program has been developed to perform positional correlation analysis of multiple sequence alignments using different methods. The program has been shown to be useful to identify functionally important position pairs or networks of correlated positions. Second, to further increase the sensitivity of sequence similarity search methods in terms of homology detection and structure modeling ability, a method has been developed by incorporating predicted secondary structure information with sequence profiles. Evaluation on PFAM-based system shows that this method provides improved structure template detection ability and generates alignment of better quality. Third, in order to systematically assess the structure modeling abilities of different sequence similarity search programs, a comprehensive evaluation system has been developed. This large-scale automatic evaluation system assesses the fold recognition ability and alignment quality of different programs from global and local perspectives using both reference-dependent and reference-independent approaches, which provides an instrument to understand the progress and limitations of the field. Two structure prediction and classification projects using manual analysis and existing tools are also described in this dissertation. First, the structure of C-terminal domain of Gyrase A is predicted through inferred homology relationship with regulator of chromosome condensation (RCC1). This prediction has been validated by experimental data. Second, a hierarchical structure classification of thioredoxin-like fold proteins has been carried out, which promotes understanding of fold definitions and sequence-structure-function relationships
Distinct Functional Phases in Proteins: A Test by Large-Scale Protein Design
(2017-03-24) Subramanian, Subramanian Kanagarajan; Yu, Hongtao; Ranganathan, Rama; Sternweis, Paul C.; Rice, Luke M.
The biological properties of proteins - folding, biochemical functions, and evolvability - originate from the global pattern of interactions between amino acids. Coevolution studies suggest a model for this pattern in which the essential constraints are loaded in sparse networks of cooperative residues (termed sectors), embedded within an environment of weakly coupled residues. Here, we test this biphasic model for proteins using a protein design approach in the SHO1-mediated yeast osmo-sensing pathway. We computationally designed libraries of synthetic SHO1 SH3 domains in which the hierarchy of coevolution that defines sectors and their environment is gradually varied. We tested the designed sequences in a quantitative high-throughput assay for SHO1 function in vivo. The data show that sector amino acids contribute in an all-or-nothing fashion while surrounding amino acids have a more graded, near-independent contribution to function. These results support the biphasic model for the information content of protein sequences.
Exploring Sequence-Structure-Function Relationships in Proteins Using Classification Schemes
(2005-12-19) Cheek, Sara Anne; Grishin, Nick V.
With the rapid growth in the number of available protein sequences and structures, the necessity of interpreting this data in comprehensive and meaningful ways becomes increasingly apparent. Identifying and categorizing the functional, structural, and evolutionary relationships between proteins is a key step in understanding protein evolution. Protein classification is a useful means of organizing biological data for the purpose of exploring these sequence-structure-function relationships in proteins. In this work, two-tier classification schemes are constructed for the organization of large protein classes. One level of this hierarchy reflects structural similarity ("fold groups"), while the second level indicates an evolutionary relationship between members ("families"). Kinases are a ubiquitous group of enzymes that participate in a variety of cellular pathways. Despite that all kinases catalyze similar phosphoryl transfer reactions, they display remarkable diversity in structural fold and substrate specificity. All available kinase sequences and structures have been classified into fold groups and families. This classification presents the first comprehensive structural annotation of a large functional class of proteins. The question of how different structural folds accomplish the same fundamental elements of the kinase reaction is investigated. Disulfide-rich domains are small protein domains whose global folds are stabilized predominantly by disulfide bonds. In order to understand the structural and functional diversity among available disulfide-rich proteins, a comprehensive classification of these domains has been performed. The resulting fold groups and families describe more distant structural and evolutionary relationships than previously acknowledged among disulfide-rich domains. Variations in disulfide bonding patterns of these domains are also evaluated. Several existing classification databases have been developed for the purpose of cataloguing all available protein structures. Because such databases are often manually curated, recently solved structures are not included and useful information regarding their relatedness to other proteins is not immediately available. To address this limitation, an algorithm has been developed to make classification assignments with evolutionary relevance for domains in newly solved structures, with the objective of reliably reproducing assignments to an existing classification scheme in an automatic manner.
Toward Structural and Functional Predictions from Biological Sequences
(2018-05-25) Li, Wenlin; Otwinowski, Zbyszek; Grishin, Nick V.; Thomas, Philip J.; Rosenbaum, Daniel M.
Biological sequences, including DNA and protein sequences, are believed to encode sufficient information to determine the structure and function of biological molecules, which in turn decide the phenotypic traits of animals. Deciphering the biological sequences is an important and multiscale problem that connecting the information flow from genotypes to phenotypes. Current advances in next-generation sequence technology provided tons of sequencing data, demanding innovations in computational algorithm for better interpretation. I developed computational methodologies to understand the biological sequences in various levels. In the primary sequence level, I analyzed the evolutionary information encoded in protein families and predicted the function (and active sites) of the proteins. To aid my sequence analysis, I developed a set of computational methodologies and deployed them as public web-servers. In the protein structure level, I studied the plasticity of the 3D structures, as well as demonstrated its effect on the uncertainty of computational scoring algorithms. In the organism level, I innovated the computational methodology to assemble and analyze complete genomes of butterflies and discovered convergence evolution in butterfly wing patterns. In conclusion, I advanced the knowledge of biological sequences in multi-layers by computational approaches.
Towards Prediction of Phenotype from Genotype
(2017-04-14) Cong, Qian; Otwinowski, Zbyszek; Grishin, Nick V.; Hobbs, Helen H.; Deisenhofer, Johann
Predicting phenotype from genotype represents the epitome of biological questions. As a multiscale problem, it starts from predicting exons and culminates with modeling of whole organisms. Focusing on the molecular level, I studied the relationship between sequences and protein spatial structures and analyzed proteins with similar sequences but different structures. To aid the assessment of structure prediction, I developed a method to rank the predictions of proteins with new folds, a very challenging problem that was previously addressed by expert inspection. Then, I developed a set of computer programs and scripts to predict various structural and functional properties of proteins from their sequences and implemented them as a public web-server. I applied these methods to important agricultural (citrus disease) and medical (Ebolavirus) problems. Moving on to organismal level predictions, I sequenced, annotated and analyzed complete genomes of butterflies and suggested hypotheses about genetic determinants of their behavior and other phenotypic traits. Taken together, these applications highlight the achievements possible today and challenges that lie ahead.