Browsing by Subject "Databases, Protein"

Now showing 1 - 3 of 3

Classification and Differentiation of Homologs and Structural Analogs
(2007-08-08) Cheng, Hua; Grishin, Nick V.
It is both meaningful and useful to study protein sequence, structure, and function in the context of evolution. In divergent evolution, homologs, or proteins having descended from a common ancestor, usually share sequence, structure, and functional properties, and an unknown protein's structure and function can be hypothesized from its experimentally characterized homologs. In convergent evolution, proteins from distinct evolutionary lineages converge to similar structures or functions, and these proteins are called "analogs". To classify proteins into evolutionary families, it is necessary to differentiate these two opposite scenarios. Statistically significant sequence similarity is commonly accepted as adequate evidence for homology. Yet in the absence of significant sequence similarity, discrimination between homology and analogy frequently requires manual work. This dissertation describes an effort in developing an automatic tool to differentiate remote homologs and structural analogs.
Evolutionary Classification of Protein Domains: From Remote Homology to Family
(2017-11-20) Liao, Yuxing; Rizo-Rey, José; Grishin, Nick V.; Rice, Luke M.; Tomchick, Diana R.
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. A protein domain classification splits protein into domains and organizes them according to their evolutionary history. Existing classification databases fall back the speed of protein structure determination and do not include some known homologous relationships. I have participated in creating a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures and developed a website for easy access and searches with keyword, sequence or structure (http://prodata.swmed.edu/ecod). ECOD (Evolutionary Classification Of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or fold). Our database uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary relationships among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. The classification is assisted by an automated pipeline that classifies the most of new structures in Protein Data Bank weekly. This synchronization uniquely distinguishes ECOD among all protein classifications. For proteins that lack confident results from the automatic pipeline, I rely on information from literature, sequence and structure similarity scores, visual comparison and experience to classify them manually. I document the manual curation process in detail with an example of the remote homology between an autoproteolytic domain found in GPCR-Autoproteolysis Inducing domain, ZU5 and nucleoporin98. ECOD also recognizes closer relationships at the family level, initially with Pfam families. However, existing family databases do not cover all structures and disagree with ECOD in terms of domain definition and boundary. I generate multiple sequence alignment and profile for domains in the same family with structural information and demonstrate that the alignment quality is similar to manually checked Pfam seed alignments. I compare ECOD family profiles with Pfam and Conserved Domain Database and discuss about the improvement of domain boundary over known families and the dominance of small families in new families.
Exploring Sequence-Structure-Function Relationships in Proteins Using Classification Schemes
(2005-12-19) Cheek, Sara Anne; Grishin, Nick V.
With the rapid growth in the number of available protein sequences and structures, the necessity of interpreting this data in comprehensive and meaningful ways becomes increasingly apparent. Identifying and categorizing the functional, structural, and evolutionary relationships between proteins is a key step in understanding protein evolution. Protein classification is a useful means of organizing biological data for the purpose of exploring these sequence-structure-function relationships in proteins. In this work, two-tier classification schemes are constructed for the organization of large protein classes. One level of this hierarchy reflects structural similarity ("fold groups"), while the second level indicates an evolutionary relationship between members ("families"). Kinases are a ubiquitous group of enzymes that participate in a variety of cellular pathways. Despite that all kinases catalyze similar phosphoryl transfer reactions, they display remarkable diversity in structural fold and substrate specificity. All available kinase sequences and structures have been classified into fold groups and families. This classification presents the first comprehensive structural annotation of a large functional class of proteins. The question of how different structural folds accomplish the same fundamental elements of the kinase reaction is investigated. Disulfide-rich domains are small protein domains whose global folds are stabilized predominantly by disulfide bonds. In order to understand the structural and functional diversity among available disulfide-rich proteins, a comprehensive classification of these domains has been performed. The resulting fold groups and families describe more distant structural and evolutionary relationships than previously acknowledged among disulfide-rich domains. Variations in disulfide bonding patterns of these domains are also evaluated. Several existing classification databases have been developed for the purpose of cataloguing all available protein structures. Because such databases are often manually curated, recently solved structures are not included and useful information regarding their relatedness to other proteins is not immediately available. To address this limitation, an algorithm has been developed to make classification assignments with evolutionary relevance for domains in newly solved structures, with the objective of reliably reproducing assignments to an existing classification scheme in an automatic manner.