Exploring Sequence-Structure-Function Relationships in Proteins Using Classification Schemes
With the rapid growth in the number of available protein sequences and structures, the necessity of interpreting this data in comprehensive and meaningful ways becomes increasingly apparent. Identifying and categorizing the functional, structural, and evolutionary relationships between proteins is a key step in understanding protein evolution. Protein classification is a useful means of organizing biological data for the purpose of exploring these sequence-structure-function relationships in proteins. In this work, two-tier classification schemes are constructed for the organization of large protein classes. One level of this hierarchy reflects structural similarity ("fold groups"), while the second level indicates an evolutionary relationship between members ("families"). Kinases are a ubiquitous group of enzymes that participate in a variety of cellular pathways. Despite that all kinases catalyze similar phosphoryl transfer reactions, they display remarkable diversity in structural fold and substrate specificity. All available kinase sequences and structures have been classified into fold groups and families. This classification presents the first comprehensive structural annotation of a large functional class of proteins. The question of how different structural folds accomplish the same fundamental elements of the kinase reaction is investigated. Disulfide-rich domains are small protein domains whose global folds are stabilized predominantly by disulfide bonds. In order to understand the structural and functional diversity among available disulfide-rich proteins, a comprehensive classification of these domains has been performed. The resulting fold groups and families describe more distant structural and evolutionary relationships than previously acknowledged among disulfide-rich domains. Variations in disulfide bonding patterns of these domains are also evaluated. Several existing classification databases have been developed for the purpose of cataloguing all available protein structures. Because such databases are often manually curated, recently solved structures are not included and useful information regarding their relatedness to other proteins is not immediately available. To address this limitation, an algorithm has been developed to make classification assignments with evolutionary relevance for domains in newly solved structures, with the objective of reliably reproducing assignments to an existing classification scheme in an automatic manner.