Evolutionary Classification of Protein Domains: From Remote Homology to Family
MetadataShow full item record
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. A protein domain classification splits protein into domains and organizes them according to their evolutionary history. Existing classification databases fall back the speed of protein structure determination and do not include some known homologous relationships. I have participated in creating a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures and developed a website for easy access and searches with keyword, sequence or structure (http://prodata.swmed.edu/ecod). ECOD (Evolutionary Classification Of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or fold). Our database uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary relationships among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. The classification is assisted by an automated pipeline that classifies the most of new structures in Protein Data Bank weekly. This synchronization uniquely distinguishes ECOD among all protein classifications. For proteins that lack confident results from the automatic pipeline, I rely on information from literature, sequence and structure similarity scores, visual comparison and experience to classify them manually. I document the manual curation process in detail with an example of the remote homology between an autoproteolytic domain found in GPCR-Autoproteolysis Inducing domain, ZU5 and nucleoporin98. ECOD also recognizes closer relationships at the family level, initially with Pfam families. However, existing family databases do not cover all structures and disagree with ECOD in terms of domain definition and boundary. I generate multiple sequence alignment and profile for domains in the same family with structural information and demonstrate that the alignment quality is similar to manually checked Pfam seed alignments. I compare ECOD family profiles with Pfam and Conserved Domain Database and discuss about the improvement of domain boundary over known families and the dominance of small families in new families.
Protein Structure, Tertiary
Terminology as Topic
Showing items related by title, author, creator and subject.
Ji, Zhejian; 0000-0002-6835-0854 (2016-11-18)In mitosis, the kinetochore-microtubule attachment is under surveillance by the spindle checkpoint to ensure the fidelity of chromosome segregation. Defects in the checkpoint could lead to aneuploidy, which has been ...
Brulotte, Melissa Lynn; 0000-0002-2908-8582 (2017-10-30)The spindle checkpoint is important for maintaining genomic stability and preventing aneuploidy, a hallmark of cancer. The checkpoint ensures that chromosome segregation does not occur until all sister chromatids are ...
Orme, Jacob Jennings (2014-05-23)Systemic Lupus Erythematosus is a multifactorial systemic autoimmune disorder marked by anti-nuclear antibodies (ANA), rashes and photosensitivity, joint inflammation, nephritis, and other clinical criteria. SLE develops ...