Evolutionary Classification of Protein Domains: From Remote Homology to Family

dc.contributor.advisorRizo-Rey, Joséen
dc.contributor.committeeMemberGrishin, Nick V.en
dc.contributor.committeeMemberRice, Luke M.en
dc.contributor.committeeMemberTomchick, Diana R.en
dc.creatorLiao, Yuxingen
dc.date.accessioned2020-01-02T18:22:37Z
dc.date.available2020-01-02T18:22:37Z
dc.date.created2017-12
dc.date.issued2017-11-20
dc.date.submittedDecember 2017
dc.date.updated2020-01-02T18:22:37Z
dc.description.abstractUnderstanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. A protein domain classification splits protein into domains and organizes them according to their evolutionary history. Existing classification databases fall back the speed of protein structure determination and do not include some known homologous relationships. I have participated in creating a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures and developed a website for easy access and searches with keyword, sequence or structure (http://prodata.swmed.edu/ecod). ECOD (Evolutionary Classification Of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or fold). Our database uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary relationships among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. The classification is assisted by an automated pipeline that classifies the most of new structures in Protein Data Bank weekly. This synchronization uniquely distinguishes ECOD among all protein classifications. For proteins that lack confident results from the automatic pipeline, I rely on information from literature, sequence and structure similarity scores, visual comparison and experience to classify them manually. I document the manual curation process in detail with an example of the remote homology between an autoproteolytic domain found in GPCR-Autoproteolysis Inducing domain, ZU5 and nucleoporin98. ECOD also recognizes closer relationships at the family level, initially with Pfam families. However, existing family databases do not cover all structures and disagree with ECOD in terms of domain definition and boundary. I generate multiple sequence alignment and profile for domains in the same family with structural information and demonstrate that the alignment quality is similar to manually checked Pfam seed alignments. I compare ECOD family profiles with Pfam and Conserved Domain Database and discuss about the improvement of domain boundary over known families and the dominance of small families in new families.en
dc.format.mimetypeapplication/pdfen
dc.identifier.oclc1134689293
dc.identifier.urihttps://hdl.handle.net/2152.5/7740
dc.language.isoenen
dc.subjectComputational Biologyen
dc.subjectDatabases, Proteinen
dc.subjectProtein Structure, Tertiaryen
dc.subjectProteinsen
dc.subjectReceptors, G-Protein-Coupleden
dc.subjectTerminology as Topicen
dc.titleEvolutionary Classification of Protein Domains: From Remote Homology to Familyen
dc.typeThesisen
dc.type.materialtexten
thesis.degree.departmentGraduate School of Biomedical Sciencesen
thesis.degree.disciplineMolecular Biophysicsen
thesis.degree.grantorUT Southwestern Medical Centeren
thesis.degree.levelDoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LIAO-DISSERTATION-2017.pdf
Size:
2.87 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: