Browsing by Subject "Sequence Alignment"

Now showing 1 - 4 of 4

Classification and Differentiation of Homologs and Structural Analogs
(2007-08-08) Cheng, Hua; Grishin, Nick V.
It is both meaningful and useful to study protein sequence, structure, and function in the context of evolution. In divergent evolution, homologs, or proteins having descended from a common ancestor, usually share sequence, structure, and functional properties, and an unknown protein's structure and function can be hypothesized from its experimentally characterized homologs. In convergent evolution, proteins from distinct evolutionary lineages converge to similar structures or functions, and these proteins are called "analogs". To classify proteins into evolutionary families, it is necessary to differentiate these two opposite scenarios. Statistically significant sequence similarity is commonly accepted as adequate evidence for homology. Yet in the absence of significant sequence similarity, discrimination between homology and analogy frequently requires manual work. This dissertation describes an effort in developing an automatic tool to differentiate remote homologs and structural analogs.
High-Performance Software Development for Genomic Sequence Alignment and Analysis
(2023-05-01T05:00:00.000Z) Zhang, Yun; Zhan, Xiaowei; Kim, Daehwan; Li, Bo; Wang, Tao; Hon, Gary C.
Nucleic acid sequencing technology is a powerful tool for understanding genetic information. Genomic data analysis software is critical for transforming complex sequencing results into meaningful biological information. Emerging sequencing technologies help scientists to understand biological processes from multiple angles, but they also raise the challenge of developing new sequence analysis tools, especially new alignment methods, to support these techniques. In this dissertation, I developed a rapid and accurate sequence alignment software, HISAT-3N, to solve the alignment problem of nucleotide conversion sequencing (NC) technologies. NC technologies, such as BS-seq and SLAM seq, involve converting one type of nucleotide to another, which allows researchers to identify specific chemical modifications in DNA or RNA molecules. However, the conversions generated in these NC technologies make it difficult to align the reads back to the reference genome. To solve this issue, I implemented the 3-letter alignment algorithm into HISAT2, which was developed by our lab previously, to create HISAT-3N. I thoroughly tested HISAT-3N and demonstrated that it is more than seven times faster and more accurate than widely used sequence aligners, and can support all types of nucleotide conversion sequencing technologies, including those that have not yet been developed. Additionally, to generalize the process of developing new alignment methods to support new sequencing technologies, I created a platform that allows for the modularized design of sequence alignment software. This platform incorporates algorithms from HISAT2, STAR, and BWA, providing greater efficiency for developers to create novel sequence alignment software and more flexibility for users to analyze different types of data in a variety of computational environments. Finally, I developed a metagenomics analysis pipeline that effectively organizes and manages multiple well-known sequence analysis software for rapid and accurate soil microbial analysis. The successful development and implementation of these tools demonstrate the robustness of a well-designed bioinformatics software and pipeline framework in bioinformatics analysis. Overall, my work emphasizes the significance of continuously improving genomics data analysis tools. This is important to support emerging sequencing technologies and deliver more precise results, which assist researchers in revealing valuable genetic information.
Improving Profile Similarity Search and Alignment of Protein Sequences
(2015-11-20) Tong, Jing; Ranganathan, Rama; Otwinowski, Zbyszek; Borek, Dominika; Grishin, Nick V.
Protein function prediction is one of the most important problems in the field of computational biology. The most reliable method to predict protein function is to detect homologs. Homologous proteins tend to possess conserved sequence motifs, the same structure folds, and similar functional sites. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. We present a new method, COMPADRE, to assess the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This method markedly boosts the homology detection precision rate. Successful homology-based protein function prediction is also determined by accurate alignment between a protein sequence and its homolog. Alignment errors are the main bottleneck for homology modeling when the query is distantly related to the template. Alignment methods often misalign secondary structural elements by a few residues. We present a refinement method, SFESA, to improve pairwise sequence alignments by evaluating alignment variants generated by local shifts of template-defined secondary structures. The potential values of these methods for structure/function predictions are illustrated by the detection of homology between evolutionary distant yet structurally similar protein domains.
PROCAIN: Protein Profile Comparison with Assisting Information
(2009-06-19) Wang, Yong; Grishin, Nick V.
Detection of remote sequence homology is essential for the accurate inference of protein structure, function, and evolution. The most sensitive detection methods involve the comparison of evolutionary patterns reflected in multiple sequence alignments of protein families. We present PROCAIN, a new method for MSA comparison based on the combination of 'vertical' MSA context (substitution constraints at individual sequence positions) and 'horizontal' context (patterns of residue content at multiple positions). Based on a simple and tractable profile methodology and primitive measures for the similarity of horizontal MSA patterns, the method achieves the quality of homology detection comparable to a more complex advanced method employing hidden Markov models and secondary structure prediction. Adding secondary structure information further improves PROCAIN performance beyond the capabilities of current state-of-the-art tools. The potential value of the method for structure/function predictions is illustrated by the detection of subtle homology between evolutionary distant yet structurally similar protein domains. ProCAIn, relevant databases and tools can be downloaded from http://prodata.swmed.edu/procain/download. The web server can be accessed at http://prodata.swmed.edu/procain/procain.php.