Browsing by Author "Wang, Tao"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item High-Performance Software Development for Genomic Sequence Alignment and Analysis(2023-05-01T05:00:00.000Z) Zhang, Yun; Zhan, Xiaowei; Kim, Daehwan; Li, Bo; Wang, Tao; Hon, Gary C.Nucleic acid sequencing technology is a powerful tool for understanding genetic information. Genomic data analysis software is critical for transforming complex sequencing results into meaningful biological information. Emerging sequencing technologies help scientists to understand biological processes from multiple angles, but they also raise the challenge of developing new sequence analysis tools, especially new alignment methods, to support these techniques. In this dissertation, I developed a rapid and accurate sequence alignment software, HISAT-3N, to solve the alignment problem of nucleotide conversion sequencing (NC) technologies. NC technologies, such as BS-seq and SLAM seq, involve converting one type of nucleotide to another, which allows researchers to identify specific chemical modifications in DNA or RNA molecules. However, the conversions generated in these NC technologies make it difficult to align the reads back to the reference genome. To solve this issue, I implemented the 3-letter alignment algorithm into HISAT2, which was developed by our lab previously, to create HISAT-3N. I thoroughly tested HISAT-3N and demonstrated that it is more than seven times faster and more accurate than widely used sequence aligners, and can support all types of nucleotide conversion sequencing technologies, including those that have not yet been developed. Additionally, to generalize the process of developing new alignment methods to support new sequencing technologies, I created a platform that allows for the modularized design of sequence alignment software. This platform incorporates algorithms from HISAT2, STAR, and BWA, providing greater efficiency for developers to create novel sequence alignment software and more flexibility for users to analyze different types of data in a variety of computational environments. Finally, I developed a metagenomics analysis pipeline that effectively organizes and manages multiple well-known sequence analysis software for rapid and accurate soil microbial analysis. The successful development and implementation of these tools demonstrate the robustness of a well-designed bioinformatics software and pipeline framework in bioinformatics analysis. Overall, my work emphasizes the significance of continuously improving genomics data analysis tools. This is important to support emerging sequencing technologies and deliver more precise results, which assist researchers in revealing valuable genetic information.Item Modeling Tumor Neoantigens for Predicting Patients' Clinical Outcomes(December 2021) Lu, Tianshi; Hoshida, Yujin; Wang, Tao; Xiao, Guanghua; Ahn, Chul; Aguilera, Todd A.Tumor neoantigens are critical targets of the host antitumor immune response and their presence play an important role in affecting tumor progressions and immunotherapy treatment response. Neoantigens showed a lot of potential of being applied to clinical treatment. However, systematic study of neoantigens' impact on tumors and patients is still challenging due to the huge diversity of neoantigens, heterogeneity within tumors, and the model to study the pairing between neoantigen-MHC and T cells to identify the neoantigens that truly elicit T cell response. To study the impact of neoantigen-T cell interaction on tumorigenesis, I developed a Bayesian hierarchical model to infer the history of neoantigen-cytotoxic T cell interactions in tumors.Item Understanding RNA Regulation Through Analysis of CLIP-Seq Data(2015-11-18) Wang, Tao; Mendell, Joshua T.; Xie, Yang; Xiao, Guanghua; Mangelsdorf, David J.; Zhang, Michael Q.The past decades have witnessed a surge of discoveries revealing RNA regulation as a central player in cellular processes. The advent of cross-linking immunoprecipitation coupled with high-throughput sequencing (CLIP-Seq) technology has recently enabled the investigation of genome-wide RNA binding protein-RNA interactions, which is a very important component of RNA-regulation. However, proper and systematic bioinformatics analysis of CLIP-Seq data is still lacking and challenging. For the past few years, I have been devoting my research to methodological developments of CLIP-Seq data analysis, and developed MiClip and dCLIP for peak calling and differential analysis of CLLIP-Seq data, respectively. I have also applied my CLIP-Seq analysis pipelines in on-campus collaborating projects, in which I identified ORF57 and nuclear AGO2 binding sites. Finally, I conducted analysis of public CLIP-Seq datasets to systematically characterize RNA binding protein targeting sites on circular RNAs.