Optimal DNA-protein alignments with application to large-scale genome analysis

Thumbnail Image
Narayanan, Mahesh
Major Professor
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Organizational Unit
Computer Science

Computer Science—the theory, representation, processing, communication and use of information—is fundamentally transforming every aspect of human endeavor. The Department of Computer Science at Iowa State University advances computational and information sciences through; 1. educational and research programs within and beyond the university; 2. active engagement to help define national and international research, and 3. educational agendas, and sustained commitment to graduating leaders for academia, industry and government.

The Computer Science Department was officially established in 1969, with Robert Stewart serving as the founding Department Chair. Faculty were composed of joint appointments with Mathematics, Statistics, and Electrical Engineering. In 1969, the building which now houses the Computer Science department, then simply called the Computer Science building, was completed. Later it was named Atanasoff Hall. Throughout the 1980s to present, the department expanded and developed its teaching and research agendas to cover many areas of computing.

Dates of Existence

Related Units

Journal Issue
Is Version Of

DNA-protein alignment algorithms can be used to discover coding sequences in a genomic sequence, if the corresponding protein derivatives are known. They can also be used to identify potential coding sequences of a newly sequenced genome by using proteins from related species. Previously known algorithms for computing DNA-protein alignments have one or more of the following drawbacks: not taking into account all aspects in problem formulation, providing optimal solutions that are run-time/memory expensive, and sacrificing optimality to achieve practical implementation. In this thesis, we present a comprehensive formulation of the DNA-protein alignment problem including indels, substitutions, frameshift errors, and intronic insertions between and within codons. We then provide an algorithm to compute an optimal alignment in O(mn) time using only four dynamic programming tables of size (m+1)x(n+1), where m and n are the lengths of the DNA and protein sequences, respectively. We developed a Protein and DNA Alignment program (PanDA) that implements the proposed solution. Experimental results indicate that our algorithm provides alignments that accurately reproduce GenBank annotation in nearly all cases when tested on gene and protein sequences from the same organism. We also present experimental evidence that our algorithm produces high-quality alignments and exon-intron predictions when aligning DNA sequences with proteins corresponding to orthologous genes from other species. We also present a parallel software that can be used to annotate, validate, and improve the quality of an assembly of a genome in a large scale. Spliced alignments between DNA sequences of the assembly and protein sequences from other organisms are done to achieve the same. Experimental results indicate that our software can produce putative annotations, while detecting candidate contigs to improve quality of an assembly.

Thu Jan 01 00:00:00 UTC 2004