Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty

Thumbnail Image
Agrawal, Ankit
Major Professor
Committee Member
Journal Title
Journal ISSN
Volume Title
Huang, Xiaoqiu
Research Projects
Organizational Units
Organizational Unit
Computer Science

Computer Science—the theory, representation, processing, communication and use of information—is fundamentally transforming every aspect of human endeavor. The Department of Computer Science at Iowa State University advances computational and information sciences through; 1. educational and research programs within and beyond the university; 2. active engagement to help define national and international research, and 3. educational agendas, and sustained commitment to graduating leaders for academia, industry and government.

The Computer Science Department was officially established in 1969, with Robert Stewart serving as the founding Department Chair. Faculty were composed of joint appointments with Mathematics, Statistics, and Electrical Engineering. In 1969, the building which now houses the Computer Science department, then simply called the Computer Science building, was completed. Later it was named Atanasoff Hall. Throughout the 1980s to present, the department expanded and developed its teaching and research agendas to cover many areas of computing.

Dates of Existence

Related Units

Journal Issue
Is Version Of

Background: Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets.

Results: Results for a knowledge discovery application of homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives better coverage than using a single parameter set, at least at some error levels. Further, the results of pairwise statistical significance using multiple parameter sets are shown to be significantly better than database statistical significance estimates reported by BLAST and PSI-BLAST, and comparable and at times significantly better than SSEARCH. Using non-zero parameter set change penalty values give better performance than zero penalty.

Conclusion: The fact that the homology detection performance does not degrade when using multiple parameter sets is a strong evidence for the validity of the assumption that the alignment score distribution follows an extreme value distribution even when using multiple parameter sets. Parameter set change penalty is a useful parameter for alignment using multiple parameter sets. Pairwise statistical significance using multiple parameter sets can be effectively used to determine the relatedness of a (or a few) pair(s) of sequences without performing a time-consuming database search.


This proceeding was published as Agrawal, Ankit, and Xiaoqiu Huang. "Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty." In BMC Bioinformatics 10 (2009): S1, doi: 10.1186/1471-2105-10-S3-S1. From Second International Workshop on Data and Text Mining in Bioinformatics (DTMBio) 2008 Napa Valley, CA, USA. 30 October 2008.

Thu Jan 01 00:00:00 UTC 2009