Using Global Sequence Similarity to Enhance Biological Sequence Labeling

Thumbnail Image
Date
2008-01-01
Authors
Caragea, Cornelia
Sinapov, Jivko
Honavar, Vasant
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

Identifying functionally important sites from biological sequences, formulated as a biological sequence labeling problem, has broad applications ranging from rational drug design to the analysis of metabolic and signal transduction networks. In this paper, we present an approach to biological sequence labeling that takes into account the global similarity between biological sequences. Our approach combines unsupervised and supervised learning techniques. Given a set of sequences and a similarity measure defined on pairs of sequences, we learn a mixture of experts model by using spectral clustering to learn the hierarchical structure of the model and by using bayesian approaches to combine the predictions of the experts. We evaluate our approach on two important biological sequence labeling problems: RNA-protein and DNA-protein interface prediction problems. The results of our experiments show that global sequence similarity can be exploited to improve the performance of classifiers trained to label biological sequence data.

Series Number
Journal Issue
Is Version Of
Versions
Series
Type
article
Comments

This is a proceeding from IEEE International Conference on Bioinformatics and Biomedicine (2008): 104, doi: 10.1109/BIBM.2008.54. Posted with permission.

Rights Statement
Copyright
Tue Jan 01 00:00:00 UTC 2008
Funding
DOI
Supplemental Resources