Predicting DNA-binding sites of proteins from amino acid sequence

dc.contributor.author Yan, Changhui
dc.contributor.author Terribilini, Michael
dc.contributor.author Wu, Feihong
dc.contributor.author Dobbs, Drena
dc.contributor.author Jernigan, Robert
dc.contributor.author Honavar, Vasant
dc.contributor.department Department of Computer Science
dc.contributor.department Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology (LAS)
dc.contributor.department Department of Genetics, Development, and Cell Biology (LAS)
dc.contributor.department Bioinformatics and Computational Biology Program
dc.date 2018-02-18T05:05:11.000
dc.date.accessioned 2020-06-30T04:01:05Z
dc.date.available 2020-06-30T04:01:05Z
dc.date.copyright Sun Jan 01 00:00:00 UTC 2006
dc.date.issued 2006-01-01
dc.description.abstract <p><h3>Background</h3></p> <p>Understanding the molecular details of protein-DNA interactions is critical for deciphering the mechanisms of gene regulation. We present a machine learning approach for the identification of amino acid residues involved in protein-DNA interactions. <h3>Results</h3></p> <p>We start with a Naïve Bayes classifier trained to predict whether a given amino acid residue is a DNA-binding residue based on its identity and the identities of its sequence neighbors. The input to the classifier consists of the identities of the target residue and 4 sequence neighbors on each side of the target residue. The classifier is trained and evaluated (using leave-one-out cross-validation) on a non-redundant set of 171 proteins. Our results indicate the feasibility of identifying interface residues based on local sequence information. The classifier achieves 71% overall accuracy with a correlation coefficient of 0.24, 35% specificity and 53% sensitivity in identifying interface residues as evaluated by leave-one-out cross-validation. We show that the performance of the classifier is improved by using sequence entropy of the target residue (the entropy of the corresponding column in multiple alignment obtained by aligning the target sequence with its sequence homologs) as additional input. The classifier achieves 78% overall accuracy with a correlation coefficient of 0.28, 44% specificity and 41% sensitivity in identifying interface residues. Examination of the predictions in the context of 3-dimensional structures of proteins demonstrates the effectiveness of this method in identifying DNA-binding sites from sequence information. In 33% (56 out of 171) of the proteins, the classifier identifies the interaction sites by correctly recognizing at least half of the interface residues. In 87% (149 out of 171) of the proteins, the classifier correctly identifies at least 20% of the interface residues. This suggests the possibility of using such classifiers to identify potential DNA-binding motifs and to gain potentially useful insights into sequence correlates of protein-DNA interactions. <h3>Conclusion</h3></p> <p>Naïve Bayes classifiers trained to identify DNA-binding residues using sequence information offer a computationally efficient approach to identifying putative DNA-binding sites in DNA-binding proteins and recognizing potential DNA-binding motifs.</p>
dc.description.comments <p>This article is from <em>BMC Bioinformatics </em>7 (2006): 262, doi: <a href="http://dx.doi.org/10.1186/1471-2105-7-262" target="_blank">10.1186/1471-2105-7-262</a>. Posted with permission.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/gdcb_las_pubs/108/
dc.identifier.articleid 1111
dc.identifier.contextkey 9760168
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath gdcb_las_pubs/108
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/37771
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/gdcb_las_pubs/108/2006_Dobbs_PredictingDNA.pdf|||Fri Jan 14 18:28:23 UTC 2022
dc.source.uri 10.1186/1471-2105-7-262
dc.subject.disciplines Bioinformatics
dc.subject.disciplines Cell and Developmental Biology
dc.subject.disciplines Computational Biology
dc.subject.disciplines Genetics and Genomics
dc.subject.disciplines Molecular Biology
dc.title Predicting DNA-binding sites of proteins from amino acid sequence
dc.type article
dc.type.genre article
dspace.entity.type Publication
relation.isAuthorOfPublication 7e096c4f-9007-41e4-9414-989c3ea9bc88
relation.isAuthorOfPublication 50d10ea7-68f5-4cc5-8858-375cef177ed2
relation.isOrgUnitOfPublication f7be4eb9-d1d0-4081-859b-b15cee251456
relation.isOrgUnitOfPublication faf0a6cb-16ca-421c-8f48-9fbbd7bc3747
relation.isOrgUnitOfPublication 9e603b30-6443-4b8e-aff5-57de4a7e4cb2
relation.isOrgUnitOfPublication c331f825-0643-499a-9eeb-592c7b43b1f5
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
2006_Dobbs_PredictingDNA.pdf
Size:
1.48 MB
Format:
Adobe Portable Document Format
Description:
Collections