Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable Peto, Myron Jernigan, Robert Kloczkowski, Andrzej Honavar, Vasant Jernigan, Robert
dc.contributor.department Biochemistry, Biophysics and Molecular Biology
dc.contributor.department Computer Science
dc.contributor.department Baker Center for Bioinformatics and Biological Statistics 2018-02-19T01:20:45.000 2020-06-29T23:46:04Z 2020-06-29T23:46:04Z Tue Jan 01 00:00:00 UTC 2008 2008-01-01
dc.description.abstract <p><h3>Background</h3></p> <p>By using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or those folding to poorly- or non-designable conformations. <h3>Results</h3></p> <p>First, we generate all possible compact lattice conformations for the specified shape (a hexagon or a triangle) on the 2D triangular lattice. Then we generate all possible binary hydrophobic/polar (H/P) sequences and by using a specified energy function, thread them through all of these compact conformations. If for a given sequence the lowest energy is obtained for a particular lattice conformation we assume that this sequence folds to that conformation. Highly-designable conformations have many H/P sequences folding to them, while poorly-designable conformations have few or no H/P sequences. We classify sequences as folding to either highly – or poorly-designable conformations. We have randomly selected subsets of the sequences belonging to highly-designable and poorly-designable conformations and used them to train several different standard machine learning algorithms. <h3>Conclusion</h3></p> <p>By using these machine learning algorithms with ten-fold cross-validation we are able to classify the two classes of sequences with high accuracy – in some cases exceeding 95%.</p>
dc.description.comments <p>This article is published as Peto, Myron, Andrzej Kloczkowski, Vasant Honavar, and Robert L. Jernigan. "Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable." BMC bioinformatics 9, no. 1 (2008): 487. doi: <a href="" target="_blank">10.1186/1471-2105-9-487</a>. Posted with permission.</p>
dc.format.mimetype application/pdf
dc.identifier archive/
dc.identifier.articleid 1168
dc.identifier.contextkey 10987059
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath bbmb_ag_pubs/160
dc.language.iso en
dc.source.bitstream archive/|||Fri Jan 14 20:53:38 UTC 2022
dc.source.uri 10.1186/1471-2105-9-487
dc.subject.disciplines Biochemistry, Biophysics, and Structural Biology
dc.subject.disciplines Bioinformatics
dc.subject.disciplines Computer Sciences
dc.title Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable
dc.type article
dc.type.genre article
dspace.entity.type Publication
relation.isAuthorOfPublication 50d10ea7-68f5-4cc5-8858-375cef177ed2
relation.isOrgUnitOfPublication c70f85ae-e0cd-4dce-96b5-4388aac08b3f
relation.isOrgUnitOfPublication f7be4eb9-d1d0-4081-859b-b15cee251456
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
1.81 MB
Adobe Portable Document Format