Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable
dc.contributor.author | Peto, Myron | |
dc.contributor.author | Jernigan, Robert | |
dc.contributor.author | Kloczkowski, Andrzej | |
dc.contributor.author | Honavar, Vasant | |
dc.contributor.author | Jernigan, Robert | |
dc.contributor.department | Biochemistry, Biophysics and Molecular Biology | |
dc.contributor.department | Computer Science | |
dc.contributor.department | Baker Center for Bioinformatics and Biological Statistics | |
dc.date | 2018-02-19T01:20:45.000 | |
dc.date.accessioned | 2020-06-29T23:46:04Z | |
dc.date.available | 2020-06-29T23:46:04Z | |
dc.date.copyright | Tue Jan 01 00:00:00 UTC 2008 | |
dc.date.issued | 2008-01-01 | |
dc.description.abstract | <p><h3>Background</h3></p> <p>By using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or those folding to poorly- or non-designable conformations. <h3>Results</h3></p> <p>First, we generate all possible compact lattice conformations for the specified shape (a hexagon or a triangle) on the 2D triangular lattice. Then we generate all possible binary hydrophobic/polar (H/P) sequences and by using a specified energy function, thread them through all of these compact conformations. If for a given sequence the lowest energy is obtained for a particular lattice conformation we assume that this sequence folds to that conformation. Highly-designable conformations have many H/P sequences folding to them, while poorly-designable conformations have few or no H/P sequences. We classify sequences as folding to either highly – or poorly-designable conformations. We have randomly selected subsets of the sequences belonging to highly-designable and poorly-designable conformations and used them to train several different standard machine learning algorithms. <h3>Conclusion</h3></p> <p>By using these machine learning algorithms with ten-fold cross-validation we are able to classify the two classes of sequences with high accuracy – in some cases exceeding 95%.</p> | |
dc.description.comments | <p>This article is published as Peto, Myron, Andrzej Kloczkowski, Vasant Honavar, and Robert L. Jernigan. "Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable." BMC bioinformatics 9, no. 1 (2008): 487. doi: <a href="http://dx.doi.org/10.1186" target="_blank">10.1186/1471-2105-9-487</a>. Posted with permission.</p> | |
dc.format.mimetype | application/pdf | |
dc.identifier | archive/lib.dr.iastate.edu/bbmb_ag_pubs/160/ | |
dc.identifier.articleid | 1168 | |
dc.identifier.contextkey | 10987059 | |
dc.identifier.s3bucket | isulib-bepress-aws-west | |
dc.identifier.submissionpath | bbmb_ag_pubs/160 | |
dc.identifier.uri | https://dr.lib.iastate.edu/handle/20.500.12876/10622 | |
dc.language.iso | en | |
dc.source.bitstream | archive/lib.dr.iastate.edu/bbmb_ag_pubs/160/2008_Jernigan_UseMachine.pdf|||Fri Jan 14 20:53:38 UTC 2022 | |
dc.source.uri | 10.1186/1471-2105-9-487 | |
dc.subject.disciplines | Biochemistry, Biophysics, and Structural Biology | |
dc.subject.disciplines | Bioinformatics | |
dc.subject.disciplines | Computer Sciences | |
dc.title | Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable | |
dc.type | article | |
dc.type.genre | article | |
dspace.entity.type | Publication | |
relation.isAuthorOfPublication | 50d10ea7-68f5-4cc5-8858-375cef177ed2 | |
relation.isOrgUnitOfPublication | c70f85ae-e0cd-4dce-96b5-4388aac08b3f | |
relation.isOrgUnitOfPublication | f7be4eb9-d1d0-4081-859b-b15cee251456 |
File
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- 2008_Jernigan_UseMachine.pdf
- Size:
- 1.81 MB
- Format:
- Adobe Portable Document Format
- Description: