Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable

Peto, Myron; Kloczkowski, Andrzej; Honavar, Vasant; Jernigan, Robert

Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable

dc.contributor.author	Peto, Myron
dc.contributor.author	Kloczkowski, Andrzej
dc.contributor.author	Honavar, Vasant
dc.contributor.author	Jernigan, Robert
dc.contributor.department	Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology (CALS)
dc.contributor.department	Department of Computer Science
dc.contributor.department	Biochemistry, Biophysics and Molecular Biology, Roy J. Carver Department of
dc.contributor.department	Baker Center for Bioinformatics and Biological Statistics
dc.date	2018-02-19T01:20:45.000
dc.date.accessioned	2020-06-29T23:46:04Z
dc.date.available	2020-06-29T23:46:04Z
dc.date.copyright	Tue Jan 01 00:00:00 UTC 2008
dc.date.issued	2008-01-01
dc.description.abstract	<p><h3>Background</h3></p> <p>By using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or those folding to poorly- or non-designable conformations. <h3>Results</h3></p> <p>First, we generate all possible compact lattice conformations for the specified shape (a hexagon or a triangle) on the 2D triangular lattice. Then we generate all possible binary hydrophobic/polar (H/P) sequences and by using a specified energy function, thread them through all of these compact conformations. If for a given sequence the lowest energy is obtained for a particular lattice conformation we assume that this sequence folds to that conformation. Highly-designable conformations have many H/P sequences folding to them, while poorly-designable conformations have few or no H/P sequences. We classify sequences as folding to either highly – or poorly-designable conformations. We have randomly selected subsets of the sequences belonging to highly-designable and poorly-designable conformations and used them to train several different standard machine learning algorithms. <h3>Conclusion</h3></p> <p>By using these machine learning algorithms with ten-fold cross-validation we are able to classify the two classes of sequences with high accuracy – in some cases exceeding 95%.</p>
dc.description.comments	<p>This article is published as Peto, Myron, Andrzej Kloczkowski, Vasant Honavar, and Robert L. Jernigan. "Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable." BMC bioinformatics 9, no. 1 (2008): 487. doi: <a href="http://dx.doi.org/10.1186" target="_blank">10.1186/1471-2105-9-487</a>. Posted with permission.</p>
dc.format.mimetype	application/pdf
dc.identifier	archive/lib.dr.iastate.edu/bbmb_ag_pubs/160/
dc.identifier.articleid	1168
dc.identifier.contextkey	10987059
dc.identifier.s3bucket	isulib-bepress-aws-west
dc.identifier.submissionpath	bbmb_ag_pubs/160
dc.identifier.uri	https://dr.lib.iastate.edu/handle/20.500.12876/10622
dc.language.iso	en
dc.source.bitstream	archive/lib.dr.iastate.edu/bbmb_ag_pubs/160/2008_Jernigan_UseMachine.pdf\|\|\|Fri Jan 14 20:53:38 UTC 2022
dc.source.uri	10.1186/1471-2105-9-487
dc.subject.disciplines	Biochemistry, Biophysics, and Structural Biology
dc.subject.disciplines	Bioinformatics
dc.subject.disciplines	Computer Sciences
dc.title	Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable
dc.type	article
dc.type.genre	article
dspace.entity.type	Publication
relation.isAuthorOfPublication	50d10ea7-68f5-4cc5-8858-375cef177ed2
relation.isOrgUnitOfPublication	c70f85ae-e0cd-4dce-96b5-4388aac08b3f
relation.isOrgUnitOfPublication	f7be4eb9-d1d0-4081-859b-b15cee251456

File

Original bundle

Now showing 1 - 1 of 1

Name:: 2008_Jernigan_UseMachine.pdf
Size:: 1.81 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Publications