Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable

Thumbnail Image
Peto, Myron
Kloczkowski, Andrzej
Honavar, Vasant
Major Professor
Committee Member
Journal Title
Journal ISSN
Volume Title
Jernigan, Robert
Distinguished Professor
Research Projects
Organizational Units
Organizational Unit
Organizational Unit
Computer Science

Computer Science—the theory, representation, processing, communication and use of information—is fundamentally transforming every aspect of human endeavor. The Department of Computer Science at Iowa State University advances computational and information sciences through; 1. educational and research programs within and beyond the university; 2. active engagement to help define national and international research, and 3. educational agendas, and sustained commitment to graduating leaders for academia, industry and government.

The Computer Science Department was officially established in 1969, with Robert Stewart serving as the founding Department Chair. Faculty were composed of joint appointments with Mathematics, Statistics, and Electrical Engineering. In 1969, the building which now houses the Computer Science department, then simply called the Computer Science building, was completed. Later it was named Atanasoff Hall. Throughout the 1980s to present, the department expanded and developed its teaching and research agendas to cover many areas of computing.

Dates of Existence

Related Units

Journal Issue
Is Version Of
Biochemistry, Biophysics and Molecular Biology, Roy J. Carver Department of


By using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or those folding to poorly- or non-designable conformations.


First, we generate all possible compact lattice conformations for the specified shape (a hexagon or a triangle) on the 2D triangular lattice. Then we generate all possible binary hydrophobic/polar (H/P) sequences and by using a specified energy function, thread them through all of these compact conformations. If for a given sequence the lowest energy is obtained for a particular lattice conformation we assume that this sequence folds to that conformation. Highly-designable conformations have many H/P sequences folding to them, while poorly-designable conformations have few or no H/P sequences. We classify sequences as folding to either highly – or poorly-designable conformations. We have randomly selected subsets of the sequences belonging to highly-designable and poorly-designable conformations and used them to train several different standard machine learning algorithms.


By using these machine learning algorithms with ten-fold cross-validation we are able to classify the two classes of sequences with high accuracy – in some cases exceeding 95%.


This article is published as Peto, Myron, Andrzej Kloczkowski, Vasant Honavar, and Robert L. Jernigan. "Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable." BMC bioinformatics 9, no. 1 (2008): 487. doi: 10.1186/1471-2105-9-487. Posted with permission.

Tue Jan 01 00:00:00 UTC 2008