Study of secondary structure of protein sequences by linear algebra

Thumbnail Image
Hsieh, Wei-hua
Major Professor
James L. Cornette
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Organizational Unit
Journal Issue
Is Version Of

Using encoding schemes, we study the relation between amino acids and protein structure in linear spaces;Local protein-structure prediction schemes use the amino acid sequences in a short subchain of a protein to predict the protein secondary structure of the middle residue of the chain;Two relatively reliable, objective and quantitative local prediction schemes are Robson et al.'s information theory method and and Qian & Sejnowski's neural network models. The latter achieved 64% accuracy for three-state predictions in 1989, the best prediction performance yet recorded;We assign to the 20 amino acids, through a one-to-one correspondence, the 20 columns (unit vectors) of the 20 x 20 identity matrix. Then any chain of k amino acids may be represented as a sequence of k unit vectors or a single, composite vector of length 20k. The representation of protein subchains by vectors is called a local encoding scheme;In the language of local encoding schemes, both the information theory method and the two-layer neural network model, which are 3-state predictors, partition 20k-dimensional space with three planes to distinguish the predicted secondary structures;We use the mathematical tool linear programming to construct partition planes. Our prediction schemes are objective and quantitative. We make no artificial modifications of the training scheme outputs. For 3-state prediction, we obtain quite high accuracy on both the training set, which is above 90%, and the testing set, which is about 66% for predicting about 1/3 of points in the testing set, as contrasted with Sejnowski who gets about 64% accuracy on both sets. For 2-state prediction, we obtain 91% and 86% accuracy for the training and the testing set, respectively, but it is about 50% for predicting an alpha-helical residue correctly. According to our experiments, the distribution of points in space is ambiguous, and it is difficult to find planes performing well for both training and testing sets;In addition to the local encoding scheme, a less complicated general scheme is used to study the same problem.

Tue Jan 01 00:00:00 UTC 1991