Using evolutionary covariance to infer protein sequence-structure relationships

Thumbnail Image
Jia, Kejue
Major Professor
Robert L. Jernigan
Drena L. Dobbs
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Organizational Unit
Biochemistry, Biophysics and Molecular Biology

The Department of Biochemistry, Biophysics, and Molecular Biology was founded to give students an understanding of life principles through the understanding of chemical and physical principles. Among these principles are frontiers of biotechnology such as metabolic networking, the structure of hormones and proteins, genomics, and the like.

The Department of Biochemistry and Biophysics was founded in 1959, and was administered by the College of Sciences and Humanities (later, College of Liberal Arts & Sciences). In 1979 it became co-administered by the Department of Agriculture (later, College of Agriculture and Life Sciences). In 1998 its name changed to the Department of Biochemistry, Biophysics, and Molecular Biology.

Dates of Existence

Historical Names

  • Department of Biochemistry and Biophysics (1959–1998)

Related Units

Journal Issue
Is Version Of

During the last half century, a deep knowledge of the actions of proteins has emerged from a broad range of experimental and computational methods. This means that there are now many opportunities for understanding how the varieties of proteins affect larger scale behaviors of organisms, in terms of phenotypes and diseases. It is broadly acknowledged that sequence, structure and dynamics are the three essential components for understanding proteins. Learning about the relationships among protein sequence, structure and dynamics becomes one of the most important steps for understanding the mechanisms of proteins. Together with the rapid growth in the efficiency of computers, there has been a commensurate growth in the sizes of the public databases for proteins. The field of computational biology has undergone a paradigm shift from investigating single proteins to looking collectively at sets of related proteins and broadly across all proteins. we develop a novel approach that combines the structure knowledge from the PDB, the CATH database with sequence information from the Pfam database by using co-evolution in sequences to achieve the following goals: (a) Collection of co-evolution information on the large scale by using protein domain family data; (b) Development of novel amino acid substitution matrices based on the structural information incorporated; (c) Higher order co-evolution correlation detection.

The results presented here show that important gains can come from improvements to the sequence matching. What has been done here is simple and the pair correlations in sequence have been decomposed into singlet terms, which amounts to discarding much of the correlation information itself. The gains shown here are encouraging, and we would like to develop a sequence matching method that retains the pair (or higher order) correlation information, and even higher order correlations directly, and this should be possible by developing the sequence matching separately for different domain structures.

The many body correlations in particular have the potential to transform the common perceptions in biology from pairs that are not actually so very informative to higher-order interactions. Fully understanding cellular processes will require a large body of higher-order correlation information such as has been initiated here for single proteins.

Subject Categories
Sat Dec 01 00:00:00 UTC 2018