Using evolutionary covariance to infer protein sequence-structure relationships

dc.contributor.advisor Robert L. Jernigan
dc.contributor.advisor Drena L. Dobbs Jia, Kejue
dc.contributor.department Biochemistry, Biophysics and Molecular Biology 2019-03-26T18:00:04.000 2020-06-30T03:13:52Z 2020-06-30T03:13:52Z Sat Dec 01 00:00:00 UTC 2018 2001-01-01 2018-01-01
dc.description.abstract <p>During the last half century, a deep knowledge of the actions of proteins has emerged from a broad range of experimental and computational methods. This means that there are now many opportunities for understanding how the varieties of proteins affect larger scale behaviors of organisms, in terms of phenotypes and diseases. It is broadly acknowledged that sequence, structure and dynamics are the three essential components for understanding proteins. Learning about the relationships among protein sequence, structure and dynamics becomes one of the most important steps for understanding the mechanisms of proteins. Together with the rapid growth in the efficiency of computers, there has been a commensurate growth in the sizes of the public databases for proteins. The field of computational biology has undergone a paradigm shift from investigating single proteins to looking collectively at sets of related proteins and broadly across all proteins. we develop a novel approach that combines the structure knowledge from the PDB, the CATH database with sequence information from the Pfam database by using co-evolution in sequences to achieve the following goals: (a) Collection of co-evolution information on the large scale by using protein domain family data; (b) Development of novel amino acid substitution matrices based on the structural information incorporated; (c) Higher order co-evolution correlation detection.</p> <p>The results presented here show that important gains can come from improvements to the sequence matching. What has been done here is simple and the pair correlations in sequence have been decomposed into singlet terms, which amounts to discarding much of the correlation information itself. The gains shown here are encouraging, and we would like to develop a sequence matching method that retains the pair (or higher order) correlation information, and even higher order correlations directly, and this should be possible by developing the sequence matching separately for different domain structures.</p> <p>The many body correlations in particular have the potential to transform the common perceptions in biology from pairs that are not actually so very informative to higher-order interactions. Fully understanding cellular processes will require a large body of higher-order correlation information such as has been initiated here for single proteins.</p>
dc.format.mimetype application/pdf
dc.identifier archive/
dc.identifier.articleid 7832
dc.identifier.contextkey 14007274
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath etd/16825
dc.language.iso en
dc.source.bitstream archive/|||Fri Jan 14 21:06:38 UTC 2022
dc.subject.disciplines Bioinformatics
dc.subject.keywords co-evolution
dc.subject.keywords high order dependence
dc.subject.keywords protein sequence
dc.subject.keywords protein structure
dc.subject.keywords sequence matching
dc.title Using evolutionary covariance to infer protein sequence-structure relationships
dc.type article
dc.type.genre dissertation
dspace.entity.type Publication
relation.isOrgUnitOfPublication faf0a6cb-16ca-421c-8f48-9fbbd7bc3747 Bioinformatics and Computational Biology dissertation Doctor of Philosophy
Original bundle
Now showing 1 - 1 of 1
3.54 MB
Adobe Portable Document Format