Exploring the utility and effectiveness of protein language models on protein relations

Thumbnail Image
Date
2024-08
Authors
Kilinc, Mesih
Major Professor
Advisor
Jernigan, Robert L
Huang, Xiaoqiu
Macintosh, Gustavo
Walley, Justin
Wu, Zhijun
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Proteins are key macromolecules in the lifecycle of all organisms. Understanding the relationship between proteins increases our understanding of their function by inferring functions from close relatives. Many industrial processes and diseases can benefit a better understanding of protein functions. Due to their importance, traditional sequence-based protein similarity tools such as BLAST are being used widely. The paradigm is widely accepted that sequence defines structure, which in turn defines the dynamics and the function. However, proteins are interesting in that one amino acid mutation can sometimes change the whole structure topology, and yet, we can find similar structures with as low as 10% sequence similarity. This shows the need to understand the complex protein relations beyond the local amino acid sequence dependences. Recently, transformer-based language models for both natural languages and proteins has shown astonishing advances. They can capture complex relationships between language and its environment even though they are only trained with the sequence itself. In this work, we harvest the contextual understanding of protein language models and optimize it for high accuracy and efficiency in identifying protein relations.
Series Number
Journal Issue
Is Version Of
Versions
Series
Type
dissertation
Comments
Rights Statement
Copyright
Funding
Subject Categories
DOI
Supplemental Resources
Source