Computational approaches to characterize bacterial protein function and activity in their environments
Date
2024-05
Authors
Chung, Henri Christopher
Major Professor
Advisor
Friedberg, Iddo
Lawrence-Dill, Carolyn
Beattie, Gwyn
Schmitz-Esser, Stephen
Ganapathysubramanian, Baskar
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Abstract
In this thesis, I present three studies on novel improvements to the elucidation of gene and protein function in different systems; antimicrobial resistance, bacterial metabolic pathways, and protein tagging. The focus of these studies is to improve functional inference in their specific contexts, with the overall expression that improvement in functional annotation will be brought by advances in multiple methods rather than a single approach.
To understand the microbial genomic contributors to antimicrobial resistance, I used an elastic net logistic regression model to model and predict antimicrobial resistance (AMR) phenotypes for 24 antibiotics in Escherichia coli isolates from veterinary animals. We compared the fit and performance of multiple models designed to evaluate different potential modes of AMR genotype translation into resistance phenotypes. Our results show that a model
that considers the presence of individual AMR genes and the total number of AMR genes present from a set of genes known to confer resistance was able to accurately predict isolate resistance on average. We concluded an interpretable AMR prediction model can be used to predict resistance phenotypes across multiple host species accurately. We revealed testable hypotheses about how the mechanism of resistance may vary across antibiotics within the same class and across animal hosts for the same antibiotic.
To predict microbial proteins that contribute to the same biological pathway, we used a binary phylogenetic profiling method with fusions, a novel definition of protein function. A protein's phylogenetic profile indicates the presence or absence of similar proteins across a set of organisms. In our study, we compared the performance of profiles created using fusion versus sequence similarity as measured by MMseqs2, an ultra-fast sequence alignment tool. We predicted proteins with similar phylogenetic profiles to participate in the same pathway. Our results demonstrated improved predictions over the compared sequence-similarity method. We then applied our method to marine metagenome data and identified new putative cross-organism pathways, generating testable hypotheses for future analysis. These emergent pathways are a novel starting point for characterizing microbiomes by their functional activity, an essential step toward understanding microbial communities.
To improve the detection and functional analysis of proteins in vivo, we piloted EpicTope, a method that predicts optimal sites for internally inserted epitope tags without disrupting protein function. Internal integration of the epitope tag offers several important advantages over external or end tags; frameshift mutations can be isolated in parallel with epitope tagging and tags can be inserted into proteins that would otherwise be disrupted by end tags or that have ends that are inaccessible from the surface of the protein upon folding. For a target protein, we integrated predictions of tertiary structure, secondary structure, solvent accessibility, disordered binding regions, and sequence conservation to score sequence positions for their suitability for tag insertion. We empirically evaluated our predictions on Smad5, an essential downstream transcription factor of the TGF-βBMP signaling pathway in zebrafish; and performed a preliminary computational assessment by comparing EpicTope predictions on pathogenic non-frameshifting insertion mutations. We demonstrated that internally-tagged Smad5 created with EpicTope can rescue zebrafish Smad5 mutant embryos, while N- and C-terminal tagged Smad5 cannot. We then show that the internally-tagged Smad5 localized in the presumptive ventral-lateral region of the zebrafish gastrula, correlating linearly with phosphorylated Smad5 in that region. Our computational assessment of EpicTope showed that our simple algorithm performed within error margins of other state-of-the-art pathogenic mutation predictors, despite not being designed for that explicit task. We intend for EpicTope to be used by molecular biologists to easily rank and identify positions in a protein of interest suitable for internal tag, and to improve on this developing area of research. EpicTope is available at https://github.com/FriedbergLab/EpicTope as an R package.
Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
dissertation