Data-driven approaches to heterogeneous datasets

Thumbnail Image
Date
2023-08
Authors
Villanueva, Paul
Major Professor
Advisor
Howe, Adina
Dickerson, Julie
Ikuma, Kaoru
Lee, Jaejin
Dorman, Karin
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Journal Issue
Is Version Of
Versions
Series
Department
Agricultural and Biosystems Engineering
Abstract
Recent technological advances in both experimental assay platforms and computational resources have enabled massive increases of raw data generated that is ripe for bioinformatics research. These datasets tend to be large and heterogeneous, enabling researchers to approach studies from an interdisciplinary perspective. Additionally, the reduced barrier to entry to implement bioinformatics pipelines and machine learning workflows brought about by the advent of AI-driven coding assistants has enabled a wider range of people to participate in this research without formal training. Rather than eliminate the need for human expertise, these datasets actually require more interdisciplinary knowledge and thoughtful methodologies in order to gain insights into the systems they describe using these powerful computational resources. This dissertation collects three studies in which this was practiced. The first project concerns MetaFunPrimer, a high-throughput qPCR primer design pipeline that uses publicly available metagenomes to design environment-specific qPCR primers. Primers designed by this program are used in the second project about the effects of nutrient additions on greenhouse gas emissions and mineralization rates, as well as the nitrification and denitrification functional communities in the soil. Finally, the third project concerns harmful algal bloom prediction using a heterogeneous, high-dimensional dataset where a novel feature selection method is implemented in order to identify key features. These studies show that while bioinformatics is heading towards a more cross-discipline, data-driven research, careful consideration of the relationships between the data and the methods employed to analyze them are still vitally important.
Comments
Description
Keywords
Citation
Source
Subject Categories
Copyright