Statistical methods for ChIP-seq and microbiome studies using next-generation DNA sequencing data

Goren, Emily

Statistical methods for ChIP-seq and microbiome studies using next-generation DNA sequencing data

dc.contributor.advisor	Peng . Liu
dc.contributor.advisor	Chong . Wang
dc.contributor.author	Goren, Emily
dc.contributor.department	Statistics (LAS)
dc.date	2020-02-12T22:55:16.000
dc.date.accessioned	2020-06-30T03:20:12Z
dc.date.available	2020-06-30T03:20:12Z
dc.date.copyright	Sun Dec 01 00:00:00 UTC 2019
dc.date.embargo	2021-11-09
dc.date.issued	2019-01-01
dc.description.abstract	<p>In this dissertation, we studied two different types of data generated by next-generation sequencing technologies. Chapter 2 is about analysis of ChIP-seq data with biological replicates to identify protein-binding sites. Chapters 3-4 are about analysis of microbiome data to estimate the causal effects of microbiome features on interesting outcomes in presence of confounding variables.</p> <p>ChIP-seq experiments aim to detect DNA-protein binding sites and require biological replication to draw inferential conclusions. However, there is no current consensus on how to analyze ChIP-seq data with biological replicates. Very few methodologies exist for the joint analysis of replicated ChIP-seq data, with approaches ranging from combining the results of analyzing replicates individually to joint modeling of all replicates. Combining the results of individual replicates analyzed separately can lead to reduced peak classification performance compared to joint modeling. Currently available methods for joint analysis may fail to control the false discovery rate at the nominal level. In Chapter 2, we propose BinQuasi, a peak caller for replicated ChIP-seq data, that jointly models biological replicates using a generalized linear model framework and employs a one-sided quasi-likelihood ratio test to detect peaks. When applied to simulated and real data, BinQuasi performs favorably compared to existing methods, including better control of false discovery rate than existing joint modeling approaches. BinQuasi offers a flexible approach to joint modeling of replicated ChIP-seq data which is preferable to combining the results of replicates analyzed individually. We created an R package called BinQuasi that is available at https://cran.r-project.org/package=BinQuasi.</p> <p>Microbiome studies have uncovered associations between microbes and human, animal, and plant health outcomes. This has led to an interest in identifying microbial interventions for treatment of disease and optimization of crop yields which will require the identification of individual relevant microbiome features. That task is challenging because of the high dimensionality of microbiome data and the confounding that results from the complex and dynamic interactions among host, environment, and microbiome. The performance of variable selection and estimation procedures may be unsatisfactory when there are differentially abundant features resulting from a categorical confounding variable. For microbiome studies with such a confounding structure, we propose a standardization approach in Chapter 3 to estimation of population effects of individual microbiome features. Due to the high dimensionality and confounding-induced correlation between features, we propose feature screening, selection, and estimation conditional on each stratum of the confounder. Comprehensive simulation studies are used to demonstrate the advantages of our approach in recovering relevant features. Utilizing a potential-outcomes framework, we outline assumptions required to ascribe causal, rather than associational, interpretations to the identified microbiome effects. We applied the proposed approach to an agricultural study of the rhizosphere microbiome of sorghum in which nitrogen fertilizer application is a confounding variable. We identified microbial taxa that are consistent with biological understanding of potential plant-microbe interactions.</p> <p>In Chapter 4, we present an inverse probability weighting approach to causal analysis of the effects of individual microbiome features in presence of continuous confounding variables. In simulated microbiome data, we show inverse probability weighting in marginal models provides microbiome effect estimates with lower bias and mean squared error than conditional regression adjustment for confounding. Our approach is demonstrated using an agricultural data set for identification of soil microbes with the potential to modulate biomass production in sorghum.</p>
dc.format.mimetype	application/pdf
dc.identifier	archive/lib.dr.iastate.edu/etd/17686/
dc.identifier.articleid	8693
dc.identifier.contextkey	16524763
dc.identifier.s3bucket	isulib-bepress-aws-west
dc.identifier.submissionpath	etd/17686
dc.identifier.uri	https://dr.lib.iastate.edu/handle/20.500.12876/31869
dc.language.iso	en
dc.source.bitstream	archive/lib.dr.iastate.edu/etd/17686/Goren_iastate_0097E_18453.pdf\|\|\|Fri Jan 14 21:27:33 UTC 2022
dc.subject.disciplines	Statistics and Probability
dc.subject.keywords	bioinformatics
dc.subject.keywords	biostatistics
dc.subject.keywords	causal inference
dc.subject.keywords	generalized linear models
dc.subject.keywords	high-dimensional inference
dc.subject.keywords	next-generation sequencing
dc.title	Statistical methods for ChIP-seq and microbiome studies using next-generation DNA sequencing data
dc.type	dissertation	en_US
dc.type.genre	dissertation	en_US
dspace.entity.type	Publication
relation.isOrgUnitOfPublication	264904d9-9e66-4169-8e11-034e537ddbca
thesis.degree.discipline	Statistics
thesis.degree.level	dissertation
thesis.degree.name	Doctor of Philosophy

File

Original bundle

Now showing 1 - 1 of 1

Name:: Goren_iastate_0097E_18453.pdf
Size:: 1.66 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Theses and Dissertations