Statistical methods for identifying differentially expressed genes using hierarchical models
The work presented in this dissertation focuses on identifying differentially expressed genes using data from microarray or RNA sequencing (RNA-seq) experiments. RNA-seq and microarray data sets frequently contain few observations for each of several thousand genes. Many statistical methods use hierarchical models to share information across genes when estimating model parameters, improving accuracy and reducing variability of parameter estimators. However, even after sharing information across genes, estimators are still subject to some non-negligible combination of bias and variability. Some methods reduce estimator variability by assuming model parameters are constant across genes, which, in nearly all cases, can be shown to be inaccurate and adversely affects model performance. With the flexibility of gene-specific parameter estimates comes an increase in estimator variance, which is often ignored in existing methods for detecting differential expression. We describe novel methods for analyzing microarray and RNA-seq data that allow for gene-specific parameter estimates and account for estimator uncertainty. We also demonstrate the detrimental effects of the assumption that differences due to differential expression follow the same distribution as differences across genes. This assumption is commonly used in microarray models, and we demonstrate how it can be relaxed. Additionally, we present an approach for modeling a portion of RNA-seq data that is often simply discarded. In general, our suggested methods offer improved power to detect differential expression and/or better control of false discovery rates when compared to competing methods.