Power enhanced gene differential expression analysis by incorporating gene network information

Thumbnail Image
Shirazi, Amin
Major Professor
Liu, Peng
Qiu, Yumou
Sabzikar, Farzad
Wang, Chong
Chu, Lynna
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Journal Issue
Is Version Of
Detecting differentially expressed (DE) genes is a fundamental step in gene expression data analysis. In this dissertation, we propose a new method for differential expression analysis (DEA) that utilizes the dependence among genes to boost the power of detecting DE genes. We first develop our dependence boosted differential expression (DBDE) procedure for microarray data that are treated as continuous measurements. Our proposed statistic leverages the dependence of other genes with the target gene by regressing the target on its neighborhood genes that are defined by genes that are associated with the target gene, and the set of neighborhood genes can be obtained through prior information on gene dependence networks or biological knowledge. The DBDE statistic is then built on the residuals of the linear model. The proposed test statistic has a smaller variance, so a higher signal-to-noise ratio (SNR) is obtained, and accordingly, a higher power is achieved. We assess the performance of our proposed method through a series of simulation studies and show that in many cases, the proposed DBDE method is superior or at least comparable to other existing alternative approaches in power while controlling the FDR at nominal level. Next, we incorporate the LIMMA procedure in the DBDE method to address the small sample size problem in the DEA of microarray data. LIMMA provides an integrated solution for analyzing complex experimental designs with small sample sizes, but it does not account for dependence among genes. Simulation results indicate that the combination of LIMMA and DBDE significantly improves the performance of DEA in small sample size cases and outperforms other procedures in a wide range of settings. Third, we extend our method to DEA of RNA-seq count data. We transform the read counts to log-count-per-million (log-CPM) values, and then apply the DBDE procedure. To account for the mean-variance relationship in count data, we take advantage of the LIMMA-trend idea, which models variance at the gene level. Numerical results show that our proposed test has higher power than other methods. Finally, we also construct a computation pipeline for implementing the proposed methods in real data analysis.
Subject Categories