Statistical methods in detecting differential expressed genes, analyzing insertion tolerance for genes and group selection for survival data

dc.contributor.advisor Peng Liu
dc.contributor.advisor Chong Wang
dc.contributor.author Liu, Fangfang
dc.contributor.department Statistics (LAS)
dc.date 2018-08-11T19:08:57.000
dc.date.accessioned 2020-06-30T02:57:15Z
dc.date.available 2020-06-30T02:57:15Z
dc.date.copyright Thu Jan 01 00:00:00 UTC 2015
dc.date.embargo 2015-10-09
dc.date.issued 2015-01-01
dc.description.abstract <p>The thesis is composed of three independent projects: (i) analyzing transposon-sequencing data to infer functions of genes on bacteria growth (chapter 2), (ii) developing semi-parametric Bayesian method method for differential gene expression analysis with RNA-sequencing data (chapter 3), (iii) solving group selection problem for survival data (chapter 4). All projects</p> <p>are motivated by statistical challenges raised in biological research.</p> <p>The first project is motivated by the need to develop statistical models to accommodate the transposon insertion sequencing (Tn-Seq) data, Tn-Seq data consist of sequence reads around each transposon insertion site.</p> <p>The detection of transposon insertion at a given site indicates that the disruption of genomic sequence at this site does not cause essential function loss and the bacteria can still grow.</p> <p>Hence, such measurements have been used to infer the functions of each gene on bacteria growth. We propose a zero-inflated Poisson regression method for analyzing the Tn-Seq count data, and derive an Expectation-Maximization (EM) algorithm to obtain parameter estimates. We also propose a multiple testing procedure that categorizes genes into each of the three states, hypo-tolerant, tolerant, and hyper-tolerant, while controlling false discovery rate. Simulation studies show our method provides good</p> <p>estimation of model parameters and inference on gene functions.</p> <p>In the second project, we model the count data from RNA-sequencing experiment for each gene using a Poisson-Gamma hierarchical model, or equivalently, a negative binomial (NB) model. We derive a full semi-parametric Bayesian approach with Dirichlet process as the prior for the fold changes between two treatment means. An inference strategy using Gibbs algorithm is developed for differential expression analysis. We evaluate our method with several simulation studies, and the results demonstrate that our method outperforms other methods including the popularly applied ones such as edgeR and DESeq.</p> <p>In the third project, we develop a new semi-parametric Bayesian method to address the group variable selection problem and study the dependence of survival outcomes on the grouped predictors using the Cox proportional hazard model. We use indicators for groups to induce sparseness and obtain the posterior inclusion probability for each group. Bayes factors are used to evaluate whether the groups should be selected or not. We compare our method with one frequentist method (HPCox) based on several simulation studies and show that our method performs better than HPCox method.</p> <p>In summary, this dissertation tackles several statistical problems raised in biological research, including high-dimensional genomic data analysis and survival analysis. All proposed methods are evaluated with simulation studies and show satisfactory performances. We also apply the proposed methods to real data analysis.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/etd/14514/
dc.identifier.articleid 5521
dc.identifier.contextkey 7986485
dc.identifier.doi https://doi.org/10.31274/etd-180810-4063
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath etd/14514
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/28699
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/etd/14514/Liu_iastate_0097E_14762.pdf|||Fri Jan 14 20:21:43 UTC 2022
dc.subject.disciplines Statistics and Probability
dc.subject.keywords Statistics
dc.title Statistical methods in detecting differential expressed genes, analyzing insertion tolerance for genes and group selection for survival data
dc.type dissertation
dc.type.genre dissertation
dspace.entity.type Publication
relation.isOrgUnitOfPublication 264904d9-9e66-4169-8e11-034e537ddbca
thesis.degree.level dissertation
thesis.degree.name Doctor of Philosophy
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Liu_iastate_0097E_14762.pdf
Size:
1.3 MB
Format:
Adobe Portable Document Format
Description: