Bayesian variable selection in ultra-high dimensional settings
dc.contributor.advisor | Dutta, Somak | |
dc.contributor.advisor | Roy, Vivekananda | |
dc.contributor.advisor | Carriquiry, Alicia | |
dc.contributor.advisor | Dorman, Karin | |
dc.contributor.advisor | Kaiser, Mark | |
dc.contributor.author | Li, Dongjin | |
dc.contributor.department | Department of Statistics (LAS) | |
dc.date.accessioned | 2022-11-08T23:52:03Z | |
dc.date.available | 2022-11-08T23:52:03Z | |
dc.date.issued | 2021-08 | |
dc.date.updated | 2022-11-08T23:52:03Z | |
dc.description.abstract | This thesis is a collection of three papers focused on Bayesian variable selection for ultra-high dimensional problems. In various disciplines of modern scientific research, datasets are commonly available with tens of thousands of predictors but a limited number of observations. Nevertheless, only very few of these predictors are believed to be associated with the response, making variable selection for ultra-high dimensional setup an important and challenging problem. In this thesis, we present novel Bayesian methods for variable selection in ultra-high dimensional settings. In Chapter \ref{ch.sven}, we propose a Bayesian variable selection method, called SVEN, for Gaussian linear regression models. The method is based on a hierarchical Gaussian linear model with priors placed on the regression coefficients as well as on the model space. The use of degenerate {\it spike} priors on inactive variables and Gaussian {\it slab} priors for the important predictors results in sparsity of the regression coefficients and provides an analytically available form of the posterior probability of a model. The strong model selection consistency is shown to be attained when the number of predictors grows nearly exponentially with the sample size and even when the norm of mean effects solely due to the unimportant variables diverge, which is a novel attractive feature. We develop a scalable variable selection algorithm with an inbuilt screening method that efficiently explores the enormous model space, rapidly identify the regions of high posterior probabilities and make fast inference and prediction. To further mitigate multimodal posterior distributions, we use a temperature schedule whose values are guided by our model selection consistency results. An appealing byproduct of SVEN is the construction of novel model weight adjusted prediction intervals. To implement SVEN, we develop an R package called ``Bravo" which is now available on the Comprehensive R Archive Network (CRAN). In Chapter \ref{ch.bravo}, we describe the major features of the software and conduct step-by-step analysis on several examples to show the usages of the functions in the package. In Chapter \ref{ch.spsven}, we extend our SVEN model to include spatial random effects and call this Bayesian variable selection method SP-SVEN. Our SP-SVEN model is based on a hierarchical Gaussian linear mixed model where the well-known spike-and-slab priors are placed on the regression coefficients to achieve sparsity and a Gaussian intrinsic auto-regression prior is assigned to the spatial random effects. The use of Gaussian conjugate priors ensures the availability of the explicit form of the posterior distribution conditional on the two spatial parameters involved in the intrinsic auto-regression, and numerical integration is used to integrate out these two parameters to obtain the posterior distribution of a given model. For the priors on the two spatial parameters, we propose using the data from unsequenced varieties, if available, to build a hierarchical mixture model, and use the corresponding posterior distribution of the spatial parameters as their prior for the variable selection model. We also develop a scalable algorithm that embeds model based screening and uses fast Cholesky updates to compute the posterior probabilities, and thereby achieving fast exploration of the gigantic model space and rapid discovery of the high posterior regions. The outstanding performance of SVEN and SP-SVEN are demonstrated through a number of simulation studies and some real data examples from genome wide association studies in field trial experiments. | |
dc.format.mimetype | ||
dc.identifier.doi | https://doi.org/10.31274/td-20240329-105 | |
dc.identifier.orcid | 0000-0002-0928-0307 | |
dc.identifier.uri | https://dr.lib.iastate.edu/handle/20.500.12876/qzoD43ew | |
dc.language.iso | en | |
dc.language.rfc3066 | en | |
dc.subject.disciplines | Statistics | en_US |
dc.subject.keywords | GWAS | en_US |
dc.subject.keywords | hierarchical model | en_US |
dc.subject.keywords | posterior prediction | en_US |
dc.subject.keywords | shrinkage | en_US |
dc.subject.keywords | stochastic search | en_US |
dc.subject.keywords | subset selection | en_US |
dc.title | Bayesian variable selection in ultra-high dimensional settings | |
dc.type | dissertation | en_US |
dc.type.genre | dissertation | en_US |
dspace.entity.type | Publication | |
relation.isOrgUnitOfPublication | 264904d9-9e66-4169-8e11-034e537ddbca | |
thesis.degree.discipline | Statistics | en_US |
thesis.degree.grantor | Iowa State University | en_US |
thesis.degree.level | dissertation | $ |
thesis.degree.name | Doctor of Philosophy | en_US |
File
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Li_iastate_0097E_19611.pdf
- Size:
- 1.25 MB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 0 B
- Format:
- Item-specific license agreed upon to submission
- Description: