Small area estimation and graphical model for complex surveys
dc.contributor.advisor | Zhu, Zhengyuan | |
dc.contributor.advisor | Berg, Emily | |
dc.contributor.advisor | Kim, Jae Kwang | |
dc.contributor.advisor | Niemi, Jarad | |
dc.contributor.advisor | Dai, Xiongtao | |
dc.contributor.author | Sun, Hao | |
dc.contributor.department | Statistics (LAS) | |
dc.date.accessioned | 2022-11-09T05:47:12Z | |
dc.date.available | 2022-11-09T05:47:12Z | |
dc.date.embargo | 2023-09-28T00:00:00Z | |
dc.date.issued | 2022-08 | |
dc.date.updated | 2022-11-09T05:47:12Z | |
dc.description.abstract | A large-scale survey typically contains more than one response variable of interest which can be either continuous or discrete. Solving all research questions by estimating each parameter of interest separately is not only inefficient but also problematic due to ignoring the conditional correlation among the response variables. To address this issue, we develop a valid joint model to include mixed-type response variables and auxiliary variables by first specifying the conditional distribution of each response variable given all other information. In this dissertation, we apply this multivariate model to solve multiple statistical problems arising in complex surveys, including small area estimation, graphical estimation, and an extension for item non-response. The first project is about bivariate small area estimation. Small area estimation is a popular approach which provides model-based estimators for a survey where the sample sizes are considered too small to achieve reliability through direct estimators. In this project, we consider one continuous response variable with conditional Gaussian distribution and one binary response variable with conditional Bernoulli distribution. We introduce a conditionally specified bivariate mixed-effects model and provide a necessary and sufficient condition to guarantee that the joint model is valid. To estimate the model parameters for this mixed-effects model, we develop a Monte Carlo EM (MCEM) algorithm to iteratively update the model parameters. We further calculate an empirical Bayes predictor (EBP) for each small area parameters and apply the parametric bootstrap approach to estimate the mean squared error. We apply the method to data from the Conservation Effects Assessment Project (CEAP), which is a two-phase survey which measures environmental impacts of agriculture and conservation. We select a continuous and binary variable from CEAP. The continuous variable is sediment loss and the binary variable is the proportion of land where the soil loss is exceeded. We apply our bivariate mixed-effects model to estimate the marginal means and domain means in different watersheds. Compared with traditional univariate models, our approach provided more convincing estimates of domain means with better scientific interpretation. The second project is about understanding the multivariate relationships among high-dimensional response variables for a complex survey under an informative design. We assume the super-population model is a pairwise graphical model where each node can be either continuous or discrete so that the conditional distribution is Gaussian for a continuous node and Bernoulli or multinomial for a discrete node. The complex sample design can cause the distribution in the sample to differ from the distribution in the population. To address this issue, we apply a penalized weighted estimating equation for nodewise neighbourhood selection and use a weighted Bayesian information criterion (WBIC) to select the tuning parameter. We provide theoretical results for neighborhood recovery and convergence speed under suitable conditions. We evaluate the selection approach through a model-based simulation and a design-based simulation using Academic Performance Index (API) data in California in 2000. The third project is about multivariate small area estimation for mixed-type response variables with item non-response. We provide a valid joint multivariate mixed-effects model for all the response variables. Item non-response means some respondents do not answer all the questions so that the sample data contains missing response values. We develop a novel Monte Carlo EM (MCEM) algorithm to address item non-response by applying Gibbs sampling along with a sampling importance re-sampling (SIR) algorithm to generate the missing response values during the MCEM algorithm. We further calculate the empirical Bayes predictors (EBP) to estimate the small area parameters. We conduct two data analyses. One uses the Academic Performance Index (API) data in California. The second uses the Pet Demographic Survey (PDS) data. Both of the two analysis results show the superiority of the multivariate mixed-effects model over the univariate mixed-effects model. Our approach has an important application to split questionnaire design (SQD), which splits a questionnaire into subsets of questions and assigns a subset to a respondent. The SQD is a popular approach for handling low response rates and low response quality due to a lengthy questionnaire. With our model, SQD is demonstrated to have better performance than the traditional full questionnaire design (FQD). | |
dc.format.mimetype | ||
dc.identifier.doi | https://doi.org/10.31274/td-20240329-674 | |
dc.identifier.orcid | 0000-0001-7995-7101 | |
dc.identifier.uri | https://dr.lib.iastate.edu/handle/20.500.12876/nrQBLYAz | |
dc.language.iso | en | |
dc.language.rfc3066 | en | |
dc.subject.disciplines | Statistics | en_US |
dc.subject.keywords | Complex survey design | en_US |
dc.subject.keywords | Graphical model | en_US |
dc.subject.keywords | Multivariate mixed-effects model | en_US |
dc.subject.keywords | Small area estimation | en_US |
dc.subject.keywords | Split Questionnarie Design | en_US |
dc.title | Small area estimation and graphical model for complex surveys | |
dc.type | dissertation | en_US |
dc.type.genre | dissertation | en_US |
dspace.entity.type | Publication | |
relation.isOrgUnitOfPublication | 264904d9-9e66-4169-8e11-034e537ddbca | |
thesis.degree.discipline | Statistics | en_US |
thesis.degree.grantor | Iowa State University | en_US |
thesis.degree.level | dissertation | $ |
thesis.degree.name | Doctor of Philosophy | en_US |
File
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Sun_iastate_0097E_20338.pdf
- Size:
- 1.16 MB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 0 B
- Format:
- Item-specific license agreed upon to submission
- Description: