Small area estimation and graphical model for complex surveys

dc.contributor.advisor Zhu, Zhengyuan
dc.contributor.advisor Berg, Emily
dc.contributor.advisor Kim, Jae Kwang
dc.contributor.advisor Niemi, Jarad
dc.contributor.advisor Dai, Xiongtao
dc.contributor.author Sun, Hao
dc.contributor.department Statistics (LAS)
dc.date.accessioned 2022-11-09T05:47:12Z
dc.date.available 2022-11-09T05:47:12Z
dc.date.embargo 2023-09-28T00:00:00Z
dc.date.issued 2022-08
dc.date.updated 2022-11-09T05:47:12Z
dc.description.abstract A large-scale survey typically contains more than one response variable of interest which can be either continuous or discrete. Solving all research questions by estimating each parameter of interest separately is not only inefficient but also problematic due to ignoring the conditional correlation among the response variables. To address this issue, we develop a valid joint model to include mixed-type response variables and auxiliary variables by first specifying the conditional distribution of each response variable given all other information. In this dissertation, we apply this multivariate model to solve multiple statistical problems arising in complex surveys, including small area estimation, graphical estimation, and an extension for item non-response. The first project is about bivariate small area estimation. Small area estimation is a popular approach which provides model-based estimators for a survey where the sample sizes are considered too small to achieve reliability through direct estimators. In this project, we consider one continuous response variable with conditional Gaussian distribution and one binary response variable with conditional Bernoulli distribution. We introduce a conditionally specified bivariate mixed-effects model and provide a necessary and sufficient condition to guarantee that the joint model is valid. To estimate the model parameters for this mixed-effects model, we develop a Monte Carlo EM (MCEM) algorithm to iteratively update the model parameters. We further calculate an empirical Bayes predictor (EBP) for each small area parameters and apply the parametric bootstrap approach to estimate the mean squared error. We apply the method to data from the Conservation Effects Assessment Project (CEAP), which is a two-phase survey which measures environmental impacts of agriculture and conservation. We select a continuous and binary variable from CEAP. The continuous variable is sediment loss and the binary variable is the proportion of land where the soil loss is exceeded. We apply our bivariate mixed-effects model to estimate the marginal means and domain means in different watersheds. Compared with traditional univariate models, our approach provided more convincing estimates of domain means with better scientific interpretation. The second project is about understanding the multivariate relationships among high-dimensional response variables for a complex survey under an informative design. We assume the super-population model is a pairwise graphical model where each node can be either continuous or discrete so that the conditional distribution is Gaussian for a continuous node and Bernoulli or multinomial for a discrete node. The complex sample design can cause the distribution in the sample to differ from the distribution in the population. To address this issue, we apply a penalized weighted estimating equation for nodewise neighbourhood selection and use a weighted Bayesian information criterion (WBIC) to select the tuning parameter. We provide theoretical results for neighborhood recovery and convergence speed under suitable conditions. We evaluate the selection approach through a model-based simulation and a design-based simulation using Academic Performance Index (API) data in California in 2000. The third project is about multivariate small area estimation for mixed-type response variables with item non-response. We provide a valid joint multivariate mixed-effects model for all the response variables. Item non-response means some respondents do not answer all the questions so that the sample data contains missing response values. We develop a novel Monte Carlo EM (MCEM) algorithm to address item non-response by applying Gibbs sampling along with a sampling importance re-sampling (SIR) algorithm to generate the missing response values during the MCEM algorithm. We further calculate the empirical Bayes predictors (EBP) to estimate the small area parameters. We conduct two data analyses. One uses the Academic Performance Index (API) data in California. The second uses the Pet Demographic Survey (PDS) data. Both of the two analysis results show the superiority of the multivariate mixed-effects model over the univariate mixed-effects model. Our approach has an important application to split questionnaire design (SQD), which splits a questionnaire into subsets of questions and assigns a subset to a respondent. The SQD is a popular approach for handling low response rates and low response quality due to a lengthy questionnaire. With our model, SQD is demonstrated to have better performance than the traditional full questionnaire design (FQD).
dc.format.mimetype PDF
dc.identifier.doi https://doi.org/10.31274/td-20240329-674
dc.identifier.orcid 0000-0001-7995-7101
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/nrQBLYAz
dc.language.iso en
dc.language.rfc3066 en
dc.subject.disciplines Statistics en_US
dc.subject.keywords Complex survey design en_US
dc.subject.keywords Graphical model en_US
dc.subject.keywords Multivariate mixed-effects model en_US
dc.subject.keywords Small area estimation en_US
dc.subject.keywords Split Questionnarie Design en_US
dc.title Small area estimation and graphical model for complex surveys
dc.type dissertation en_US
dc.type.genre dissertation en_US
dspace.entity.type Publication
relation.isOrgUnitOfPublication 264904d9-9e66-4169-8e11-034e537ddbca
thesis.degree.discipline Statistics en_US
thesis.degree.grantor Iowa State University en_US
thesis.degree.level dissertation $
thesis.degree.name Doctor of Philosophy en_US
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Sun_iastate_0097E_20338.pdf
Size:
1.16 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description: