##
Small area prediction based on unit level models when the covariate mean is measured with error

##### Date

##### Authors

##### Major Professor

##### Advisor

##### Committee Member

##### Journal Title

##### Journal ISSN

##### Volume Title

##### Publisher

##### Altmetrics

##### Authors

##### Research Projects

##### Organizational Units

##### Journal Issue

##### Is Version Of

##### Versions

##### Series

##### Department

##### Abstract

Agencies and policy makers are interested in constructing reliable estimates for areas with small sample sizes, where areas often refer to geographic areas and demographic groups. The estimation for such areas is known as small area estimation. Procedures based on models have been used to construct estimates for the small area means, by exploiting auxiliary information. Mixed models are suitable small area models because they combine different sources of information and contain different sources of error. The models studied in this dissertation are unit level generalized linear mixed models in situations where the mean of an auxiliary variable is subject to estimation error. Different cases of auxiliary information are considered. Prediction methods for the small area mean, estimation of the prediction mean squared error (MSE) and confidence intervals (CIs) for the small area means are presented for the case when the response variable is nonnormal. In the simulation studies, the response variable is binary.

In the first study, two methods for constructing small area mean predictions are considered. The first method is based on the conditional distribution of the random area effects given the response variables. The second method, called the 'plug-in method' is based on the direct substitution of the predicted random area effects into the small area mean expression. Using a simulation study, we show that the 'plug-in' predictor for the small area mean can have sizeable bias.

The estimation of prediction MSE for small area models is complicated, particularly in a nonlinear model setting. In the second study, the efficiency gains associated with the random specification for the auxiliary variable measured with error are demonstrated. The prediction MSE is smaller when additional auxiliary information is available and included in the estimation. The effect of including auxiliary information, if available, in the estimation is smaller for the random mean model than for the fixed mean model for the covariates. A parametric fast double bootstrap procedure is proposed for the estimation of the MSE of the predictor. The proposed procedure has smaller bootstrap error than a classical fast double bootstrap procedure with the same number of samples. We call the proposed procedure telescoping fast double bootstrap.

Most small area studies, including the first two studies in this dissertation, focus on constructing predictors for the area means and on estimating the variance of the prediction errors. The ultimate goal of this dissertation is to construct CIs for the small area means. The most common CI is based on the estimated prediction MSE and approximates the distribution of parameter estimates with a normal distribution. The coverage error for such an interval can be large when the distribution of the parameter estimate is skewed and when the standard error is poorly estimated. We present two sided CIs for the small area means of a binary response variable. The estimation of the prediction error variance and the estimation of the cutoff points are key components in the construction of confidence intervals for the small area means. A linear approximation of the model is considered and a Taylor variance approximation is presented for the prediction error variance. We compare the normal approximation method, the percentile bootstrap method and the pivot-like bootstrap method for estimating the cutoff points using a simulation study. Level one bootstrap and telescoping fast double bootstrap methods are used to construct CIs for the small area means. Pivot-like bootstrap CIs perform better than the percentile bootstrap CIs, with respect to the coverage errors. Double bootstrap CIs perform well, but do not improve the coverage accuracy compared to the level one bootstrap CIs. A method for constructing bootstrap CIs for a general level is proposed. The user is given a degrees of freedom for the Student-t distribution and a standard error of the small area mean prediction. The CI for the small area mean can be constructed in the common form $(\hat{\theta}_i \pm \zeta_{1-\alpha/2,i,df_i} se(\hat{\theta}_i)),$ where $i$ denotes the area, $1-\alpha$ is the desired level, $\hat{\theta}_i$ is the predicted small area mean, $\zeta_{1-\alpha/2,i, df_i}$ is the $100(1-\alpha/2)^{th}$ quantile of the Student-t distribution with given degrees of freedom $df_i$, and $se(\hat{\theta}_i)$ is the given standard error of $\hat{\theta}_i$. The coverage of the general bootstrap CI is comparable to the coverage of the level specific bootstrap CI.