Variance estimation under random imputation

Tollefson, Margot
Major Professor
Wayne A. Fuller
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Organizational Unit
Journal Issue

The properties of the usual estimator of the population mean or total calculated using a sample that includes randomly imputed values are derived. Simple random sampling without replacement, stratified sampling, and a general sampling plan are considered. It is assumed that imputation is done within imputation classes and that the missing values are missing at random within the imputation classes. For stratified sampling and the general sampling scheme, an underlying superpopulation is assumed, where the variables of interest are identically and independently distributed within the imputation classes. Elements within one imputation class are assumed to be independent of elements in other imputation classes. For imputation of a single variable, the set of respondents is replicated as many times as the set fits into the set of missing values and the remaining missing values are filled in by respondents chosen by simple random sampling without replacement from the set of respondents. The donors are assigned to the missing values at random. Three methods of imputation applicable to two variables with missing values are considered. The imputation schemes for two variables are variations on the imputation scheme for a single variable;For stratified random sampling and the general sampling plan, the expected values of the estimator of the population mean are given conditional on the finite population and unconditional with respect to the superpopulation. The expectation of the usual estimated total, given the finite population, is biased. Under the model, the mean of a simple random sample with imputed values is an unbiased estimator of the population mean;Three estimators of the variance of the estimated mean with imputation are given for simple random sampling and stratified sampling. For the general sampling scheme a variance estimator is given and a general form for the estimated covariance between estimators is given for the population totals. It is demonstrated that these estimators are suitable for implementation into survey sampling software;Results for the general model are applied to an imputation problem posed by the Soil Conservation Service's 1987 National Resources Inventory.