Topics on nonparametric calibration, kernel ridge regression imputation\\ and nonparametric propensity score estimation

Wang, Hengfang
Major Professor
Jae Kwang Kim
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Organizational Unit
Journal Issue

This dissertation focuses on statistical issues arising in survey data and item nonresponse. In particular, it covers topics on nonparametric calibration in survey data, kernel ridge regression imputation and density ratio estimation in propensity score approach.

The first project is about nonparametric calibration in survey sampling. Estimation of a finite population mean or total is important in survey sampling. Calibration estimation is a popular method to address this issue by adjusting the sampling weights to match the unknown population totals of auxiliary variables. When the auxiliary vairbales are observed for all units in the finite population, one can apply the model calibration using the working outcome model. Traditional parametric calibration approach might not be robust in practice. We develope a nonparametric calibration method employing infinite-dimensional reproducing kernel Hilbert space (RKHS) that does not require an explicit outcome model. Under mild assumptions, the proposed calibration estimator attains the Godambe-Joshi lower bound asymptotically.

The second project is about handling missing data using kernel ridge regression method. Missing data is frequently encountered in practice. In some cases, missingness is planned to reduce the cost or the response burden. Ignoring the cases with missing values can lead to misleading results. To avoid the potential problem with missing data, imputation is commonly used. Kernel Ridge Regression (KRR) is a modern nonparametric regression technique based on the theory of Reproducing Kernel Hilbert Space, which enjoys the model robustness. We consider such method to imputation. Specifically, we establish the root-n consistency of the KRR imputation estimators and show that it is optimal in the sense that it achieves the lower bound of the semiparametric asymptotic variance. We further consider propensity score weighting method using kernel ridge regression and discuss its asymptotic properties.

The third project is about propensity score estimation using density ration function approach. The propensity score approach is also a popular tool for handling item nonresponse. The propensity score is often developed using the model for the response probability. In practice, regression models for binary response, e.g., logistic regression, can be utilized to model the response probability given the observed auxiliary information. An inverse probability weighting estimator can then be constructed to get an unbiased estimation of the target parameter. We consider an alternative approach of estimating the inverse of the propensity scores using density ratio function. Density ratio estimation can be obtained by applying the maximum entropy method which uses the Kullback-Leibler distance measure. By including the covariates for the outcome regression models only into the density ratio model, we can achieve efficient propensity score estimation. We further extend the proposed approach to handling the multivariate missing case.