Studies on semiparametric spatial regression models

Wang, Jue
Major Professor
Li Wang
Steve (Lisheng) Hou
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Organizational Unit
Journal Issue

In this thesis, I study estimations and inferences for semiparametric spatial regression models and generalized geoadditive models (GgAMs). I use the bivariate penalized spline over triangulation (BPST) method in these models to incorporate the spatial information when it is available. There are three topics in the thesis.

In the first topic, we try to develop a sparse-partially linear spatial regression model ($\mathcal{S}$-PLSM) using a doubly penalized estimator to select and estimate the most significant linear covariates. We apply BPST to approximate a bivariate function over a spatial domain. A standard error formula is constructed to estimate the standard deviation of the estimators, which is tested by simulation studies. We show the consistency of our sparse estimator with asymptotic normality. An application to United States mortality illustrates improvements in estimation and prediction from the use of our estimator relative to other methods.

In the second topic, a generalized version of PLSM (GPLSM) is developed to allow a nonlinear link function relating the covariates to the mean of the response variables. This extension allows our method to deal with non-continuous response variables, such as count and binary variables. The iteratively reweighted least square (IRLS) algorithm helps to achieve the computational efficiency of our estimator. The consistency of the proposed estimator is proved with a convergence rate. A standard error formula is developed to construct confidence intervals for the linear estimator. A crash frequency real data analysis demonstrates the accuracy in estimation and prediction for GPLSM.

In the last topic, I build an \textsf{R} package, \textbf{GgAM}, which integrates model structure identification process, estimation methods, statistical inference tools of GgAMs together. We develop a semiparametric version of GgAM by adding a linear part into nonparametric GgAMs. This model shares the benefits from univariate splines, bivariate splines and local polynomials. A penalized quasi-likelihood estimator is firstly derived through the IRLS algorithm and then a spline-backfitted local polynomial estimator is obtained.

We propose a standard error formula for the parametric estimator in the model as well. Simultaneous confidence bands are developed to measure the accuracy of the univariate spline estimators.

A model structure identification process is contained before model fitting to better identify the function form (linearity/nonlinearity) of the continuous covariates.

Simulation studies are conducted to show the estimation accuracy and predictive power of our GgAM. The datasets of Georgia education attainment, Sydney housing prices, and Florida crash frequency are included to show the convenient and flexible uses of functions in the \textbf{GgAM} package.

In this thesis, I aim to develop computational algorithms to get accurate estimators and propose efficient inference tools to better interpret the results for GgAMs. These tools can be widely used in social, economic, and geographic applications with spatial data to draw perceptive conclusions.