Model estimation, identification and inference for next-generation functional data and spatial data
This dissertation is composed of three research projects focused on model estimation, identification, and inference for next-generation functional data and spatial data.
The first project deals with data that are collected on a count or binary response with spatial covariate information. In this project, we introduce a new class of generalized geoadditive models (GGAMs) for spatial data distributed over complex domains. Through a link function, the proposed GGAM assumes that the mean of the discrete response variable depends on additive univariate functions of explanatory variables and a bivariate function to adjust for the spatial effect. We propose a two-stage approach for estimating and making inferences of the components in the GGAM. In the first stage, the univariate components and the geographical component in the model are approximated via univariate polynomial splines and bivariate penalized splines over triangulation, respectively. In the second stage, local polynomial smoothing is applied to the cleaned univariate data to average out the variation of the first-stage estimators. We investigate the consistency of the proposed estimators and the asymptotic normality of the univariate components. We also establish the simultaneous confidence band for each of the univariate components. The performance of the proposed method is evaluated by two simulation studies and the crash counts data in the Tampa-St. Petersburg urbanized area in Florida.
In the second project, motivated by recent work of analyzing data in the biomedical imaging studies, we consider a class of image-on-scalar regression models for imaging responses and scalar predictors. We propose to use flexible multivariate splines over triangulations to handle the irregular domain of the objects of interest on the images and other characteristics of images. The proposed estimators of the coefficient functions are proved to be root-$n$ consistent and asymptotically normal under some regularity conditions. We also provide a consistent and computationally efficient estimator of the covariance function. Asymptotic pointwise confidence intervals (PCIs) and data-driven simultaneous confidence corridors (SCCs) for the coefficient functions are constructed. A highly efficient and scalable estimation algorithm is developed. Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed method. The proposed method is applied to the spatially normalized Positron Emission Tomography (PET) data of Alzheimer's Disease Neuroimaging Initiative (ADNI).
In the third project, we propose a heterogeneous functional linear model to simultaneously estimate multiple coefficient functions and identify groups, such that coefficient functions are identical within groups and distinct across groups. By borrowing information from relevant subgroups, our method enhances estimation efficiency while preserving heterogeneity. We use an adaptive fused lasso penalty to shrink subgroup coefficients to shared common values within each group. We also establish the theoretical properties of our adaptive fused lasso estimators. To enhance the computation efficiency and incorporate neighborhood information, we propose to use a graph-constrained adaptive lasso. A highly efficient and scalable estimation algorithm is developed. Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed method. The proposed method is applied to a dataset of hybrid maize grain yields from the Genomes to Fields consortium.