Kernel smoothing for spatially correlated data

Thumbnail Image
Date
2001-01-01
Authors
Liu, Xiao-Hu
Major Professor
Advisor
Jean Opsomer
Kenneth Koehler
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Organizational Unit
Statistics
As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.
Journal Issue
Is Version Of
Versions
Series
Department
Abstract

Kernel smoothing is a nonparametric approach for estimating the relationship between a response variable and a set of predictors (or design variables). A major problem for kernel smoothing is the selection of the bandwidth, which controls the amount of smoothing. When data are correlated, former studies on kernel smoothing have been essentially limited to the case of a univariate predictor, with equally spaced design. In this dissertation, we discuss a more general case for correlated data, the case of multivariate predictors with random design. Three types of estimators, the Priestley-Chao estimator, the Nadaraya-Watson estimator, and the local linear estimator, are addressed, with emphasis on the local linear estimator. We will derive formulas for asymptotic mean square, errors of these kernel smoothing estimators, and formulas of asymptotically optimal bandwidth. In the presence of spatially correlated errors, we show that traditional data-driven bandwidth selection methods, such as cross-validation and generalized cross-validation, fail to provide good bandwidth values. We propose several data-driven bandwidth selection methods that account for the presence of spatial correlation. Simulation studies show that these methods are effective when the covariances between the errors are completely known. When the covariances need to be estimated from data, we consider two special cases: spatial data with repeated measurements, and spatial data collected on a grid (with only one realization). For data with repeated measurements, we propose an estimation method based on semi-variogram fitting. For data on a grid, we propose a method based on differencing, with the application of approximate Whittle likelihood estimation. Simulation studies show that these methods can provide reasonably good estimates of the covariances for the purpose of bandwidth selection.

Comments
Description
Keywords
Citation
Source
Subject Categories
Keywords
Copyright
Mon Jan 01 00:00:00 UTC 2001