Functional principal component analysis (FPCA) on remote sensing data and longitudinal studies

Thumbnail Image
Chang, Xinyue
Major Professor
Zhu, Zhengyuan
Dai, Xiongtao
Li, Yehua
Nettleton, Dan
Kaiser, Mark
Wang, Lily
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Journal Issue
Is Version Of
This dissertation consists of three main pieces of my Ph.D. dissertation, which shapes my research on multivariate functional data, change-point detection, asynchronous longitudinal data, and remote sensing data applications. The first project is motivated by NASA's first dedicated \ce{CO2} monitoring satellite, the Orbiting Carbon Observatory-2 (OCO-2). The satellite data processing includes a retrieval algorithm for estimating \ce{CO2} concentration from high-resolution spectra of reflected sunlight. However, due to factors such as cloud cover and cosmic rays a large amount of radiance observations are missing, hence the spatial coverage of the retrieval algorithm is limited in some areas of critical importance. Another issue with the retrieval process is that mixed land/water pixels along the coastline are not used in retrieval processing due to the lack of valid ancillary variables including land fraction. We propose an approach to model spatial spectral data to solve these two problems by radiance imputation and land fraction estimation. Based on specific features of the OCO-2 instrument, we propose a functional model which uses separate mean models and measurement error variance models for different spatial footprints. The principal component scores are modeled as random fields to account for the spatial dependence, and can be imputed by ordinary kriging. The proposed method is shown to impute spectral radiance with high accuracy when tested with observations over the Pacific Ocean. We also develop an unmixing approach based on this model, which provides much more accurate land fraction estimates in the case study of mixed land/water pixels along Greece coastlines. In the second project, we are motivated by the urban dynamics problem, which is important for understanding the urban system growth and its environmental impacts. One major practical challenge is to accurately detect and pinpoint the temporal change point in urbanization given heterogeneous remote sensing data. In our application, it is the surface reflectance recorded in the Landsat program, which is sparsely observed multivariate functional data. We followed a functional approach and performed change point detection and estimation through sparse functional principal component analysis and utilizing cumulative sum statistics. Theoretical results are derived to show the asymptotical validity of the proposal, which overcomes the sparse data issue using theory from functional data analysis and empirical process. A multivariate and a univariate ensemble approach are proposed and compared, where they are shown to have different strengths under varying data scenarios, and a bootstrap procedure is designed to improve change-point detection under limited sample sizes. Applications to Landsat data show favorable performance for the proposed multivariate method to detect urbanized areas and estimate turning years. For the third project, we consider the asynchronous longitudinal data, where the response and covariates are observed at different time points. A naive last-observation-carried-forward method suffers from estimation bias, and existing kernel-based methods suffer from slow convergence rates and large variation. We model the longitudinal covariate process as sparse functional data, propose a functional calibration approach based on functional principal component analysis, and apply it to asynchronous regression with either time-invariant or time-varying coefficients. For regression with time-invariant coefficients, our estimator is asymptotically unbiased, root-n consistent, and asymptotically normal; for time-varying coefficient models, our estimator has the non-parametric convergence rate with inflated asymptotic variance from the calibration. In both cases, our estimator has a faster convergence rate than the existing methods. The proposed methods are illustrated by simulation studies and a real application to a dataset from the Study of Women's Health Across the Nation. The future work will be extending the proposed method to the case of multiple time-varying covariates and studying the variable selection of asynchronous longitudinal data.
Subject Categories