Automatic relevance determination for Gaussian process regression with functional inputs
Date
2023-05
Authors
Damiano, Luis
Major Professor
Advisor
Niemi, Jarad
Caragea, Petruţa
Morris, Max D
Dutta, Somak
Qiu, Yumou
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Journal Issue
Series
Department
Statistics
Abstract
We introduce the novel automatic dynamic relevance determination (ADRD)
framework for Gaussian process regression with functional inputs, an adaptation
of automatic relevance determination (ARD) priors for vector inputs.
In this framework, relevance varies smoothly over the input index space
resulting in smooth and parsimonious relevance profiles learned from data
whose posterior can be inspected for scientific interpretation and used in
downstream analyses.
An ADRD model requires us to specify a weight function form
that is appropriate for a given application.
We explore two strategies to design the weights, namely setting up a parametric
form and generating them via a basis expansion. First, we introduce the
asymmetric double and squared exponential weight functions for unimodal,
smoothly decaying predictive relevance profiles. Second, we present a general
form for the basis expansion of the weights and explore, specifically, the
Fourier, B-spline, and adaptive spline expansions.
We establish an equivalence between the ADRD and ARD weights
and propose an adaptation to permutation feature importance. Both motivate
different exploratory tools to elicit a weight function form from data.
We also discuss a fully Bayesian estimation framework via MCMC, including a set
of weakly informative priors for the model parameters, as well as statistics for
model validation.
In two simulation studies, we show that a well specified model is able to
recover the true weight function.
Moreover, we present two applications to scientific data generated by an
atmospheric radiative transfer computer model and a soil erosion computer model.
We show empirically that, compared to ARD, ADRD generates
smoother weight patterns and produces information useful for scientific
interpretation and downstream analyses with a drastic reduction in the number of
model parameters without compromising on prediction accuracy.