Small area prediction and big data visualization: Analysis of soil losses from sheet and rill erosion on cropland
Assessment of soil erosion benefits both the well-being of people and agricultural production. Sustainable and environmentally friendly agriculture needs to balance short-time production, long-term capabilities, and environmental quality. The overarching applications related to the works in this dissertation are related to the National Resources Inventory (NRI) program. The ongoing NRI surveys collect a wealth of sample data describing natural resources conditions and trends to support national policy-making and enterprise-level landowner decision making on resource conservation practices. Among those natural resources issues, soil erosion assessment is of primary interest to prioritize future soil conservation needs and measure past soil conservation impact. Our effort is aimed at estimation of land use and soil erosion rates, especially sheet and rill erosion, through combined techniques of small area estimation and "big" data visualization.
Small area estimation (SAE) techniques are used to construct model-based estimators when direct survey estimators cannot achieve desired statistical reliability. To account for the zero-contamination and right-skew of the sheet and rill erosion data in our case study, we consider a zero-inflated log-normal model framework and extend the two-part model of Chandra and Chambers (2016) by including an additional parameter to account for significant correlation between the pair of random effects for an area. We develop an empirical Bayes predictor of the area mean that replaces the unknown model parameters in the best predictor, which is guaranteed to be unbiased and have the minimum mean squared error, with consistent parameter estimates. We address the analytic challenges associated with parameter estimation under this model framework by using a maximum likelihood method. Maximum likelihood estimation is challenging because of a need to integrate over a bivariate distribution of the pair of random effects for a county. We transform the bivariate integral to a univariate integral to facilitate numerical integration through a computationally efficient Gauss-Hermite approximation. Computationally efficiency in terms of assessing statistical uncertainty in the estimates is further enhanced by using the "one-step" MSE estimator, an estimator we propose that does not require resampling. The reliable county-level erosion estimates that are not obtainable from the NRI sample data can be used to prioritize conservation resource allocation at a more granular level. To help practitioners implement our SAE methodology, we develop an R package saezero, available at https://github.com/XiaodanLyu/saezero.
Besides the characteristic of reliability, there are many other dimensions of data quality, such as accuracy, consistency, timeliness, usability, accessibility, and relevance, which are featured in the quality assurance (QA) process of NRI. The QA process is operationally complex as the involved databases are large in scale. Effective visualization techniques, under the help of well-managed databases, can facilitate the QA process by alleviating the cognitive load and enabling user-data interactions. By using the reactive framework of R shiny, we built three web-based graphical tools intended to be used by NRI. The first tool "iNtr", whose public version is available at https://lyux.shinyapps.io/table_review/, is designed to help with the labor-intensive NRI table review process so that data accuracy and consistency can be checked as much as possible without sacrificing the timeliness of the NRI releases. The second tool "VISCOVER", available at https://lyux.shinyapps.io/viscover/, is developed to check the accuracy of the auxiliary variables, i.e., public soil and crop-cover data, used in the case study of our SAE methodology. An R package viscover, available at https://github.com/XiaodanLyu/viscover, has also been developed by us for practitioners to query the two databases easily. The third tool "SREM", available at https://lyux.shinyapps.io/srem/, presents an interactive sheet and rill erosion map at a 30-meter spatial resolution to enhance the usability and accessibility of NRI in that the NRI erosion estimates used to be available only at national and state level in the form of printed figures and tables. "SREM" is built upon five databases --- one sheet-and-rill-erosion and four soil-erosion-factor databases we created by assembling the NRI Database and several other public databases by data linkage and statistical modeling.