Estimation of a distribution function from survey data

Dodd, Kevin
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Organizational Unit
Journal Issue

Dietary intake data collected in nationwide food consumption surveys are used for various policy, educational, and research purposes. Of interest is the distribution of the usual intake of a dietary component, where usual intake is an individual's long-run average intake of the component. The data available for analysis are a few daily intakes collected on each individual in a sample selected from a population according to a complex survey design. It is reasonable to treat a daily intake as the usual intake plus a measurement error. An estimation approach based on the analysis of transformed data is discussed. First, a least-squares regression spline is used to estimate a transformation that carries observed daily intakes into approximate normality. Then a measurement error model is fit to the transformed data, resulting in an estimated usual intake distribution in the transformed scale. Finally, the original transformation is used to develop an inverse transformation that maps the estimated transformed usual intake distribution back to the original scale;The regression spline model may be applied to the general problem of quantile estimation, where the sampled data are assumed to be realizations of a random variable Y with a smooth quantile function. The normal quantile-quantile plot of the data is a mapping between estimated quantiles of Y and quantiles of the standard normal distribution. Fitting a regression spline to the quantile-quantile plot allows estimated quantiles of Y to be expressed as smooth functions of standard normal quantiles and the estimated spline parameters;The asymptotic properties of the least-squares regression spline are discussed by formulating the least squares normal equations as a multivariate L-statistic. It is shown that, under certain conditions on the sampling design, the estimated spline parameters are asymptotically normal and unbiased, and that jackknife variance estimation can be used to estimate the covariance matrix of the spline parameters. The jackknife can also be used to estimate the variances of smooth functions of the spline parameters, such as the quantiles of Y.