Inference in structural equation models with missing data
Date
Authors
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Abstract
Missing data can lead to bias and inefficiency in estimating the quantities of interest in scientific studies. This can be especially problematic in longitudinal studies which measure the same subjects at several different points in time. It is not uncommon for individuals to be unavailable at one or more point in time, and even when available, an individual may fail to respond to one or more items. To give an indication of the nature of the problem, we consider here an example that only 295 out of 451 cases would be considered complete. A common approach for dealing with missing values in current practice is to restrict attention to those individuals or cases for which the data are completely observed. At best, this procedure is inefficient since some observed information (belonging to incomplete cases) is being ignored, but in some situations it can also be badly biased;Rubin (1976) defines three mechanisms by which values may become missing: values are missing completely at random (MCAR) if the fact that they are missing is completely unrelated to the problem at hand (e.g., a typographical error); values are missing at random (MAR) if the fact that they are missing does not affect our ability to draw conclusions (e.g., people may not answer a question about their date of birth while supplying correlated information like graduation dates); finally, values are nonignorably missing (NI) if the very fact that the values are missing contains important information about the values that belong there (e.g., people with high incomes tend not to answer questions about income). The naive approach of relying only on completed observations will be unbiased only in the nicest of these cases (missing data are MCAR) and may be substantially biased in the other cases. The last case, nonignorably missing: data, can not be addressed easily using statistical methods since according to the definition we are missing extremely important information in these cases. This thesis explores approaches that are valid when the missing data mechanism is MAR and then it is possible to use the observed values to learn about what the missing values might be;We consider three approaches to drawing: inferences in structural equation models with missing data: likelihood-based approach, Bayesian inference, and inference based on multiple imputation (or fill-ins) of the missing values. Many of the sociology and psychology sample survey problems that rely on SEM have two different kinds of variables: item responses and composites which are obtained by linear combinations of items. If one of the item responses that define a composite variable is missing then the composite variable might be declared missing. We also consider the ways of having advantage of using item responses in missing data problems. These approaches are described followed by an example and simulation study where each approach is used to analyze data about the psychological development of adolescents using structural equation models. The data are from the Iowa Youth and Family Project (IYFP) being carried out at the Center for Rural Health at Iowa State University.