Statistical analysis using finite mixtures of normal linear models

Thumbnail Image
Date
1999
Authors
Cheng, Jianlin
Major Professor
Advisor
Hal Stern
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Abstract

Finite mixture models are often used in statistical applications when the population under study is believed to consist of a number of heterogeneous subpopulations, but it is not possible to identify the subpopulation to which an individual belongs. In this thesis, finite mixtures of normal linear regression models are explored as a class of models for relating a response variable to a set of predictor variables. We consider two classes of mixture models: those in which the proportion of the population in each subpopulation is independent of the measured predictor variables, and a second in which the mixture proportions are allowed to depend on the predictor variables;Conditions are determined under which the parameters of the finite mixture model are identifiable. Two approaches to statistical inference for the model parameters are reviewed: maximum likelihood estimation and the associated large sample theory, and Bayesian inference. There are several complications that arise in practice when analyzing data with finite mixture models including multiple modes of the likelihood function, degenerate modes corresponding to small subpopulations with apparently zero variance, and the failure of traditional large sample results. Simulations are used to investigate the performance of the two approaches to inference. It is important that a statistical analysis go beyond just fitting a model to data and include some model assessment. This thesis explores the use of posterior predictive model checks for this purpose. In particular a posterior predictive method is proposed for comparing the mixture of regressions with constant proportions to the mixture of regressions with nonconstant proportions;The various approaches to inference and model assessment are applied to an example concerning household expenditures in Bangladesh. An economic hypothesis there suggests that more resources are spent ensuring the health of male rather than female children. A simple linear regression explaining the difference between male and female child health finds no significant predictors. One plausible explanation is that the population consists of two types of households, those that do not discriminate based on gender and those that do. The finite mixture of regressions allows us to address this hypothesis.

Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
dissertation
Comments
Rights Statement
Copyright
Fri Jan 01 00:00:00 UTC 1999
Funding
Subject Categories
Keywords
Supplemental Resources
Source