A Covariate-Balanced, Survey-Weighted GLM Framework for Causal Effect Estimation in Observational Studies
Date
2025-08
Authors
Hariharan, Yamini
Major Professor
Shelley, Mack
Genschel, Ulrike
Advisor
Committee Member
Ommen, Danica
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
substance use disorder (SUD) treatment using data from the 2019 National Survey on Drug Use and
Health (NSDUH). It is guided by by two central questions—How can we improve the estimation of causal
effects in observational health data when randomization is not feasible? and How do statistical modeling
choices influence the interpretability and generalizability of health-related findings? Using a combination
of survey-weighted logistic regression and propensity score matching, the study estimates the impact of
private insurance on the likelihood of receiving treatment, while accounting for the complex survey design
and potential confounding factors. Covariate balance was improved using weight trimming techniques, and
model performance was evaluated through weighted confusion matrices.
Special attention was given to the treatment of race as a covariate. By comparing models using categorical
versus dichotomous race encodings, the study explores how these choices affect the interpretability, predictive
performance, and fairness of models estimating disparities in treatment access. Results from the logistic
regression models indicate that individuals with higher income and education were significantly more likely to
have private insurance, while those who had received alcohol-related treatment were less likely to be privately
insured—suggesting patterns of underinsurance or dependence on public coverage.
In the second phase of analysis, Binomial regression models were used to examine how specific insurance
combinations (e.g., Medicare + Private) were associated to treatment access. Findings revealed that individuals
with dual coverage were significantly less likely to receive treatment compared to those with other forms of
coverage, even after controlling for demographic and socioeconomic variables. These results suggest that
disparities in treatment utilization are not solely a function of being insured, but also of the structure and
type of insurance.
Altogether, this research underscores the need for methodologically robust approaches to causal inference in
public health data and highlights the broader implications of modeling decisions in shaping our understanding
of health disparities and informing equitable policy interventions.
Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Department of Statistics
Type
Thesis
Comments
Rights Statement
Attribution-NonCommercial-NoDerivs 3.0 United States
Copyright
2025