A Covariate-Balanced, Survey-Weighted GLM Framework for Causal Effect Estimation in Observational Studies

Thumbnail Image
Date
2025-08
Authors
Hariharan, Yamini
Major Professor
Shelley, Mack
Genschel, Ulrike
Advisor
Committee Member
Ommen, Danica
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
substance use disorder (SUD) treatment using data from the 2019 National Survey on Drug Use and Health (NSDUH). It is guided by by two central questions—How can we improve the estimation of causal effects in observational health data when randomization is not feasible? and How do statistical modeling choices influence the interpretability and generalizability of health-related findings? Using a combination of survey-weighted logistic regression and propensity score matching, the study estimates the impact of private insurance on the likelihood of receiving treatment, while accounting for the complex survey design and potential confounding factors. Covariate balance was improved using weight trimming techniques, and model performance was evaluated through weighted confusion matrices. Special attention was given to the treatment of race as a covariate. By comparing models using categorical versus dichotomous race encodings, the study explores how these choices affect the interpretability, predictive performance, and fairness of models estimating disparities in treatment access. Results from the logistic regression models indicate that individuals with higher income and education were significantly more likely to have private insurance, while those who had received alcohol-related treatment were less likely to be privately insured—suggesting patterns of underinsurance or dependence on public coverage. In the second phase of analysis, Binomial regression models were used to examine how specific insurance combinations (e.g., Medicare + Private) were associated to treatment access. Findings revealed that individuals with dual coverage were significantly less likely to receive treatment compared to those with other forms of coverage, even after controlling for demographic and socioeconomic variables. These results suggest that disparities in treatment utilization are not solely a function of being insured, but also of the structure and type of insurance. Altogether, this research underscores the need for methodologically robust approaches to causal inference in public health data and highlights the broader implications of modeling decisions in shaping our understanding of health disparities and informing equitable policy interventions.
Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Department of Statistics
Type
Thesis
Comments
Rights Statement
Attribution-NonCommercial-NoDerivs 3.0 United States
Copyright
2025
Funding
Subject Categories
DOI
Supplemental Resources
Source