A Covariate-Balanced, Survey-Weighted GLM Framework for Causal Effect Estimation in Observational Studies

dc.contributor.author Hariharan, Yamini
dc.contributor.committeeMember Ommen, Danica
dc.contributor.department Department of Statistics
dc.contributor.majorProfessor Shelley, Mack
dc.contributor.majorProfessor Genschel, Ulrike
dc.date.accessioned 2025-08-20T18:36:59Z
dc.date.available 2025-08-20T18:36:59Z
dc.date.copyright 2025
dc.date.issued 2025-08
dc.description.abstract substance use disorder (SUD) treatment using data from the 2019 National Survey on Drug Use and Health (NSDUH). It is guided by by two central questions—How can we improve the estimation of causal effects in observational health data when randomization is not feasible? and How do statistical modeling choices influence the interpretability and generalizability of health-related findings? Using a combination of survey-weighted logistic regression and propensity score matching, the study estimates the impact of private insurance on the likelihood of receiving treatment, while accounting for the complex survey design and potential confounding factors. Covariate balance was improved using weight trimming techniques, and model performance was evaluated through weighted confusion matrices. Special attention was given to the treatment of race as a covariate. By comparing models using categorical versus dichotomous race encodings, the study explores how these choices affect the interpretability, predictive performance, and fairness of models estimating disparities in treatment access. Results from the logistic regression models indicate that individuals with higher income and education were significantly more likely to have private insurance, while those who had received alcohol-related treatment were less likely to be privately insured—suggesting patterns of underinsurance or dependence on public coverage. In the second phase of analysis, Binomial regression models were used to examine how specific insurance combinations (e.g., Medicare + Private) were associated to treatment access. Findings revealed that individuals with dual coverage were significantly less likely to receive treatment compared to those with other forms of coverage, even after controlling for demographic and socioeconomic variables. These results suggest that disparities in treatment utilization are not solely a function of being insured, but also of the structure and type of insurance. Altogether, this research underscores the need for methodologically robust approaches to causal inference in public health data and highlights the broader implications of modeling decisions in shaping our understanding of health disparities and informing equitable policy interventions.
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/106151
dc.language.iso en_US
dc.rights Attribution-NonCommercial-NoDerivs 3.0 United States *
dc.rights.holder Yamini Hariharan
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/us/ *
dc.subject.disciplines DegreeDisciplines::Physical Sciences and Mathematics::Statistics and Probability
dc.subject.keywords Statistics, Causal Inference, Observational Study, Public Health, Public Policy, Propensity Score Matching, GLM, Logistic Regression, SMD
dc.title A Covariate-Balanced, Survey-Weighted GLM Framework for Causal Effect Estimation in Observational Studies
dc.type Thesis
dc.type.genre creativecomponent
dspace.entity.type Publication
thesis.degree.discipline Statistics
thesis.degree.level Masters
thesis.degree.name Master of Science
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
YaminiHariharan_CC.pdf
Size:
908.67 KB
Format:
Adobe Portable Document Format
Description: