Three essays on regression discontinuity design and partial identification
This dissertation consists of three chapters on regression discontinuity (RD) design and partial identification, which are widely used techniques in program evaluation.
The first and the second chapters discuss statistic inference for the treatment effect estimator in fuzzy RD designs. Fuzzy RD design and instrumental variables (IV) regression share similar identification strategies and numerically yield the same results under certain conditions. While the weak identification problem is widely recognized in IV regressions, it has drawn much less attention in fuzzy RD designs, where the standard t-test can also suffer from asymptotic size distortions and the confidence interval obtained by inverting such a test becomes invalid. I explicitly model fuzzy RD designs in parallel with IV regressions, and based on the extensive literature of the latter, develop tests which are robust to weak identification in fuzzy RD designs, including the Anderson-Rubin (AR) test, the Lagrange multiplier (LM) test, and the conditional likelihood ratio (CLR) test. These tests have correct size regardless of the strength of identification and their power properties are similar to those in IV regressions. Due to the similarities between a fuzzy RD design and an IV regression, one can choose either method for estimation and inference. However, it is shown that adopting a fuzzy RD design with newly proposed tests has the potential to achieve more power without introducing size distortions in hypothesis testing and is thus recommended. An extension to testing for quantile treatment effects in fuzzy RD designs is also discussed.
RD estimators are usually estimated with nonparametric methods and have bias. A new wild bootstrap procedure is proposed to correct bias and construct valid confidence intervals in fuzzy regression discontinuity designs. This procedure uses a wild bootstrap based on second order local polynomials to estimate and remove the bias from linear models. The bias-corrected estimator is then bootstrapped itself to generate valid confidence intervals. While the conventional confidence intervals generated by adopting MSE-optimal bandwidth is asymptotically not valid, the confidence intervals generated by this procedure have correct coverage under conditions similar to Calonico, Cattaneo and Titiunik's(2014, Econometrica) analytical correction. Simulation studies provide evidence that this new method is as accurate as the analytical corrections when applied to a variety of data generating processes featuring heteroskedasticity, endogeneity and clustering. As an example, its usage is demonstrated through a reanalysis of the scholastic achievement data used by Angrist and Lavy (1999).
In the third chapter, a novel numerical approach is proposed to partially identify treatment effects. Endogenous treatment and measurement error are very common in survey data and pose threats to reliable estimation of treatment effects. The new approach considers these two issues simultaneously and provides bounds for treatment effects. Conceptually, treatment effects and model assumptions are formulated as linear restrictions on a large set of probability mass. One can then check if any given treatment effect is consistent with model assumptions and observed data. Compared with previous methods, the newly proposed numerical approach is general enough to be applied to various different problems and guarantees sharp bounds. An example is provided to show that how the distribution of a treatment effect and how the averages of multiple treatment effects can be partially identified through this approach.