Chapter 9: Causal Inference

Difference-in-Differences, RDD, Matching, Entropy Balancing, and Synthetic Control

9.1 Difference-in-Differences (DID)

DID compares the change in outcomes over time between a treatment group and a control group. The key assumption is parallel trends: absent treatment, the two groups would have followed the same trajectory.

* Basic 2x2 DID setup
* treatment: 1 if treated group
* post: 1 if after treatment
gen treat_post = treatment * post

regress outcome treatment post treat_post controls, vce(cluster state)

* The DID estimate is the coefficient on treat_post

9.2 Event Study Design

Event study plots are the standard way to assess parallel trends before treatment and to visualize dynamic treatment effects.

* Generate event-time indicators
* Assume treat_year is the year of treatment for treated units
gen event_time = year - treat_year

* Create dummies, omitting t=-1 as reference
forvalues k = -5/5 {
    if `k' < 0 {
        gen pre`=abs(`k')' = (event_time == `k')
    }
    else if `k' >= 0 {
        gen post`k' = (event_time == `k')
    }
}
drop pre1  // reference period

* Estimate event study
reghdfe outcome pre5 pre4 pre3 pre2 post0-post5 controls, ///
    absorb(unit_id year) vce(cluster unit_id)

* Plot coefficients (install: ssc install coefplot)
coefplot, keep(pre* post*) vertical ///
    yline(0, lcolor(red) lpattern(dash)) ///
    xline(4.5, lcolor(gray) lpattern(dash)) ///
    title("Event Study: Dynamic Treatment Effects") ///
    ytitle("Coefficient") xtitle("Periods Relative to Treatment")

9.2.1 Publication-Quality Event Study Plots with coefplot

* ─── Using factor variables for cleaner event study syntax ───
* Binning endpoints to avoid extrapolation bias
gen event_time_binned = event_time
replace event_time_binned = -5 if event_time < -5
replace event_time_binned = 5 if event_time > 5

* Estimate with factor variables (reference = -1)
reghdfe outcome ib(-1).event_time_binned controls, ///
    absorb(unit_id year) vce(cluster unit_id)

* Publication-ready coefplot
coefplot, keep(*.event_time_binned) vertical ///
    rename(-5.event_time_binned = "-5" ///
           -4.event_time_binned = "-4" ///
           -3.event_time_binned = "-3" ///
           -2.event_time_binned = "-2" ///
            0.event_time_binned = "0"  ///
            1.event_time_binned = "1"  ///
            2.event_time_binned = "2"  ///
            3.event_time_binned = "3"  ///
            4.event_time_binned = "4"  ///
            5.event_time_binned = "5") ///
    yline(0, lcolor(cranberry) lpattern(dash)) ///
    xline(4.5, lcolor(gs10) lpattern(dash)) ///
    ciopts(lwidth(thin) lcolor(navy%60)) ///
    mcolor(navy) msymbol(circle) ///
    graphregion(color(white)) ///
    title("Dynamic Treatment Effects", size(medium)) ///
    ytitle("Coefficient Estimate") xtitle("Event Time") ///
    note("Reference period: t = -1. Bars show 95% CIs.")

graph export "event_study.png", replace width(1200)
Event Study Best Practices for Reviewers (1) Always bin the endpoints (e.g., aggregate all pre-treatment periods before t=-5 into a single bin) to avoid sparse-data distortions. (2) Show at least 3 pre-treatment periods to assess parallel trends. (3) Include confidence intervals, not just point estimates. (4) If pre-treatment coefficients show a trending pattern (even if individually insignificant), your parallel trends assumption may be questionable. (5) Use vce(cluster unit_id) for clustered SEs at the treatment-assignment level.

9.3 Modern DID with Staggered Treatment

When treatment rolls out at different times across units, the classic two-way FE estimator can be biased. Modern DID estimators address this issue.

* Install modern DID packages
ssc install did_multiplegt, replace
ssc install csdid, replace
ssc install eventstudyinteract, replace

* de Chaisemartin and D'Haultfoeuille (2020)
did_multiplegt outcome unit_id year treatment, ///
    robust_dynamic dynamic(5) placebo(5) breps(100) cluster(state)

* Callaway and Sant'Anna (2021)
csdid outcome controls, ivar(unit_id) time(year) gvar(treat_year) ///
    method(dripw)
csdid_plot, title("Callaway-Sant'Anna: Group-Time ATT")
TWFE Bias with Staggered Treatment The standard two-way FE estimator uses already-treated units as controls for newly-treated units, which introduces bias when treatment effects vary over time. If your treatment rolls out at different dates across units, use one of the modern estimators: Callaway-Sant'Anna, de Chaisemartin-D'Haultfoeuille, or Sun-Abraham.

9.3.1 Borusyak, Jaravel, and Spiess (2024): Imputation DID

The imputation approach to staggered DID estimates counterfactual outcomes for treated units by imputing from untreated observations. It is efficient and provides a clean event study decomposition.

* Install: ssc install did_imputation, replace

* ─── did_imputation (Borusyak, Jaravel, Spiess) ───
* Requires: outcome, unit_id, time, first_treat (0 or . for never-treated)
did_imputation outcome unit_id year treat_year, ///
    horizons(0/5) pretrends(5)

* horizons(0/5): estimate treatment effects for t=0,1,...,5
* pretrends(5): estimate placebo effects for t=-5,...,-1

* The output includes:
* - Pre-trend coefficients (should be near zero)
* - Horizon-specific ATT estimates
* - An aggregate ATT across all horizons

* Event study plot
event_plot, default_look ///
    graph_opt(title("BJS Imputation DID") ///
    ytitle("ATT") xtitle("Periods Since Treatment"))
Choosing Among Modern DID Estimators All modern estimators address the same TWFE bias, but they differ in assumptions and efficiency. Callaway-Sant'Anna (csdid) is semiparametric and allows for doubly-robust estimation; it requires a "never treated" or "not yet treated" comparison group. did_imputation (BJS) is most efficient under homogeneous treatment effects and provides a clean event study plot. did_multiplegt (dCDH) is the most agnostic but can be noisy with many time periods. For most applications, start with csdid and check robustness with did_imputation.

9.4 Regression Discontinuity Design (RDD)

RDD exploits a threshold rule to identify causal effects. Units just above and just below the threshold are comparable, providing a local randomized experiment.

* Install: ssc install rdrobust, replace
*          ssc install rddensity, replace

* Sharp RDD estimation
rdrobust outcome running_var, c(0)

* With controls and different kernel
rdrobust outcome running_var, c(0) covs(x1 x2) kernel(triangular)

* RDD plot
rdplot outcome running_var, c(0) ///
    title("Regression Discontinuity") ///
    x("Running Variable") y("Outcome")

* Manipulation test (McCrary density test)
rddensity running_var, c(0) plot

9.4.1 Bandwidth Selection and Sensitivity

The bandwidth choice determines which observations near the cutoff are used. Narrower bandwidths reduce bias (more local comparison) but increase variance (fewer observations). Robustness to bandwidth choice is expected by reviewers.

* ─── Bandwidth Selection Methods ───
* Default: MSE-optimal bandwidth (Calonico, Cattaneo, Titiunik 2014)
rdrobust outcome running_var, c(0)
* The output reports the optimal bandwidth (h) and the
* bias-corrected robust confidence interval

* CER-optimal bandwidth (smaller, better coverage)
rdrobust outcome running_var, c(0) bwselect(cerrd)

* ─── Bandwidth Sensitivity Analysis ───
* Estimate at 50%, 75%, 100%, 125%, 150%, 200% of optimal
rdrobust outcome running_var, c(0)
local h_opt = e(h_l)  // store optimal bandwidth

foreach mult in 0.5 0.75 1 1.25 1.5 2 {
    local h_use = `h_opt' * `mult'
    quietly rdrobust outcome running_var, c(0) h(`h_use')
    display "h = " %6.2f `h_use' ///
        " | Estimate = " %8.3f e(tau_cl) ///
        " | p-value = " %6.3f e(pv_rb)
}

* ─── Placebo Tests at Non-Cutoff Points ───
* Test at median of running variable (should find no effect)
summarize running_var, detail
rdrobust outcome running_var, c(`r(p50)')
* A significant result at a placebo cutoff suggests confounding
RDD Validity Checks for Reviewers Reviewers expect several validity tests: (1) McCrary density test (rddensity): no manipulation of the running variable at the cutoff. (2) Covariate balance: run rdrobust with pre-determined covariates as the outcome; no jumps should appear. (3) Bandwidth sensitivity: results should be stable across a range of bandwidths. (4) Placebo cutoffs: no treatment effect at non-cutoff points. (5) Donut hole test: exclude observations very close to the cutoff to check whether a few influential observations drive the result.

9.5 Fuzzy RDD

In a fuzzy RDD, crossing the threshold increases the probability of treatment but does not determine it perfectly. This is estimated via a local IV approach.

* Fuzzy RDD: treatment is the endogenous variable
rdrobust outcome running_var, c(0) fuzzy(treatment)

* This estimates the LATE at the cutoff

* ─── Verify the First Stage in Fuzzy RDD ───
* There must be a visible jump in treatment probability at the cutoff
rdplot treatment running_var, c(0) ///
    title("First Stage: Treatment Take-Up at Cutoff") ///
    x("Running Variable") y("Pr(Treatment)")

* Estimate the first-stage jump
rdrobust treatment running_var, c(0)
* If the jump is small (weak first stage), fuzzy RDD is imprecise

9.6 Propensity Score Matching

Matching pairs treated and control units based on observed characteristics. The assumption is that, conditional on these characteristics, treatment assignment is as good as random (selection on observables).

* Propensity score matching (nearest neighbor)
teffects psmatch (outcome) (treatment x1 x2 x3, logit), ///
    nn(1) atet

* Check covariate balance after matching
tebalance summarize

* Balance plot
tebalance density x1

9.6.1 Propensity Score Overlap Diagnostics

Matching only works when treated and control units share a common region of the propensity score distribution. Lack of overlap means some treated units have no comparable controls, and results rely on extrapolation.

* ─── Estimate Propensity Score Manually ───
logit treatment x1 x2 x3
predict pscore, pr

* ─── Overlap (Common Support) Assessment ───
* Visual: overlapping density plots
twoway (kdensity pscore if treatment == 1, lcolor(cranberry)) ///
       (kdensity pscore if treatment == 0, lcolor(navy)), ///
    legend(order(1 "Treated" 2 "Control")) ///
    title("Propensity Score Overlap") ///
    xtitle("Propensity Score") ytitle("Density")

* Summary statistics by group
tabstat pscore, by(treatment) stats(min max mean p25 p75)

* ─── Trimming for Common Support ───
* Drop observations outside the overlap region
summarize pscore if treatment == 1
local min_treated = r(min)
local max_treated = r(max)
summarize pscore if treatment == 0
local min_control = r(min)
local max_control = r(max)

gen common_support = (pscore >= max(`min_treated', `min_control') & ///
                      pscore <= min(`max_treated', `max_control'))
tab common_support treatment

* Re-estimate on common support only
teffects psmatch (outcome) (treatment x1 x2 x3, logit) ///
    if common_support == 1, nn(1) atet
Standardized Mean Differences (SMD) for Balance After matching, report standardized mean differences (SMD) rather than t-tests to assess covariate balance. The SMD is the difference in means divided by the pooled standard deviation. An SMD below 0.1 indicates good balance, while an SMD above 0.25 is concerning. Use tebalance summarize after teffects or compute manually. Reviewers at top journals expect a balance table showing pre-match and post-match SMDs for all covariates.

9.7 Other Matching Methods

* Coarsened Exact Matching (CEM)
* Install: ssc install cem, replace
cem age (#5) education (#3) income (#4), treatment(treatment)
regress outcome treatment [iweight=cem_weights], vce(robust)

* Inverse probability weighting (IPW)
teffects ipw (outcome) (treatment x1 x2 x3, logit), atet

* Doubly robust: IPW + regression adjustment
teffects ipwra (outcome x1 x2 x3) (treatment x1 x2 x3, logit), atet
Choosing a Matching Estimator PSM is common but sensitive to propensity score model specification. CEM avoids the propensity score entirely by coarsening covariates and matching exactly within bins. Doubly-robust methods (AIPW/IPWRA) are consistent if either the outcome model or the treatment model is correctly specified.

9.7.1 Entropy Balancing

Entropy balancing (Hainmueller, 2012) reweights control observations to match the covariate moments (mean, variance, skewness) of the treated group. It achieves exact balance without discarding observations or relying on a propensity score model.

* Install: ssc install ebalance, replace

* Entropy balancing on first and second moments
ebalance treatment x1 x2 x3 x4, targets(2)
* targets(1) = match means only
* targets(2) = match means and variances
* targets(3) = match means, variances, and skewness

* The command creates _webal as the balancing weight
* Use it in the outcome regression
regress outcome treatment x1 x2 x3 [pw=_webal], vce(robust)

* Verify balance achieved
tabstat x1 x2 x3 [aw=_webal], by(treatment) stats(mean sd)
Entropy Balancing vs. PSM Entropy balancing has several advantages over PSM: (1) it guarantees exact covariate balance (by construction), so no iterative checking and re-specification of the propensity score model is needed; (2) it retains all observations (no units are discarded); (3) it is less sensitive to model specification. The downside is that extreme weights can arise if the covariate distributions are very different, reducing effective sample size. Always check the weight distribution: summarize _webal, detail. Entropy balancing has become increasingly popular in management and strategy journals.

9.7.2 Coarsened Exact Matching: Detailed Implementation

* ─── CEM with Automatic Coarsening ───
cem age education income, treatment(treatment)

* Check the matching result
* CEM reports: matched observations, L1 distance before/after

* ─── CEM with Manual Bin Specification ───
* (#k) means k equal-width bins
* (10 20 30 40) means use these breakpoints
cem age (20 30 40 50 60) education (#4) income (#10), ///
    treatment(treatment)

* Use CEM weights in analysis
regress outcome treatment x1 x2 [iweight=cem_weights], vce(robust)

* CEM + panel regression
reghdfe outcome treatment [aw=cem_weights], ///
    absorb(unit_id year) vce(cluster unit_id)

9.8 Synthetic Control Method

* Install: ssc install synth, replace

* Synthetic control estimation
synth outcome x1 x2 x3 outcome(2000) outcome(2001) outcome(2002), ///
    trunit(1) trperiod(2003) ///
    figure

Exercise 9.1

Simulate a simple 2x2 DID. Create a dataset with 200 units, 2 periods, half treated in period 2. Set the true treatment effect to 5. Estimate the DID coefficient and verify it recovers the true effect. Plot group means over time.

Exercise 9.2

Using rdrobust, simulate a sharp RDD with a known cutoff. Generate a running variable from a uniform distribution, treatment as running_var >= 0, and an outcome with a jump of 3 at the cutoff. Estimate the RDD treatment effect and create an rdplot.

Exercise 9.3

Continuing Exercise 9.2, perform a bandwidth sensitivity analysis for your RDD. Estimate the treatment effect at 50%, 75%, 100%, 125%, 150%, and 200% of the MSE-optimal bandwidth. Do the results remain stable? Run a placebo test at the median of the running variable. Also run rddensity to check for manipulation. Present your findings in a summary table.

Exercise 9.4

Simulate a staggered DID setting: create 100 units observed for 10 years, where treatment is adopted at different times (years 4, 6, and 8) for different groups, with some units never treated. Set the true treatment effect to 3. Estimate the effect using (a) standard TWFE (reghdfe outcome treat, absorb(unit year)), (b) Callaway-Sant'Anna (csdid), and (c) imputation DID (did_imputation). Compare the estimates. Does TWFE recover the true effect?

External Resources

Key Takeaways

← Chapter 8: Time Series Chapter 10: Advanced Topics →