Chapter 10: Advanced Topics

GMM, Stochastic Frontier, Heckman, Control Function, Bootstrap, and Permutation Tests

10.1 Generalized Method of Moments (GMM)

GMM is a flexible estimation framework that nests OLS, IV, and many other estimators as special cases. In panel data, dynamic GMM estimators (Arellano-Bond, Arellano-Bover/Blundell-Bond) handle lagged dependent variables and endogenous regressors using internal instruments.

* --- Stata's built-in gmm command ---
* Define moment conditions explicitly
webuse abdata, clear

* Arellano-Bond (difference GMM) with xtabond
xtabond n L.n w k, lags(1) vce(robust)

* --- xtabond2 (Roodman 2009) - the gold standard ---
* Install: ssc install xtabond2, replace

* Difference GMM
xtabond2 n L.n w k yr1980-yr1984, ///
    gmm(L.n, lag(2 5)) iv(w k yr1980-yr1984) ///
    robust noleveleq small

* System GMM (Blundell-Bond): uses both level and differenced equations
xtabond2 n L.n w k yr1980-yr1984, ///
    gmm(L.n, lag(2 5)) iv(w k yr1980-yr1984) ///
    robust small

Instrument Proliferation With many time periods, GMM generates a large number of instruments. Too many instruments overfit the endogenous variables and weaken the Hansen test. Use the collapse option in xtabond2 or limit lag depth with lag(2 4) to keep the instrument count manageable. A rule of thumb: the number of instruments should not exceed the number of groups.

10.2 GMM Diagnostics

Proper GMM estimation requires passing several diagnostic tests. Reviewers at top journals will closely scrutinize these diagnostics, and failure to report them is a common reason for desk rejection.

* After xtabond2, check:

* 1. Hansen J test (overidentification)
*    H0: instruments are valid. Do NOT reject.
*    p-value too close to 1.0 => suspect instrument proliferation

* 2. Arellano-Bond AR(1) and AR(2) tests
*    Expect AR(1) to reject (by construction)
*    AR(2) should NOT reject (validates moment conditions)

* 3. Number of instruments vs. number of groups
*    Instruments should be fewer than groups

* Collapsed instruments to reduce count
xtabond2 n L.n w k yr1980-yr1984, ///
    gmm(L.n, lag(2 5) collapse) iv(w k yr1980-yr1984) ///
    robust small

10.2.1 Arellano-Bond AR Tests: Interpretation

* ─── Understanding the AR Tests ───
* The AR tests examine serial correlation in the first-differenced errors

* AR(1): tests for first-order serial correlation in D.e
*   Always expected to reject (because D.e_it and D.e_{it-1} share e_{it-1})
*   Not informative for instrument validity

* AR(2): tests for second-order serial correlation in D.e
*   If AR(2) rejects: e_it is serially correlated at order >= 1
*   This means lag-2 instruments are invalid
*   Solution: use deeper lags, e.g., gmm(L.n, lag(3 5))

* ─── Sargan vs. Hansen J Test ───
* Sargan: assumes homoskedasticity (not robust)
* Hansen: robust to heteroskedasticity and autocorrelation
* Use Hansen when you specify "robust" in xtabond2
* CAUTION: Hansen test is weakened by instrument proliferation
*   p-values close to 1.0 are suspicious (too many instruments)

* ─── Difference-in-Hansen Test ───
* Tests the validity of subsets of instruments
* xtabond2 reports this for the level equation instruments
* If the level instruments fail the diff-Hansen test,
* system GMM is invalid; fall back to difference GMM

* ─── A Complete GMM Reporting Template ───
xtabond2 n L.n w k yr1980-yr1984, ///
    gmm(L.n, lag(2 4) collapse) iv(w k yr1980-yr1984) ///
    robust small twostep

* Report in your paper:
*   - N (observations), N_g (groups), N_i (instruments)
*   - AR(1) p-value, AR(2) p-value
*   - Hansen J p-value
*   - Diff-in-Hansen p-value for level equation
*   - Whether two-step or one-step, collapsed or not

GMM Reporting Checklist for Publication Roodman (2009) provides the definitive guidance. Always report: (1) Number of instruments vs. number of groups (instruments < groups). (2) AR(1) and AR(2) p-values. (3) Hansen J p-value (should be between 0.1 and 0.8; values near 1.0 suggest instrument proliferation). (4) Whether you used one-step or two-step estimation. (5) Whether instruments are collapsed. (6) The lag range for GMM instruments. (7) Sensitivity of results to different lag structures and to using the collapse option. Reviewers at journals like Management Science and JOM will check these systematically.

10.3 Stochastic Frontier Analysis (SFA)

SFA estimates a production or cost frontier and decomposes the error into a symmetric noise term and a one-sided inefficiency term. This is widely used in healthcare operations, banking, and agriculture to measure productive efficiency.

* Cross-sectional SFA (production frontier)
frontier ln_output ln_labor ln_capital, dist(hnormal)

* Cost frontier (use cost() option)
frontier ln_cost ln_output ln_wage ln_rental, cost dist(exponential)

* Technical efficiency scores
predict te, te
summarize te, detail

* Histogram of efficiency
histogram te, frequency ///
    title("Distribution of Technical Efficiency") ///
    xtitle("Efficiency Score")

10.3.1 Technical Efficiency Scores and Second-Stage Analysis

* ─── Predicting Technical Efficiency ───
frontier ln_output ln_labor ln_capital, dist(hnormal)

* Jondrow et al. (1982) efficiency predictor (conditional mean)
predict te_jlms, te
* Battese-Coelli (1988) efficiency predictor
predict te_bc, te

* Summary and distribution
summarize te_jlms, detail
histogram te_jlms, freq normal ///
    title("Technical Efficiency Distribution") ///
    note("Mean TE = `=string(r(mean), "%5.3f")'")

* ─── Second-Stage: What Explains Efficiency? ───
* Option A: Regress TE on determinants (common but flawed)
regress te_jlms firm_age ownership_type export_status, vce(robust)

* Option B: Single-stage SFA with inefficiency determinants (preferred)
* Battese-Coelli (1995) model includes Z variables in the
* inefficiency distribution directly
frontier ln_output ln_labor ln_capital, ///
    dist(hnormal) uhet(firm_age ownership_type export_status)

* The uhet() coefficients show how Z affects INEFFICIENCY
* Positive coefficient => increases inefficiency (reduces TE)
* Negative coefficient => reduces inefficiency (increases TE)

Two-Stage SFA Bias The common practice of estimating SFA in the first stage, predicting TE, and then regressing TE on determinants in a second stage is statistically problematic. The predicted TE values are bounded between 0 and 1, their distribution is non-normal, and the second-stage regression ignores the estimation error in the first stage. Use the single-stage approach with uhet() to include efficiency determinants in the inefficiency distribution directly. If a two-stage approach is required (e.g., with panel SFA), bootstrap the entire procedure for correct standard errors.

10.4 Panel SFA

* Time-invariant inefficiency (Battese-Coelli 1988)
xtfrontier ln_output ln_labor ln_capital, ti

* Time-varying decay model (Battese-Coelli 1992)
xtfrontier ln_output ln_labor ln_capital, tvd

* Predict efficiency
predict te_panel, te

* ─── Choosing Between SFA Models ───
* ti (time-invariant): assumes inefficiency is constant over time
*   Appropriate for short panels where TE is stable
* tvd (time-varying decay): inefficiency follows exp(-eta*(t-T))
*   Allows TE to systematically improve or deteriorate over time
*   eta > 0 => efficiency improves over time
*   eta < 0 => efficiency declines over time

* Compare using LR test or AIC/BIC
quietly xtfrontier ln_output ln_labor ln_capital, ti
estimates store sfa_ti
quietly xtfrontier ln_output ln_labor ln_capital, tvd
estimates store sfa_tvd
lrtest sfa_ti sfa_tvd

10.5 Heckman Selection Model

The Heckman model corrects for sample selection bias. It applies when the dependent variable is observed only for a non-random subsample (e.g., wages observed only for employed individuals).

webuse womenwk, clear

* Two-step Heckman (Heckit)
heckman wage education age, ///
    select(married children education age) twostep

* Full MLE estimation (more efficient)
heckman wage education age, ///
    select(married children education age)

* Key output: rho (correlation between selection and outcome errors)
* If rho is insignificant, selection bias may not be a concern

Exclusion Restriction For the Heckman model to be identified by more than functional form, the selection equation should include at least one variable (the "exclusion restriction") that affects selection into the sample but not the outcome. In the example above, married and children serve this role.

10.5.1 Two-Step vs. MLE: When Does It Matter?

* ─── Comparing Two-Step and MLE ───
webuse womenwk, clear

* Two-step Heckman
heckman wage education age, ///
    select(married children education age) twostep
estimates store heck_2step

* MLE Heckman
heckman wage education age, ///
    select(married children education age)
estimates store heck_mle

* Side-by-side comparison
estimates table heck_2step heck_mle, se stats(N ll)

* Key differences:
* - Two-step: consistent but less efficient; SEs may be inaccurate
*   Advantage: does not require joint normality of errors
* - MLE: efficient under correct specification (joint normality)
*   Advantage: more precise estimates and correct SEs
*   Disadvantage: sensitive to normality assumption

* ─── Testing for Selection Bias ───
* In two-step: test significance of lambda (inverse Mills ratio)
* In MLE: test significance of rho or use LR test
*   (compare MLE Heckman vs. simple regression)

Heckman Practical Guidance (1) If your sample is large, use MLE for efficiency. If you are concerned about misspecification, use two-step as a robustness check. (2) Report both two-step and MLE estimates; if they differ substantially, the normality assumption may be problematic. (3) A strong and defensible exclusion restriction is far more important than the choice between two-step and MLE. (4) If rho or lambda is insignificant, selection bias may not be severe, but the Heckman model is still valid (it reduces to OLS in the limit). (5) For panel data with selection, consider xtheckman or the Wooldridge (1995) correlated random effects approach.

10.6 Heckman Probit

* When the outcome is also binary
heckprobit promoted education tenure, ///
    select(employed = education experience married children)

10.7 Control Function Approach

The control function is a two-step method related to both Heckman and IV. In the first step, you estimate the endogenous variable's relationship with instruments and save the residuals. In the second step, you include these residuals as a control in the outcome equation.

* Step 1: First stage (reduced form for endogenous X)
regress rn_pp instrument_z controls i.year
predict cf_resid, residuals

* Step 2: Include residuals in structural equation
regress qip rn_pp sdi cf_resid controls i.year, vce(cluster facility_id)

* Test endogeneity: t-test on cf_resid
* If significant => endogeneity present, CF is needed
* If not => OLS is consistent

* NOTE: Standard errors in step 2 need bootstrapping
* because cf_resid is a generated regressor

10.7.1 Control Function with Bootstrap Standard Errors

The control function approach produces incorrect standard errors in the second stage because the residual is a generated regressor. Bootstrap the entire two-step procedure to obtain valid inference.

* ─── Complete CF with Correct Bootstrap SEs ───
capture program drop cf_bootstrap
program define cf_bootstrap, eclass
    syntax [, *]

    * Step 1: First stage
    quietly regress rn_pp instrument_z sdi controls i.year
    quietly predict double _cf_resid, residuals

    * Step 2: Structural equation with CF residual
    regress qip rn_pp sdi _cf_resid controls i.year

    * Clean up
    drop _cf_resid
end

* Bootstrap with cluster resampling
bootstrap, reps(500) seed(42) cluster(facility_id): cf_bootstrap

* ─── CF for Nonlinear Models (Probit/Poisson) ───
capture program drop cf_probit_boot
program define cf_probit_boot, eclass
    * Step 1: Linear first stage for continuous endogenous variable
    quietly regress endogenous_x instrument_z controls
    quietly predict double _cf_v, residuals

    * Step 2: Probit with CF residual
    probit binary_y endogenous_x _cf_v controls

    drop _cf_v
end

bootstrap, reps(500) seed(42) cluster(unit_id): cf_probit_boot

* ─── CF for Panel Data ───
capture program drop cf_panel_boot
program define cf_panel_boot, eclass
    quietly reghdfe rn_pp instrument_z controls, absorb(facility_id year)
    quietly predict double _cf_r, residuals

    reghdfe qip rn_pp sdi _cf_r controls, absorb(facility_id year)

    drop _cf_r
end

bootstrap, reps(500) seed(42) cluster(facility_id): cf_panel_boot

Control Function vs. 2SLS vs. Heckman These three approaches are related but serve different purposes. (1) CF vs. 2SLS: identical in linear models, but CF extends to nonlinear models (logit, probit, Poisson) where 2SLS is inconsistent. (2) CF vs. Heckman: the Heckman selection model is a special case of the CF approach where the "endogeneity" is sample selection rather than a correlated regressor. Both include a correction term in the outcome equation (inverse Mills ratio in Heckman, first-stage residual in CF). (3) In all cases, the correction term provides a built-in test of the endogeneity/selection assumption: if it is insignificant, OLS/the uncorrected model is consistent.

10.8 Bootstrap Inference

Bootstrap resamples the data with replacement and re-estimates the model many times to obtain standard errors that do not rely on distributional assumptions. This is particularly useful for two-step estimators where analytical standard errors are complex.

* Bootstrap a simple regression
bootstrap, reps(1000) seed(12345) cluster(facility_id): ///
    regress qip rn_pp sdi controls

* Bootstrap a two-step procedure
capture program drop cf_boot
program define cf_boot, eclass
    * Step 1
    regress rn_pp instrument_z controls
    predict double cf_r, residuals
    * Step 2
    regress qip rn_pp sdi cf_r controls
    drop cf_r
end

bootstrap, reps(500) seed(42) cluster(facility_id): cf_boot

10.8.1 Bootstrap Confidence Intervals

* ─── Types of Bootstrap Confidence Intervals ───
bootstrap, reps(1000) seed(42): regress qip rn_pp sdi controls

* Normal approximation (default)
estat bootstrap, normal

* Percentile method (distribution-free)
estat bootstrap, percentile

* Bias-corrected (BC) interval
estat bootstrap, bc

* Bias-corrected accelerated (BCa) interval (most reliable)
estat bootstrap, bca

* Compare all CI methods
estat bootstrap, all

Bootstrap Best Practices (1) Use at least 500 replications for standard errors and 1000+ for confidence intervals. For published work, 2000-5000 replications provide stable results. (2) Always set a seed for reproducibility (seed(12345)). (3) If your data is clustered, use cluster() in the bootstrap call; this resamples entire clusters rather than individual observations. (4) BCa confidence intervals are generally preferred over normal or percentile intervals because they correct for both bias and skewness. (5) If bootstrap replications fail frequently (e.g., matrix not positive definite), your model may be fragile or overparameterized.

10.9 Wild Cluster Bootstrap

When the number of clusters is small (fewer than 30-50), conventional cluster-robust standard errors can be severely biased. The wild cluster bootstrap provides more reliable inference.

* Install: ssc install boottest, replace

regress outcome treatment controls, vce(cluster state)

* Wild cluster bootstrap p-value for treatment
boottest treatment, reps(9999) seed(42) cluster(state)

* Webb weights (recommended for very few clusters, G < 11)
boottest treatment, reps(9999) seed(42) cluster(state) weight(webb)

10.10 Permutation Tests

Permutation tests (also called randomization inference) provide exact p-values by comparing the observed test statistic to its distribution under random reassignment of treatment. They require no distributional assumptions and are particularly useful with small samples or when standard asymptotic inference is suspect.

* ─── Permutation Test for a Treatment Effect ───
permute treatment _b[treatment], reps(5000) seed(42): ///
    regress outcome treatment controls

* This randomly reassigns the treatment variable 5000 times,
* re-estimates the model each time, and computes the proportion
* of permuted coefficients as extreme as the observed one.

* ─── Fisher Exact Test for 2x2 Tables ───
* Built-in exact test for small-sample binary outcomes
tabi 10 5 \ 3 12, exact

* ─── Randomization Inference for DID ───
* Randomly reassign treatment status and re-estimate DID
* Particularly useful when the number of treated units is small
permute treatment _b[treat_post], reps(5000) seed(42): ///
    regress outcome treatment post treat_post controls, cluster(state)

When to Use Permutation Tests Permutation tests are appropriate when: (1) sample sizes are too small for asymptotic approximations to be reliable; (2) the data distribution is highly non-normal; (3) you want to provide an exact (non-parametric) p-value as a robustness check alongside conventional inference; (4) the number of clusters is very small (complement to wild cluster bootstrap). In causal inference, randomization inference provides the "gold standard" p-value when treatment is genuinely randomly assigned. For observational studies, permutation tests still require the assumption that the permuted variable is exchangeable under the null.

10.11 Sensitivity Analysis for Unobserved Confounding

Even with a careful identification strategy, reviewers may question whether unobserved confounders could explain your results. Sensitivity analysis quantifies how strong confounding would need to be to overturn your conclusions.

* ─── Oster (2019) Proportional Selection ───
* Install: ssc install psacalc, replace

* Step 1: Run baseline without controls
regress outcome treatment
local b_base = _b[treatment]
local r2_base = e(r2)

* Step 2: Run with observable controls
regress outcome treatment controls
local b_full = _b[treatment]
local r2_full = e(r2)

* Step 3: Compute delta (how much unobservables would need to
*         matter relative to observables to drive beta to zero)
psacalc delta treatment, rmax(1.3 * `r2_full') mcontrol(controls)

* delta > 1 => unobservables would need to be more important
*              than all observables combined to explain away the result
* This is considered strong evidence of a robust result

Sensitivity Analysis for Reviewers Oster (2019) sensitivity analysis is increasingly required at top management and economics journals. The key output is delta: the ratio of selection on unobservables to selection on observables needed to produce a zero treatment effect. A delta > 1 (under the assumption that R-squared_max is at most 1.3 times the full-controls R-squared) provides strong evidence that your result is not driven by omitted variables. Report both delta and the identified set [beta_full, beta_adjusted] in your robustness section.

Exercise 10.1

Using webuse abdata, estimate a system GMM model of employment (n) on lagged employment, wages (w), and capital (k) with xtabond2. Use the collapse option. Check the AR(2) test and Hansen J statistic. Are the moment conditions valid?

Exercise 10.2

Using webuse womenwk, estimate a Heckman selection model of wage on education and age, with the selection equation including married, children, education, and age. Compare the two-step and MLE estimates. Is the inverse Mills ratio (lambda) significant?

Exercise 10.3

Using webuse abdata, estimate both difference GMM (noleveleq) and system GMM for employment on lagged employment, wages, and capital. Compare the AR(2) and Hansen J p-values. Vary the lag depth from lag(2 3) to lag(2 5) to lag(2 .) (all available lags). How do the results change? What happens to the Hansen J p-value as you add more instruments? Report the number of instruments for each specification and explain which you would present to a reviewer.

Exercise 10.4

Write a bootstrap program for a two-step control function procedure: (1) regress wage on education, age, and an instrument of your choice, saving the residual; (2) include the residual in a Heckman selection model or a simple outcome regression. Run 500 bootstrap replications with cluster resampling. Compare the bootstrap standard errors to the naive (non-bootstrapped) standard errors from the second stage. How much do they differ?

External Resources

Stata Manual: xtabond External
Roodman (2009): xtabond2 in Stata Journal External
Stata Manual: frontier (SFA) External
UCLA OARC: Heckman Selection Model External
boottest on SSC (Wild Cluster Bootstrap) External
psacalc on SSC (Oster Sensitivity Analysis) External

Key Takeaways

Use system GMM (xtabond2) for dynamic panels; always check AR(2) and Hansen J, and avoid instrument proliferation.
Report the number of instruments vs. groups, and sensitivity to lag structure and the collapse option.
SFA separates noise from inefficiency; prefer single-stage models with uhet() over two-stage approaches.
The Heckman model corrects for selection bias but requires a credible exclusion restriction.
The control function extends IV to nonlinear models; always bootstrap the entire two-step procedure for correct SEs.
Bootstrap standard errors for any two-step procedure where the first step generates regressors.
Wild cluster bootstrap (boottest) provides reliable inference with few clusters.
Permutation tests offer exact, distribution-free p-values for small samples or robustness checks.
Oster (2019) sensitivity analysis quantifies how strong unobserved confounding must be to overturn your result.

← Chapter 9: Causal Inference Back to Guide Home →