GMM, Stochastic Frontier, Heckman, Control Function, Bootstrap, and Permutation Tests
GMM is a flexible estimation framework that nests OLS, IV, and many other estimators as special cases. In panel data, dynamic GMM estimators (Arellano-Bond, Arellano-Bover/Blundell-Bond) handle lagged dependent variables and endogenous regressors using internal instruments.
* --- Stata's built-in gmm command --- * Define moment conditions explicitly webuse abdata, clear * Arellano-Bond (difference GMM) with xtabond xtabond n L.n w k, lags(1) vce(robust) * --- xtabond2 (Roodman 2009) - the gold standard --- * Install: ssc install xtabond2, replace * Difference GMM xtabond2 n L.n w k yr1980-yr1984, /// gmm(L.n, lag(2 5)) iv(w k yr1980-yr1984) /// robust noleveleq small * System GMM (Blundell-Bond): uses both level and differenced equations xtabond2 n L.n w k yr1980-yr1984, /// gmm(L.n, lag(2 5)) iv(w k yr1980-yr1984) /// robust small
collapse option in xtabond2 or limit lag depth with lag(2 4) to keep the instrument count manageable. A rule of thumb: the number of instruments should not exceed the number of groups.
Proper GMM estimation requires passing several diagnostic tests. Reviewers at top journals will closely scrutinize these diagnostics, and failure to report them is a common reason for desk rejection.
* After xtabond2, check: * 1. Hansen J test (overidentification) * H0: instruments are valid. Do NOT reject. * p-value too close to 1.0 => suspect instrument proliferation * 2. Arellano-Bond AR(1) and AR(2) tests * Expect AR(1) to reject (by construction) * AR(2) should NOT reject (validates moment conditions) * 3. Number of instruments vs. number of groups * Instruments should be fewer than groups * Collapsed instruments to reduce count xtabond2 n L.n w k yr1980-yr1984, /// gmm(L.n, lag(2 5) collapse) iv(w k yr1980-yr1984) /// robust small
* ─── Understanding the AR Tests ─── * The AR tests examine serial correlation in the first-differenced errors * AR(1): tests for first-order serial correlation in D.e * Always expected to reject (because D.e_it and D.e_{it-1} share e_{it-1}) * Not informative for instrument validity * AR(2): tests for second-order serial correlation in D.e * If AR(2) rejects: e_it is serially correlated at order >= 1 * This means lag-2 instruments are invalid * Solution: use deeper lags, e.g., gmm(L.n, lag(3 5)) * ─── Sargan vs. Hansen J Test ─── * Sargan: assumes homoskedasticity (not robust) * Hansen: robust to heteroskedasticity and autocorrelation * Use Hansen when you specify "robust" in xtabond2 * CAUTION: Hansen test is weakened by instrument proliferation * p-values close to 1.0 are suspicious (too many instruments) * ─── Difference-in-Hansen Test ─── * Tests the validity of subsets of instruments * xtabond2 reports this for the level equation instruments * If the level instruments fail the diff-Hansen test, * system GMM is invalid; fall back to difference GMM * ─── A Complete GMM Reporting Template ─── xtabond2 n L.n w k yr1980-yr1984, /// gmm(L.n, lag(2 4) collapse) iv(w k yr1980-yr1984) /// robust small twostep * Report in your paper: * - N (observations), N_g (groups), N_i (instruments) * - AR(1) p-value, AR(2) p-value * - Hansen J p-value * - Diff-in-Hansen p-value for level equation * - Whether two-step or one-step, collapsed or not
collapse option. Reviewers at journals like Management Science and JOM will check these systematically.
SFA estimates a production or cost frontier and decomposes the error into a symmetric noise term and a one-sided inefficiency term. This is widely used in healthcare operations, banking, and agriculture to measure productive efficiency.
* Cross-sectional SFA (production frontier) frontier ln_output ln_labor ln_capital, dist(hnormal) * Cost frontier (use cost() option) frontier ln_cost ln_output ln_wage ln_rental, cost dist(exponential) * Technical efficiency scores predict te, te summarize te, detail * Histogram of efficiency histogram te, frequency /// title("Distribution of Technical Efficiency") /// xtitle("Efficiency Score")
* ─── Predicting Technical Efficiency ─── frontier ln_output ln_labor ln_capital, dist(hnormal) * Jondrow et al. (1982) efficiency predictor (conditional mean) predict te_jlms, te * Battese-Coelli (1988) efficiency predictor predict te_bc, te * Summary and distribution summarize te_jlms, detail histogram te_jlms, freq normal /// title("Technical Efficiency Distribution") /// note("Mean TE = `=string(r(mean), "%5.3f")'") * ─── Second-Stage: What Explains Efficiency? ─── * Option A: Regress TE on determinants (common but flawed) regress te_jlms firm_age ownership_type export_status, vce(robust) * Option B: Single-stage SFA with inefficiency determinants (preferred) * Battese-Coelli (1995) model includes Z variables in the * inefficiency distribution directly frontier ln_output ln_labor ln_capital, /// dist(hnormal) uhet(firm_age ownership_type export_status) * The uhet() coefficients show how Z affects INEFFICIENCY * Positive coefficient => increases inefficiency (reduces TE) * Negative coefficient => reduces inefficiency (increases TE)
uhet() to include efficiency determinants in the inefficiency distribution directly. If a two-stage approach is required (e.g., with panel SFA), bootstrap the entire procedure for correct standard errors.
* Time-invariant inefficiency (Battese-Coelli 1988) xtfrontier ln_output ln_labor ln_capital, ti * Time-varying decay model (Battese-Coelli 1992) xtfrontier ln_output ln_labor ln_capital, tvd * Predict efficiency predict te_panel, te * ─── Choosing Between SFA Models ─── * ti (time-invariant): assumes inefficiency is constant over time * Appropriate for short panels where TE is stable * tvd (time-varying decay): inefficiency follows exp(-eta*(t-T)) * Allows TE to systematically improve or deteriorate over time * eta > 0 => efficiency improves over time * eta < 0 => efficiency declines over time * Compare using LR test or AIC/BIC quietly xtfrontier ln_output ln_labor ln_capital, ti estimates store sfa_ti quietly xtfrontier ln_output ln_labor ln_capital, tvd estimates store sfa_tvd lrtest sfa_ti sfa_tvd
The Heckman model corrects for sample selection bias. It applies when the dependent variable is observed only for a non-random subsample (e.g., wages observed only for employed individuals).
webuse womenwk, clear * Two-step Heckman (Heckit) heckman wage education age, /// select(married children education age) twostep * Full MLE estimation (more efficient) heckman wage education age, /// select(married children education age) * Key output: rho (correlation between selection and outcome errors) * If rho is insignificant, selection bias may not be a concern
married and children serve this role.
* ─── Comparing Two-Step and MLE ─── webuse womenwk, clear * Two-step Heckman heckman wage education age, /// select(married children education age) twostep estimates store heck_2step * MLE Heckman heckman wage education age, /// select(married children education age) estimates store heck_mle * Side-by-side comparison estimates table heck_2step heck_mle, se stats(N ll) * Key differences: * - Two-step: consistent but less efficient; SEs may be inaccurate * Advantage: does not require joint normality of errors * - MLE: efficient under correct specification (joint normality) * Advantage: more precise estimates and correct SEs * Disadvantage: sensitive to normality assumption * ─── Testing for Selection Bias ─── * In two-step: test significance of lambda (inverse Mills ratio) * In MLE: test significance of rho or use LR test * (compare MLE Heckman vs. simple regression)
xtheckman or the Wooldridge (1995) correlated random effects approach.
* When the outcome is also binary heckprobit promoted education tenure, /// select(employed = education experience married children)
The control function is a two-step method related to both Heckman and IV. In the first step, you estimate the endogenous variable's relationship with instruments and save the residuals. In the second step, you include these residuals as a control in the outcome equation.
* Step 1: First stage (reduced form for endogenous X) regress rn_pp instrument_z controls i.year predict cf_resid, residuals * Step 2: Include residuals in structural equation regress qip rn_pp sdi cf_resid controls i.year, vce(cluster facility_id) * Test endogeneity: t-test on cf_resid * If significant => endogeneity present, CF is needed * If not => OLS is consistent * NOTE: Standard errors in step 2 need bootstrapping * because cf_resid is a generated regressor
The control function approach produces incorrect standard errors in the second stage because the residual is a generated regressor. Bootstrap the entire two-step procedure to obtain valid inference.
* ─── Complete CF with Correct Bootstrap SEs ─── capture program drop cf_bootstrap program define cf_bootstrap, eclass syntax [, *] * Step 1: First stage quietly regress rn_pp instrument_z sdi controls i.year quietly predict double _cf_resid, residuals * Step 2: Structural equation with CF residual regress qip rn_pp sdi _cf_resid controls i.year * Clean up drop _cf_resid end * Bootstrap with cluster resampling bootstrap, reps(500) seed(42) cluster(facility_id): cf_bootstrap * ─── CF for Nonlinear Models (Probit/Poisson) ─── capture program drop cf_probit_boot program define cf_probit_boot, eclass * Step 1: Linear first stage for continuous endogenous variable quietly regress endogenous_x instrument_z controls quietly predict double _cf_v, residuals * Step 2: Probit with CF residual probit binary_y endogenous_x _cf_v controls drop _cf_v end bootstrap, reps(500) seed(42) cluster(unit_id): cf_probit_boot * ─── CF for Panel Data ─── capture program drop cf_panel_boot program define cf_panel_boot, eclass quietly reghdfe rn_pp instrument_z controls, absorb(facility_id year) quietly predict double _cf_r, residuals reghdfe qip rn_pp sdi _cf_r controls, absorb(facility_id year) drop _cf_r end bootstrap, reps(500) seed(42) cluster(facility_id): cf_panel_boot
Bootstrap resamples the data with replacement and re-estimates the model many times to obtain standard errors that do not rely on distributional assumptions. This is particularly useful for two-step estimators where analytical standard errors are complex.
* Bootstrap a simple regression bootstrap, reps(1000) seed(12345) cluster(facility_id): /// regress qip rn_pp sdi controls * Bootstrap a two-step procedure capture program drop cf_boot program define cf_boot, eclass * Step 1 regress rn_pp instrument_z controls predict double cf_r, residuals * Step 2 regress qip rn_pp sdi cf_r controls drop cf_r end bootstrap, reps(500) seed(42) cluster(facility_id): cf_boot
* ─── Types of Bootstrap Confidence Intervals ─── bootstrap, reps(1000) seed(42): regress qip rn_pp sdi controls * Normal approximation (default) estat bootstrap, normal * Percentile method (distribution-free) estat bootstrap, percentile * Bias-corrected (BC) interval estat bootstrap, bc * Bias-corrected accelerated (BCa) interval (most reliable) estat bootstrap, bca * Compare all CI methods estat bootstrap, all
seed(12345)). (3) If your data is clustered, use cluster() in the bootstrap call; this resamples entire clusters rather than individual observations. (4) BCa confidence intervals are generally preferred over normal or percentile intervals because they correct for both bias and skewness. (5) If bootstrap replications fail frequently (e.g., matrix not positive definite), your model may be fragile or overparameterized.
When the number of clusters is small (fewer than 30-50), conventional cluster-robust standard errors can be severely biased. The wild cluster bootstrap provides more reliable inference.
* Install: ssc install boottest, replace regress outcome treatment controls, vce(cluster state) * Wild cluster bootstrap p-value for treatment boottest treatment, reps(9999) seed(42) cluster(state) * Webb weights (recommended for very few clusters, G < 11) boottest treatment, reps(9999) seed(42) cluster(state) weight(webb)
Permutation tests (also called randomization inference) provide exact p-values by comparing the observed test statistic to its distribution under random reassignment of treatment. They require no distributional assumptions and are particularly useful with small samples or when standard asymptotic inference is suspect.
* ─── Permutation Test for a Treatment Effect ─── permute treatment _b[treatment], reps(5000) seed(42): /// regress outcome treatment controls * This randomly reassigns the treatment variable 5000 times, * re-estimates the model each time, and computes the proportion * of permuted coefficients as extreme as the observed one. * ─── Fisher Exact Test for 2x2 Tables ─── * Built-in exact test for small-sample binary outcomes tabi 10 5 \ 3 12, exact * ─── Randomization Inference for DID ─── * Randomly reassign treatment status and re-estimate DID * Particularly useful when the number of treated units is small permute treatment _b[treat_post], reps(5000) seed(42): /// regress outcome treatment post treat_post controls, cluster(state)
Even with a careful identification strategy, reviewers may question whether unobserved confounders could explain your results. Sensitivity analysis quantifies how strong confounding would need to be to overturn your conclusions.
* ─── Oster (2019) Proportional Selection ─── * Install: ssc install psacalc, replace * Step 1: Run baseline without controls regress outcome treatment local b_base = _b[treatment] local r2_base = e(r2) * Step 2: Run with observable controls regress outcome treatment controls local b_full = _b[treatment] local r2_full = e(r2) * Step 3: Compute delta (how much unobservables would need to * matter relative to observables to drive beta to zero) psacalc delta treatment, rmax(1.3 * `r2_full') mcontrol(controls) * delta > 1 => unobservables would need to be more important * than all observables combined to explain away the result * This is considered strong evidence of a robust result
Using webuse abdata, estimate a system GMM model of employment (n) on lagged employment, wages (w), and capital (k) with xtabond2. Use the collapse option. Check the AR(2) test and Hansen J statistic. Are the moment conditions valid?
Using webuse womenwk, estimate a Heckman selection model of wage on education and age, with the selection equation including married, children, education, and age. Compare the two-step and MLE estimates. Is the inverse Mills ratio (lambda) significant?
Using webuse abdata, estimate both difference GMM (noleveleq) and system GMM for employment on lagged employment, wages, and capital. Compare the AR(2) and Hansen J p-values. Vary the lag depth from lag(2 3) to lag(2 5) to lag(2 .) (all available lags). How do the results change? What happens to the Hansen J p-value as you add more instruments? Report the number of instruments for each specification and explain which you would present to a reviewer.
Write a bootstrap program for a two-step control function procedure: (1) regress wage on education, age, and an instrument of your choice, saving the residual; (2) include the residual in a Heckman selection model or a simple outcome regression. Run 500 bootstrap replications with cluster resampling. Compare the bootstrap standard errors to the naive (non-bootstrapped) standard errors from the second stage. How much do they differ?
xtabond2) for dynamic panels; always check AR(2) and Hansen J, and avoid instrument proliferation.collapse option.uhet() over two-stage approaches.boottest) provides reliable inference with few clusters.