2SLS, Weak Instruments, Control Functions, Lewbel HBIV, and Overidentification Tests
Endogeneity arises when a regressor is correlated with the error term, which can occur through omitted variables, simultaneity, or measurement error. OLS estimates are biased and inconsistent in this case. Instrumental variables (IV) estimation provides a solution when you can find a variable (the instrument) that is correlated with the endogenous regressor but uncorrelated with the error.
Formally, suppose the structural equation is Y = Xβ + Zγ + u, where X is endogenous (Cov(X, u) ≠ 0) and Z contains exogenous controls. The OLS estimator of β is inconsistent because E[X'u] ≠ 0. IV recovers a consistent estimate by isolating variation in X that is uncorrelated with u through the instrument.
Stata's ivregress 2sls command estimates 2SLS models. The syntax separates exogenous variables from the endogenous regressor and its instruments.
webuse hsng2, clear * OLS baseline (likely biased) regress rent hsngval pcturban * 2SLS: hsngval is endogenous, instrumented by faminc and region ivregress 2sls rent pcturban (hsngval = faminc i.region), vce(robust) * Equivalent syntax: first() option shows first-stage results ivregress 2sls rent pcturban (hsngval = faminc i.region), first vce(robust)
ivregress 2sls, the standard errors from the manual second-stage regression are wrong because they treat the fitted values as data rather than estimates. The residual variance is computed using fitted X rather than actual X, leading to incorrect SEs. Always use ivregress, ivreg2, or ivreghdfe to get valid standard errors in a single estimation step.
A weak instrument produces IV estimates that are biased toward OLS and have unreliable inference. The first-stage F-statistic is the primary diagnostic.
* After ivregress, test first-stage strength estat firststage * Rule of thumb: F-statistic > 10 (Staiger-Stock) * For rigorous testing, use Stock-Yogo critical values * reported by ivreg2
ivreg2 package reports these automatically.
The Staiger-Stock F > 10 rule applies only to the case of a single endogenous regressor and i.i.d. errors. Modern applied work requires more nuanced diagnostics.
* ─── Stock-Yogo Critical Values ─── * ivreg2 reports these automatically after estimation ivreg2 rent pcturban (hsngval = faminc i.region), robust first * The output table shows critical values for: * - 2SLS relative bias: 5%, 10%, 20%, 30% of OLS bias * - 2SLS size of 5% Wald test: 10%, 15%, 20%, 25% actual size * Compare your Kleibergen-Paap rk Wald F to these thresholds * ─── Kleibergen-Paap Statistics ─── * With heteroskedastic or clustered errors, use the KP statistic * instead of the Cragg-Donald (which assumes i.i.d. errors) * - KP rk LM stat: tests underidentification (H0: rank deficient) * - KP rk Wald F stat: tests weak identification * ─── Anderson-Rubin Test for Weak-Instrument-Robust Inference ─── * When instruments are weak, use the Anderson-Rubin (AR) test * It provides valid inference regardless of instrument strength ivregress 2sls rent pcturban (hsngval = faminc i.region), vce(robust) weakivtest * Or use ivreg2 with the AR confidence set * Install: ssc install weakiv, replace ivreg2 rent pcturban (hsngval = faminc i.region), robust weakiv
When you have more instruments than endogenous variables (overidentification), you can test whether the instruments as a group satisfy the exclusion restriction. The Sargan/Hansen J-test is the standard approach.
* After ivregress estat overid * H0: all instruments are valid (exogenous) * Reject => at least one instrument may be invalid * IMPORTANT: The Sargan test assumes homoskedasticity * Under heteroskedasticity, use the Hansen J from ivreg2: ivreg2 rent pcturban (hsngval = faminc i.region), robust * Hansen J statistic is reported at the bottom of ivreg2 output
The user-written ivreg2 package provides more comprehensive IV diagnostics than the built-in ivregress, including Kleibergen-Paap statistics for heteroskedastic data and Stock-Yogo critical values.
* Install: ssc install ivreg2, replace * ssc install ranktest, replace * Estimate with ivreg2 ivreg2 rent pcturban (hsngval = faminc i.region), robust first * Key diagnostics reported automatically: * - Kleibergen-Paap rk Wald F (robust first-stage F) * - Kleibergen-Paap rk LM (underidentification test) * - Hansen J statistic (overidentification test with robust SEs)
The choice between ivreg2 and ivregress depends on your estimation context. Here is a practical comparison:
| Feature | ivregress (built-in) | ivreg2 (user-written) |
|---|---|---|
| Estimation methods | 2SLS, LIML, GMM | 2SLS, LIML, GMM, CUE, FULLER |
| Heteroskedasticity-robust F | Cragg-Donald only (i.i.d.) | Kleibergen-Paap (robust/cluster) |
| Stock-Yogo critical values | Not reported | Reported automatically |
| Overidentification test | Sargan (i.i.d.) via estat overid | Hansen J (robust to heterosked.) |
| Underidentification test | Not available | Kleibergen-Paap LM |
| Cluster-robust SEs | Yes | Yes (with multi-way clustering) |
| LIML/Fuller k-class | LIML only | LIML, Fuller(1), Fuller(4), CUE |
ivreg2 as your default for published research. The Kleibergen-Paap F-statistic is valid under heteroskedasticity and clustering, while the Cragg-Donald statistic from ivregress assumes i.i.d. errors. When your first-stage F is borderline (between 10 and 20), consider LIML or Fuller estimation, which are less biased than 2SLS with weak instruments: ivreg2 y (x = z1 z2), liml robust.
Before running IV, you may want to test whether the suspected endogenous variable is actually endogenous. If it is not, OLS is preferred because it is more efficient.
* After ivregress 2sls estat endogenous * H0: variable is exogenous. Reject => endogeneity present, IV justified * Manual Durbin-Wu-Hausman approach: * 1. Run first stage, save residuals regress hsngval faminc i.region pcturban predict v_hat, residuals * 2. Include residuals in structural equation regress rent hsngval pcturban v_hat * If v_hat is significant => endogeneity confirmed
For panel data IV estimation, combine xtivreg (built-in) or ivreghdfe (user-written) with fixed effects.
* Panel IV with fixed effects xtivreg qip (rn_pp = instrument_z) sdi i.year, fe vce(cluster facility_id) * Or with reghdfe-style syntax (install: ssc install ivreghdfe) ivreghdfe qip (rn_pp = instrument_z) sdi, /// absorb(facility_id year) cluster(facility_id)
When external instruments are unavailable, Lewbel (2012) instruments exploit heteroskedasticity in the first-stage residuals to generate internal instruments. This approach requires that the error terms exhibit heteroskedasticity related to some exogenous variables.
The key idea: if there is heteroskedasticity in the first-stage equation, then the instruments (Z - E[Z]) * e_hat (where e_hat are first-stage residuals and Z are exogenous regressors) satisfy the moment conditions required for identification. The identifying assumption is that E[Z * u * e] = 0 while E[Z * e^2] ≠ 0.
* Install: ssc install ivreg2h, replace * Lewbel HBIV estimation ivreg2h qip (rn_pp = ) sdi controls i.year, robust * Combine Lewbel instruments with external instruments ivreg2h qip (rn_pp = external_iv) sdi controls i.year, robust * ─── Verifying the Heteroskedasticity Assumption ─── * Step 1: Run the first-stage regression regress rn_pp sdi controls i.year predict resid_fs, residuals * Step 2: Test for heteroskedasticity using Breusch-Pagan estat hettest sdi controls * Reject H0 (homoskedasticity) => Lewbel instruments have identifying power * Step 3: Verify instrument strength in the augmented first stage * ivreg2h reports the first-stage F and Hansen J automatically
The control function (CF) is a two-step method closely related to 2SLS but with important advantages when the endogenous variable enters nonlinearly (e.g., in logit, probit, or Poisson models). The CF approach includes first-stage residuals directly in the outcome equation. Under linearity and homoskedasticity, CF and 2SLS produce identical point estimates, but CF generalizes to nonlinear models where standard IV does not.
* ─── Control Function: Linear Case ─── * Step 1: First stage regress rn_pp instrument_z sdi controls i.year predict cf_resid, residuals * Step 2: Include residuals in structural equation regress qip rn_pp sdi cf_resid controls i.year, vce(cluster facility_id) * The t-test on cf_resid is a test of endogeneity * If significant => endogeneity present, keep cf_resid * If not significant => OLS is consistent, drop cf_resid * ─── Control Function: Nonlinear Case (Probit) ─── * Step 1: First stage (still linear for continuous endogenous var) regress endogenous_x instrument_z controls predict cf_v, residuals * Step 2: Probit with residual control probit binary_y endogenous_x cf_v controls * IMPORTANT: SEs must be bootstrapped because cf_v is generated capture program drop cf_probit program define cf_probit, eclass regress endogenous_x instrument_z controls predict double _cf_v, residuals probit binary_y endogenous_x _cf_v controls drop _cf_v end bootstrap, reps(500) seed(12345) cluster(unit_id): cf_probit
Limited Information Maximum Likelihood (LIML) and Fuller estimators are alternatives to 2SLS that perform better with weak instruments. LIML is approximately median-unbiased even when instruments are moderately weak, while 2SLS can have substantial bias.
* LIML estimation ivregress liml rent pcturban (hsngval = faminc i.region), vce(robust) * LIML with ivreg2 ivreg2 rent pcturban (hsngval = faminc i.region), liml robust * Fuller estimator (bias-corrected LIML, Fuller k=1 or k=4) ivreg2 rent pcturban (hsngval = faminc i.region), fuller(1) robust * Compare 2SLS and LIML estimates * If they differ substantially, instruments may be weak quietly ivreg2 rent pcturban (hsngval = faminc i.region), robust estimates store iv_2sls quietly ivreg2 rent pcturban (hsngval = faminc i.region), liml robust estimates store iv_liml estimates table iv_2sls iv_liml, se stats(N r2)
Top journals in economics, management, and operations research have converging expectations for how IV results should be reported. Below is a checklist for a complete IV presentation.
Using webuse hsng2, estimate the effect of hsngval on rent using 2SLS with faminc as the instrument. Report the first-stage F-statistic. Is faminc a strong instrument by the Staiger-Stock criterion?
Re-estimate the model from Exercise 6.1 using ivreg2 with the robust and first options. Compare the Kleibergen-Paap F-statistic to the conventional F-statistic. Run estat endogenous to test whether IV is necessary.
Using webuse hsng2, estimate the model from Exercise 6.1 with both 2SLS and LIML. Use estimates store and estimates table to compare the point estimates side by side. If the LIML and 2SLS estimates diverge, what does that tell you about instrument strength?
Implement a control function approach manually: (1) regress hsngval on faminc, i.region, and pcturban, and save the residuals; (2) include those residuals in a regression of rent on hsngval and pcturban. Is the residual significant (endogeneity test)? Compare the coefficient on hsngval to your 2SLS estimate. Then write a bootstrap program to obtain correct standard errors for the two-step procedure.
ivreg2 over ivregress for better diagnostics with heteroskedastic data.