Chapter 6: Instrumental Variables & Endogeneity

2SLS, Weak Instruments, Control Functions, Lewbel HBIV, and Overidentification Tests

6.1 The Endogeneity Problem

Endogeneity arises when a regressor is correlated with the error term, which can occur through omitted variables, simultaneity, or measurement error. OLS estimates are biased and inconsistent in this case. Instrumental variables (IV) estimation provides a solution when you can find a variable (the instrument) that is correlated with the endogenous regressor but uncorrelated with the error.

Formally, suppose the structural equation is Y = Xβ + Zγ + u, where X is endogenous (Cov(X, u) ≠ 0) and Z contains exogenous controls. The OLS estimator of β is inconsistent because E[X'u] ≠ 0. IV recovers a consistent estimate by isolating variation in X that is uncorrelated with u through the instrument.

Two IV Conditions An instrument Z must satisfy: (1) Relevance: Cov(Z, X) is not zero, and (2) Exclusion: Cov(Z, u) = 0. Condition (1) is testable; condition (2) is not (it relies on economic reasoning).
Three Sources of Endogeneity Reviewers expect you to clearly articulate which source motivates your IV strategy. (1) Omitted variable bias: an unobserved confounder affects both X and Y. (2) Simultaneity: X causes Y, but Y also causes X. (3) Measurement error: X is measured with error, attenuating the OLS coefficient toward zero. The choice of instrument should logically address the specific source identified.

6.2 Two-Stage Least Squares (2SLS)

Stata's ivregress 2sls command estimates 2SLS models. The syntax separates exogenous variables from the endogenous regressor and its instruments.

webuse hsng2, clear

* OLS baseline (likely biased)
regress rent hsngval pcturban

* 2SLS: hsngval is endogenous, instrumented by faminc and region
ivregress 2sls rent pcturban (hsngval = faminc i.region), vce(robust)

* Equivalent syntax: first() option shows first-stage results
ivregress 2sls rent pcturban (hsngval = faminc i.region), first vce(robust)
The Forbidden Regression Never run 2SLS "by hand" by literally regressing Y on the fitted values from the first stage. Although the point estimates will be identical to ivregress 2sls, the standard errors from the manual second-stage regression are wrong because they treat the fitted values as data rather than estimates. The residual variance is computed using fitted X rather than actual X, leading to incorrect SEs. Always use ivregress, ivreg2, or ivreghdfe to get valid standard errors in a single estimation step.

6.3 First-Stage Diagnostics

A weak instrument produces IV estimates that are biased toward OLS and have unreliable inference. The first-stage F-statistic is the primary diagnostic.

* After ivregress, test first-stage strength
estat firststage

* Rule of thumb: F-statistic > 10 (Staiger-Stock)
* For rigorous testing, use Stock-Yogo critical values
* reported by ivreg2
The Stock-Yogo Threshold An F-statistic above 10 is a rough heuristic. The more rigorous approach uses Stock and Yogo (2005) critical values, which depend on the number of instruments and the acceptable bias relative to OLS. The ivreg2 package reports these automatically.

6.4 Weak Instrument Testing: Beyond F > 10

The Staiger-Stock F > 10 rule applies only to the case of a single endogenous regressor and i.i.d. errors. Modern applied work requires more nuanced diagnostics.

* ─── Stock-Yogo Critical Values ───
* ivreg2 reports these automatically after estimation
ivreg2 rent pcturban (hsngval = faminc i.region), robust first

* The output table shows critical values for:
*   - 2SLS relative bias: 5%, 10%, 20%, 30% of OLS bias
*   - 2SLS size of 5% Wald test: 10%, 15%, 20%, 25% actual size
* Compare your Kleibergen-Paap rk Wald F to these thresholds

* ─── Kleibergen-Paap Statistics ───
* With heteroskedastic or clustered errors, use the KP statistic
* instead of the Cragg-Donald (which assumes i.i.d. errors)
*   - KP rk LM stat: tests underidentification (H0: rank deficient)
*   - KP rk Wald F stat: tests weak identification

* ─── Anderson-Rubin Test for Weak-Instrument-Robust Inference ───
* When instruments are weak, use the Anderson-Rubin (AR) test
* It provides valid inference regardless of instrument strength
ivregress 2sls rent pcturban (hsngval = faminc i.region), vce(robust)
weakivtest

* Or use ivreg2 with the AR confidence set
* Install: ssc install weakiv, replace
ivreg2 rent pcturban (hsngval = faminc i.region), robust
weakiv
Anderson-Rubin Test The Anderson-Rubin (AR) test is valid regardless of instrument strength. It tests H0: β = β0 by projecting out the endogenous regressor and testing whether the instruments are jointly significant in a reduced-form regression. The AR confidence set inverts this test across all possible values of β. If the AR confidence interval is much wider than the conventional Wald interval, your instruments may be weak despite passing the F > 10 screen. Reviewers at top journals increasingly expect weak-instrument-robust inference when the first-stage F is below 20.

6.5 Overidentification Tests

When you have more instruments than endogenous variables (overidentification), you can test whether the instruments as a group satisfy the exclusion restriction. The Sargan/Hansen J-test is the standard approach.

* After ivregress
estat overid
* H0: all instruments are valid (exogenous)
* Reject => at least one instrument may be invalid

* IMPORTANT: The Sargan test assumes homoskedasticity
* Under heteroskedasticity, use the Hansen J from ivreg2:
ivreg2 rent pcturban (hsngval = faminc i.region), robust
* Hansen J statistic is reported at the bottom of ivreg2 output
Limitations of Overidentification Tests The J-test can only detect whether instruments differ from each other in their implied estimates. If all instruments are invalid in the same direction (e.g., all correlated with the error through the same omitted variable), the test has no power. A passing J-test is necessary but not sufficient for instrument validity. Your exclusion restriction argument must rest on economic reasoning and institutional knowledge.

6.6 The ivreg2 Package

The user-written ivreg2 package provides more comprehensive IV diagnostics than the built-in ivregress, including Kleibergen-Paap statistics for heteroskedastic data and Stock-Yogo critical values.

* Install: ssc install ivreg2, replace
*          ssc install ranktest, replace

* Estimate with ivreg2
ivreg2 rent pcturban (hsngval = faminc i.region), robust first

* Key diagnostics reported automatically:
* - Kleibergen-Paap rk Wald F (robust first-stage F)
* - Kleibergen-Paap rk LM (underidentification test)
* - Hansen J statistic (overidentification test with robust SEs)

6.6.1 ivreg2 vs. ivregress: Which to Use?

The choice between ivreg2 and ivregress depends on your estimation context. Here is a practical comparison:

Featureivregress (built-in)ivreg2 (user-written)
Estimation methods2SLS, LIML, GMM2SLS, LIML, GMM, CUE, FULLER
Heteroskedasticity-robust FCragg-Donald only (i.i.d.)Kleibergen-Paap (robust/cluster)
Stock-Yogo critical valuesNot reportedReported automatically
Overidentification testSargan (i.i.d.) via estat overidHansen J (robust to heterosked.)
Underidentification testNot availableKleibergen-Paap LM
Cluster-robust SEsYesYes (with multi-way clustering)
LIML/Fuller k-classLIML onlyLIML, Fuller(1), Fuller(4), CUE
Practical Recommendation Use ivreg2 as your default for published research. The Kleibergen-Paap F-statistic is valid under heteroskedasticity and clustering, while the Cragg-Donald statistic from ivregress assumes i.i.d. errors. When your first-stage F is borderline (between 10 and 20), consider LIML or Fuller estimation, which are less biased than 2SLS with weak instruments: ivreg2 y (x = z1 z2), liml robust.

6.7 Endogeneity Test (Durbin-Wu-Hausman)

Before running IV, you may want to test whether the suspected endogenous variable is actually endogenous. If it is not, OLS is preferred because it is more efficient.

* After ivregress 2sls
estat endogenous
* H0: variable is exogenous. Reject => endogeneity present, IV justified

* Manual Durbin-Wu-Hausman approach:
* 1. Run first stage, save residuals
regress hsngval faminc i.region pcturban
predict v_hat, residuals

* 2. Include residuals in structural equation
regress rent hsngval pcturban v_hat
* If v_hat is significant => endogeneity confirmed

6.8 IV with Panel Data

For panel data IV estimation, combine xtivreg (built-in) or ivreghdfe (user-written) with fixed effects.

* Panel IV with fixed effects
xtivreg qip (rn_pp = instrument_z) sdi i.year, fe vce(cluster facility_id)

* Or with reghdfe-style syntax (install: ssc install ivreghdfe)
ivreghdfe qip (rn_pp = instrument_z) sdi, ///
    absorb(facility_id year) cluster(facility_id)

6.9 Heteroskedasticity-Based IV (Lewbel)

When external instruments are unavailable, Lewbel (2012) instruments exploit heteroskedasticity in the first-stage residuals to generate internal instruments. This approach requires that the error terms exhibit heteroskedasticity related to some exogenous variables.

The key idea: if there is heteroskedasticity in the first-stage equation, then the instruments (Z - E[Z]) * e_hat (where e_hat are first-stage residuals and Z are exogenous regressors) satisfy the moment conditions required for identification. The identifying assumption is that E[Z * u * e] = 0 while E[Z * e^2] ≠ 0.

* Install: ssc install ivreg2h, replace

* Lewbel HBIV estimation
ivreg2h qip (rn_pp = ) sdi controls i.year, robust

* Combine Lewbel instruments with external instruments
ivreg2h qip (rn_pp = external_iv) sdi controls i.year, robust

* ─── Verifying the Heteroskedasticity Assumption ───
* Step 1: Run the first-stage regression
regress rn_pp sdi controls i.year
predict resid_fs, residuals

* Step 2: Test for heteroskedasticity using Breusch-Pagan
estat hettest sdi controls
* Reject H0 (homoskedasticity) => Lewbel instruments have identifying power

* Step 3: Verify instrument strength in the augmented first stage
* ivreg2h reports the first-stage F and Hansen J automatically
Lewbel HBIV Limitations Lewbel instruments have been proven valid for only one endogenous regressor (Baum and Lewbel, 2019). With multiple endogenous variables, the identification conditions become much stronger and are generally not satisfied. Additionally, the generated instruments can be weak if heteroskedasticity is mild. Always report the first-stage F-statistic for the Lewbel specification and compare estimates to OLS and any available external-IV specification as a sensitivity check.

6.10 Control Function Approach

The control function (CF) is a two-step method closely related to 2SLS but with important advantages when the endogenous variable enters nonlinearly (e.g., in logit, probit, or Poisson models). The CF approach includes first-stage residuals directly in the outcome equation. Under linearity and homoskedasticity, CF and 2SLS produce identical point estimates, but CF generalizes to nonlinear models where standard IV does not.

* ─── Control Function: Linear Case ───
* Step 1: First stage
regress rn_pp instrument_z sdi controls i.year
predict cf_resid, residuals

* Step 2: Include residuals in structural equation
regress qip rn_pp sdi cf_resid controls i.year, vce(cluster facility_id)

* The t-test on cf_resid is a test of endogeneity
* If significant => endogeneity present, keep cf_resid
* If not significant => OLS is consistent, drop cf_resid

* ─── Control Function: Nonlinear Case (Probit) ───
* Step 1: First stage (still linear for continuous endogenous var)
regress endogenous_x instrument_z controls
predict cf_v, residuals

* Step 2: Probit with residual control
probit binary_y endogenous_x cf_v controls

* IMPORTANT: SEs must be bootstrapped because cf_v is generated
capture program drop cf_probit
program define cf_probit, eclass
    regress endogenous_x instrument_z controls
    predict double _cf_v, residuals
    probit binary_y endogenous_x _cf_v controls
    drop _cf_v
end

bootstrap, reps(500) seed(12345) cluster(unit_id): cf_probit
CF vs. 2SLS: When Does It Matter? In linear models, the control function and 2SLS are algebraically equivalent (the "augmented regression" representation of 2SLS). The CF approach becomes essential when: (1) the outcome model is nonlinear (logit, probit, Poisson), where 2SLS is inconsistent; (2) you want a direct test of endogeneity via the significance of the residual; (3) you have a two-part model or other structure where standard IV cannot be applied. Published work in operations management and health economics increasingly uses the CF approach for these reasons.

6.11 LIML and Fuller Estimators

Limited Information Maximum Likelihood (LIML) and Fuller estimators are alternatives to 2SLS that perform better with weak instruments. LIML is approximately median-unbiased even when instruments are moderately weak, while 2SLS can have substantial bias.

* LIML estimation
ivregress liml rent pcturban (hsngval = faminc i.region), vce(robust)

* LIML with ivreg2
ivreg2 rent pcturban (hsngval = faminc i.region), liml robust

* Fuller estimator (bias-corrected LIML, Fuller k=1 or k=4)
ivreg2 rent pcturban (hsngval = faminc i.region), fuller(1) robust

* Compare 2SLS and LIML estimates
* If they differ substantially, instruments may be weak
quietly ivreg2 rent pcturban (hsngval = faminc i.region), robust
estimates store iv_2sls
quietly ivreg2 rent pcturban (hsngval = faminc i.region), liml robust
estimates store iv_liml
estimates table iv_2sls iv_liml, se stats(N r2)

6.12 Reporting IV Results: What Reviewers Expect

Top journals in economics, management, and operations research have converging expectations for how IV results should be reported. Below is a checklist for a complete IV presentation.

IV Reporting Checklist for Publication (1) Economic argument for the exclusion restriction, not just statistical tests. (2) First-stage results in a table (coefficient, SE, F-statistic on excluded instruments). (3) Kleibergen-Paap F with Stock-Yogo critical values (if heteroskedastic or clustered SEs). (4) Underidentification test (KP LM statistic). (5) Overidentification test (Hansen J) if overidentified. (6) Endogeneity test (Durbin-Wu-Hausman). (7) Comparison with OLS to discuss the direction and magnitude of bias. (8) Sensitivity: LIML or Fuller estimates alongside 2SLS if first-stage F is below 20.

Exercise 6.1

Using webuse hsng2, estimate the effect of hsngval on rent using 2SLS with faminc as the instrument. Report the first-stage F-statistic. Is faminc a strong instrument by the Staiger-Stock criterion?

Exercise 6.2

Re-estimate the model from Exercise 6.1 using ivreg2 with the robust and first options. Compare the Kleibergen-Paap F-statistic to the conventional F-statistic. Run estat endogenous to test whether IV is necessary.

Exercise 6.3

Using webuse hsng2, estimate the model from Exercise 6.1 with both 2SLS and LIML. Use estimates store and estimates table to compare the point estimates side by side. If the LIML and 2SLS estimates diverge, what does that tell you about instrument strength?

Exercise 6.4

Implement a control function approach manually: (1) regress hsngval on faminc, i.region, and pcturban, and save the residuals; (2) include those residuals in a regression of rent on hsngval and pcturban. Is the residual significant (endogeneity test)? Compare the coefficient on hsngval to your 2SLS estimate. Then write a bootstrap program to obtain correct standard errors for the two-step procedure.

External Resources

Key Takeaways

← Chapter 5: Panel Data Methods Chapter 7: Limited Dependent Variables →