Chapter 8: Time Series & Forecasting

ARIMA, VAR, Unit Roots, Cointegration, ARCH/GARCH, and Forecast Evaluation

8.1 Declaring Time Series Data

Before using time-series operators and estimation commands, declare the time variable with tsset.

webuse air2, clear

* Declare time variable
tsset t

* For calendar data (monthly, quarterly)
webuse gnp96, clear
tsset date, quarterly

* View time-series structure
tsreport, list

8.2 Time-Series Operators

Stata provides operators for lags, leads, and differences that work directly in estimation commands.

webuse gnp96, clear
tsset date, quarterly

* L. = lag operator
gen gnp_lag1 = L.gnp96       // one-quarter lag
gen gnp_lag4 = L4.gnp96      // four-quarter lag

* D. = first difference operator
gen d_gnp = D.gnp96          // gnp96 - L.gnp96

* F. = lead (forward) operator
gen gnp_lead = F.gnp96

* Use operators directly in regressions
regress gnp96 L.gnp96 L2.gnp96

* Percentage change (growth rate)
gen gnp_growth = D.gnp96 / L.gnp96 * 100

8.3 Unit Root Tests

Many time series are nonstationary (they have trends or unit roots). Regressing one nonstationary series on another can produce spurious results. Unit root tests determine whether differencing is needed.

* Augmented Dickey-Fuller test
dfuller gnp96, trend lags(4)
* H0: unit root (nonstationary)
* Reject => series is stationary

* Phillips-Perron test (robust to serial correlation)
pperron gnp96, trend lags(4)

* Test in first differences
dfuller D.gnp96, lags(4)
* If level is nonstationary but first difference is stationary
* => the series is integrated of order 1, I(1)

Stationarity Matters Use the series in levels if stationary (I(0)). If the series is I(1), work with first differences. If two I(1) series are cointegrated, you can estimate a Vector Error Correction Model (VECM) instead of differencing.

8.3.1 KPSS Test: Reversing the Null

The ADF and Phillips-Perron tests have stationarity as the alternative hypothesis, meaning a failure to reject leaves you uncertain. The KPSS test reverses this: H0 is stationarity. Using both tests together provides a more complete picture.

* Install KPSS test: ssc install kpss, replace
kpss gnp96, maxlag(4)
* H0: series is stationary
* Reject => series is nonstationary

* ─── Confirmatory Testing Strategy ───
* Run both ADF and KPSS:
*   ADF: fails to reject (unit root) + KPSS: rejects (nonstationary)
*        => Strong evidence for I(1)
*   ADF: rejects (stationary) + KPSS: fails to reject (stationary)
*        => Strong evidence for I(0)
*   Both reject or both fail to reject => ambiguous, need more data

* ADF with lag selection via information criteria
dfuller gnp96, trend lags(4)
dfuller gnp96, trend lags(8)
* Choose lag length where further lags do not change the conclusion

Unit Root Tests Have Low Power ADF and Phillips-Perron tests have notoriously low power against near-unit-root alternatives. A series with an AR coefficient of 0.95 (stationary but persistent) will often fail to reject the unit root null. With short samples (T < 100), treat unit root test results with caution. The KPSS test has its own size distortions. Always complement statistical tests with visual inspection of the time series plot and ACF/PACF.

8.4 ARIMA Models

ARIMA(p,d,q) combines autoregressive (AR) terms, differencing (I), and moving average (MA) terms. It is the standard univariate forecasting model.

webuse air2, clear
tsset t

* Examine autocorrelation structure
ac air, lags(24) title("Autocorrelation: Air Passengers")
pac air, lags(24) title("Partial Autocorrelation")

* ARIMA(1,1,1) estimation
arima air, arima(1,1,1)

* ARIMA with seasonal component
arima air, arima(1,1,1) sarima(1,1,1,12)

* Forecast
predict air_hat, xb
tsline air air_hat, title("Actual vs. Fitted") ///
    legend(order(1 "Actual" 2 "Fitted"))

Box-Jenkins Model Identification The classic procedure for identifying ARIMA order: (1) Plot the ACF and PACF of the (possibly differenced) series. (2) A sharp cutoff in the PACF at lag p with a slowly decaying ACF suggests AR(p). (3) A sharp cutoff in the ACF at lag q with a slowly decaying PACF suggests MA(q). (4) Both decaying gradually suggests ARMA(p,q). (5) Compare candidate models using AIC/BIC. In practice, try several specifications and let information criteria guide the choice.

8.5 Cointegration

When two or more I(1) series share a common stochastic trend, they are cointegrated. A linear combination of them is stationary, representing a long-run equilibrium. Differencing cointegrated series discards this long-run information.

8.5.1 Engle-Granger Two-Step Test

webuse wpi1, clear
tsset t

* Step 1: Regress one I(1) variable on the other(s)
regress wpi t
predict resid_eg, residuals

* Step 2: Test the residuals for a unit root
dfuller resid_eg, notrend lags(4)
* IMPORTANT: Use Engle-Granger critical values, not standard ADF
* (critical values are more negative because residuals are estimated)
* Install: ssc install egranger, replace
egranger wpi t, lags(4)

8.5.2 Johansen Cointegration Test

* Johansen test works with multiple variables simultaneously
* and can detect multiple cointegrating relationships

webuse lutkepohl2, clear
tsset qtr

* Johansen trace and max-eigenvalue tests
vecrank ln_inv ln_inc ln_consump, trend(constant) lags(2)
* Trace stat: tests H0: r <= k cointegrating relationships
* Max-eigen stat: tests H0: r = k vs. r = k+1
* Choose the rank where you first fail to reject

Cointegration and Error Correction If you find r cointegrating relationships among k variables, estimate a VECM with rank r. The VECM captures both the short-run dynamics (through lagged differences) and the long-run equilibrium (through the error correction term). The adjustment coefficient (alpha) measures how quickly deviations from equilibrium are corrected. A negative and significant alpha means the variable adjusts back toward equilibrium.

8.5.3 Vector Error Correction Model (VECM)

* Estimate VECM with 1 cointegrating relationship
vec ln_inv ln_inc ln_consump, rank(1) lags(2) trend(constant)

* Key output:
* - Cointegrating equation: the long-run relationship
* - Adjustment parameters (alpha): speed of adjustment
* - Short-run dynamics: coefficients on lagged differences

* Test for weak exogeneity (alpha = 0 for a variable)
* If alpha is insignificant, that variable does not adjust
* to the long-run equilibrium (it is the "driver")
vec ln_inv ln_inc ln_consump, rank(1) lags(2)
* Inspect the adjustment coefficients

8.6 VAR Models

Vector Autoregression (VAR) models capture the dynamic interdependencies among multiple time series. Each variable is regressed on its own lags and the lags of all other variables in the system.

webuse lutkepohl2, clear
tsset qtr

* Lag order selection
varsoc dln_inv dln_inc dln_consump, maxlag(8)

* Estimate VAR(2)
var dln_inv dln_inc dln_consump, lags(1/2)

* VAR stability condition
varstable, graph

8.7 Impulse Response Functions

IRFs trace the effect of a one-standard-deviation shock to one variable on the other variables over time.

* Create IRF file
irf create results, set(myirf) replace

* Plot orthogonalized IRF
irf graph oirf, impulse(dln_inc) response(dln_inv) ///
    title("Response of Investment to Income Shock")

* Forecast error variance decomposition
irf graph fevd, impulse(dln_inc) response(dln_inv)

8.7.1 Interpreting and Customizing IRFs

* ─── Multiple IRFs in One Graph ───
irf graph oirf, impulse(dln_inc) ///
    response(dln_inv dln_consump) ///
    title("Responses to Income Shock")

* ─── IRF Tables (for precise values) ───
irf table oirf, impulse(dln_inc) response(dln_inv) step(12)

* ─── Cumulative IRFs ───
* Shows total accumulated effect over time
irf graph coirf, impulse(dln_inc) response(dln_inv) ///
    title("Cumulative Response of Investment to Income Shock")

Ordering Matters for Orthogonalized IRFs Orthogonalized IRFs use a Cholesky decomposition, which imposes a recursive causal ordering among the variables. The first variable in the VAR can affect all others contemporaneously, the second can affect the third but not the first, and so on. This means your results are sensitive to variable ordering. Either justify your ordering with economic theory, or use structural VARs (SVAR) with theory-based restrictions. Generalized IRFs (irf graph girf) are invariant to ordering but have a different interpretation.

8.8 Granger Causality

Granger causality tests whether lags of one variable help predict another variable, beyond what the variable's own lags can predict. It is a test of predictive content, not true causality.

* After estimating VAR
vargranger

* Or test specific relationships
var dln_inv dln_inc, lags(1/2)
vargranger
* H0: dln_inc does not Granger-cause dln_inv
* Reject => lagged income helps predict investment

8.9 Structural Breaks

Structural breaks occur when the data-generating process changes at some point in time (e.g., a policy change, financial crisis, or regime shift). Ignoring structural breaks can lead to misleading unit root tests, biased parameter estimates, and poor forecasts.

webuse gnp96, clear
tsset date, quarterly

* ─── Known Break Date ───
* Chow test for a known break point
regress gnp96 L.gnp96 if date <= tq(1980q4)
estimates store pre_break
regress gnp96 L.gnp96 if date > tq(1980q4)
estimates store post_break

* ─── Unknown Break Date ───
* Quandt-Andrews / sup-Wald test for unknown break
regress gnp96 L.gnp96
estat sbsingle
* Reports the most likely break date and a p-value
* for the null of no structural break

* ─── Multiple Break Points ───
* Bai-Perron test for multiple breaks
* estat sbknown can test a specific date:
regress gnp96 L.gnp96
estat sbknown, break(tq(1980q4))

Structural Breaks and Unit Root Tests A structural break in the trend or intercept can cause ADF tests to fail to reject the unit root null even when the series is trend-stationary with a break. The Zivot-Andrews test (not built into Stata but available via ssc install zandrews) allows for an endogenously determined break point under the alternative hypothesis of stationarity with a break. Always plot your data before running unit root tests to look for obvious regime changes.

8.10 ARCH/GARCH Models for Volatility

Financial and economic time series often exhibit volatility clustering: periods of high volatility tend to be followed by more high volatility. ARCH/GARCH models capture this time-varying conditional variance.

* ─── Testing for ARCH Effects ───
webuse gnp96, clear
tsset date, quarterly

regress D.gnp96 L.D.gnp96
estat archlm, lags(1/4)
* H0: no ARCH effects. Reject => ARCH/GARCH needed

* ─── ARCH(1) Model ───
arch D.gnp96 L.D.gnp96, arch(1)

* ─── GARCH(1,1) Model ───
* The workhorse volatility model
arch D.gnp96 L.D.gnp96, arch(1) garch(1)

* Key output:
*   Mean equation: coefficients on regressors
*   Variance equation: alpha (ARCH) + beta (GARCH)
*   alpha + beta < 1 => volatility is mean-reverting
*   alpha + beta close to 1 => high volatility persistence

* ─── Extract Conditional Variance ───
predict cond_var, variance
gen cond_sd = sqrt(cond_var)
tsline cond_sd, title("Conditional Standard Deviation") ///
    ytitle("Conditional SD")

When to Use GARCH GARCH models are essential for financial returns data and risk management, but they also appear in operations management when modeling demand variability, service time variability, or quality metric fluctuations over time. The GARCH(1,1) model captures most of the volatility dynamics in practice. Higher-order GARCH models rarely improve fit. For asymmetric volatility effects (bad news increases volatility more than good news), consider EGARCH or GJR-GARCH: arch y x, earch(1) egarch(1).

8.11 Forecast Evaluation

Building a time series model without evaluating its out-of-sample forecast accuracy is incomplete. Forecast evaluation compares predicted values against actual outcomes using loss functions.

* ─── In-Sample Fit Measures ───
arima air, arima(1,1,1)
estat ic

* Compare models
quietly arima air, arima(1,1,0)
estimates store ar1

quietly arima air, arima(1,1,1)
estimates store arma11

estimates table ar1 arma11, stats(aic bic ll)

* ─── Out-of-Sample Forecast Evaluation ───
* Hold out the last 12 periods for evaluation
webuse air2, clear
tsset t

local T_total = _N
local T_train = `T_total' - 12

* Estimate on training sample
arima air if t <= `T_train', arima(1,1,1)

* Dynamic forecast from end of training sample
predict air_fc, dynamic(tm(`T_train'+1)) xb

* ─── Forecast Error Metrics ───
gen fc_error = air - air_fc if t > `T_train'
gen fc_error_sq = fc_error^2
gen fc_error_abs = abs(fc_error)
gen fc_error_pct = abs(fc_error / air) * 100

* RMSE: Root Mean Squared Error
summarize fc_error_sq if t > `T_train'
display "RMSE = " sqrt(r(mean))

* MAE: Mean Absolute Error
summarize fc_error_abs if t > `T_train'
display "MAE  = " r(mean)

* MAPE: Mean Absolute Percentage Error
summarize fc_error_pct if t > `T_train'
display "MAPE = " r(mean) "%"

Forecast Evaluation Best Practices (1) Always evaluate on a held-out test set, not in-sample. In-sample fit measures (R-squared, AIC) reward complexity and can overfit. (2) RMSE penalizes large errors more than MAE, so it matters whether you care about average errors or worst-case errors. (3) MAPE is scale-free and easy to communicate ("our forecast is off by 5% on average"), but it is undefined when the actual value is zero and penalizes underprediction more than overprediction. (4) For rolling or expanding window forecasts, compare your model against a naive benchmark (e.g., random walk: forecast = last observed value).

Exercise 8.1

Load webuse gnp96 and set the time variable. Run a Dickey-Fuller test on gnp96 in levels and first differences. Is the series I(0) or I(1)? Estimate an ARIMA(1,1,0) model and examine the residual autocorrelation with predict resid, residuals followed by ac resid.

Exercise 8.2

Using webuse lutkepohl2, estimate a VAR(2) with dln_inv, dln_inc, and dln_consump. Run vargranger. Which variables Granger-cause which? Generate impulse response functions and interpret the response of consumption to an income shock.

Exercise 8.3

Using webuse gnp96, run both an ADF test (dfuller gnp96, trend lags(4)) and a KPSS test (kpss gnp96, maxlag(4)) on the level of gnp96. Do both tests agree on the integration order? Now run estat sbsingle after a regression of gnp96 on L.gnp96 to check for a structural break. If a break is found, how might it affect your unit root test conclusion?

Exercise 8.4

Using webuse lutkepohl2, test whether ln_inv, ln_inc, and ln_consump (in levels) are cointegrated using vecrank. How many cointegrating relationships do you find? Estimate a VECM with the appropriate rank using vec. Interpret the adjustment coefficient: which variable(s) adjust to restore the long-run equilibrium after a shock?

External Resources

Stata Time-Series Reference Manual External
UCLA OARC: ARIMA in Stata External
Stata Manual: VAR External
Stata Manual: VEC (Cointegration) External
Stata Manual: ARCH/GARCH External

Key Takeaways

Always test for unit roots before modeling time series; nonstationary data requires differencing.
Use both ADF and KPSS tests together for a more reliable assessment of stationarity.
Use ACF/PACF plots to identify ARIMA order, and AIC/BIC for model selection.
Cointegrated I(1) series should be modeled with VECM, not VAR in differences, to preserve long-run information.
VAR models capture multivariate dynamics; use IRFs to interpret shock propagation, but justify Cholesky ordering.
GARCH models capture time-varying volatility; test for ARCH effects before modeling.
Granger causality tests predictive content, not structural causation.
Evaluate forecasts out of sample with RMSE, MAE, and MAPE; compare against naive benchmarks.
Check for structural breaks, which can invalidate unit root tests and destabilize parameter estimates.

← Chapter 7: Limited Dependent Variables Chapter 9: Causal Inference →