Logit, Probit, Tobit, Count Models, Selection Models, and Multinomial Models
When the dependent variable is binary (0/1), linear probability models (LPM) are simple but can produce predictions outside [0,1]. Logit and probit address this by using nonlinear link functions.
webuse lbw, clear describe * Linear Probability Model (for comparison) regress low age lwt i.race smoke, vce(robust) * Logit model logit low age lwt i.race smoke, vce(robust) * Probit model probit low age lwt i.race smoke, vce(robust)
Raw logit/probit coefficients give the change in log-odds (logit) or the z-score (probit) for a unit change in X. These are difficult to interpret directly. Use marginal effects instead.
* Odds ratios (logit only) logit low age lwt i.race smoke, or * Average marginal effects (AME) logit low age lwt i.race smoke margins, dydx(*) post * Marginal effects at the mean (MEM) logit low age lwt i.race smoke margins, dydx(*) atmeans
margins, dydx(*) at(smoke=1 age=30) instead.
Interpreting interactions in nonlinear models is more complex than in linear models. The cross-partial derivative is not simply the coefficient on the interaction term. Use margins to compute the correct interaction effect.
* Logit with interaction logit low c.age##i.smoke lwt i.race * Wrong: just looking at the interaction coefficient * Right: compute the AME of smoke at different ages margins, dydx(smoke) at(age=(20(5)40)) marginsplot, title("Marginal Effect of Smoking by Age") /// ytitle("dPr(Low Birth Weight)/dSmoke") /// yline(0, lcolor(red) lpattern(dash)) * Second differences: the true "interaction effect" in nonlinear models margins, dydx(smoke) at(age=(20 40)) contrast(atcontrast(r))
margins rather than interpreting interaction term coefficients directly. This remains one of the most common errors in applied health and management research.
You can compute predicted probabilities at specific covariate values to make results more tangible for your audience. Margins plots are the standard visualization for nonlinear model results in published papers.
logit low age lwt i.race smoke * Predicted probability for each observation predict phat, pr * Predicted probability at specific values margins, at(smoke=(0 1) age=(20 30 40)) vsquish * Plot predicted probabilities margins, at(age=(15(5)45)) over(smoke) marginsplot, title("Predicted Probability of Low Birth Weight") /// ytitle("Pr(Low Birth Weight)") xtitle("Mother's Age")
* ─── Publication-Quality Margins Plot ─── logit low c.age##i.race lwt smoke * Predicted probabilities across age for each race category margins race, at(age=(18(2)42)) vsquish * Customized marginsplot marginsplot, /// title("Predicted Probability of Low Birth Weight", size(medium)) /// ytitle("Pr(Low Birth Weight)") xtitle("Mother's Age") /// legend(order(1 "White" 2 "Black" 3 "Other") rows(1) pos(6)) /// plot1opts(lcolor(navy) mcolor(navy)) /// plot2opts(lcolor(cranberry) mcolor(cranberry)) /// plot3opts(lcolor(forest_green) mcolor(forest_green)) /// ci1opts(color(navy%20)) ci2opts(color(cranberry%20)) ci3opts(color(forest_green%20)) /// graphregion(color(white)) scheme(s2color) * Export for manuscript graph export "margins_plot.png", replace width(1200)
When the dependent variable is censored (e.g., hours worked cannot be negative, expenditure data piles up at zero), the tobit model accounts for this truncation.
webuse womenwk, clear * Tobit with left-censoring at zero tobit hours age education married children, ll(0) * Marginal effects (on the latent variable) margins, dydx(*) predict(ystar(0,.)) * Marginal effects (on the observed, censored outcome) margins, dydx(*) predict(e(0,.))
predict(ystar(0,.)) gives the effect on the latent (uncensored) variable y*; (2) predict(e(0,.)) gives the effect on E[y | y > 0], the expected value conditional on being uncensored; (3) predict(pr(0,.)) gives the effect on the probability of being uncensored. In most applications, the conditional expectation e(0,.) is the quantity of interest. Reporting the wrong one is a common error.
When the dependent variable is a non-negative count (e.g., number of hospital admissions, number of defects, number of patents), use Poisson or negative binomial regression rather than OLS. These models use a log link function and ensure predicted counts are non-negative.
* ─── Poisson Regression ─── webuse dollhill3, clear * Basic Poisson poisson deaths smokes i.agecat, exposure(pyears) irr * Incidence Rate Ratios (IRR): exponentiated coefficients * IRR of 1.5 means the rate is 50% higher for a unit increase in X * Robust SEs (quasi-Poisson: relaxes the mean=variance assumption) poisson deaths smokes i.agecat, exposure(pyears) irr vce(robust) * ─── Negative Binomial (for overdispersion) ─── * When Var(Y) > E(Y), Poisson is too restrictive nbreg deaths smokes i.agecat, exposure(pyears) irr * Test for overdispersion (alpha = 0 => Poisson is adequate) * The LR test at the bottom of nbreg output tests alpha = 0 * ─── Marginal Effects for Count Models ─── poisson deaths smokes i.agecat, exposure(pyears) margins, dydx(*) predict(n) // marginal effect on predicted count
When the data has excess zeros beyond what Poisson or negative binomial can explain (e.g., many people report zero doctor visits, but among those who visit, the count follows a standard distribution), zero-inflated models are appropriate.
* Zero-inflated Poisson zip doctor_visits age income chronic, /// inflate(age income distance_to_clinic) vuong * Zero-inflated Negative Binomial zinb doctor_visits age income chronic, /// inflate(age income distance_to_clinic) * Vuong test: ZIP vs. standard Poisson * Significant positive z => ZIP preferred over Poisson * Marginal effects zip doctor_visits age income chronic, inflate(age income) margins, dydx(*) predict(n) margins, dydx(*) predict(pr(0)) // effect on Pr(Y=0)
nbreg is significantly different from zero, consider negative binomial. If there are excess zeros driven by a qualitatively different process (e.g., "never users" vs. "potential users who happen to have zero counts"), consider zero-inflated models. Wooldridge (2010) recommends Poisson with robust SEs as a default because it only requires correct specification of the conditional mean (not the full distribution).
When the dependent variable has ordered categories (e.g., satisfaction: low/medium/high), use ordered logit or probit.
webuse fullauto, clear * Ordered logit ologit rep77 foreign length mpg * Ordered probit oprobit rep77 foreign length mpg * Predicted probabilities for each category margins, predict(outcome(1)) predict(outcome(2)) predict(outcome(3)) /// predict(outcome(4)) predict(outcome(5))
ssc install brant, replace followed by brant after ologit. If the assumption is violated, consider generalized ordered logit (gologit2) or multinomial logit instead.
When the dependent variable is categorical without a natural ordering (e.g., mode of transportation: car, bus, train), use multinomial logit. The model estimates separate coefficients for each category relative to a base category.
webuse sysdsn1, clear * Multinomial logit mlogit insure age male nonwhite, baseoutcome(1) * Relative risk ratios (exponentiated coefficients) mlogit insure age male nonwhite, baseoutcome(1) rrr * Average marginal effects margins, dydx(*) predict(outcome(1)) margins, dydx(*) predict(outcome(2)) margins, dydx(*) predict(outcome(3))
Raw multinomial logit coefficients are relative to a base category and difficult to interpret. Margins provide the change in predicted probability for each category, which is what readers actually want to know.
* ─── Complete Marginal Effects Table ─── mlogit insure age male nonwhite, baseoutcome(1) * Marginal effects for all outcomes in one table margins, dydx(*) predict(outcome(1)) post estimates store me_1 quietly mlogit insure age male nonwhite, baseoutcome(1) margins, dydx(*) predict(outcome(2)) post estimates store me_2 quietly mlogit insure age male nonwhite, baseoutcome(1) margins, dydx(*) predict(outcome(3)) post estimates store me_3 estimates table me_1 me_2 me_3, se * Note: Marginal effects across all categories sum to zero * (a unit increase in X redistributes probability among categories) * ─── IIA Test (Hausman-McFadden) ─── * Tests whether removing a category changes the remaining coefficients mlogit insure age male nonwhite, baseoutcome(1) estimates store full_model * Re-estimate excluding one category mlogit insure age male nonwhite if insure != 3, baseoutcome(1) estimates store restricted hausman restricted full_model, alleqs constant
When the outcome variable is observed only for a non-random subsample, standard regression suffers from selection bias. The Heckman selection model (also called the Heckit) corrects this by modeling the selection process jointly with the outcome.
webuse womenwk, clear * Two-step Heckman (Heckit) heckman wage education age, /// select(married children education age) twostep * Full MLE estimation (more efficient if correctly specified) heckman wage education age, /// select(married children education age) * Key output: lambda (inverse Mills ratio) * If lambda is significant => selection bias present * rho = correlation between selection and outcome errors * sigma = SD of the outcome error * lambda = rho * sigma * Predicted wages (corrected for selection) predict wage_hat, ycond // E[wage | selected] predict wage_uncon, yexpected // E[wage] (unconditional) predict psel, psel // Pr(selected) predict mills, mills // inverse Mills ratio
married and children affect labor force participation but are excluded from the wage equation. Top journals require you to justify this exclusion economically.
logit low age lwt i.race smoke * Classification table estat classification * ROC curve and AUC lroc, title("ROC Curve") * Goodness-of-fit (Hosmer-Lemeshow) estat gof, group(10) * Information criteria for model comparison estat ic
* ─── Pseudo R-squared ─── * McFadden's R2 = 1 - (ll_full / ll_null) * Reported automatically by logit/probit * Typical values are lower than OLS R2 (0.2-0.4 is often "good") * ─── Likelihood Ratio Test (nested models) ─── quietly logit low age lwt smoke estimates store m_restricted quietly logit low age lwt i.race smoke estimates store m_full lrtest m_restricted m_full * ─── AIC/BIC for Non-Nested Comparison ─── quietly logit low age lwt i.race smoke estimates store logit_model quietly probit low age lwt i.race smoke estimates store probit_model estimates table logit_model probit_model, stats(aic bic ll N) * ─── Percent Correctly Predicted ─── logit low age lwt i.race smoke predict phat, pr gen y_hat = (phat >= 0.5) tab low y_hat * Be careful: with unbalanced data (e.g., 90% zeros), * always predicting 0 gives 90% accuracy but is useless
Using webuse lbw, estimate a logit model of low on age, lwt, smoke, and i.race. Compute average marginal effects with margins, dydx(*). What is the marginal effect of smoking on the probability of low birth weight?
Using webuse fullauto, estimate an ordered logit of rep77 on foreign, length, and mpg. Use margins to predict the probability of each outcome category for foreign vs. domestic cars. Create a marginsplot.
Using webuse lbw, estimate a logit model of low on c.age##i.smoke, lwt, and i.race. Use margins, dydx(smoke) at(age=(20(5)40)) to compute the marginal effect of smoking at different ages. Create a marginsplot. Does the effect of smoking change with age? Now verify your interpretation by computing the "second difference" using margins, dydx(smoke) at(age=(20 40)) contrast(atcontrast(r)).
Using webuse womenwk, estimate a Heckman selection model of wage on education and age, with the selection equation including married, children, education, and age. Compare the two-step (twostep) and MLE estimates. Is lambda (the inverse Mills ratio) significant? What does the sign of rho tell you about the selection process?
margins computation; the interaction coefficient alone is misleading.