feat: add LWDiD estimator (Lee & Wooldridge 2025, 2026)#588
feat: add LWDiD estimator (Lee & Wooldridge 2025, 2026)#588gorgeousfish wants to merge 1 commit into
Conversation
Native implementation of rolling-transformation DiD: - Transformations: demean, detrend, demeanq, detrendq - Estimation: RA, IPW, IPWRA, PSM - Inference: classical, HC0-HC4, cluster-robust, t-distribution - Designs: common timing + staggered adoption - Advanced: wild cluster bootstrap, randomization inference - Diagnostics: parallel trends, sensitivity, clustering - Visualization: cohort trends, event study, sensitivity plots Numerical equivalence validated against lwdid-py (tolerance ≤1e-10 for RA paths) and paper Tables 3-4 (LW 2026) reproduced exactly. Zero new runtime dependencies (uses existing numpy/pandas/scipy). References: - Lee, S.J. & Wooldridge, J.M. (2025). SSRN 4516518. - Lee, S.J. & Wooldridge, J.M. (2026). SSRN 5325686.
9712477 to
8c5ccce
Compare
|
@gorgeousfish Thanks for this - it's a serious, well-prepared contribution, and the method is a great That said, before I do a full review, one thing to clear first: licensing. This implementation
|
|
Thanks for the quick response and the positive signal. On licensing: Yes, I hold the copyright to all code in this PR. It's my own independent implementation, not derived from Lee & Wooldridge's Stata lwdid package or any third-party source. I'm contributing it under diff-diff's MIT license. On datasets: smoking.csv: Abadie, Diamond & Hainmueller (2010), California tobacco control program. Publicly available, widely redistributed in academic packages. castle.csv: Cheng & Hoekstra (2013) / Cunningham (2021), Castle Doctrine laws. Publicly available. walmart.csv: county-level panel from Brown & Butts (2025, Journal of Econometrics), constructed from County Business Patterns (CBP) data. Publicly available government statistical data. |
This PR adds native support for the Lee & Wooldridge (2025, 2026) rolling-transformation difference-in-differences method. The approach works by applying unit-specific time-series transformations (demeaning or detrending) to panel outcomes before treatment, converting the panel DiD problem into a standard cross-sectional one. Once transformed, any treatment-effect estimator — regression adjustment, inverse probability weighting, doubly robust, or propensity-score matching — can be applied directly to the cross-section. The method handles both common-timing and staggered-adoption designs with flexible control-group selection.
The implementation lives entirely within
diff_diff/lwdid*.py(9 modules) and introduces zero new runtime dependencies — it reuses the existingsolve_ols,solve_logit, andsafe_inferenceinfrastructure. The IPW and IPWRA standard errors use the full semiparametric influence function with propensity-score and outcome-model correction terms, matching the variance formula in the authors' Stata package.Numerical correctness has been validated against the reference lwdid Python package across all supported configurations. The RA path achieves machine-precision agreement (≤1e-10), and IPW/IPWRA paths agree to within 1%. The California Proposition 99 results from Table 3 of LW (2026) are reproduced exactly: demeaning ATT = −0.422, detrending ATT = −0.227.
Beyond core estimation, the PR includes wild cluster bootstrap inference (Rademacher/Mammen/Webb), Fisher randomization inference, parallel-trends pre-testing, sensitivity analysis, clustering-level diagnostics, and visualization methods. A tutorial notebook walks through the full workflow on the papers' empirical datasets (California smoking data, Castle Doctrine laws, Walmart county-level entry).
Methodology references (required if estimator / math changes)
Validation
docs/tutorials/26_lwdid.ipynb) reproduces Tables 3–4 from LW (2026) on California Proposition 99 data. ATT values match published results to 0.04% precision.Security / privacy