SpilloverDiD: ring-indicator spillover-aware DiD (Butts 2021) by igerber · Pull Request #446 · igerber/diff-diff

igerber · 2026-05-15T21:21:28Z

Summary

New standalone estimator SpilloverDiD (diff_diff/spillover.py) implementing two-stage Gardner DiD with ring-indicator covariates that identify both the direct effect on treated units (tau_total) and per-ring spillover effects on near-control units (delta_j). Handles non-staggered and Section 5 staggered timing in a single estimator.

Methodology references

Method: Butts (2021/2023) ring-indicator spillover-aware DiD; Gardner (2022) two-stage residualize-then-fit; reuses the Conley (1999) spatial-HAC vcov shipped in 3.3.3.
Paper / source links:
- Butts, K. (2023). Difference-in-Differences with Spatial Spillovers. arXiv:2105.03737v3 — https://arxiv.org/abs/2105.03737
- Gardner, J. (2022). Two-stage differences in differences. arXiv:2207.05943 — https://arxiv.org/abs/2207.05943
- Conley, T. G. (1999). GMM Estimation with Cross-Sectional Dependence. Journal of Econometrics 92(1), 1-45.
Intentional deviations from source:
- Stage-2 regressor uses the time-varying (1 - D_it) * Ring_{it,j} form (paper page 12's S_it = S_i * 1{t >= t_treat} notation; Section 5 Table 2's S^k_{it} / Ring^k_{it,j}). The literal unit-static reading of Equation 5 is algebraically rank-deficient under TWFE; only the time-varying form supports the paper's identification (Proposition 2.3). Documented in docs/methodology/REGISTRY.md § SpilloverDiD.
- Stage-1 subsample uses Butts' stricter Omega_0 = {D_it = 0 AND S_it = 0} (untreated AND unexposed) rather than TwoStageDiD's {D_it = 0} (untreated only). Prevents spillover-contaminated near-controls from biasing the time FE.
- Omega_0 row-level identification check: period strict (hard ValueError — dropping a period removes all units' cross-time identification), unit warn-and-drop (mirrors TwoStageDiD's always-treated convention; the downstream finite-mask path excludes the affected rows from stage 2).
- Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (planned follow-up extends two_stage.py::_compute_gmm_variance). Documented in REGISTRY + TODO.md.
- Far-away control identification uses current-period untreated status (D_it = 0) rather than never-treated-only, so all-eventually-treated staggered designs can identify the counterfactual via not-yet-treated far-away rows.
No R parity benchmark: did2s implements Gardner two-stage without rings; no published R/Stata software implements the Butts ring estimator. Correctness anchored on (a) 20-seed deterministic regression test pinning SpilloverDiD.att against direct single-stage TWFE ring regression at atol=1e-10 (the Gardner identity equivalence for non-staggered timing — empirically bit-identical, so the reported non-staggered tau_total IS the Butts Eqs. 4-6 estimator), (b) 50-seed Monte Carlo identification recovery on synthetic Butts-Assumption-satisfying DGPs (+ 200-seed @pytest.mark.slow variant), and (c) Conley sparse-vs-dense parity inherited from the 3.3.3 release.

Validation

Tests added (139 total in tests/test_spillover.py):
- Ring-construction primitives (pairwise haversine/euclidean, callable metrics, static + staggered nearest-treated distance, sparse kd-tree path)
- Validators (rings sort/non-negative/start-at-0, d_bar consistency, treatment XOR first_treat, exact {0,1} treatment, NaN rejection on cluster/unit/time/first_treat/treatment, balanced panel, duplicate cells, non-absorbing treatment, conley_coords within-unit-constant, callable metric self-distance contract, hc2/hc2_bm rejected, NaN in outcome rejected, mixed-encoding time collapse caught)
- Fit integration (non-staggered + staggered, raw-data invariant, Conley wiring, baseline-treated unit recognition + warn-and-drop, partial-unsupported-units warn-and-drop, period-level hard ValueError)
- Identification (50-seed MC recovery of tau_total and delta_j; 200-seed slow variant)
- Gardner identity (20-seed bit-identity vs single-stage TWFE ring regression at atol=1e-10)
- Coefficients-vs-vcov column alignment + ATT alias
DGP factories (tests/_dgp_utils.py): generate_butts_nonstaggered_dgp / generate_butts_staggered_dgp satisfy Butts Assumptions 1/3/5/7 by construction.
No tutorial in this PR: T22 (Butts TVA tutorial) is deferred to a follow-up notebook PR per user-confirmed Wave B scope.
Documentation: REGISTRY section adjacent to ConleySpatialHAC, docs/api/spillover.rst, diff_diff/guides/llms.txt + llms-full.txt, docs/references.rst, docs/doc-deps.yaml, README catalog entry, TODO.md rows for deferred follow-ups.

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

igerber · 2026-05-15T21:44:59Z

/ai-review

igerber · 2026-05-15T21:51:17Z

/ai-review

github-actions · 2026-05-15T21:57:35Z

🔁 AI review rerun (requested by @igerber)

Head SHA: fc48d69a654d4206da0cf3e5312d2a61ad67d321

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Cross-checking against docs/methodology/REGISTRY.md did not surface an undocumented methodology deviation. The PR’s main departures from a literal Butts/Gardner implementation are explicitly documented in docs/methodology/REGISTRY.md:L2952-L2985.
The new estimator avoids the project’s inference anti-patterns: ATT and per-ring outputs both route through safe_inference() rather than inline t_stat = coef / se logic (diff_diff/spillover.py:L1745-L1757).
Input validation is strong on the edge cases called out in prior reviews: duplicate cells, unbalanced panels, NaN treatment/IDs/clusters, reversible treatment, unsupported hc2/hc2_bm, and missing far-away controls (diff_diff/spillover.py:L947-L1284).
Minor follow-ups only: SE extraction should clamp negative vcov diagonals for consistency, and the staggered-distance helper still has a small avoidable performance hotspot.
I could not execute the tests in this sandbox because the available Python environment does not have numpy/pandas; this is a static review only.

Methodology

P3 — Informational: the material methodology deviations are documented rather than hidden: time-varying ring regressor, stricter Omega_0, unit warn-and-drop, and omission of Gardner’s stage-1 GMM correction (docs/methodology/REGISTRY.md:L2952-L2985, TODO.md:L121-L127). Impact: no approval blocker under the repo policy; users still need to treat current analytical SEs as downward-biased until the tracked follow-up lands. Concrete fix: none required for this PR.

Code Quality

P3 — SpilloverDiD.fit() takes np.sqrt(vcov[i, i]) directly when building ATT and per-ring SEs (diff_diff/spillover.py:L1740-L1757). Sibling estimators usually clamp max(var, 0.0) first (diff_diff/two_stage.py:L1183-L1183, diff_diff/estimators.py:L606-L606, diff_diff/stacked_did.py:L515-L515). Impact: a numerically tiny negative diagonal from an indefinite Conley or near-singular sandwich can blank out the entire inference row with NaN instead of preserving a stable zero-floor SE. Concrete fix: wrap each diagonal in max(v, 0.0) before sqrt.

Performance

P3 — The staggered distance helper recomputes row_pos inside every cohort loop (diff_diff/spillover.py:L400-L425). Impact: avoidable extra O(n_rows × n_cohorts) Python work on already-dense staggered fits. Concrete fix: precompute row_pos once before the for onset in unique_onsets loop and reuse it.

Maintainability

No findings.

Tech Debt

P3 — The staggered helper says the sparse kd-tree branch is “queued as a follow-up — see TODO.md” (diff_diff/spillover.py:L365-L369), but the new spillover TODO block does not actually track that item (TODO.md:L121-L127). Impact: the deferred scaling work is easy to lose from the project’s review backlog. Concrete fix: add a TODO.md row for the sparse staggered ring-distance path, or remove the explicit TODO cross-reference from the code/docs.

Security

No findings.

Documentation/Tests

No findings on the diff itself. The new registry entry and the spillover test surface are both thorough (docs/methodology/REGISTRY.md:L2940-L3012, tests/test_spillover.py). Review limitation: I was not able to run the tests in this sandbox because the Python environment here is missing numpy/pandas.

github-actions · 2026-05-15T22:13:42Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 6f3ade490449ab5cec7ae2250ad0e6249a526ea4

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Previous AI re-review items were addressed: the stage-2 SE path now clamps negative vcov diagonals before sqrt, the staggered helper precomputes row_pos once, and the sparse staggered path is now explicitly tracked in TODO.md (diff_diff/spillover.py:L400-L402, diff_diff/spillover.py:L1742-L1764, TODO.md:L121-L128).
At a high level, the estimator matches the cited Butts + Gardner synthesis: Butts frames the DiD target as jointly identifying direct and spillover effects under spatial spillovers, and Gardner supplies the two-stage DiD backbone reused here. (arxiv.org)
The major implementation departures from a literal paper reading are documented in the Methodology Registry, so under repo policy they are informational rather than blocking (docs/methodology/REGISTRY.md:L2952-L3010, TODO.md:L121-L128).
I did not find a new P0/P1 issue in control-group composition, NaN-safe inference, or public-parameter propagation on the shipped surface.
Remaining issues are minor: one registry gap for the public anticipation extension, one constructor-validation consistency issue, and one validation overclaim in REGISTRY.md.
Review limitation: I could not execute tests/test_spillover.py here because the sandbox Python environment lacks both numpy and pytest.

Methodology

No unmitigated source-material mismatch found. The time-varying ring regressor, stricter Omega_0, and omission of Gardner’s stage-1 GMM correction are explicitly recorded in the registry and TODO, so they are non-blocking per repo policy. At a high level, that is consistent with the cited Butts/Gardner setup. (arxiv.org)

P3 Impact: anticipation is a public methodology-affecting parameter, and the implementation clearly shifts both treatment timing and spillover timing with it, but the SpilloverDiD section of the Methodology Registry does not currently document that non-default estimand. Concrete fix: add a short **Note:** to docs/methodology/REGISTRY.md:L2940-L3010 mirroring the semantics already implemented in diff_diff/spillover.py:L850-L854 and diff_diff/spillover.py:L1441-L1544.

Code Quality

P3 Impact: SpilloverDiD.__init__ does not validate rank_deficient_action, unlike peer estimators, so a bad value fails later inside solve_ols instead of at the API boundary. Concrete fix: mirror the constructor guard used in diff_diff/two_stage.py:L137-L153 and diff_diff/stacked_did.py:L130-L148 inside diff_diff/spillover.py:L885-L916.

Performance

No findings. The remaining spillover-specific hotspot, the dense staggered nearest-treated-distance path, is now properly tracked in TODO.md:L128-L128.

Maintainability

No findings.

Tech Debt

P3 Impact: remaining deferrals for Gardner GMM correction, event-study mode, survey integration, and the sparse staggered path are properly tracked, so they are acceptable deferred work rather than blockers for this PR. Concrete fix: none required in this PR (TODO.md:L121-L128).

Security

No findings.

Documentation/Tests

P3 Impact: docs/methodology/REGISTRY.md:L2972-L2972 says correctness is anchored on a “reduce-to-TWFE / reduce-to-TwoStageDiD” limit, but the changed spillover tests only pin Monte Carlo recovery and single-stage FE identity; I do not see a TwoStageDiD reduction test in tests/test_spillover.py:L757-L806 or tests/test_spillover.py:L2271-L2420. Concrete fix: either add the missing TwoStageDiD reduction regression test or narrow the registry wording to the validations that actually shipped.

github-actions · 2026-05-15T22:40:43Z

🔁 AI review rerun (requested by @igerber)

Head SHA: b23aca4075acfc947ea3b0496521f190bca980c0

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Previous re-review items are addressed: anticipation is now documented and serialized on results, rank_deficient_action is validated in __init__, and the registry no longer claims a shipped reduce-to-TwoStageDiD test (docs/methodology/REGISTRY.md:2972-2974, diff_diff/spillover.py:885-906, diff_diff/results.py:445-462).
Methodology alignment is acceptable: Butts’ spillover setup uses untreated-and-unexposed rows for the first-stage residualization in the staggered case, and Gardner’s two-stage framework requires inference that accounts for the generated first-stage regressor. The PR’s omission of that GMM correction is explicitly documented and tracked, so it is informational rather than blocking. citeturn3view0turn3view1turn5view0
I did not find a new control-group, NaN-inference, or parameter-propagation issue in the changed spillover code.
Remaining issues are documentation-only: one registry note still overstates the shipped validation surface, and one public guide still advertises covariates without warning that the estimator rejects it.
Review limitation: I could not execute tests/test_spillover.py here because the sandbox lacks both pytest and numpy.

Methodology

P3 Impact: Stage-2 SEs still omit Gardner’s first-stage uncertainty correction, which would be a methodology defect if undocumented. Here it is explicitly disclosed in the registry and tracked in TODO.md, so under repo policy it is non-blocking informational debt. Concrete fix: none required in this PR; keep the existing follow-up entry at TODO.md:121. citeturn3view0turn5view0
No unmitigated P0/P1 findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

P3 Impact: the deferred spillover items that remain after this PR are properly tracked in TODO.md (GMM correction, event_study, survey_design, count rings, data-driven d_bar, sparse staggered distance), so they are acceptable deferred work rather than blockers. Concrete fix: none required in this PR (TODO.md:121-128).

Security

No findings.

Documentation/Tests

P3 Impact: REGISTRY.md says the synthetic-DGP Monte Carlo anchor includes both 50-seed and 200-seed recovery of tau_total and delta_1, but the shipped tests only pin delta_1 on the non-staggered DGP; the staggered MC currently checks tau_total only. That overstates the validation actually merged. Concrete fix: either add staggered delta_1 recovery tests (and a slow variant) or narrow the registry note to the validation that is present (docs/methodology/REGISTRY.md:2974, tests/test_spillover.py:764-810, tests/test_spillover.py:812-822).
P3 Impact: the LLM guide exposes fit(..., covariates=...) in the public signature but does not list that covariates currently raises NotImplementedError, even though the estimator rejects it in code. That makes the documented surface broader than the real one. Concrete fix: add an explicit covariates= limitation bullet in the spillover guide section, at minimum in diff_diff/guides/llms-full.txt (diff_diff/guides/llms-full.txt:488-507, diff_diff/spillover.py:1353-1363).

github-actions · 2026-05-15T22:55:16Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 9581b21cb586ab8a9e9a108d75baa2db4f4d7b51

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Previous re-review items are mostly addressed: anticipation is now carried on results serialization, rank_deficient_action is validated at construction, and the public LLM guide now explicitly says covariates= is unsupported. See diff_diff/results.py:L402-L462, diff_diff/spillover.py:L885-L906, and diff_diff/guides/llms-full.txt:L501-L508.
Methodology alignment is acceptable against the registry: the time-varying ring regressor, stricter Omega_0, current-period untreated far-away logic, and omitted Gardner GMM correction are all explicitly documented or tracked, so they are non-blocking in this PR. See docs/methodology/REGISTRY.md:L2952-L3012 and TODO.md:L121-L128.
I did not find a new control-group, NaN-inference, parameter-propagation, or undocumented methodology defect in the changed estimator code.
One previous P3 documentation issue remains in narrowed form: the registry/changelog still overstate the merged staggered Monte Carlo coverage.
Review limitation: I could not execute tests/test_spillover.py here because this sandbox does not have numpy, pandas, scipy, or pytest.

Methodology

P3 Impact: Stage-2 SEs still omit Gardner’s first-stage GMM uncertainty correction, but that deviation is explicitly documented in the methodology registry and tracked in TODO.md, so under repo policy it is informational rather than blocking. Concrete fix: none required in this PR; keep the existing follow-up tracked in docs/methodology/REGISTRY.md:L2985-L3012 and TODO.md:L121-L128.
No unmitigated P0/P1 findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

P3 Impact: The main deferred spillover items (event_study, survey_design, ring_method="count", data-driven d_bar, sparse staggered distance, GMM correction) are properly tracked in TODO.md, so they are acceptable deferrals rather than blockers. Concrete fix: none required in this PR. See TODO.md:L121-L128.

Security

No findings.

Documentation/Tests

P3 Impact: The validation narrative still overstates the shipped staggered Monte Carlo coverage. REGISTRY.md says the staggered anchor is a 50-seed default plus a 200-seed slow variant, and the changelog summarizes the identification MC in the same way, but the merged test file only contains a 30-seed staggered tau_total test and no slow staggered companion. Concrete fix: either add the promised staggered 50-seed and 200-seed tests, or narrow the docs/changelog to the 30-seed staggered tau_total test that actually exists. See docs/methodology/REGISTRY.md:L2974-L2974, CHANGELOG.md:L11-L11, tests/test_spillover.py:L796-L810, and tests/test_spillover.py:L812-L825.

github-actions · 2026-05-15T23:17:24Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 23cdf0ef20c06416e512132595d9b2be6c53b485

Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains in the new SpilloverDiD identification path.

Executive Summary

Previous re-review items look addressed: anticipation now round-trips on results serialization, rank_deficient_action is validated at construction, and the docs now match the shipped staggered MC coverage.
The estimator matches the documented registry deviations on the time-varying ring regressor, stricter Omega_0, current-period untreated far-away controls, and the omitted Gardner GMM correction.
P1: SpilloverDiD checks balancedness plus per-unit/per-period Omega_0 coverage, but it never checks whether the Omega_0 = {D_it = 0, S_it = 0} unit-time graph is actually connected. Some balanced panels will therefore pass validation even though stage-1 FE are not jointly identified, and tau_total / delta_j can be silently distorted.
The missing Gardner GMM correction remains P3 informational because it is explicitly documented in the methodology registry and tracked in TODO.md.
Review limitation: I could not execute the test suite here because pytest, numpy, pandas, and scipy are not installed in this environment.

Methodology

Severity P1
Impact: SpilloverDiD.fit() treats a balanced panel plus per-unit/per-period Omega_0 presence as sufficient support for the stage-1 FE model (diff_diff/spillover.py:1104, diff_diff/spillover.py:1563). That proxy is not enough. A balanced panel can still have a disconnected Omega_0 bipartite graph, in which case _iterative_fe_subset() only identifies FE up to component-specific constants while _residualize_butts() later combines unit FE from one component with time FE from another. That silently changes the residualized outcome used to estimate both tau_total and delta_j. The public docs currently describe exact graph-connectivity as a future relaxation of the balanced-panel gate (docs/api/spillover.rst:204, docs/methodology/REGISTRY.md:3009), but for this Gardner/Butts stage-1 design it is the actual live identification condition.
Concrete fix: after building omega_0_mask, construct the unit-time support graph on Omega_0 rows and raise ValueError unless all stage-2-supported rows lie in a single connected component. Add a regression test for a balanced-but-disconnected Omega_0 panel.
Severity P3
Impact: Stage-2 SEs still omit Gardner’s first-stage GMM uncertainty correction, but this deviation is explicitly documented and tracked (docs/methodology/REGISTRY.md:2987, TODO.md:121). Under repo policy this is informational, not blocking.
Concrete fix: none required in this PR.

Code Quality

No additional findings.

Performance

No additional findings.

Maintainability

No additional findings.

Tech Debt

Severity P3
Impact: The remaining spillover follow-ups (event_study, survey_design, ring_method="count", data-driven d_bar, sparse staggered distance path, GMM correction) are properly tracked in TODO.md (TODO.md:121).
Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

Severity P2
Impact: The current tests cover missing-period Omega_0 support and unbalanced panels (tests/test_spillover.py:1371, tests/test_spillover.py:2004), but they do not cover the balanced-but-disconnected Omega_0 case behind the P1 above. That leaves the identification bug unguarded in CI.
Concrete fix: add a fixture where every unit and every period has at least one Omega_0 row, yet the Omega_0 graph splits into two disconnected components, and assert that fit() raises with a connectivity-specific error.

Path to Approval

Add an explicit connected-component check for the Omega_0 unit-time graph in SpilloverDiD.fit() and fail fast on disconnected support.
Add a regression test for a balanced panel that passes the current unit/period/balanced-panel guards but still has disconnected Omega_0 support.
Update docs/api/spillover.rst, docs/methodology/REGISTRY.md, and the [Unreleased] changelog entry so they state the actual identification requirement: balancedness alone is not sufficient; connected Omega_0 support is required.

github-actions · 2026-05-15T23:39:00Z

🔁 AI review rerun (requested by @igerber)

Head SHA: c1ba56427cfc4592db51e087862356a13d6e5b8a

Overall Assessment

✅ Looks good — the prior P1 on disconnected Omega_0 support is resolved, and I did not find any new unmitigated P0/P1 issues in the changed spillover estimator path.

Executive Summary

The previous blocker is addressed: SpilloverDiD.fit() now explicitly checks connectedness of the supported Omega_0 unit-period graph before stage 1, and the failure mode is documented and tested (diff_diff/spillover.py:L682-L778, diff_diff/spillover.py:L1660-L1739, docs/methodology/REGISTRY.md:L2966-L2968, tests/test_spillover.py:L2101-L2208).
The implemented estimator structure matches the cited methodology: Butts’ staggered procedure estimates unit/time effects on untreated-and-unexposed observations and then regresses the residualized outcome on treatment and spillover dummies; Gardner likewise treats second-stage inference as generated-regressor inference. (kylebutts.com)
The remaining omission of Gardner’s first-stage GMM variance correction is still present, but it is explicitly documented in the Methodology Registry/API docs and tracked in TODO.md, so under the project rubric it is P3 informational rather than blocking (docs/methodology/REGISTRY.md:L2985-L3012, docs/api/spillover.rst:L177-L203, TODO.md:L121-L128). (kylebutts.com)
I did not identify a new silent correctness bug in control-group construction, ring weighting, NaN/inference handling, or parameter propagation in the changed files.
Review limitation: I could not execute pytest here because pytest is not installed in this environment.

Methodology

Severity P3 | Impact: Stage-2 SEs still omit Gardner’s first-stage GMM correction. That matters for inference in principle, but this deviation is explicitly documented and tracked, so it is mitigated under the review rubric rather than blocking (docs/methodology/REGISTRY.md:L2985-L3012, docs/api/spillover.rst:L177-L203, TODO.md:L121-L128). (kylebutts.com) Concrete fix: None required in this PR; follow-up already tracked.
No unmitigated methodology findings. The prior connectivity issue is now handled explicitly in code and covered by direct tests (diff_diff/spillover.py:L682-L778, diff_diff/spillover.py:L1660-L1739, tests/test_spillover.py:L2101-L2208).

Code Quality

No findings.

Performance

Severity P3 | Impact: _compute_nearest_treated_distance_staggered() still uses the dense per-cohort distance path, so large staggered panels may pay avoidable memory/runtime cost; this is already tracked in TODO.md:L128-L128. Concrete fix: None required in this PR; follow-up already tracked.

Maintainability

No findings.

Tech Debt

Severity P3 | Impact: The remaining Wave B spillover follow-ups (event_study, survey_design, count-based rings, data-driven d_bar, GMM correction, sparse staggered distance path, first-class Conley in TwoStageDiD) are explicitly tracked in TODO.md:L121-L128, so they are deferred work rather than blockers. Concrete fix: None required in this PR.

Security

No findings.

Documentation/Tests

No unmitigated findings. The public docs now describe the connectivity requirement and the period-strict/unit-warn-drop behavior, and the new tests cover the connectivity helper directly (docs/api/spillover.rst:L204-L230, docs/methodology/REGISTRY.md:L2966-L2974, tests/test_spillover.py:L2101-L2208). Review limitation: I could not run the suite locally because pytest is unavailable.

New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner (2022) DiD with ring-indicator covariates that identify, alongside the direct effect on treated (`tau_total`), per-ring spillover effects on near-control units (`delta_j`). Reference: Butts, K. (2023, originally 2021) "Difference-in-Differences with Spatial Spillovers" arXiv:2105.03737v3; Gardner, J. (2022) "Two-stage differences in differences" arXiv:2207.05943. Handles panel non-staggered (paper Eqs 5/6/8) and Section 5 staggered timing in one estimator. Stage-2 regressor uses the time-varying `(1 - D_it) * Ring_{it,j}` form. Stage-1 subsample is Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed). Identification-check policy: period strict, unit warn-and-drop, plus connected-component check on the Omega_0 bipartite graph. SE clamps negative vcov diagonals before sqrt (sibling-estimator convention). `coefficients` exposes all (1+K) stage-2 entries keyed to vcov columns plus an "ATT" alias. `rank_deficient_action` validated at `__init__`. Variance: stage-2 OLS via `solve_ols` (HC1 / Conley / cluster). Gardner GMM first-stage uncertainty correction NOT applied in this PR (documented limitation; tracked in TODO). Deferred features (planned follow-ups, all in TODO): `event_study=True` per-event-time × ring coefficients, `survey_design=` integration, `ring_method="count"`, data-driven `d_bar`, GMM correction at stage 2, sparse staggered ring-distance path, TwoStageDiD-Conley first-class. Tests: `tests/test_spillover.py` (153 tests) + DGP factories at `tests/_dgp_utils.py`. Includes 20-seed Gardner identity bit-identity test (`SpilloverDiD.att` matches single-stage TWFE ring regression at `atol=1e-10` on non-staggered DGPs — the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator). Non-staggered MC at 50 seeds + 200-seed slow variant recovers both `tau_total` and `delta_1`; staggered MC at 30 seeds anchors `tau_total` only. Docs: REGISTRY section, API rst, `llms.txt` + `llms-full.txt`, README catalog entry, references, `doc-deps.yaml`, TODO follow-up rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-16T10:42:10Z

🔁 AI review rerun (requested by @igerber)

Head SHA: ced54b3d8dae0c9a5c0945c66ba8c9efce93a727

Overall Assessment

⚠️ Needs changes — [Newly identified] the new anticipation parameter changes identification-relevant timing in SpilloverDiD, but the added test suite never checks its behavior on either fit path. Under the stated review rubric, that missing parameter-interaction coverage is a P1.

Executive Summary

The previous blocker remains resolved: the Omega_0 connectivity check is still enforced in diff_diff/spillover.py:L1721-L1739 and covered by dedicated tests in tests/test_spillover.py:L2106-L2213.
I did not find an undocumented methodology mismatch in the estimator itself; the remaining first-stage GMM variance omission is explicitly documented and tracked (docs/methodology/REGISTRY.md:L2985-L3012, docs/api/spillover.rst:L171-L188, TODO.md:L135-L142).
Severity P1 | [Newly identified] anticipation is estimand-defining in both code and docs (diff_diff/spillover.py:L1545-L1626, docs/methodology/REGISTRY.md:L2972-L2974), but tests only cover round-tripping and invalid-value rejection (tests/test_spillover.py:L1837-L1855, tests/test_spillover.py:L2221-L2235).
Severity P2 | The staggered identification coverage only anchors tau_total, not the staggered spillover coefficients (tests/test_spillover.py:L813-L826 vs. the non-staggered tau_total/delta_1 coverage at tests/test_spillover.py:L765-L811).
Review limitation: I could not run pytest or import the package here because this environment does not have pytest or numpy installed.

Methodology

No unmitigated P0/P1 methodology findings in the estimator logic itself. The implemented shape is consistent with the cited two-stage setup: Gardner’s first stage identifies unit/period effects from untreated observations, and Butts’ spillover estimator uses untreated-and-unexposed observations before second-stage spillover regression. (arxiv.org)
Severity P3 | Impact: Stage-2 SEs still omit Gardner’s first-stage GMM correction, so reported SEs/CIs/p-values can be too optimistic. This is explicitly documented and TODO-tracked, so it is mitigated under the rubric. Concrete fix: none required in this PR; keep the tracked follow-up as-is (docs/methodology/REGISTRY.md:L2985-L3012, docs/api/spillover.rst:L171-L188, TODO.md:L135-L142).

Code Quality

No findings.

Performance

Severity P3 | Impact: _compute_nearest_treated_distance_staggered() still always uses dense per-cohort distance matrices, so large staggered panels will pay avoidable memory/runtime cost. This is already tracked. Concrete fix: none required in this PR; follow the existing sparse-helper TODO (diff_diff/spillover.py:L365-L369, TODO.md:L142-L142).

Maintainability

No findings.

Tech Debt

No new blocker beyond the spillover follow-ups already tracked in TODO.md:L135-L142.

Security

No findings.

Documentation/Tests

Severity P1 | Impact: [Newly identified] anticipation shifts both the treatment clock and the ring-exposure clock (diff_diff/spillover.py:L1545-L1626, docs/methodology/REGISTRY.md:L2972-L2974), which changes Omega_0 membership and therefore the estimand. The added tests never assert that behavior on either treatment= or first_treat= paths; they only check serialization and invalid-value rejection (tests/test_spillover.py:L1837-L1855, tests/test_spillover.py:L2221-L2235). A regression here could silently move rows in or out of Omega_0 and change tau_total/delta_j without failing CI. Concrete fix: add behavior-level anticipation tests on both fit entry paths that assert the shifted D_it/S_it/Omega_0 membership on a hand-built panel.
Severity P2 | Impact: The new staggered spillover path is only numerically anchored on tau_total; there is no staggered recovery test for any delta_j. Current coverage at tests/test_spillover.py:L813-L826 would not catch a staggered-only ring-assignment or spillover-coefficient regression that leaves tau_total roughly unchanged. Concrete fix: add at least one staggered DGP test that asserts delta_1 recovery, ideally with a slower higher-seed variant analogous to tests/test_spillover.py:L765-L811.

Path to Approval

Add a non-staggered anticipation=1 regression test on the treatment= path that explicitly checks the anticipation window changes D_it and Omega_0 membership as documented.
Add a staggered anticipation=1 regression test on the first_treat= path that checks the ring-exposure clock shifts with the treatment clock and preserves the intended far-away controls.
Add a staggered spillover-coefficient accuracy anchor for at least delta_1, so the staggered ring path is directly regression-tested, not only tau_total.

…overy Two test-coverage gaps surfaced on the rebased SHA: 1. `anticipation` was previously only round-tripped on results and rejected for invalid values, with no behavior-level test asserting that it actually shifts D_it / Omega_0 membership / the ring-exposure clock. A regression there could silently move rows in/out of stage 1 and change tau_total / delta_j without failing CI. 2. Staggered MC anchored only `tau_total`, not `delta_1`. A staggered- only ring-assignment or spillover-coefficient regression with roughly- unchanged tau_total wouldn't be caught. Fixes: - tests/test_spillover.py: new `TestSpilloverDiDAnticipationBehavior` class (3 tests) with a hand-built 4-period panel (1 treated @ t=2, 1 near-control, 2 far-controls). Asserts on both `treatment=` and `first_treat=` paths that anticipation=1 (vs 0) yields n_treated+1, smaller stage1_n_obs, and a different att. Plus a parity check that the two fit paths produce identical results under the same anticipation setting (entry points are internally unified). - tests/test_spillover.py: `TestSpilloverDiDIdentification::test_staggered_recovers_tau_total` renamed to `..._tau_total_and_delta_1` and extended to also assert staggered delta_1 recovery within 0.03 absolute tolerance (mean over 30 seeds: -0.0398, std 0.0083). Per-event-time `delta_jk` decomposition on staggered DGPs is still queued alongside event_study=True support. - docs/methodology/REGISTRY.md, CHANGELOG.md: narrow the staggered MC anchor language to match (tau_total + delta_1, not tau_total only); bump test count 153 -> 156. All 156 tests pass; black + ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-16T11:00:20Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 1a71eaf59506e2551d4ae19b7dc4ac557a1eb942

Overall Assessment

✅ Looks good — the prior P1 on anticipation interaction coverage is resolved, the prior staggered-delta_1 test gap is resolved, and I did not find any new unmitigated P0/P1 issues in the changed spillover estimator.

Executive Summary

Previous P1 resolved: anticipation now shifts both the treatment clock and the exposure clock in diff_diff/spillover.py lines 1545-1657, and that behavior is tested on both treatment= and first_treat= paths in tests/test_spillover.py lines 1907-2001.
Previous P2 resolved: staggered Monte Carlo coverage now anchors both tau_total and delta_1 in tests/test_spillover.py lines 813-839.
I did not find an undocumented methodology mismatch in the estimator shape itself. The ring-based spillover setup, staggered extension, and Gardner-style untreated first stage are coherent with the cited papers’ framing. (arxiv.org)
Severity P3 | mitigated: stage-2 SEs still omit the Gardner first-stage GMM correction, but that limitation is explicitly documented in docs/methodology/REGISTRY.md lines 2985-2987 and tracked in TODO.md lines 135-142.
Severity P3: the Conley test remains only a smoke test; tests/test_spillover.py lines 869-894 never assert a Conley-specific numeric invariant.
Review limitation: I could not run pytest or import the package here because this environment is missing pytest and core runtime deps like pandas.

Methodology

No unmitigated P0/P1 findings. The implemented estimator follows the registry’s committed contract: time-varying ring regressors, a spillover-clean Omega_0, and Gardner-style first-stage residualization on untreated observations; that is consistent with Butts’s spillover-aware DiD framing and Gardner’s two-stage setup. diff_diff/spillover.py lines 17-25, diff_diff/spillover.py lines 1651-1657, docs/methodology/REGISTRY.md lines 2952-2974. (arxiv.org)
Severity P3 | Impact: stage-2 inference is still optimistic relative to the full Butts+Gardner variance composition because the first-stage GMM correction is omitted. This is mitigated by explicit registry/TODO tracking, so it is not a blocker. Concrete fix: none required in this PR; keep the existing follow-up to extend two_stage.py::_compute_gmm_variance. docs/methodology/REGISTRY.md, TODO.md

Code Quality

No findings.

Performance

Severity P3 | Impact: _compute_nearest_treated_distance_staggered() still uses dense per-cohort pairwise distance matrices, so large staggered panels will incur avoidable memory/runtime cost. This is already documented and tracked, so it is mitigated. Concrete fix: none required in this PR; implement the queued sparse cKDTree branch. diff_diff/spillover.py lines 333-369, TODO.md

Maintainability

No findings.

Tech Debt

No new untracked tech debt findings. The remaining spillover follow-ups are properly registered in TODO.md lines 135-142.

Security

No findings.

Documentation/Tests

Severity P3 | Impact: test_conley_se_differs_from_hc1 does not actually verify a Conley-specific result; it only checks finite output and ATT invariance. A regression that silently fell back to HC1 could still pass. Concrete fix: compare result_conley.se/vcov against a direct solve_ols(..., vcov_type="conley") reference on a fixed fixture, or use a spatially correlated fixture where HC1 and Conley must differ. tests/test_spillover.py lines 869-894
No remaining P1-level test gaps from the prior review: anticipation behavior is now covered on both fit paths, and staggered delta_1 recovery is now pinned. tests/test_spillover.py lines 813-839, tests/test_spillover.py lines 1907-2001

Round-8 CI review flagged `test_conley_se_differs_from_hc1` as a test- coverage gap: the name promises a Conley-specific assertion but the body only checked finite SE + ATT invariance. A silent fallback to HC1 (e.g., if SpilloverDiD ever stopped threading the Conley kwargs through to `solve_ols`) would have passed. Replace with: - `test_conley_kwargs_threaded_to_solve_ols`: patches `solve_ols` at the import site, captures the kwargs of the stage-2 invocation, and asserts they include `vcov_type="conley"`, `conley_cutoff_km=200.0`, `conley_metric="haversine"`, `conley_lag_cutoff=0`, plus fit-time-derived `conley_coords` / `conley_time` / `conley_unit` arrays of the right shape. Any silent fallback to HC1 fails this. - `test_conley_att_invariant_vs_hc1`: extracted from the old test — vcov choice does not change ATT (residualization + OLS fit are independent of variance). Now stands as a clean invariant rather than pretending to verify Conley-specific output. Also bumps CHANGELOG test count 156 -> 157 and updates the Conley description to "plumbing (verifies solve_ols is called with vcov_type= 'conley' + Conley kwargs, no silent HC1 fallback)". All 157 tests pass; black + ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-16T11:13:53Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2b86d8c051ecec73c43ae601e601fc213bec3b93

Overall Assessment

✅ Looks good — the prior re-review gaps are addressed, and I did not find any new unmitigated P0/P1 issues in the changed spillover estimator.

Executive Summary

The prior anticipation-propagation issue is resolved: anticipation now shifts both the treatment clock and the exposure clock through effective_onsets in diff_diff/spillover.py:L1545-L1657, and both treatment= and first_treat= entry paths are covered in tests/test_spillover.py:L1923-L2045.
The prior staggered-coverage gap is resolved: the new staggered Monte Carlo test now anchors both tau_total and delta_1 in tests/test_spillover.py:L813-L839.
The prior Conley test gap is resolved: the test now verifies that stage 2 actually calls solve_ols(..., vcov_type="conley", ...) with the Conley kwargs, preventing silent HC1 fallback (tests/test_spillover.py:L869-L925).
I did not find an undocumented methodology mismatch in the committed estimator contract. The implemented time-varying ring regressor, stricter Omega_0, period-strict/unit-warn-drop identification policy, and connectivity check are consistent with the contract documented in docs/methodology/REGISTRY.md:L2952-L2987 and implemented in diff_diff/spillover.py:L1545-L1739.
Severity P3 | informational: stage-2 SEs still omit the Gardner first-stage GMM correction, but that limitation is explicitly documented in docs/methodology/REGISTRY.md:L2985-L2987 and tracked in TODO.md:L135-L142, so it is mitigated and not blocking.

Methodology

Severity P3 | Impact: stage-2 inference remains optimistic relative to the full Butts+Gardner variance composition because the stage-1 GMM uncertainty correction is not yet applied. This is a documented limitation, not a defect, because it is explicitly called out in diff_diff/spillover.py:L1419-L1423, docs/methodology/REGISTRY.md:L2985-L2987, and TODO.md:L135-L142. Concrete fix: none required in this PR; keep the planned follow-up to extend two_stage.py::_compute_gmm_variance.
No unmitigated P0/P1 findings.

Code Quality

No findings.

Performance

Severity P3 | Impact: _compute_nearest_treated_distance_staggered() still uses dense per-cohort pairwise distance matrices, so large staggered panels will pay avoidable memory/runtime cost. This is already tracked in TODO.md:L142-L142, so it is mitigated. Concrete fix: add the queued sparse cKDTree branch for diff_diff/spillover.py:L333-L429.

Maintainability

No findings.

Tech Debt

No new untracked tech debt findings. The spillover follow-ups added in TODO.md:L135-L142 appropriately track the deferred variance, event-study, survey, ring-count, and sparse-distance work.

Security

No findings.

Documentation/Tests

No findings. The re-review gaps from the prior pass are now covered by explicit regression tests for staggered delta_1, anticipation on both fit paths, Conley plumbing, non-staggered FE identity, and the warn/drop vs hard-error identification split in tests/test_spillover.py:L813-L925, tests/test_spillover.py:L1732-L2045, and tests/test_spillover.py:L2712-L2761.

Review limitation: I could not execute the test suite in this environment because Python runtime dependencies such as numpy are unavailable, so the assessment is based on diff inspection and the added regression coverage.

igerber added ready-for-ci Triggers CI test workflows and removed ready-for-ci Triggers CI test workflows labels May 16, 2026

igerber force-pushed the spillover-conley-butts-wave-b branch from 4fda83c to ced54b3 Compare May 16, 2026 10:32

igerber added the ready-for-ci Triggers CI test workflows label May 16, 2026

igerber merged commit 71f84d0 into main May 16, 2026
33 of 34 checks passed

igerber deleted the spillover-conley-butts-wave-b branch May 16, 2026 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpilloverDiD: ring-indicator spillover-aware DiD (Butts 2021)#446

SpilloverDiD: ring-indicator spillover-aware DiD (Butts 2021)#446
igerber merged 3 commits into
mainfrom
spillover-conley-butts-wave-b

igerber commented May 15, 2026

Uh oh!

igerber commented May 15, 2026

Uh oh!

igerber commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 15, 2026

Summary

Methodology references

Validation

Security / privacy

Uh oh!

igerber commented May 15, 2026

Uh oh!

igerber commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant