Skip to content

SpilloverDiD: ring-indicator spillover-aware DiD (Butts 2021)#446

Merged
igerber merged 3 commits into
mainfrom
spillover-conley-butts-wave-b
May 16, 2026
Merged

SpilloverDiD: ring-indicator spillover-aware DiD (Butts 2021)#446
igerber merged 3 commits into
mainfrom
spillover-conley-butts-wave-b

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 15, 2026

Summary

  • New standalone estimator SpilloverDiD (diff_diff/spillover.py) implementing two-stage Gardner DiD with ring-indicator covariates that identify both the direct effect on treated units (tau_total) and per-ring spillover effects on near-control units (delta_j). Handles non-staggered and Section 5 staggered timing in a single estimator.

Methodology references

  • Method: Butts (2021/2023) ring-indicator spillover-aware DiD; Gardner (2022) two-stage residualize-then-fit; reuses the Conley (1999) spatial-HAC vcov shipped in 3.3.3.
  • Paper / source links:
    • Butts, K. (2023). Difference-in-Differences with Spatial Spillovers. arXiv:2105.03737v3https://arxiv.org/abs/2105.03737
    • Gardner, J. (2022). Two-stage differences in differences. arXiv:2207.05943https://arxiv.org/abs/2207.05943
    • Conley, T. G. (1999). GMM Estimation with Cross-Sectional Dependence. Journal of Econometrics 92(1), 1-45.
  • Intentional deviations from source:
    • Stage-2 regressor uses the time-varying (1 - D_it) * Ring_{it,j} form (paper page 12's S_it = S_i * 1{t >= t_treat} notation; Section 5 Table 2's S^k_{it} / Ring^k_{it,j}). The literal unit-static reading of Equation 5 is algebraically rank-deficient under TWFE; only the time-varying form supports the paper's identification (Proposition 2.3). Documented in docs/methodology/REGISTRY.md § SpilloverDiD.
    • Stage-1 subsample uses Butts' stricter Omega_0 = {D_it = 0 AND S_it = 0} (untreated AND unexposed) rather than TwoStageDiD's {D_it = 0} (untreated only). Prevents spillover-contaminated near-controls from biasing the time FE.
    • Omega_0 row-level identification check: period strict (hard ValueError — dropping a period removes all units' cross-time identification), unit warn-and-drop (mirrors TwoStageDiD's always-treated convention; the downstream finite-mask path excludes the affected rows from stage 2).
    • Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (planned follow-up extends two_stage.py::_compute_gmm_variance). Documented in REGISTRY + TODO.md.
    • Far-away control identification uses current-period untreated status (D_it = 0) rather than never-treated-only, so all-eventually-treated staggered designs can identify the counterfactual via not-yet-treated far-away rows.
  • No R parity benchmark: did2s implements Gardner two-stage without rings; no published R/Stata software implements the Butts ring estimator. Correctness anchored on (a) 20-seed deterministic regression test pinning SpilloverDiD.att against direct single-stage TWFE ring regression at atol=1e-10 (the Gardner identity equivalence for non-staggered timing — empirically bit-identical, so the reported non-staggered tau_total IS the Butts Eqs. 4-6 estimator), (b) 50-seed Monte Carlo identification recovery on synthetic Butts-Assumption-satisfying DGPs (+ 200-seed @pytest.mark.slow variant), and (c) Conley sparse-vs-dense parity inherited from the 3.3.3 release.

Validation

  • Tests added (139 total in tests/test_spillover.py):
    • Ring-construction primitives (pairwise haversine/euclidean, callable metrics, static + staggered nearest-treated distance, sparse kd-tree path)
    • Validators (rings sort/non-negative/start-at-0, d_bar consistency, treatment XOR first_treat, exact {0,1} treatment, NaN rejection on cluster/unit/time/first_treat/treatment, balanced panel, duplicate cells, non-absorbing treatment, conley_coords within-unit-constant, callable metric self-distance contract, hc2/hc2_bm rejected, NaN in outcome rejected, mixed-encoding time collapse caught)
    • Fit integration (non-staggered + staggered, raw-data invariant, Conley wiring, baseline-treated unit recognition + warn-and-drop, partial-unsupported-units warn-and-drop, period-level hard ValueError)
    • Identification (50-seed MC recovery of tau_total and delta_j; 200-seed slow variant)
    • Gardner identity (20-seed bit-identity vs single-stage TWFE ring regression at atol=1e-10)
    • Coefficients-vs-vcov column alignment + ATT alias
  • DGP factories (tests/_dgp_utils.py): generate_butts_nonstaggered_dgp / generate_butts_staggered_dgp satisfy Butts Assumptions 1/3/5/7 by construction.
  • No tutorial in this PR: T22 (Butts TVA tutorial) is deferred to a follow-up notebook PR per user-confirmed Wave B scope.
  • Documentation: REGISTRY section adjacent to ConleySpatialHAC, docs/api/spillover.rst, diff_diff/guides/llms.txt + llms-full.txt, docs/references.rst, docs/doc-deps.yaml, README catalog entry, TODO.md rows for deferred follow-ups.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 15, 2026

/ai-review

1 similar comment
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 15, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: fc48d69a654d4206da0cf3e5312d2a61ad67d321


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Cross-checking against docs/methodology/REGISTRY.md did not surface an undocumented methodology deviation. The PR’s main departures from a literal Butts/Gardner implementation are explicitly documented in docs/methodology/REGISTRY.md:L2952-L2985.
  • The new estimator avoids the project’s inference anti-patterns: ATT and per-ring outputs both route through safe_inference() rather than inline t_stat = coef / se logic (diff_diff/spillover.py:L1745-L1757).
  • Input validation is strong on the edge cases called out in prior reviews: duplicate cells, unbalanced panels, NaN treatment/IDs/clusters, reversible treatment, unsupported hc2/hc2_bm, and missing far-away controls (diff_diff/spillover.py:L947-L1284).
  • Minor follow-ups only: SE extraction should clamp negative vcov diagonals for consistency, and the staggered-distance helper still has a small avoidable performance hotspot.
  • I could not execute the tests in this sandbox because the available Python environment does not have numpy/pandas; this is a static review only.

Methodology

  • P3 — Informational: the material methodology deviations are documented rather than hidden: time-varying ring regressor, stricter Omega_0, unit warn-and-drop, and omission of Gardner’s stage-1 GMM correction (docs/methodology/REGISTRY.md:L2952-L2985, TODO.md:L121-L127). Impact: no approval blocker under the repo policy; users still need to treat current analytical SEs as downward-biased until the tracked follow-up lands. Concrete fix: none required for this PR.

Code Quality

  • P3SpilloverDiD.fit() takes np.sqrt(vcov[i, i]) directly when building ATT and per-ring SEs (diff_diff/spillover.py:L1740-L1757). Sibling estimators usually clamp max(var, 0.0) first (diff_diff/two_stage.py:L1183-L1183, diff_diff/estimators.py:L606-L606, diff_diff/stacked_did.py:L515-L515). Impact: a numerically tiny negative diagonal from an indefinite Conley or near-singular sandwich can blank out the entire inference row with NaN instead of preserving a stable zero-floor SE. Concrete fix: wrap each diagonal in max(v, 0.0) before sqrt.

Performance

  • P3 — The staggered distance helper recomputes row_pos inside every cohort loop (diff_diff/spillover.py:L400-L425). Impact: avoidable extra O(n_rows × n_cohorts) Python work on already-dense staggered fits. Concrete fix: precompute row_pos once before the for onset in unique_onsets loop and reuse it.

Maintainability

  • No findings.

Tech Debt

  • P3 — The staggered helper says the sparse kd-tree branch is “queued as a follow-up — see TODO.md” (diff_diff/spillover.py:L365-L369), but the new spillover TODO block does not actually track that item (TODO.md:L121-L127). Impact: the deferred scaling work is easy to lose from the project’s review backlog. Concrete fix: add a TODO.md row for the sparse staggered ring-distance path, or remove the explicit TODO cross-reference from the code/docs.

Security

  • No findings.

Documentation/Tests

  • No findings on the diff itself. The new registry entry and the spillover test surface are both thorough (docs/methodology/REGISTRY.md:L2940-L3012, tests/test_spillover.py). Review limitation: I was not able to run the tests in this sandbox because the Python environment here is missing numpy/pandas.

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 6f3ade490449ab5cec7ae2250ad0e6249a526ea4


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Previous AI re-review items were addressed: the stage-2 SE path now clamps negative vcov diagonals before sqrt, the staggered helper precomputes row_pos once, and the sparse staggered path is now explicitly tracked in TODO.md (diff_diff/spillover.py:L400-L402, diff_diff/spillover.py:L1742-L1764, TODO.md:L121-L128).
  • At a high level, the estimator matches the cited Butts + Gardner synthesis: Butts frames the DiD target as jointly identifying direct and spillover effects under spatial spillovers, and Gardner supplies the two-stage DiD backbone reused here. (arxiv.org)
  • The major implementation departures from a literal paper reading are documented in the Methodology Registry, so under repo policy they are informational rather than blocking (docs/methodology/REGISTRY.md:L2952-L3010, TODO.md:L121-L128).
  • I did not find a new P0/P1 issue in control-group composition, NaN-safe inference, or public-parameter propagation on the shipped surface.
  • Remaining issues are minor: one registry gap for the public anticipation extension, one constructor-validation consistency issue, and one validation overclaim in REGISTRY.md.
  • Review limitation: I could not execute tests/test_spillover.py here because the sandbox Python environment lacks both numpy and pytest.

Methodology

No unmitigated source-material mismatch found. The time-varying ring regressor, stricter Omega_0, and omission of Gardner’s stage-1 GMM correction are explicitly recorded in the registry and TODO, so they are non-blocking per repo policy. At a high level, that is consistent with the cited Butts/Gardner setup. (arxiv.org)

Code Quality

Performance

No findings. The remaining spillover-specific hotspot, the dense staggered nearest-treated-distance path, is now properly tracked in TODO.md:L128-L128.

Maintainability

No findings.

Tech Debt

  • P3 Impact: remaining deferrals for Gardner GMM correction, event-study mode, survey integration, and the sparse staggered path are properly tracked, so they are acceptable deferred work rather than blockers for this PR. Concrete fix: none required in this PR (TODO.md:L121-L128).

Security

No findings.

Documentation/Tests

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: b23aca4075acfc947ea3b0496521f190bca980c0


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Previous re-review items are addressed: anticipation is now documented and serialized on results, rank_deficient_action is validated in __init__, and the registry no longer claims a shipped reduce-to-TwoStageDiD test (docs/methodology/REGISTRY.md:2972-2974, diff_diff/spillover.py:885-906, diff_diff/results.py:445-462).
  • Methodology alignment is acceptable: Butts’ spillover setup uses untreated-and-unexposed rows for the first-stage residualization in the staggered case, and Gardner’s two-stage framework requires inference that accounts for the generated first-stage regressor. The PR’s omission of that GMM correction is explicitly documented and tracked, so it is informational rather than blocking. citeturn3view0turn3view1turn5view0
  • I did not find a new control-group, NaN-inference, or parameter-propagation issue in the changed spillover code.
  • Remaining issues are documentation-only: one registry note still overstates the shipped validation surface, and one public guide still advertises covariates without warning that the estimator rejects it.
  • Review limitation: I could not execute tests/test_spillover.py here because the sandbox lacks both pytest and numpy.

Methodology

  • P3 Impact: Stage-2 SEs still omit Gardner’s first-stage uncertainty correction, which would be a methodology defect if undocumented. Here it is explicitly disclosed in the registry and tracked in TODO.md, so under repo policy it is non-blocking informational debt. Concrete fix: none required in this PR; keep the existing follow-up entry at TODO.md:121. citeturn3view0turn5view0
  • No unmitigated P0/P1 findings.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • P3 Impact: the deferred spillover items that remain after this PR are properly tracked in TODO.md (GMM correction, event_study, survey_design, count rings, data-driven d_bar, sparse staggered distance), so they are acceptable deferred work rather than blockers. Concrete fix: none required in this PR (TODO.md:121-128).

Security

  • No findings.

Documentation/Tests

  • P3 Impact: REGISTRY.md says the synthetic-DGP Monte Carlo anchor includes both 50-seed and 200-seed recovery of tau_total and delta_1, but the shipped tests only pin delta_1 on the non-staggered DGP; the staggered MC currently checks tau_total only. That overstates the validation actually merged. Concrete fix: either add staggered delta_1 recovery tests (and a slow variant) or narrow the registry note to the validation that is present (docs/methodology/REGISTRY.md:2974, tests/test_spillover.py:764-810, tests/test_spillover.py:812-822).
  • P3 Impact: the LLM guide exposes fit(..., covariates=...) in the public signature but does not list that covariates currently raises NotImplementedError, even though the estimator rejects it in code. That makes the documented surface broader than the real one. Concrete fix: add an explicit covariates= limitation bullet in the spillover guide section, at minimum in diff_diff/guides/llms-full.txt (diff_diff/guides/llms-full.txt:488-507, diff_diff/spillover.py:1353-1363).

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 9581b21cb586ab8a9e9a108d75baa2db4f4d7b51


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Previous re-review items are mostly addressed: anticipation is now carried on results serialization, rank_deficient_action is validated at construction, and the public LLM guide now explicitly says covariates= is unsupported. See diff_diff/results.py:L402-L462, diff_diff/spillover.py:L885-L906, and diff_diff/guides/llms-full.txt:L501-L508.
  • Methodology alignment is acceptable against the registry: the time-varying ring regressor, stricter Omega_0, current-period untreated far-away logic, and omitted Gardner GMM correction are all explicitly documented or tracked, so they are non-blocking in this PR. See docs/methodology/REGISTRY.md:L2952-L3012 and TODO.md:L121-L128.
  • I did not find a new control-group, NaN-inference, parameter-propagation, or undocumented methodology defect in the changed estimator code.
  • One previous P3 documentation issue remains in narrowed form: the registry/changelog still overstate the merged staggered Monte Carlo coverage.
  • Review limitation: I could not execute tests/test_spillover.py here because this sandbox does not have numpy, pandas, scipy, or pytest.

Methodology

  • P3 Impact: Stage-2 SEs still omit Gardner’s first-stage GMM uncertainty correction, but that deviation is explicitly documented in the methodology registry and tracked in TODO.md, so under repo policy it is informational rather than blocking. Concrete fix: none required in this PR; keep the existing follow-up tracked in docs/methodology/REGISTRY.md:L2985-L3012 and TODO.md:L121-L128.
  • No unmitigated P0/P1 findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • P3 Impact: The main deferred spillover items (event_study, survey_design, ring_method="count", data-driven d_bar, sparse staggered distance, GMM correction) are properly tracked in TODO.md, so they are acceptable deferrals rather than blockers. Concrete fix: none required in this PR. See TODO.md:L121-L128.

Security

No findings.

Documentation/Tests

  • P3 Impact: The validation narrative still overstates the shipped staggered Monte Carlo coverage. REGISTRY.md says the staggered anchor is a 50-seed default plus a 200-seed slow variant, and the changelog summarizes the identification MC in the same way, but the merged test file only contains a 30-seed staggered tau_total test and no slow staggered companion. Concrete fix: either add the promised staggered 50-seed and 200-seed tests, or narrow the docs/changelog to the 30-seed staggered tau_total test that actually exists. See docs/methodology/REGISTRY.md:L2974-L2974, CHANGELOG.md:L11-L11, tests/test_spillover.py:L796-L810, and tests/test_spillover.py:L812-L825.

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 23cdf0ef20c06416e512132595d9b2be6c53b485


Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains in the new SpilloverDiD identification path.

Executive Summary

  • Previous re-review items look addressed: anticipation now round-trips on results serialization, rank_deficient_action is validated at construction, and the docs now match the shipped staggered MC coverage.
  • The estimator matches the documented registry deviations on the time-varying ring regressor, stricter Omega_0, current-period untreated far-away controls, and the omitted Gardner GMM correction.
  • P1: SpilloverDiD checks balancedness plus per-unit/per-period Omega_0 coverage, but it never checks whether the Omega_0 = {D_it = 0, S_it = 0} unit-time graph is actually connected. Some balanced panels will therefore pass validation even though stage-1 FE are not jointly identified, and tau_total / delta_j can be silently distorted.
  • The missing Gardner GMM correction remains P3 informational because it is explicitly documented in the methodology registry and tracked in TODO.md.
  • Review limitation: I could not execute the test suite here because pytest, numpy, pandas, and scipy are not installed in this environment.

Methodology

  • Severity P1
    Impact: SpilloverDiD.fit() treats a balanced panel plus per-unit/per-period Omega_0 presence as sufficient support for the stage-1 FE model (diff_diff/spillover.py:1104, diff_diff/spillover.py:1563). That proxy is not enough. A balanced panel can still have a disconnected Omega_0 bipartite graph, in which case _iterative_fe_subset() only identifies FE up to component-specific constants while _residualize_butts() later combines unit FE from one component with time FE from another. That silently changes the residualized outcome used to estimate both tau_total and delta_j. The public docs currently describe exact graph-connectivity as a future relaxation of the balanced-panel gate (docs/api/spillover.rst:204, docs/methodology/REGISTRY.md:3009), but for this Gardner/Butts stage-1 design it is the actual live identification condition.
    Concrete fix: after building omega_0_mask, construct the unit-time support graph on Omega_0 rows and raise ValueError unless all stage-2-supported rows lie in a single connected component. Add a regression test for a balanced-but-disconnected Omega_0 panel.

  • Severity P3
    Impact: Stage-2 SEs still omit Gardner’s first-stage GMM uncertainty correction, but this deviation is explicitly documented and tracked (docs/methodology/REGISTRY.md:2987, TODO.md:121). Under repo policy this is informational, not blocking.
    Concrete fix: none required in this PR.

Code Quality

No additional findings.

Performance

No additional findings.

Maintainability

No additional findings.

Tech Debt

  • Severity P3
    Impact: The remaining spillover follow-ups (event_study, survey_design, ring_method="count", data-driven d_bar, sparse staggered distance path, GMM correction) are properly tracked in TODO.md (TODO.md:121).
    Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

  • Severity P2
    Impact: The current tests cover missing-period Omega_0 support and unbalanced panels (tests/test_spillover.py:1371, tests/test_spillover.py:2004), but they do not cover the balanced-but-disconnected Omega_0 case behind the P1 above. That leaves the identification bug unguarded in CI.
    Concrete fix: add a fixture where every unit and every period has at least one Omega_0 row, yet the Omega_0 graph splits into two disconnected components, and assert that fit() raises with a connectivity-specific error.

Path to Approval

  1. Add an explicit connected-component check for the Omega_0 unit-time graph in SpilloverDiD.fit() and fail fast on disconnected support.
  2. Add a regression test for a balanced panel that passes the current unit/period/balanced-panel guards but still has disconnected Omega_0 support.
  3. Update docs/api/spillover.rst, docs/methodology/REGISTRY.md, and the [Unreleased] changelog entry so they state the actual identification requirement: balancedness alone is not sufficient; connected Omega_0 support is required.

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: c1ba56427cfc4592db51e087862356a13d6e5b8a


Overall Assessment

Looks good — the prior P1 on disconnected Omega_0 support is resolved, and I did not find any new unmitigated P0/P1 issues in the changed spillover estimator path.

Executive Summary

  • The previous blocker is addressed: SpilloverDiD.fit() now explicitly checks connectedness of the supported Omega_0 unit-period graph before stage 1, and the failure mode is documented and tested (diff_diff/spillover.py:L682-L778, diff_diff/spillover.py:L1660-L1739, docs/methodology/REGISTRY.md:L2966-L2968, tests/test_spillover.py:L2101-L2208).
  • The implemented estimator structure matches the cited methodology: Butts’ staggered procedure estimates unit/time effects on untreated-and-unexposed observations and then regresses the residualized outcome on treatment and spillover dummies; Gardner likewise treats second-stage inference as generated-regressor inference. (kylebutts.com)
  • The remaining omission of Gardner’s first-stage GMM variance correction is still present, but it is explicitly documented in the Methodology Registry/API docs and tracked in TODO.md, so under the project rubric it is P3 informational rather than blocking (docs/methodology/REGISTRY.md:L2985-L3012, docs/api/spillover.rst:L177-L203, TODO.md:L121-L128). (kylebutts.com)
  • I did not identify a new silent correctness bug in control-group construction, ring weighting, NaN/inference handling, or parameter propagation in the changed files.
  • Review limitation: I could not execute pytest here because pytest is not installed in this environment.

Methodology

  • Severity P3 | Impact: Stage-2 SEs still omit Gardner’s first-stage GMM correction. That matters for inference in principle, but this deviation is explicitly documented and tracked, so it is mitigated under the review rubric rather than blocking (docs/methodology/REGISTRY.md:L2985-L3012, docs/api/spillover.rst:L177-L203, TODO.md:L121-L128). (kylebutts.com) Concrete fix: None required in this PR; follow-up already tracked.
  • No unmitigated methodology findings. The prior connectivity issue is now handled explicitly in code and covered by direct tests (diff_diff/spillover.py:L682-L778, diff_diff/spillover.py:L1660-L1739, tests/test_spillover.py:L2101-L2208).

Code Quality

No findings.

Performance

  • Severity P3 | Impact: _compute_nearest_treated_distance_staggered() still uses the dense per-cohort distance path, so large staggered panels may pay avoidable memory/runtime cost; this is already tracked in TODO.md:L128-L128. Concrete fix: None required in this PR; follow-up already tracked.

Maintainability

No findings.

Tech Debt

  • Severity P3 | Impact: The remaining Wave B spillover follow-ups (event_study, survey_design, count-based rings, data-driven d_bar, GMM correction, sparse staggered distance path, first-class Conley in TwoStageDiD) are explicitly tracked in TODO.md:L121-L128, so they are deferred work rather than blockers. Concrete fix: None required in this PR.

Security

No findings.

Documentation/Tests

No unmitigated findings. The public docs now describe the connectivity requirement and the period-strict/unit-warn-drop behavior, and the new tests cover the connectivity helper directly (docs/api/spillover.rst:L204-L230, docs/methodology/REGISTRY.md:L2966-L2974, tests/test_spillover.py:L2101-L2208). Review limitation: I could not run the suite locally because pytest is unavailable.

@igerber igerber added ready-for-ci Triggers CI test workflows and removed ready-for-ci Triggers CI test workflows labels May 16, 2026
New standalone estimator at `diff_diff/spillover.py` implementing
two-stage Gardner (2022) DiD with ring-indicator covariates that
identify, alongside the direct effect on treated (`tau_total`), per-ring
spillover effects on near-control units (`delta_j`). Reference:
Butts, K. (2023, originally 2021) "Difference-in-Differences with
Spatial Spillovers" arXiv:2105.03737v3; Gardner, J. (2022) "Two-stage
differences in differences" arXiv:2207.05943.

Handles panel non-staggered (paper Eqs 5/6/8) and Section 5 staggered
timing in one estimator. Stage-2 regressor uses the time-varying
`(1 - D_it) * Ring_{it,j}` form. Stage-1 subsample is Butts' STRICTER
`Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed).

Identification-check policy: period strict, unit warn-and-drop, plus
connected-component check on the Omega_0 bipartite graph. SE clamps
negative vcov diagonals before sqrt (sibling-estimator convention).
`coefficients` exposes all (1+K) stage-2 entries keyed to vcov columns
plus an "ATT" alias. `rank_deficient_action` validated at `__init__`.

Variance: stage-2 OLS via `solve_ols` (HC1 / Conley / cluster). Gardner
GMM first-stage uncertainty correction NOT applied in this PR (documented
limitation; tracked in TODO).

Deferred features (planned follow-ups, all in TODO): `event_study=True`
per-event-time × ring coefficients, `survey_design=` integration,
`ring_method="count"`, data-driven `d_bar`, GMM correction at stage 2,
sparse staggered ring-distance path, TwoStageDiD-Conley first-class.

Tests: `tests/test_spillover.py` (153 tests) + DGP factories at
`tests/_dgp_utils.py`. Includes 20-seed Gardner identity bit-identity
test (`SpilloverDiD.att` matches single-stage TWFE ring regression at
`atol=1e-10` on non-staggered DGPs — the reported non-staggered
`tau_total` IS the Butts Eqs. 4-6 estimator). Non-staggered MC at 50
seeds + 200-seed slow variant recovers both `tau_total` and `delta_1`;
staggered MC at 30 seeds anchors `tau_total` only.

Docs: REGISTRY section, API rst, `llms.txt` + `llms-full.txt`, README
catalog entry, references, `doc-deps.yaml`, TODO follow-up rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the spillover-conley-butts-wave-b branch from 4fda83c to ced54b3 Compare May 16, 2026 10:32
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: ced54b3d8dae0c9a5c0945c66ba8c9efce93a727


Overall Assessment

⚠️ Needs changes — [Newly identified] the new anticipation parameter changes identification-relevant timing in SpilloverDiD, but the added test suite never checks its behavior on either fit path. Under the stated review rubric, that missing parameter-interaction coverage is a P1.

Executive Summary

  • The previous blocker remains resolved: the Omega_0 connectivity check is still enforced in diff_diff/spillover.py:L1721-L1739 and covered by dedicated tests in tests/test_spillover.py:L2106-L2213.
  • I did not find an undocumented methodology mismatch in the estimator itself; the remaining first-stage GMM variance omission is explicitly documented and tracked (docs/methodology/REGISTRY.md:L2985-L3012, docs/api/spillover.rst:L171-L188, TODO.md:L135-L142).
  • Severity P1 | [Newly identified] anticipation is estimand-defining in both code and docs (diff_diff/spillover.py:L1545-L1626, docs/methodology/REGISTRY.md:L2972-L2974), but tests only cover round-tripping and invalid-value rejection (tests/test_spillover.py:L1837-L1855, tests/test_spillover.py:L2221-L2235).
  • Severity P2 | The staggered identification coverage only anchors tau_total, not the staggered spillover coefficients (tests/test_spillover.py:L813-L826 vs. the non-staggered tau_total/delta_1 coverage at tests/test_spillover.py:L765-L811).
  • Review limitation: I could not run pytest or import the package here because this environment does not have pytest or numpy installed.

Methodology

  • No unmitigated P0/P1 methodology findings in the estimator logic itself. The implemented shape is consistent with the cited two-stage setup: Gardner’s first stage identifies unit/period effects from untreated observations, and Butts’ spillover estimator uses untreated-and-unexposed observations before second-stage spillover regression. (arxiv.org)
  • Severity P3 | Impact: Stage-2 SEs still omit Gardner’s first-stage GMM correction, so reported SEs/CIs/p-values can be too optimistic. This is explicitly documented and TODO-tracked, so it is mitigated under the rubric. Concrete fix: none required in this PR; keep the tracked follow-up as-is (docs/methodology/REGISTRY.md:L2985-L3012, docs/api/spillover.rst:L171-L188, TODO.md:L135-L142).

Code Quality

  • No findings.

Performance

  • Severity P3 | Impact: _compute_nearest_treated_distance_staggered() still always uses dense per-cohort distance matrices, so large staggered panels will pay avoidable memory/runtime cost. This is already tracked. Concrete fix: none required in this PR; follow the existing sparse-helper TODO (diff_diff/spillover.py:L365-L369, TODO.md:L142-L142).

Maintainability

  • No findings.

Tech Debt

  • No new blocker beyond the spillover follow-ups already tracked in TODO.md:L135-L142.

Security

  • No findings.

Documentation/Tests

  • Severity P1 | Impact: [Newly identified] anticipation shifts both the treatment clock and the ring-exposure clock (diff_diff/spillover.py:L1545-L1626, docs/methodology/REGISTRY.md:L2972-L2974), which changes Omega_0 membership and therefore the estimand. The added tests never assert that behavior on either treatment= or first_treat= paths; they only check serialization and invalid-value rejection (tests/test_spillover.py:L1837-L1855, tests/test_spillover.py:L2221-L2235). A regression here could silently move rows in or out of Omega_0 and change tau_total/delta_j without failing CI. Concrete fix: add behavior-level anticipation tests on both fit entry paths that assert the shifted D_it/S_it/Omega_0 membership on a hand-built panel.
  • Severity P2 | Impact: The new staggered spillover path is only numerically anchored on tau_total; there is no staggered recovery test for any delta_j. Current coverage at tests/test_spillover.py:L813-L826 would not catch a staggered-only ring-assignment or spillover-coefficient regression that leaves tau_total roughly unchanged. Concrete fix: add at least one staggered DGP test that asserts delta_1 recovery, ideally with a slower higher-seed variant analogous to tests/test_spillover.py:L765-L811.

Path to Approval

  1. Add a non-staggered anticipation=1 regression test on the treatment= path that explicitly checks the anticipation window changes D_it and Omega_0 membership as documented.
  2. Add a staggered anticipation=1 regression test on the first_treat= path that checks the ring-exposure clock shifts with the treatment clock and preserves the intended far-away controls.
  3. Add a staggered spillover-coefficient accuracy anchor for at least delta_1, so the staggered ring path is directly regression-tested, not only tau_total.

…overy

Two test-coverage gaps surfaced on the rebased SHA:

1. `anticipation` was previously only round-tripped on results and
   rejected for invalid values, with no behavior-level test asserting
   that it actually shifts D_it / Omega_0 membership / the ring-exposure
   clock. A regression there could silently move rows in/out of stage 1
   and change tau_total / delta_j without failing CI.
2. Staggered MC anchored only `tau_total`, not `delta_1`. A staggered-
   only ring-assignment or spillover-coefficient regression with roughly-
   unchanged tau_total wouldn't be caught.

Fixes:

- tests/test_spillover.py: new `TestSpilloverDiDAnticipationBehavior`
  class (3 tests) with a hand-built 4-period panel (1 treated @ t=2,
  1 near-control, 2 far-controls). Asserts on both `treatment=` and
  `first_treat=` paths that anticipation=1 (vs 0) yields n_treated+1,
  smaller stage1_n_obs, and a different att. Plus a parity check that
  the two fit paths produce identical results under the same anticipation
  setting (entry points are internally unified).
- tests/test_spillover.py:
  `TestSpilloverDiDIdentification::test_staggered_recovers_tau_total`
  renamed to `..._tau_total_and_delta_1` and extended to also assert
  staggered delta_1 recovery within 0.03 absolute tolerance (mean over
  30 seeds: -0.0398, std 0.0083). Per-event-time `delta_jk`
  decomposition on staggered DGPs is still queued alongside
  event_study=True support.
- docs/methodology/REGISTRY.md, CHANGELOG.md: narrow the staggered MC
  anchor language to match (tau_total + delta_1, not tau_total only);
  bump test count 153 -> 156.

All 156 tests pass; black + ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 1a71eaf59506e2551d4ae19b7dc4ac557a1eb942


Overall Assessment

Looks good — the prior P1 on anticipation interaction coverage is resolved, the prior staggered-delta_1 test gap is resolved, and I did not find any new unmitigated P0/P1 issues in the changed spillover estimator.

Executive Summary

  • Previous P1 resolved: anticipation now shifts both the treatment clock and the exposure clock in diff_diff/spillover.py lines 1545-1657, and that behavior is tested on both treatment= and first_treat= paths in tests/test_spillover.py lines 1907-2001.
  • Previous P2 resolved: staggered Monte Carlo coverage now anchors both tau_total and delta_1 in tests/test_spillover.py lines 813-839.
  • I did not find an undocumented methodology mismatch in the estimator shape itself. The ring-based spillover setup, staggered extension, and Gardner-style untreated first stage are coherent with the cited papers’ framing. (arxiv.org)
  • Severity P3 | mitigated: stage-2 SEs still omit the Gardner first-stage GMM correction, but that limitation is explicitly documented in docs/methodology/REGISTRY.md lines 2985-2987 and tracked in TODO.md lines 135-142.
  • Severity P3: the Conley test remains only a smoke test; tests/test_spillover.py lines 869-894 never assert a Conley-specific numeric invariant.
  • Review limitation: I could not run pytest or import the package here because this environment is missing pytest and core runtime deps like pandas.

Methodology

  • No unmitigated P0/P1 findings. The implemented estimator follows the registry’s committed contract: time-varying ring regressors, a spillover-clean Omega_0, and Gardner-style first-stage residualization on untreated observations; that is consistent with Butts’s spillover-aware DiD framing and Gardner’s two-stage setup. diff_diff/spillover.py lines 17-25, diff_diff/spillover.py lines 1651-1657, docs/methodology/REGISTRY.md lines 2952-2974. (arxiv.org)
  • Severity P3 | Impact: stage-2 inference is still optimistic relative to the full Butts+Gardner variance composition because the first-stage GMM correction is omitted. This is mitigated by explicit registry/TODO tracking, so it is not a blocker. Concrete fix: none required in this PR; keep the existing follow-up to extend two_stage.py::_compute_gmm_variance. docs/methodology/REGISTRY.md, TODO.md

Code Quality

  • No findings.

Performance

  • Severity P3 | Impact: _compute_nearest_treated_distance_staggered() still uses dense per-cohort pairwise distance matrices, so large staggered panels will incur avoidable memory/runtime cost. This is already documented and tracked, so it is mitigated. Concrete fix: none required in this PR; implement the queued sparse cKDTree branch. diff_diff/spillover.py lines 333-369, TODO.md

Maintainability

  • No findings.

Tech Debt

  • No new untracked tech debt findings. The remaining spillover follow-ups are properly registered in TODO.md lines 135-142.

Security

  • No findings.

Documentation/Tests

  • Severity P3 | Impact: test_conley_se_differs_from_hc1 does not actually verify a Conley-specific result; it only checks finite output and ATT invariance. A regression that silently fell back to HC1 could still pass. Concrete fix: compare result_conley.se/vcov against a direct solve_ols(..., vcov_type="conley") reference on a fixed fixture, or use a spatially correlated fixture where HC1 and Conley must differ. tests/test_spillover.py lines 869-894
  • No remaining P1-level test gaps from the prior review: anticipation behavior is now covered on both fit paths, and staggered delta_1 recovery is now pinned. tests/test_spillover.py lines 813-839, tests/test_spillover.py lines 1907-2001

Round-8 CI review flagged `test_conley_se_differs_from_hc1` as a test-
coverage gap: the name promises a Conley-specific assertion but the body
only checked finite SE + ATT invariance. A silent fallback to HC1 (e.g.,
if SpilloverDiD ever stopped threading the Conley kwargs through to
`solve_ols`) would have passed.

Replace with:

- `test_conley_kwargs_threaded_to_solve_ols`: patches `solve_ols` at the
  import site, captures the kwargs of the stage-2 invocation, and
  asserts they include `vcov_type="conley"`, `conley_cutoff_km=200.0`,
  `conley_metric="haversine"`, `conley_lag_cutoff=0`, plus
  fit-time-derived `conley_coords` / `conley_time` / `conley_unit`
  arrays of the right shape. Any silent fallback to HC1 fails this.
- `test_conley_att_invariant_vs_hc1`: extracted from the old test —
  vcov choice does not change ATT (residualization + OLS fit are
  independent of variance). Now stands as a clean invariant rather than
  pretending to verify Conley-specific output.

Also bumps CHANGELOG test count 156 -> 157 and updates the Conley
description to "plumbing (verifies solve_ols is called with vcov_type=
'conley' + Conley kwargs, no silent HC1 fallback)".

All 157 tests pass; black + ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 2b86d8c051ecec73c43ae601e601fc213bec3b93


Overall Assessment

Looks good — the prior re-review gaps are addressed, and I did not find any new unmitigated P0/P1 issues in the changed spillover estimator.

Executive Summary

  • The prior anticipation-propagation issue is resolved: anticipation now shifts both the treatment clock and the exposure clock through effective_onsets in diff_diff/spillover.py:L1545-L1657, and both treatment= and first_treat= entry paths are covered in tests/test_spillover.py:L1923-L2045.
  • The prior staggered-coverage gap is resolved: the new staggered Monte Carlo test now anchors both tau_total and delta_1 in tests/test_spillover.py:L813-L839.
  • The prior Conley test gap is resolved: the test now verifies that stage 2 actually calls solve_ols(..., vcov_type="conley", ...) with the Conley kwargs, preventing silent HC1 fallback (tests/test_spillover.py:L869-L925).
  • I did not find an undocumented methodology mismatch in the committed estimator contract. The implemented time-varying ring regressor, stricter Omega_0, period-strict/unit-warn-drop identification policy, and connectivity check are consistent with the contract documented in docs/methodology/REGISTRY.md:L2952-L2987 and implemented in diff_diff/spillover.py:L1545-L1739.
  • Severity P3 | informational: stage-2 SEs still omit the Gardner first-stage GMM correction, but that limitation is explicitly documented in docs/methodology/REGISTRY.md:L2985-L2987 and tracked in TODO.md:L135-L142, so it is mitigated and not blocking.

Methodology

  • Severity P3 | Impact: stage-2 inference remains optimistic relative to the full Butts+Gardner variance composition because the stage-1 GMM uncertainty correction is not yet applied. This is a documented limitation, not a defect, because it is explicitly called out in diff_diff/spillover.py:L1419-L1423, docs/methodology/REGISTRY.md:L2985-L2987, and TODO.md:L135-L142. Concrete fix: none required in this PR; keep the planned follow-up to extend two_stage.py::_compute_gmm_variance.
  • No unmitigated P0/P1 findings.

Code Quality

  • No findings.

Performance

  • Severity P3 | Impact: _compute_nearest_treated_distance_staggered() still uses dense per-cohort pairwise distance matrices, so large staggered panels will pay avoidable memory/runtime cost. This is already tracked in TODO.md:L142-L142, so it is mitigated. Concrete fix: add the queued sparse cKDTree branch for diff_diff/spillover.py:L333-L429.

Maintainability

  • No findings.

Tech Debt

  • No new untracked tech debt findings. The spillover follow-ups added in TODO.md:L135-L142 appropriately track the deferred variance, event-study, survey, ring-count, and sparse-distance work.

Security

  • No findings.

Documentation/Tests

  • No findings. The re-review gaps from the prior pass are now covered by explicit regression tests for staggered delta_1, anticipation on both fit paths, Conley plumbing, non-staggered FE identity, and the warn/drop vs hard-error identification split in tests/test_spillover.py:L813-L925, tests/test_spillover.py:L1732-L2045, and tests/test_spillover.py:L2712-L2761.

Review limitation: I could not execute the test suite in this environment because Python runtime dependencies such as numpy are unavailable, so the assessment is based on diff inspection and the added regression coverage.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 16, 2026
@igerber igerber merged commit 71f84d0 into main May 16, 2026
33 of 34 checks passed
@igerber igerber deleted the spillover-conley-butts-wave-b branch May 16, 2026 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant