Skip to content

Lift Gate 6: cluster-aware CR2 Bell-McCaffrey contrast DOF for MultiPeriodDiD avg_att#465

Merged
igerber merged 4 commits into
mainfrom
mpd-cluster-hc2-bm-contrast-dof
May 18, 2026
Merged

Lift Gate 6: cluster-aware CR2 Bell-McCaffrey contrast DOF for MultiPeriodDiD avg_att#465
igerber merged 4 commits into
mainfrom
mpd-cluster-hc2-bm-contrast-dof

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 18, 2026

Summary

  • Lift MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") NotImplementedError gate at estimators.py:1657. The post-period-average ATT (avg_att = (1/n_post) Σ_{t ≥ t_treat} β_t) is a compound contrast; pre-PR the cluster-aware CR2 Bell-McCaffrey Satterthwaite DOF was only implemented for per-coefficient contrasts.
  • New _compute_cr2_bm_contrast_dof helper in diff_diff/linalg.py generalizes the per-coefficient loop in _compute_cr2_bm to arbitrary (k, m) contrast matrices via the identical Pustejovsky-Tipton 2018 Section 4 algebra. _compute_cr2_bm is refactored to call the new helper with contrasts=eye(k) (per-coefficient case bit-equivalent at atol=1e-10).
  • MultiPeriodDiD.fit() extends its existing avg_att DOF block (PR Lift MultiPeriodDiD-absorb HC2/HC2-BM gate via auto-route #459) to branch on effective_cluster_ids: one-way _compute_bm_dof_from_contrasts when None, cluster-aware _compute_cr2_bm_contrast_dof otherwise. Cluster IDs are passed unmodified (per-observation, not subscripted by the rank-deficient column-drop mask).
  • After this PR, 3 of 6 HC2/HC2-BM gates lifted (DiD-absorb DiD-absorb HC2/HC2-BM: auto-route to fixed_effects internally #458, MPD-absorb Lift MultiPeriodDiD-absorb HC2/HC2-BM gate via auto-route #459, MPD-cluster-contrast-DOF this PR). Remaining: TWFE absorb (Gate 1), weighted CR2-BM (Gates 4-5).

Methodology references

  • Method name(s): CR2 Bell-McCaffrey cluster-robust variance + Satterthwaite DOF for compound contrasts
  • Paper / source link(s): Pustejovsky & Tipton (2018) "Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models" §4 / Appendix A. clubSandwich R package Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom is the R parity target (on a 1-row constraint matrix, HTZ reduces to a Satterthwaite t-test). Bell & McCaffrey (2002), Imbens & Kolesar (2016) for the underlying BM Satterthwaite framework. See docs/methodology/REGISTRY.md § HC2 + Bell-McCaffrey scope-limitation block (MPD cluster+hc2_bm status flipped from REJECT → SUPPORTED).
  • Any intentional deviations from the source: None. The contrast-DOF algebra is identical to clubSandwich's; the new helper matches Wald_test(test="HTZ")$df_denom at atol=1e-10 on the new mpd_clustered_avg_att_dof fixture (smoke test passed at atol=1e-13 before any source edits).

Validation

  • Tests added/updated:
    • tests/test_linalg_hc2_bm.py::TestCR2BMContrastDOF — 4 new tests: refactor regression (helper with eye(k) matches _compute_cr2_bm at atol=1e-10), R-parity for compound contrast vs clubSandwich (atol=1e-10), ndim+shape validation, cluster-count validation.
    • tests/test_estimators_vcov_type.py::TestFitBehavior::test_multi_period_cluster_plus_hc2_bm_produces_finite_inference — existing rejection test flipped to behavioral; asserts finite avg_att + period_effects inference under the lifted gate.
    • tests/test_estimators_vcov_type.py::TestFitBehavior::test_multi_period_cluster_hc2_bm_avg_att_uses_clubsandwich_dof — NEW end-to-end estimator-level parity test: fits MPD on the R mpd_clustered_avg_att_dof golden fixture, recovers the implied Satterthwaite DOF from avg_p_value, asserts it matches the R Wald_test target at atol=1e-6. Derives post_periods from the golden's post_interaction_names so the Python and R contrasts are bit-equivalent (local R2 found that MPD's default [3,4] post-period rule diverged from the R fixture's [2,3,4]).
  • New R golden scenario mpd_clustered_avg_att_dof in benchmarks/R/generate_clubsandwich_golden.R + regenerated benchmarks/data/clubsandwich_cr2_golden.json. 15-unit × 4-period staggered panel with cluster=unit.
  • Local Codex review: R1 (3 P3) → R2 (1 P3) → R3 (✅ zero findings). All 347 tests in linalg/estimators/methodology suites pass; lint clean.
  • Backtest / simulation / notebook evidence: N/A (analytical-sandwich methodology; no tutorials touched).

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

igerber and others added 3 commits May 17, 2026 21:55
…vg_att inference

Closes Gate 6 of the six HC2/HC2-BM NotImplementedError gates:
MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") at estimators.py:1657
previously raised NotImplementedError because _compute_cr2_bm returns
per-coefficient Satterthwaite DOF only — the post-period-average ATT
(`avg_att = (1/n_post) Sum_{t >= t_treat} beta_t`) is a compound
contrast that needed a cluster-aware contrast DOF helper.

New _compute_cr2_bm_contrast_dof in diff_diff/linalg.py generalizes the
per-coefficient loop in _compute_cr2_bm to arbitrary (k, m) contrast
matrices using the identical Pustejovsky-Tipton 2018 Section 4 algebra
(`q = X bread_inv c`, `omega_g = A_g X_g bread_inv c`,
`DOF = trace(B)^2 / trace(B^2)`). _compute_cr2_bm is refactored to
call the new helper via a private _cr2_bm_dof_inner with
`contrasts=eye(k)`; refactor regression at atol=1e-10 confirms the
per-coefficient DOFs are preserved (matmul ordering differs slightly
from the prior inline loop).

MultiPeriodDiD.fit() extends its existing avg_att DOF block (introduced
in PR #459) to branch on effective_cluster_ids: one-way
_compute_bm_dof_from_contrasts when None, cluster-aware
_compute_cr2_bm_contrast_dof otherwise. Cluster IDs are per-observation
length n and are NOT subscripted by the rank-deficient column-drop
mask `_kept` (which indexes coefficients, not observations).

R parity verified at atol=1e-10 against clubSandwich's
Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom on a new
mpd_clustered_avg_att_dof fixture in
benchmarks/data/clubsandwich_cr2_golden.json. On a 1-row constraint
matrix, HTZ reduces to a Satterthwaite t-test and its df_denom IS the
BM Satterthwaite DOF. The pre-flight smoke test against this same R
target passed at atol=1e-13 before any source edits.

Tests:
- TestCR2BMContrastDOF (4 new tests): refactor regression vs library,
  R-parity for compound contrast, shape validation, cluster-count
  validation.
- test_multi_period_cluster_plus_hc2_bm_rejected flipped to
  test_multi_period_cluster_plus_hc2_bm_produces_finite_inference
  (end-to-end MPD wire-through with finite avg_att / period_effects
  inference assertions).

After this PR, 3 of 6 HC2/HC2-BM gates are lifted (DiD-absorb #458,
MPD-absorb #459, MPD-cluster-contrast-DOF this PR). Remaining: TWFE
absorb (Gate 1), weighted HC2-BM (Gates 4-5).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local Codex review on commit 79e0962 returned ✅ with 3 P3s (all
documentation/coverage, no actionable P0/P1). Per the test-coverage P3
upgrade rule (feedback_test_coverage_gap_treat_as_actionable.md),
addressing all three:

P3 #1 (Code Quality): `_compute_cr2_bm_contrast_dof` was missing the
`ndim` validation that the parallel one-way `_compute_bm_dof_from_contrasts`
helper has, so a stray `(k,)` 1-D vector would die with a low-level
indexing error instead of a contract error. Added the same shape-tuple
check pattern (`if contrasts.ndim != 2 or contrasts.shape[0] != k`).

P3 #2 (Docs): two stale doc surfaces post-feature-lift —
  - `estimators.py:68-71` base estimator docstring still said MPD did
    NOT support cluster + hc2_bm. Rewrote to describe the new
    cluster-aware contrast-DOF support and flag survey CR2-BM as the
    remaining gate.
  - `tests/test_linalg_hc2_bm.py` module banner still said clustered
    CR2 BM was "deferred to a follow-up". Updated to describe both the
    per-coefficient and the new compound-contrast DOF surfaces, and
    narrow the deferral to the weighted CR2-BM case only.

P3 #3 (Tests): the new MPD test only asserted finite output, so a
regression that silently fell back to the shared n-k DOF would still
pass. Added `test_multi_period_cluster_hc2_bm_avg_att_uses_clubsandwich_dof`
which fits MPD on the new R `mpd_clustered_avg_att_dof` fixture and
recovers the implied Satterthwaite DOF by inverting
`avg_p_value = 2 * (1 - t.cdf(|avg_t_stat|, df))` via scipy.brentq. The
recovered DOF must match the R `Wald_test(test="HTZ")$df_denom` at
atol=1e-6. Also pins that the implied DOF is much smaller than the
n-k fallback (~39 here) — catches a regression to the shared df path.

All 254 tests in tests/test_linalg_hc2_bm.py + test_estimators_vcov_type.py
+ test_estimators.py pass; lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local Codex R2 returned ✅ with 1 substantive P3: the new MPD parity
test was fitting MPD without an explicit `post_periods=` argument, so
MPD's default "last half of periods" rule selected `[3, 4]` on the
4-period fixture while the R generator defines the parity contrast
over `[2, 3, 4]` (per `post_interaction_names` in the JSON). The
Satterthwaite DOFs happened to coincide here (~8.1) which masked the
estimand mismatch — the test would have stayed green if MPD silently
fit the wrong contrast.

Fix: derive `post_periods` from the golden JSON's
`post_interaction_names` field and pass it explicitly to MPD.fit().
The test now asserts that MPD computes `avg_att` over the exact same
contrast vector R uses for the Wald_test DOF target.

This makes the test a genuine estimand-level parity pin rather than
just a DOF-magnitude smell check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

Looks good — I did not find any unmitigated P0/P1 issues in the new clustered MultiPeriodDiD(..., vcov_type="hc2_bm") path.

Executive Summary

  • The new _compute_cr2_bm_contrast_dof() helper is methodologically aligned with the existing CR2 Bell-McCaffrey machinery and is wired into avg_att inference without introducing inline inference anti-patterns.
  • The rank-deficient-column handling looks consistent with the existing solve_ols() contract: DOF is computed on kept columns only, and avg_att still falls back to all-NaN inference when any post-period effect is unidentified.
  • The new tests cover the main methodology surface reasonably well: helper parity/refactor safety, clubSandwich parity for the compound contrast, shape validation, cluster-count validation, and estimator-level implied-DOF recovery.
  • The main issue is documentation consistency: the primary MultiPeriodDiD registry entry and the class docstring still say clustered hc2_bm is unsupported, while this PR implements and later documents it as supported.
  • Static review only: I could not execute the tests here because this environment lacks both pytest and numpy.

Methodology

  • No findings. On the changed code path, I do not see an undocumented deviation from the Methodology Registry or an incorrect variance/SE implementation. The new helper applies the same PT2018-style CR2 adjustment structure already used in _compute_cr2_bm, and the estimator continues to route final inference through safe_inference().

Code Quality

  • No findings. The new code keeps the inference contract centralized and avoids the project’s forbidden inline t = effect / se pattern.

Performance

  • P3 diff_diff/estimators.py:L1847-L1901, diff_diff/linalg.py:L1506-L1600, diff_diff/linalg.py:L1668-L1736: MultiPeriodDiD.fit() now recomputes the full CR2 precomputations (H, M, cluster blocks, A_g) after solve_ols() already built the same objects for the vcov path. Impact: clustered hc2_bm MPD fits now pay the expensive O(n²) CR2 setup twice. Concrete fix: plumb compound-contrast DOF through the existing CR2 vcov path, or cache/reuse the shared CR2 precomputes when both vcov and contrast DOF are needed.

Maintainability

  • No findings beyond the duplicated CR2 setup noted above.

Tech Debt

  • No findings. The remaining weighted Bell-McCaffrey limitations are already explicitly tracked in TODO.md:L103-L105, so I would not block on them here.

Security

  • No findings.

Documentation/Tests

  • P2 docs/methodology/REGISTRY.md:L167-L175, diff_diff/estimators.py:L1109-L1131, contrasted with docs/methodology/REGISTRY.md:L2557-L2557: the Methodology Registry’s main MultiPeriodDiD section and the MultiPeriodDiD class docstring still state that cluster + vcov_type="hc2_bm" is unsupported, but this PR lifts that gate and later documents it as supported. Impact: the project’s methodology source of truth is internally inconsistent, and users reading the class docstring or top-level registry entry will be told the new capability does not exist. Concrete fix: update the older “unsupported” text to match the new supported-path note, and reserve the remaining gate language for weighted/survey CR2-BM limitations only.

CI Codex review on PR #465 (commit 41a323e) returned ✅ with two
findings:

P2 (Documentation): Two stale doc surfaces post-feature-lift still said
MPD does NOT support cluster + hc2_bm. The HC2/BM scope-limitation Note
at REGISTRY.md:2557 was already updated in 79e0962, but:
- REGISTRY.md:167-176 (main MPD section under HeterogeneousAdoptionDiD
  requirements-checklist) still had the old "not supported" Note.
- estimators.py MPD class docstring's `cluster` and `vcov_type` blocks
  still said `cluster + hc2_bm` raises NotImplementedError.

Both rewritten to describe the new supported path with a pointer to
_compute_cr2_bm_contrast_dof and the clubSandwich Wald_test(HTZ)
parity target. Weighted CR2-BM remains the only documented gate.

P3 (Performance): _compute_cr2_bm_contrast_dof recomputes H, M, and
per-cluster A_g matrices that solve_ols → _compute_cr2_bm already
built for the vcov path. O(n²k) redundant work per clustered hc2_bm
MPD fit; acceptable for typical cluster-robust DiD panel sizes
(n ≤ few thousand). Tracked as a new Performance row in TODO.md;
acknowledged with a Note in REGISTRY.md per the codex deferral rules
(`**Note:**` label + TODO entry downgrades to P3-deferred).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 59971295ca0b6c794c0d746b48932980e70fafbd


Overall Assessment

Looks good — I did not find any unmitigated P0/P1 issues in the changed clustered MultiPeriodDiD(..., vcov_type="hc2_bm") path.

Executive Summary

  • The prior review’s documentation mismatch is resolved: the main registry entry and the estimator docstrings now consistently mark clustered hc2_bm as supported (docs/methodology/REGISTRY.md:L167-L185, docs/methodology/REGISTRY.md:L2565-L2567, diff_diff/estimators.py:L55-L74, diff_diff/estimators.py:L1109-L1131).
  • Methodology looks aligned: _compute_cr2_bm() now delegates the per-coefficient case to the new contrast-aware helper, and MultiPeriodDiD.fit() uses that helper for the compound avg_att contrast (diff_diff/linalg.py:L1545-L1736, diff_diff/estimators.py:L1847-L1908).
  • The changed inference path still routes through safe_inference(), and I did not find any new inline t = effect / se or partial-NaN inference anti-patterns in the touched files (diff_diff/estimators.py:L1920-L1974).
  • The remaining duplicated CR2 precomputes are now explicitly documented and tracked, so that issue is mitigated P3 tech debt rather than a blocker (TODO.md:L143-L153, docs/methodology/REGISTRY.md:L179-L185).
  • Static review only: this workspace does not have numpy or pytest, so I could not execute the updated tests.

Methodology

  • No findings. The changed path is consistent with the Methodology Registry’s stated CR2 Bell-McCaffrey contract, and the new helper is wired only into the contrast-DOF portion of avg_att inference (diff_diff/linalg.py:L1668-L1736, diff_diff/estimators.py:L1847-L1908).

Code Quality

  • No findings. The PR keeps inference centralized through safe_inference() and avoids the project’s forbidden inline inference pattern (diff_diff/estimators.py:L1920-L1974).

Performance

  • Severity: P3 (tracked in TODO.md, mitigated).
  • Impact: clustered hc2_bm MPD fits still rebuild the CR2 hat/residual-maker/adjustment matrices once in solve_ols() and again in the new contrast-DOF path, adding redundant O(n^2 k) work without changing results (diff_diff/estimators.py:L1862-L1900, diff_diff/linalg.py:L1545-L1600, diff_diff/linalg.py:L1668-L1736, TODO.md:L151-L153).
  • Concrete fix: reuse the existing CR2 precomputes or plumb compound-contrast DOF through the current CR2 vcov path.

Maintainability

  • No findings beyond the tracked performance duplication above.

Tech Debt

  • No new findings. The duplicate-precompute debt is now explicitly tracked in TODO.md:L151-L153, so it is mitigated under the review policy.

Security

  • No findings.

Documentation/Tests

  • No findings. The prior review’s P2 documentation inconsistency is resolved in both the registry and estimator docstrings (docs/methodology/REGISTRY.md:L167-L185, docs/methodology/REGISTRY.md:L2565-L2567, diff_diff/estimators.py:L55-L74, diff_diff/estimators.py:L1109-L1131).
  • No findings. The new tests cover helper parity, clubSandwich compound-contrast parity, validation failures, and estimator-level implied-DOF recovery (tests/test_linalg_hc2_bm.py:L689-L752, tests/test_estimators_vcov_type.py:L510-L628).
  • Static review only: I could not run the suite here because numpy and pytest are unavailable in this workspace.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 18, 2026
@igerber igerber merged commit 956445e into main May 18, 2026
33 of 34 checks passed
@igerber igerber deleted the mpd-cluster-hc2-bm-contrast-dof branch May 18, 2026 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant