Lift Gate 6: cluster-aware CR2 Bell-McCaffrey contrast DOF for MultiPeriodDiD avg_att#465
Merged
Merged
Conversation
…vg_att inference
Closes Gate 6 of the six HC2/HC2-BM NotImplementedError gates:
MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") at estimators.py:1657
previously raised NotImplementedError because _compute_cr2_bm returns
per-coefficient Satterthwaite DOF only — the post-period-average ATT
(`avg_att = (1/n_post) Sum_{t >= t_treat} beta_t`) is a compound
contrast that needed a cluster-aware contrast DOF helper.
New _compute_cr2_bm_contrast_dof in diff_diff/linalg.py generalizes the
per-coefficient loop in _compute_cr2_bm to arbitrary (k, m) contrast
matrices using the identical Pustejovsky-Tipton 2018 Section 4 algebra
(`q = X bread_inv c`, `omega_g = A_g X_g bread_inv c`,
`DOF = trace(B)^2 / trace(B^2)`). _compute_cr2_bm is refactored to
call the new helper via a private _cr2_bm_dof_inner with
`contrasts=eye(k)`; refactor regression at atol=1e-10 confirms the
per-coefficient DOFs are preserved (matmul ordering differs slightly
from the prior inline loop).
MultiPeriodDiD.fit() extends its existing avg_att DOF block (introduced
in PR #459) to branch on effective_cluster_ids: one-way
_compute_bm_dof_from_contrasts when None, cluster-aware
_compute_cr2_bm_contrast_dof otherwise. Cluster IDs are per-observation
length n and are NOT subscripted by the rank-deficient column-drop
mask `_kept` (which indexes coefficients, not observations).
R parity verified at atol=1e-10 against clubSandwich's
Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom on a new
mpd_clustered_avg_att_dof fixture in
benchmarks/data/clubsandwich_cr2_golden.json. On a 1-row constraint
matrix, HTZ reduces to a Satterthwaite t-test and its df_denom IS the
BM Satterthwaite DOF. The pre-flight smoke test against this same R
target passed at atol=1e-13 before any source edits.
Tests:
- TestCR2BMContrastDOF (4 new tests): refactor regression vs library,
R-parity for compound contrast, shape validation, cluster-count
validation.
- test_multi_period_cluster_plus_hc2_bm_rejected flipped to
test_multi_period_cluster_plus_hc2_bm_produces_finite_inference
(end-to-end MPD wire-through with finite avg_att / period_effects
inference assertions).
After this PR, 3 of 6 HC2/HC2-BM gates are lifted (DiD-absorb #458,
MPD-absorb #459, MPD-cluster-contrast-DOF this PR). Remaining: TWFE
absorb (Gate 1), weighted HC2-BM (Gates 4-5).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local Codex review on commit 79e0962 returned ✅ with 3 P3s (all documentation/coverage, no actionable P0/P1). Per the test-coverage P3 upgrade rule (feedback_test_coverage_gap_treat_as_actionable.md), addressing all three: P3 #1 (Code Quality): `_compute_cr2_bm_contrast_dof` was missing the `ndim` validation that the parallel one-way `_compute_bm_dof_from_contrasts` helper has, so a stray `(k,)` 1-D vector would die with a low-level indexing error instead of a contract error. Added the same shape-tuple check pattern (`if contrasts.ndim != 2 or contrasts.shape[0] != k`). P3 #2 (Docs): two stale doc surfaces post-feature-lift — - `estimators.py:68-71` base estimator docstring still said MPD did NOT support cluster + hc2_bm. Rewrote to describe the new cluster-aware contrast-DOF support and flag survey CR2-BM as the remaining gate. - `tests/test_linalg_hc2_bm.py` module banner still said clustered CR2 BM was "deferred to a follow-up". Updated to describe both the per-coefficient and the new compound-contrast DOF surfaces, and narrow the deferral to the weighted CR2-BM case only. P3 #3 (Tests): the new MPD test only asserted finite output, so a regression that silently fell back to the shared n-k DOF would still pass. Added `test_multi_period_cluster_hc2_bm_avg_att_uses_clubsandwich_dof` which fits MPD on the new R `mpd_clustered_avg_att_dof` fixture and recovers the implied Satterthwaite DOF by inverting `avg_p_value = 2 * (1 - t.cdf(|avg_t_stat|, df))` via scipy.brentq. The recovered DOF must match the R `Wald_test(test="HTZ")$df_denom` at atol=1e-6. Also pins that the implied DOF is much smaller than the n-k fallback (~39 here) — catches a regression to the shared df path. All 254 tests in tests/test_linalg_hc2_bm.py + test_estimators_vcov_type.py + test_estimators.py pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local Codex R2 returned ✅ with 1 substantive P3: the new MPD parity test was fitting MPD without an explicit `post_periods=` argument, so MPD's default "last half of periods" rule selected `[3, 4]` on the 4-period fixture while the R generator defines the parity contrast over `[2, 3, 4]` (per `post_interaction_names` in the JSON). The Satterthwaite DOFs happened to coincide here (~8.1) which masked the estimand mismatch — the test would have stayed green if MPD silently fit the wrong contrast. Fix: derive `post_periods` from the golden JSON's `post_interaction_names` field and pass it explicitly to MPD.fit(). The test now asserts that MPD computes `avg_att` over the exact same contrast vector R uses for the Wald_test DOF target. This makes the test a genuine estimand-level parity pin rather than just a DOF-magnitude smell check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good — I did not find any unmitigated P0/P1 issues in the new clustered Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
CI Codex review on PR #465 (commit 41a323e) returned ✅ with two findings: P2 (Documentation): Two stale doc surfaces post-feature-lift still said MPD does NOT support cluster + hc2_bm. The HC2/BM scope-limitation Note at REGISTRY.md:2557 was already updated in 79e0962, but: - REGISTRY.md:167-176 (main MPD section under HeterogeneousAdoptionDiD requirements-checklist) still had the old "not supported" Note. - estimators.py MPD class docstring's `cluster` and `vcov_type` blocks still said `cluster + hc2_bm` raises NotImplementedError. Both rewritten to describe the new supported path with a pointer to _compute_cr2_bm_contrast_dof and the clubSandwich Wald_test(HTZ) parity target. Weighted CR2-BM remains the only documented gate. P3 (Performance): _compute_cr2_bm_contrast_dof recomputes H, M, and per-cluster A_g matrices that solve_ols → _compute_cr2_bm already built for the vcov path. O(n²k) redundant work per clustered hc2_bm MPD fit; acceptable for typical cluster-robust DiD panel sizes (n ≤ few thousand). Tracked as a new Performance row in TODO.md; acknowledged with a Note in REGISTRY.md per the codex deferral rules (`**Note:**` label + TODO entry downgrades to P3-deferred). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — I did not find any unmitigated P0/P1 issues in the changed clustered Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MultiPeriodDiD(cluster=..., vcov_type="hc2_bm")NotImplementedErrorgate atestimators.py:1657. The post-period-average ATT (avg_att = (1/n_post) Σ_{t ≥ t_treat} β_t) is a compound contrast; pre-PR the cluster-aware CR2 Bell-McCaffrey Satterthwaite DOF was only implemented for per-coefficient contrasts._compute_cr2_bm_contrast_dofhelper indiff_diff/linalg.pygeneralizes the per-coefficient loop in_compute_cr2_bmto arbitrary(k, m)contrast matrices via the identical Pustejovsky-Tipton 2018 Section 4 algebra._compute_cr2_bmis refactored to call the new helper withcontrasts=eye(k)(per-coefficient case bit-equivalent at atol=1e-10).MultiPeriodDiD.fit()extends its existing avg_att DOF block (PR Lift MultiPeriodDiD-absorb HC2/HC2-BM gate via auto-route #459) to branch oneffective_cluster_ids: one-way_compute_bm_dof_from_contrastswhen None, cluster-aware_compute_cr2_bm_contrast_dofotherwise. Cluster IDs are passed unmodified (per-observation, not subscripted by the rank-deficient column-drop mask).Methodology references
Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denomis the R parity target (on a 1-row constraint matrix, HTZ reduces to a Satterthwaite t-test). Bell & McCaffrey (2002), Imbens & Kolesar (2016) for the underlying BM Satterthwaite framework. Seedocs/methodology/REGISTRY.md§ HC2 + Bell-McCaffrey scope-limitation block (MPD cluster+hc2_bm status flipped from REJECT → SUPPORTED).Wald_test(test="HTZ")$df_denomat atol=1e-10 on the newmpd_clustered_avg_att_doffixture (smoke test passed at atol=1e-13 before any source edits).Validation
tests/test_linalg_hc2_bm.py::TestCR2BMContrastDOF— 4 new tests: refactor regression (helper witheye(k)matches_compute_cr2_bmat atol=1e-10), R-parity for compound contrast vs clubSandwich (atol=1e-10), ndim+shape validation, cluster-count validation.tests/test_estimators_vcov_type.py::TestFitBehavior::test_multi_period_cluster_plus_hc2_bm_produces_finite_inference— existing rejection test flipped to behavioral; asserts finite avg_att + period_effects inference under the lifted gate.tests/test_estimators_vcov_type.py::TestFitBehavior::test_multi_period_cluster_hc2_bm_avg_att_uses_clubsandwich_dof— NEW end-to-end estimator-level parity test: fits MPD on the Rmpd_clustered_avg_att_dofgolden fixture, recovers the implied Satterthwaite DOF fromavg_p_value, asserts it matches the RWald_testtarget at atol=1e-6. Derivespost_periodsfrom the golden'spost_interaction_namesso the Python and R contrasts are bit-equivalent (local R2 found that MPD's default[3,4]post-period rule diverged from the R fixture's[2,3,4]).mpd_clustered_avg_att_dofinbenchmarks/R/generate_clubsandwich_golden.R+ regeneratedbenchmarks/data/clubsandwich_cr2_golden.json. 15-unit × 4-period staggered panel with cluster=unit.Security / privacy
Generated with Claude Code