From 79e096249ead30da04b191f609a620dad071494c Mon Sep 17 00:00:00 2001 From: igerber Date: Sun, 17 May 2026 21:55:01 -0400 Subject: [PATCH 1/4] linalg: add cluster-aware CR2 Bell-McCaffrey contrast DOF; wire MPD avg_att inference MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes Gate 6 of the six HC2/HC2-BM NotImplementedError gates: MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") at estimators.py:1657 previously raised NotImplementedError because _compute_cr2_bm returns per-coefficient Satterthwaite DOF only — the post-period-average ATT (`avg_att = (1/n_post) Sum_{t >= t_treat} beta_t`) is a compound contrast that needed a cluster-aware contrast DOF helper. New _compute_cr2_bm_contrast_dof in diff_diff/linalg.py generalizes the per-coefficient loop in _compute_cr2_bm to arbitrary (k, m) contrast matrices using the identical Pustejovsky-Tipton 2018 Section 4 algebra (`q = X bread_inv c`, `omega_g = A_g X_g bread_inv c`, `DOF = trace(B)^2 / trace(B^2)`). _compute_cr2_bm is refactored to call the new helper via a private _cr2_bm_dof_inner with `contrasts=eye(k)`; refactor regression at atol=1e-10 confirms the per-coefficient DOFs are preserved (matmul ordering differs slightly from the prior inline loop). MultiPeriodDiD.fit() extends its existing avg_att DOF block (introduced in PR #459) to branch on effective_cluster_ids: one-way _compute_bm_dof_from_contrasts when None, cluster-aware _compute_cr2_bm_contrast_dof otherwise. Cluster IDs are per-observation length n and are NOT subscripted by the rank-deficient column-drop mask `_kept` (which indexes coefficients, not observations). R parity verified at atol=1e-10 against clubSandwich's Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom on a new mpd_clustered_avg_att_dof fixture in benchmarks/data/clubsandwich_cr2_golden.json. On a 1-row constraint matrix, HTZ reduces to a Satterthwaite t-test and its df_denom IS the BM Satterthwaite DOF. The pre-flight smoke test against this same R target passed at atol=1e-13 before any source edits. Tests: - TestCR2BMContrastDOF (4 new tests): refactor regression vs library, R-parity for compound contrast, shape validation, cluster-count validation. - test_multi_period_cluster_plus_hc2_bm_rejected flipped to test_multi_period_cluster_plus_hc2_bm_produces_finite_inference (end-to-end MPD wire-through with finite avg_att / period_effects inference assertions). After this PR, 3 of 6 HC2/HC2-BM gates are lifted (DiD-absorb #458, MPD-absorb #459, MPD-cluster-contrast-DOF this PR). Remaining: TWFE absorb (Gate 1), weighted HC2-BM (Gates 4-5). Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 1 + benchmarks/R/generate_clubsandwich_golden.R | 60 ++++++++ benchmarks/data/clubsandwich_cr2_golden.json | 20 ++- diff_diff/estimators.py | 54 ++++--- diff_diff/linalg.py | 146 +++++++++++++++++-- docs/methodology/REGISTRY.md | 2 +- tests/test_estimators_vcov_type.py | 34 +++-- tests/test_linalg_hc2_bm.py | 127 ++++++++++++++++ 8 files changed, 389 insertions(+), 55 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 0ec74104..278ce96f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] ### Added +- **`MultiPeriodDiD(cluster=..., vcov_type="hc2_bm")` now supported** (`diff_diff/estimators.py:1657`). Pre-PR the combination raised `NotImplementedError` because the cluster-aware CR2 Bell-McCaffrey Satterthwaite DOF for the post-period-average ATT (`avg_att = (1/n_post) Σ_{t ≥ t_treat} β_t`) was not implemented — only the per-coefficient case existed in `_compute_cr2_bm`. New `_compute_cr2_bm_contrast_dof` helper in `diff_diff/linalg.py` generalizes the per-coefficient loop to arbitrary `(k, m)` contrast matrices using the identical Pustejovsky-Tipton 2018 Section 4 algebra; `_compute_cr2_bm` is refactored to call it with `contrasts=eye(k)` so the existing per-coefficient parity to clubSandwich's `coef_test$df_Satt` is preserved (refactor regression at atol=1e-10). `MultiPeriodDiD.fit()` extends its existing avg_att DOF block to branch on `effective_cluster_ids`: one-way `_compute_bm_dof_from_contrasts` when None, cluster-aware `_compute_cr2_bm_contrast_dof` otherwise. Cluster IDs are per-observation length `n` and are NOT subscripted by the rank-deficient column-drop mask. R parity verified at atol=1e-10 against clubSandwich's `Wald_test(constraints=matrix(c, 1), test="HTZ")$df_denom` on the new `mpd_clustered_avg_att_dof` fixture in `benchmarks/data/clubsandwich_cr2_golden.json` (Wald_test's HTZ on a 1-row constraint matrix yields the Satterthwaite t-test DOF). Per-coefficient `period_effects[t].p_value` / `conf_int` and `avg_att` `avg_p_value` / `avg_conf_int` now reflect the correct Satterthwaite DOF rather than the n-k fallback under cluster+hc2_bm. Weighted CR2-BM (`survey_design=` paths) remains a separate gate. New tests: `tests/test_linalg_hc2_bm.py::TestCR2BMContrastDOF` (4 tests: refactor regression, R-parity, shape validation, cluster-count validation); existing `test_multi_period_cluster_plus_hc2_bm_rejected` flipped to behavioral `test_multi_period_cluster_plus_hc2_bm_produces_finite_inference`. - **`MultiPeriodDiD(absorb=..., vcov_type in {"hc2", "hc2_bm"})` now supported** (`diff_diff/estimators.py:1476`). Mirrors the DiD-absorb auto-route shipped earlier in this release: when `absorb=` is paired with `vcov_type in {"hc2","hc2_bm"}`, `MultiPeriodDiD.fit()` promotes the absorb columns to `fixed_effects=` internally so the existing full-dummy-design code path computes the algebraically correct vcov on the event-study design (`treated + period_X dummies + treated:period_X interactions + factor(unit)`). Verified at ~1e-10 vs `lm() + sandwich::vcovHC(type="HC2")` and `lm() + clubSandwich::vcovCR(cluster=1:n, type="CR2")` on a 5-cohort × 5-period event-study fixture (new `tests/test_estimators_vcov_type.py::TestMPDAbsorbedFERParity` against `benchmarks/data/clubsandwich_cr2_golden.json` scenario `mpd_absorbed_fe_did`). HC1/CR1 paths on `absorb=` are unchanged (no leverage term). `TwoWayFixedEffects(vcov_type in {"hc2","hc2_bm"})` rejection remains as a follow-up (different fit-path structure — no `fixed_effects=` equivalent inside TWFE). **Behavioral note (full `MultiPeriodDiDResults` surface change under auto-route):** under the auto-route, the entire returned `MultiPeriodDiDResults` reflects the full-dummy fit rather than the within-transformed fit — `result.coefficients`, `result.vcov`, `result.residuals`, `result.fitted_values`, `result.r_squared` all include the FE-dummy entries / un-demeaned values. `result.period_effects[t].effect` / `.se` / `.p_value` / `.conf_int` and `result.avg_att` / `.avg_se` are invariant to this routing (FWL guarantee). MPD requires a time-invariant ever-treated indicator that lies in the span of the intercept and the post-auto-route unit FE dummies (the exact alias depends on the omitted FE reference category under `pd.get_dummies(drop_first=True)`, not just on "the sum of treated-cohort unit dummies"), so `solve_ols` drops one column from that collinear set under R-style rank-deficiency handling. Which specific column is dropped is pivot-order and dummy-coding dependent (in the shipped parity fixture it is a never-treated unit dummy, not the `treated` main effect itself). The per-period interaction coefficients (`treated:period_X`) and `avg_att` are identified and invariant to that choice; parity tests target those rather than the `treated` main effect. **Survey-design scope (replicate weights):** when `survey_design=` uses replicate weights, the auto-route short-circuits the absorb-refit branch at `estimators.py:1693` and routes through the standard `compute_replicate_vcov` path on the fixed full-dummy design — correct because the design does not depend on replicate weights so no per-replicate refit is needed. **Redundant time-FE skip:** when the routed (or directly-supplied) `fixed_effects` list contains the `time` column, MPD silently skips emitting `