igerber · igerber · May 22, 2026 · May 21, 2026 · May 21, 2026 · May 21, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
+- **WooldridgeDiD `vcov_type` parameter, OLS path (Phase 1b PR 3/8).** `WooldridgeDiD(vcov_type=...)` now accepts `{"classical","hc1","hc2","hc2_bm"}` on `method="ols"` (defaults to `"hc1"`, preserves prior behavior at machine precision — the WLS-CR1 sandwich is algebraically invariant between the prior within-transform path and the new branched path, differing only by float64 multiplication ordering at sub-ULP scale; the full 106-test `tests/test_wooldridge.py` baseline still passes unchanged). `hc2_bm` auto-routes to a full-dummy saturated design (`[intercept, X_design, unit_dummies, time_dummies]`) + clubSandwich WLS-CR2 algebra (PR #475) — matches `clubSandwich::vcovCR(lm(...), type="CR2") + coef_test()$df_Satt` at `atol=1e-10` on the new `benchmarks/data/wooldridge_golden.json` fixture. `classical`/`hc2` supported via full-dummy + auto-drop of the unit auto-cluster (one-way families); explicit `cluster="X"` + one-way family raises at the linalg validator. Per-cell + aggregate p-values/CIs on `classical`/`hc2` paths use the residual DOF `n - rank(X)` (matches R `lm()` / `coef_test()` t-distribution), not normal-theory. **Bell-McCaffrey Satterthwaite DOF is threaded across ALL hc2_bm user-facing inference surfaces**: (1) per-cell `group_time_effects[(g, t)]` use `coef_test()$df_Satt` (matches R at atol=1e-6 from CI inversion); (2) overall ATT uses the post-period-aggregation contrast DOF from `_compute_cr2_bm_contrast_dof` (matches R `Wald_test(test="HTZ")$df_denom` at atol=1e-10); (3) `.aggregate("group" | "calendar" | "event")` recomputes contrast-specific BM DOFs lazily from BM artifacts stored on the Results object — the REDUCED kept-column design (`X_red`), cluster_ids, reduced bread matrix, and reduced-space coef-index map (using the reduced kept-column design after rank-deficient drops keeps the bread non-singular and matches the subspace `solve_ols` actually estimated in). Fail-closed (all-NaN inference) when BM DOF unavailable, mirrors PR #475 R7 and PR #479 R3. `method ∈ {"logit","poisson"}` + `vcov_type != "hc1"` raises `NotImplementedError` at `__init__` (GLM CR2-BM-on-pseudo-residuals composition needs derivation; deferred to follow-up TODO row). `SurveyDesign` + `vcov_type != "hc1"` raises `NotImplementedError` at `fit()` (survey TSL overrides analytical sandwich). `n_bootstrap > 0` + one-way (`hc2`/`classical`) raises at `fit()` regardless of `cluster=` setting (multiplier bootstrap is intrinsically clustered, but one-way vcov_type does not compose with cluster_ids — either the auto-cluster is dropped when `cluster=None` leaving the bootstrap with no cluster to draw at, or the linalg validator rejects one-way + cluster_ids when `cluster=X`). `conley` rejected at `__init__` with a deferral pointer. `vcov_type`, `cluster_name`, `n_clusters` added to `WooldridgeDiDResults` for downstream introspection (per `feedback_results_vcov_label_cluster_metadata`). Third PR of the Phase 1b standalone-estimator threading initiative (5 PRs to follow: CallawaySantAnna, ImputationDiD, TripleDifference, TwoStageDiD, EfficientDiD).
 - **`SpilloverDiD(survey_design=SurveyDesign.subpopulation(...))` full-design retention via zero-pad scores (Wave E.3).** Closes the Wave E.1/E.2/follow-up documented limitation at `REGISTRY.md:3249`: `SurveyDesign.subpopulation()`-derived designs AND warn-and-drop fits now preserve the full-domain resolved survey design — `n_psu` / `n_strata` / `df_survey` / Binder TSL per-stratum centering reflect the FULL domain rather than the post-`finite_mask` fit sample. **Documented synthesis (library-convention adoption, NOT new methodology):** Wave E.3 adopts the canonical "zero-pad scores to full panel + retain full-design resolved survey" pattern from R `survey::svyrecvar(subset())` (Lumley 2010 §2.5) already established in `diff_diff/imputation.py:2175-2183` (PreTrendsImputation lead regression — Omega_0 scores zero-padded back to full panel length) and `diff_diff/prep.py:1401-1432` (DCDH cell variance — IF zero-padded outside the cell). Wave E.3 propagates the same convention to SpilloverDiD's Wave E.1 Binder TSL × Wave D Gardner GMM × Wave E.2/follow-up stratified-Conley + serial Bartlett meat. **Mechanical realization (one new `_compute_gmm_corrected_meat` kwarg):** the gamma_hat / Psi build stays on SURVEY-FINITE-MASK inputs (`X_1_sparse_fit`, `X_10_sparse_fit`, `eps_10_fit` built on `survey_finite_mask = finite_mask & survey_weights > 0`; `X_2_kept_gamma`, `eps_2_fit_gamma`, `survey_weights_fit_gamma` projected from the fit-sample frame down to survey_finite_mask) so the drop-first stage-1 FE column space is bit-identical to the pre-E.3 path. `_compute_gmm_corrected_meat` gains a new optional kwarg `score_pad_mask: Optional[np.ndarray] = None`: when supplied, the helper zero-pads the fit-sample `Psi` to full panel length AFTER construction but BEFORE kernel dispatch via `Psi_padded[score_pad_mask] = Psi`. Kernel-dispatch arrays (`cluster_ids`, `conley_coords`, `conley_time`, `conley_unit`, `resolved_survey`) are passed at FULL length so the meat helpers (Binder TSL / stratified-Conley / serial Bartlett) see the full-domain PSU / strata / centroid / time geometry. The `_validate_conley_kwargs` call inside the helper reads `n_for_conley = len(score_pad_mask)` when the kwarg is set so the Conley shape checks see the full-length geometry. **`gamma_hat` invariance:** the gamma_hat solve operates on fit-sample inputs throughout — bit-identical to the pre-E.3 path (critical for the case where `_build_butts_fe_design_csr`'s `pd.factorize` re-compaction would drop a different unit's column under a full-length FE build than under a fit-length one). **Bread invariance:** `A_22 = X_2_kept' W X_2_kept` at `spillover.py:3187-3214` still uses fit-length `X_2_kept` because `A_22_full = X_2_full' W_full X_2_full` equals `A_22_kept` when zero-weight rows contribute zero. **A2 invariant:** warn-and-drop and `SurveyDesign.subpopulation()` drops are treated identically — both apply the zero-pad mechanism. The "both mechanisms compose cleanly" case (subpop-excluded row that is ALSO warn-and-dropped) produces `Psi = 0` from either cause; the PSU still counts toward `n_psu_full`. Hand-computation methodology anchor at `_scratch/wave_e3_smoke.py` codifies the A2 invariant on 4 PSU × 4 period × 3 obs synthetic. **Subpopulation parity vs upstream-subset:** `df_survey` matches the full domain regardless of how many rows the subpopulation mask excludes (mirrors R `svyglm(design=subset(d, mask))` vs `svyglm(design=svydesign(data=data[mask], ...))`). SE may differ by design — subpopulation retains zero-padded PSU geometry; upstream-subset drops PSUs entirely. **Pre-E.3 baseline parity:** when `finite_mask.all() == True` AND all weights `> 0`, the Wave E.3 zero-pad is a no-op — ATT + SE + n_psu + df_survey match pre-E.3 baseline values via FIXED GOLDEN values at `test_c` (`rtol=1e-12, atol=1e-12`). **Cross-surface n_psu consistency:** top-level `res.n_psu` reads from `len(resolved_survey_fit.weights)` on the implicit-PSU branch (was `int(finite_mask.sum())` pre-codex-R1-P2-fix); this keeps `res.n_psu == res.survey_metadata.n_psu` on weights-only / strata-only survey designs under warn-and-drop. Regression at `test_c2`. **Restrictions inherited:** replicate-weight variance + subpopulation continues to raise `NotImplementedError` at the Wave E.1 gate. TwoStageDiD's analogous `finite_mask + design-subset` pattern at `two_stage.py:567-601` is NOT yet adopted to Wave E.3 — separate parity follow-up tracked in `TODO.md` (an expected-divergence test was attempted but TwoStageDiD's always-treated handling at `two_stage.py:294-336` differs from SpilloverDiD's per-unit Omega_0 check, so the divergence didn't materialize on the standard fixture; the parity follow-up should add its own targeted regression). **Implementation:** `spillover.py:2845-2896` design-subset block deleted; `survey_weights_fit = survey_weights[finite_mask]` retained for the stage-2 OLS solve which still operates on the fit sample; `cluster_ids_full[finite_mask]` subset dropped on the survey path. `_compute_gmm_corrected_meat` call at `spillover.py:3163` now receives FIT-LENGTH gamma_hat-construction inputs (unchanged) plus FULL-LENGTH kernel-dispatch arrays (`cluster_ids_for_meat`, `conley_*_for_meat`, `resolved_survey_fit`) plus the new `score_pad_mask=survey_finite_mask` kwarg; no-survey path passes `score_pad_mask=None` and uses fit-length variables throughout (bit-identical to pre-E.3). `_compute_gmm_corrected_meat` at `two_stage.py:62-80` adds one new optional kwarg `score_pad_mask: Optional[np.ndarray] = None` and one post-Psi-construction zero-pad block; the `_validate_conley_kwargs` call uses `n_for_conley = len(score_pad_mask)` when the kwarg is set. Within-unit-constancy validator at `spillover.py:2913` updated to operate on full-length unit array. Second `compute_survey_metadata` recompute at `spillover.py:2954-2959` uses full-length `raw_w`. No `_compute_stratified_meat_from_psu_scores` / `_compute_stratified_conley_meat` / `_compute_stratified_serial_bartlett_meat` signature changes. **Tests:** new `TestSpilloverDiDWaveE3SubpopulationFullDesign` and `TestSpilloverDiDWaveE3SubpopulationFullDesignEventStudy` classes in `tests/test_spillover.py` (19 tests: pre-E.3 baseline parity via pinned goldens, n_psu cross-surface consistency on implicit-PSU branch, A2 invariant (zero-pad mechanics via mock-spy), subpopulation × explicit-PSU parity, conley + lag>0 + subpopulation × explicit-PSU / cluster-injection / weights-only branches, cluster-as-PSU + subpopulation parity, unit with BOTH zero weight AND no Omega_0 support, gamma_hat-build sample excludes zero-weight rows, n_obs / n_treated / n_control / n_far_away_obs reflect count_mask, warn-drop SE drift golden, ATT bit-equality under PSU-last-sort exclusion, exact event-study n_obs propagation, event-study on both is_staggered branches with analytical + conley+lag variants). Pre-existing Wave E.1 `test_p2_finite_mask_forces_drop_under_survey` assertion flipped from `n_psu=8` (subset) to `n_psu=10` (full domain) to reflect the new contract.
 - **ChaisemartinDHaultfoeuille (DCDH) methodology-review-tracker promotion.** Tracker row flipped **In Progress** → **Complete** with full Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns structure mirroring the HAD precedent (PR #473) and ContinuousDiD precedent (PR #476). REGISTRY `## ChaisemartinDHaultfoeuille` gains a formal `### Deviations from the paper / from R / library extensions` block consolidating 7 documented deviations into a single AI-review-recognized labeled surface (per CLAUDE.md "Documenting Deviations (AI Review Compatibility)"): (D1) equal-cell weighting (deviation from BOTH AER 2020 Equation 3 AND R `DIDmultiplegtDYN`); (D2) period-based vs cohort-based stable controls; (D3) balanced-baseline panel + interior-gap drops + terminal-missingness retention + cell-period-allocator targeted `ValueError`; (D4) SE normalization `N_l` vs R `G` (~4% smaller analytical SE); (D5) singleton-cohort degeneracy → NaN with `UserWarning`; (D6) `<50%` switcher warning at far horizons (library extension citing Favara-Imbs application, footnote 14 of NBER WP 29873); (D7) Phase 3 `DID^X` covariate first-stage equal-cell weights. R cross-language coverage holds at documented tolerance bands in `tests/test_chaisemartin_dhaultfoeuille_parity.py` (`POINT_RTOL = 1e-4` on pure-direction point estimates, `MIXED_POINT_RTOL = 0.025` on mixed-direction, `PURE_DIRECTION_SE_RTOL = 0.05` on pure-direction SE, `SE_RTOL = 0.10` on multi-horizon SE, `se_rtol=0.15` on the long-panel `L_max=5` joiners-only scenario where cell-count-weighting compounds). No source code changes, no new tests, no new docstrings — consolidation only against the existing 12 methodology tests (`tests/test_methodology_chaisemartin_dhaultfoeuille.py`), 26 R-parity tests (`tests/test_chaisemartin_dhaultfoeuille_parity.py`), 352 unit tests (`tests/test_chaisemartin_dhaultfoeuille.py`), survey suites (`tests/test_survey_dcdh.py`, `tests/test_survey_dcdh_replicate_psu.py`, three cell-period coverage suites), and two primary-source DCDH paper reviews on disk (2020 AER + 2022/2023 NBER WP 29873 via PR #478; the `dechaisemartin-2026-review.md` on disk is HAD's primary source, not DCDH's, and is referenced as adjacent context only). The REGISTRY Deviations block uses semantic section-name anchors (rather than fragile line numbers) for back-references to other parts of the DCDH section — an intentional divergence from the PR #476 ContinuousDiD precedent reflecting PR-A wording-drift CI feedback that flagged line-number cross-references as drift-prone in long sections. `METHODOLOGY_REVIEW.md` DCDH row promoted **In Progress** → **Complete**; L27 In Progress example paragraph re-pointed to WooldridgeDiD; L1289 priority-order queue item #6 (DCDH) removed and items #7-#11 renumbered to #6-#10.
 

diff --git a/TODO.md b/TODO.md
@@ -99,9 +99,11 @@ Deferred items from PR reviews that were not addressed before merge.
 | PreTrendsPower: CS/SA `anticipation=1` R-parity fixture. The PR-C R-parity goldens cover NIS power + γ_p MDV at `atol=1e-4` on four shifted-grid / regular / irregular / K=1 fixtures, but R `pretrends` has no anticipation parameter so the Python-side `_extract_pre_period_params` anticipation filter (`if t < _pre_cutoff` in `pretrends.py` lines 1138-1150 for CS; mirror in SA branch) is not R-parity-locked. Build a synthetic `CallawaySantAnnaResults` (or `SunAbrahamResults`) with `anticipation=1` and a t=-1 event-study entry that should be filtered before reaching `_compute_power_nis`, then assert the resulting γ_p matches R's `slope_for_power()` on the K=4 shifted-grid fixture. Existing PR-B MC-based tests (`TestPretrendsPropositions`) and full-VCV tests (`TestPretrendsCovarianceSource`) already cover the filter mechanically; this would close the loop against R. | `tests/test_methodology_pretrends.py::TestPretrendsParityR`, `benchmarks/R/generate_pretrends_golden.R` | PR-C follow-up | Low |
 
 
-| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the standalone estimators that expose `cluster=` but not yet `vcov_type=`: `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham`; Phase 1b PR 2/8 added `StackedDiD` (this row tracks the remaining 6). | multiple | Phase 1b | Medium |
+| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the standalone estimators that expose `cluster=` but not yet `vcov_type=`: `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham`; Phase 1b PR 2/8 added `StackedDiD`; Phase 1b PR 3/8 added `WooldridgeDiD` OLS path (this row tracks the remaining 5). | multiple | Phase 1b | Medium |
 | Extend `SunAbraham` with `vcov_type="conley"` (Conley spatial-HAC) as a first-class feature: thread `conley_coords` / `conley_cutoff_km` / `conley_metric` / `conley_kernel` / `conley_time` / `conley_unit` / `conley_lag_cutoff` through `_fit_saturated_regression`. Phase 1b PR 1/8 deferred this; SA currently rejects `vcov_type="conley"` at `__init__` with a deferral message. | `diff_diff/sun_abraham.py` | follow-up | Medium |
 | Extend `StackedDiD` with `vcov_type="conley"` (Conley spatial-HAC) — thread the six `conley_*` params through `solve_ols` at `stacked_did.py:419` (and the `_refit_stacked` closure at `:444`). Phase 1b PR 2/8 deferred this; StackedDiD currently rejects `vcov_type="conley"` at `__init__` with a deferral message. Same shape as the SunAbraham conley follow-up. | `diff_diff/stacked_did.py` | follow-up | Medium |
+| Extend `WooldridgeDiD` with `vcov_type="conley"` — thread the six `conley_*` params through `solve_ols` in `_fit_ols`. Phase 1b PR 3/8 deferred this; WooldridgeDiD currently rejects `vcov_type="conley"` at `__init__` with a deferral message. Same shape as the SunAbraham / StackedDiD conley follow-ups. | `diff_diff/wooldridge.py` | follow-up | Medium |
+| Extend `WooldridgeDiD` `method ∈ {"logit","poisson"}` paths with `vcov_type ∈ {classical, hc2, hc2_bm}`. The GLM QMLE sandwich uses pseudo-residuals (`weights=p(1-p)` for logit, `weights=μ_i` for Poisson, aweight semantics); composing HC2 leverage and Bell-McCaffrey Satterthwaite DOF with QMLE on canonical-link pseudo-residuals needs derivation + R parity against `clubSandwich::vcovCR(glm(...), type="CR2")`. Phase 1b PR 3/8 rejects `method != "ols" + vcov_type != "hc1"` at `__init__` with a deferral pointer here. | `diff_diff/wooldridge.py` (`_fit_logit`, `_fit_poisson`) | follow-up | Medium |
 | Harmonize SunAbraham's HC1 within-transform finite-sample correction with `fixest::sunab()`. SA's `solve_ols` applies `n / (n - k_dm)` (within-transform columns only); fixest applies `n / (n - k_total)` (counts absorbed FE). SE values differ by ~1-2% on typical panel sizes (documented in REGISTRY.md "Deviation from R"; pinned at `atol=5e-3` in `tests/test_methodology_sun_abraham.py`). Either thread `df_adjustment` into the vcov scaling or document as an intentional difference. | `diff_diff/sun_abraham.py`, `diff_diff/linalg.py::compute_robust_vcov` | follow-up | Low |
 <!-- Rows 104-105 LIFTED 2026-05-20 via the clubSandwich WLS-CR2 port. The diff-diff
      form matches clubSandwich's specific algebra (W not sqrt(W), W^2 in bias term,
@@ -195,7 +197,7 @@ Ordered paydown view across the tables above. Tier A → D is by effort × risk,
 
 #### Tier B — Mid-size methodology (5-10 CI rounds expected, per memory cascade priors)
 
-- Thread `vcov_type` through 8 standalone estimators: `CallawaySantAnna`, `SunAbraham`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD` (none currently expose `self.vcov_type`)
+- Thread `vcov_type` through the 5 remaining standalone estimators: `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `EfficientDiD` (Phase 1b PR 1/8 added SunAbraham, PR 2/8 added StackedDiD, PR 3/8 added WooldridgeDiD-OLS)
 - SyntheticDiD: rename internal `placebo_effects` → `variance_effects` AND public `placebo_effects` field with deprecation alias retained for one release (`synthetic_did.py`, `results.py`)
 - StaggeredTripleDifference R parity: commit CSV fixtures + add covariate-adjusted scenarios + aggregation-SE assertions (`tests/test_methodology_staggered_triple_diff.py`, `benchmarks/R/benchmark_staggered_triplediff.R`)
 - StaggeredTripleDifference: per-cohort group-effect SE WIF override for exact R `triplediff` match (`staggered_triple_diff.py`)