Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422.

### Changed
- **`ChaisemartinDHaultfoeuille.predict_het` inference: t-distribution df threading (closes TODO pilot-412).** `_compute_heterogeneity_test` now passes `df = n_obs - n_params` to `safe_inference` on the non-survey OLS path, matching R `did_multiplegt_dyn(predict_het=...)`'s t-distribution inference (`DIDmultiplegtDYN:::did_multiplegt_main` `t_stat <- qt(0.975, df.residual(model))` site). Pre-PR Python used `df=None` (normal Z critical), producing 0.1-2% rtol gaps on `p_value` and `conf_int` vs R. Parity tolerance tightened on the existing forward-horizon scenarios (`multi_path_reversible_predict_het`, `multi_path_reversible_by_path_predict_het`) from "unpinned" to `INFERENCE_RTOL=1e-4` on `p_value` and `conf_int`; `beta` / `se` / `t_stat` continue at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5`. **Rank-deficient caveat:** `n_params = design.shape[1]` is the pre-drop column count; under near-rank-deficient designs that `solve_ols` retains rather than NaN-out, the actual rank may be lower than `n_params` (R's `df.residual` uses post-drop rank). Fully rank-deficient designs are NaN-filled by the existing short-circuit; the gap only affects near-rank-deficient edge cases (tracked as a Low TODO follow-up). The Z-vs-t REGISTRY deviation note is replaced with an "R parity (post-2026-05-15 df threading)" positive-claim note.
- **`ChaisemartinDHaultfoeuille.predict_het` inference: t-distribution df threading (closes TODO pilot-412).** `_compute_heterogeneity_test` now passes `df = n_obs - rank(design)` to `safe_inference` on the non-survey OLS path, matching R `did_multiplegt_dyn(predict_het=...)`'s t-distribution inference (`DIDmultiplegtDYN:::did_multiplegt_main` `t_stat <- qt(0.975, df.residual(model))` site). Pre-PR Python used `df=None` (normal Z critical), producing 0.1-2% rtol gaps on `p_value` and `conf_int` vs R. Parity tolerance tightened on the existing forward-horizon scenarios (`multi_path_reversible_predict_het`, `multi_path_reversible_by_path_predict_het`) from "unpinned" to `INFERENCE_RTOL=1e-4` on `p_value` and `conf_int`; `beta` / `se` / `t_stat` continue at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5`. **Post-drop rank (post-2026-05-16 wrap-up):** the df denominator uses the post-drop numerical rank via `_detect_rank_deficiency`, which `solve_ols` already calls internally. For full-rank designs `rank == n_params` and behavior is bit-identical to the pre-PR `n_obs - n_params` path; for near-rank-deficient designs that `solve_ols` retains rather than NaN-out (e.g., cohort-collinearity at high horizons), the post-drop rank is strictly lower and the post-PR `df` is larger, matching R's `lm()` convention. The Z-vs-t REGISTRY deviation note is replaced with an "R parity (post-2026-05-15 df threading)" positive-claim note.

- **`ChaisemartinDHaultfoeuille.by_path` negative-baseline path regression coverage.** New `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathNonBinary::test_negative_baseline_path_supported` exercises switchers with `D_{g,1} = -1` and asserts that `path_effects` correctly contains negative-baseline tuple keys (e.g., `(-1, 0, 0, 0)`, `(-1, 1, 1, 1)`). This closes the test-coverage gap from PR #419: the existing `test_negative_integer_D_supported` only covered paths with negative values in non-baseline positions (e.g., `(0, -1, -1, -1)`), which does not trigger R's documented `substr(path, 1, 1)` baseline-extraction bug. Python's tuple-key matching is correct under any baseline value; this test pins the contract. No R-parity fixture is added because R is the buggy side on this regime — the deviation is documented in the REGISTRY non-binary treatment Note.

## [3.3.3] - 2026-05-15

Expand Down
5 changes: 1 addition & 4 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,7 @@ Deferred items from PR reviews that were not addressed before merge.
| dCDH: Phase 1 per-period placebo DID_M^pl has NaN SE (no IF derivation for the per-period aggregation path). Multi-horizon placebos (L_max >= 1) have valid SE. | `chaisemartin_dhaultfoeuille.py` | #294 | Low |
| dCDH: Survey cell-period allocator's post-period attribution is a library convention, not derived from the observation-level survey linearization. MC coverage is empirically close to nominal on the test DGP; a formal derivation (or a covariance-aware two-cell alternative) is deferred. Documented in REGISTRY.md survey IF expansion Note. | `chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | #408 | Medium |
| dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). | `test_chaisemartin_dhaultfoeuille_parity.py` | #294 | Low |
| dCDH by_path: negative-baseline path regression (e.g. `(-1, 0, 0, 0)`) is not yet exercised. The existing negative-D test (`test_negative_integer_D_supported`) only covers paths with negative values in non-baseline positions like `(0, -1, -1, -1)`, which does not trigger the R `substr(path, 1, 1)` bug regime (the bug needs a multi-character baseline). Add a switcher fixture with `D_{g,1} = -1` and assert the resulting path tuple key. | `tests/test_chaisemartin_dhaultfoeuille.py` | #419 | Low |
| dCDH by_path: survey-aware backward-horizon (`placebo + predict_het + survey_design`) raises `NotImplementedError` because the Binder TSL cell-period allocator's REGISTRY justification is tied to post-period attribution. Backward horizons would put ψ_g mass on a pre-period cell. Deriving the pre-period cell allocator (or adding a covariance-aware two-cell alternative) is deferred to a follow-up methodology PR. | `diff_diff/chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | follow-up | Medium |
| dCDH heterogeneity: rank-deficient designs use `df = n_obs - n_params` (pre-drop column count) in the t-distribution inference. R's `lm(predict_het=...)` uses `df.residual = n - rank(design)` post-drop. Fully rank-deficient designs are NaN-filled by the rank-deficient short-circuit at `_compute_heterogeneity_test:5141-5150`, so the gap only affects near-rank-deficient designs where `solve_ols` retains the design. Thread actual rank from `solve_ols` to close the gap. | `diff_diff/chaisemartin_dhaultfoeuille.py` | follow-up | Low |
| CallawaySantAnna: consider materializing NaN entries for non-estimable (g,t) cells in group_time_effects dict (currently omitted with consolidated warning); would require updating downstream consumers (event study, balance_e, aggregation) | `staggered.py` | #256 | Low |
| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
| Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
Expand Down Expand Up @@ -186,8 +184,7 @@ Ordered paydown view across the tables above. Tier A → D is by effort × risk,
- WooldridgeDiD: optional `weights="cohort_share"` on `aggregate()` (`wooldridge_results.py`)
- HAD survey-design API consolidation: drop deprecated `survey=`/`weights=` kwargs (`had.py`, `had_pretests.py`; gated on next minor bump)
- Survey-design resolution / collapse helper extraction across `continuous_did.py`, `efficient_did.py`, `stacked_did.py`
- dCDH heterogeneity df threading: t-distribution at heterogeneity surface (or formalize the tolerance constant) (`chaisemartin_dhaultfoeuille.py`)
- dCDH by_path placebo `predict_het` parity vs R `did_multiplegt_dyn(..., by_path, predict_het)` (`chaisemartin_dhaultfoeuille.py`, `chaisemartin_dhaultfoeuille_results.py`)
- dCDH survey + backward-horizon `predict_het` allocator derivation: lift the warn-and-skip fallback at `_compute_heterogeneity_test` once the pre-period Binder TSL cell-period allocator is derived (currently the gate emits a `UserWarning` and falls back to forward-horizon-only heterogeneity under `survey_design + placebo + heterogeneity`) (`chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md`)
- Rust local-method solver path unification to `solve_wls_svd` + bootstrap-weight RNG parity audit (`rust/src/trop.rs`, `rust/src/bootstrap.rs`)
- AI review CI workflow-contract pin test expansion (`tests/test_openai_review.py`)
- In-site Sphinx render of `REPORTING.md` and `REGISTRY.md` (`docs/conf.py` + `:doc:` link migration)
Expand Down
59 changes: 47 additions & 12 deletions diff_diff/chaisemartin_dhaultfoeuille.py
Original file line number Diff line number Diff line change
Expand Up @@ -5190,9 +5190,32 @@ def _compute_heterogeneity_test(
else:
design = np.hstack([intercept, x_arr])

# Guard: need more observations than parameters
# Compute post-drop numerical rank for both the small-sample
# guard and the t-distribution df. `_detect_rank_deficiency` is
# the same helper `solve_ols` calls internally; calling it
# explicitly here lets the guard use post-drop rank (matching
# R's `df.residual = n_obs - rank(design)` convention from
# `DIDmultiplegtDYN:::did_multiplegt_main` `qt(0.975,
# df.residual(model))`) instead of pre-drop column count,
# which would incorrectly short-circuit cases where solve_ols's
# R-style alias drop leaves `n_obs > rank > 0` (e.g., cohort-
# dummy collinearity at high horizons). For full-rank designs
# `rank == n_params` and behavior is bit-identical to the
# pre-PR `n_obs - n_params` path. The extra O(nk^2) cost is
# negligible at heterogeneity scale (k = intercept + X_het +
# cohort dummies, typically 5-30 columns; n = path-switcher
# count, typically 30-300 groups).
from diff_diff.linalg import _detect_rank_deficiency

n_params = design.shape[1]
if n_obs <= n_params:
rank, _dropped, _pivot = _detect_rank_deficiency(design)

# Guard: need MORE observations than rank for a well-defined
# residual df. When `n_obs <= rank`, the regression has zero
# residual df (perfect fit or under-identified) and inference
# is undefined. This is the rank-based replacement for the
# pre-PR `n_obs <= n_params` short-circuit.
if n_obs <= rank:
results[l_h] = {
"beta": float("nan"),
"se": float("nan"),
Expand All @@ -5203,27 +5226,39 @@ def _compute_heterogeneity_test(
}
continue

df_ols = int(n_obs) - int(rank)

if not use_survey:
# Plain OLS path: standard inference per Lemma 7. df is the
# pre-drop column count (n_obs - n_params); matches R's
# did_multiplegt_dyn(predict_het=...) which uses the
# t-distribution with df = n - k from the OLS regression
# (DIDmultiplegtDYN:::did_multiplegt_main `t_stat <- qt(0.975,
# df.residual(model))` site). Under near-rank-deficient
# designs that solve_ols retains rather than NaN-out, n_params
# may exceed actual rank; see TODO row for the deferred
# rank-tracking follow-up.
# Plain OLS path: standard inference per Lemma 7.
coefs, _residuals, vcov = solve_ols(
design,
dep_arr,
return_vcov=True,
rank_deficient_action=rank_deficient_action,
)
# Under rank-deficient designs solve_ols R-style-drops one
# or more columns and NaN-fills their coefs. If `X_het`
# (column index 1) is the dropped column, the heterogeneity
# coefficient is unidentified — NaN-fill the inference
# tuple (matches R's lm() returning NA for aliased
# coefficients). Other columns being dropped is fine: the
# X_het coefficient and its (1,1) vcov entry remain
# identified, df_ols already reflects the post-drop rank.
if not np.isfinite(coefs[1]):
results[l_h] = {
"beta": float("nan"),
"se": float("nan"),
"t_stat": float("nan"),
"p_value": float("nan"),
"conf_int": (float("nan"), float("nan")),
"n_obs": n_obs,
}
continue
beta_het = float(coefs[1])
se_het = float("nan")
if vcov is not None and np.isfinite(vcov[1, 1]) and vcov[1, 1] > 0:
se_het = float(np.sqrt(vcov[1, 1]))
t_stat, p_val, ci = safe_inference(beta_het, se_het, alpha=alpha, df=n_obs - n_params)
t_stat, p_val, ci = safe_inference(beta_het, se_het, alpha=alpha, df=df_ols)
else:
# Survey-aware path: WLS with per-group weights + TSL IF variance.
W_elig = W_g_all[eligible]
Expand Down
2 changes: 1 addition & 1 deletion docs/methodology/REGISTRY.md

Large diffs are not rendered by default.

Loading
Loading