igerber · igerber · May 16, 2026 · May 16, 2026 · May 16, 2026 · May 16, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -11,7 +11,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422.
 
 ### Changed
-- **`ChaisemartinDHaultfoeuille.predict_het` inference: t-distribution df threading (closes TODO pilot-412).** `_compute_heterogeneity_test` now passes `df = n_obs - n_params` to `safe_inference` on the non-survey OLS path, matching R `did_multiplegt_dyn(predict_het=...)`'s t-distribution inference (`DIDmultiplegtDYN:::did_multiplegt_main` `t_stat <- qt(0.975, df.residual(model))` site). Pre-PR Python used `df=None` (normal Z critical), producing 0.1-2% rtol gaps on `p_value` and `conf_int` vs R. Parity tolerance tightened on the existing forward-horizon scenarios (`multi_path_reversible_predict_het`, `multi_path_reversible_by_path_predict_het`) from "unpinned" to `INFERENCE_RTOL=1e-4` on `p_value` and `conf_int`; `beta` / `se` / `t_stat` continue at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5`. **Rank-deficient caveat:** `n_params = design.shape[1]` is the pre-drop column count; under near-rank-deficient designs that `solve_ols` retains rather than NaN-out, the actual rank may be lower than `n_params` (R's `df.residual` uses post-drop rank). Fully rank-deficient designs are NaN-filled by the existing short-circuit; the gap only affects near-rank-deficient edge cases (tracked as a Low TODO follow-up). The Z-vs-t REGISTRY deviation note is replaced with an "R parity (post-2026-05-15 df threading)" positive-claim note.
+- **`ChaisemartinDHaultfoeuille.predict_het` inference: t-distribution df threading (closes TODO pilot-412).** `_compute_heterogeneity_test` now passes `df = n_obs - rank(design)` to `safe_inference` on the non-survey OLS path, matching R `did_multiplegt_dyn(predict_het=...)`'s t-distribution inference (`DIDmultiplegtDYN:::did_multiplegt_main` `t_stat <- qt(0.975, df.residual(model))` site). Pre-PR Python used `df=None` (normal Z critical), producing 0.1-2% rtol gaps on `p_value` and `conf_int` vs R. Parity tolerance tightened on the existing forward-horizon scenarios (`multi_path_reversible_predict_het`, `multi_path_reversible_by_path_predict_het`) from "unpinned" to `INFERENCE_RTOL=1e-4` on `p_value` and `conf_int`; `beta` / `se` / `t_stat` continue at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5`. **Post-drop rank (post-2026-05-16 wrap-up):** the df denominator uses the post-drop numerical rank via `_detect_rank_deficiency`, which `solve_ols` already calls internally. For full-rank designs `rank == n_params` and behavior is bit-identical to the pre-PR `n_obs - n_params` path; for near-rank-deficient designs that `solve_ols` retains rather than NaN-out (e.g., cohort-collinearity at high horizons), the post-drop rank is strictly lower and the post-PR `df` is larger, matching R's `lm()` convention. The Z-vs-t REGISTRY deviation note is replaced with an "R parity (post-2026-05-15 df threading)" positive-claim note.
+
+- **`ChaisemartinDHaultfoeuille.by_path` negative-baseline path regression coverage.** New `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathNonBinary::test_negative_baseline_path_supported` exercises switchers with `D_{g,1} = -1` and asserts that `path_effects` correctly contains negative-baseline tuple keys (e.g., `(-1, 0, 0, 0)`, `(-1, 1, 1, 1)`). This closes the test-coverage gap from PR #419: the existing `test_negative_integer_D_supported` only covered paths with negative values in non-baseline positions (e.g., `(0, -1, -1, -1)`), which does not trigger R's documented `substr(path, 1, 1)` baseline-extraction bug. Python's tuple-key matching is correct under any baseline value; this test pins the contract. No R-parity fixture is added because R is the buggy side on this regime — the deviation is documented in the REGISTRY non-binary treatment Note.
 
 ## [3.3.3] - 2026-05-15
 

diff --git a/TODO.md b/TODO.md
@@ -77,9 +77,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | dCDH: Phase 1 per-period placebo DID_M^pl has NaN SE (no IF derivation for the per-period aggregation path). Multi-horizon placebos (L_max >= 1) have valid SE. | `chaisemartin_dhaultfoeuille.py` | #294 | Low |
 | dCDH: Survey cell-period allocator's post-period attribution is a library convention, not derived from the observation-level survey linearization. MC coverage is empirically close to nominal on the test DGP; a formal derivation (or a covariance-aware two-cell alternative) is deferred. Documented in REGISTRY.md survey IF expansion Note. | `chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | #408 | Medium |
 | dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). | `test_chaisemartin_dhaultfoeuille_parity.py` | #294 | Low |
-| dCDH by_path: negative-baseline path regression (e.g. `(-1, 0, 0, 0)`) is not yet exercised. The existing negative-D test (`test_negative_integer_D_supported`) only covers paths with negative values in non-baseline positions like `(0, -1, -1, -1)`, which does not trigger the R `substr(path, 1, 1)` bug regime (the bug needs a multi-character baseline). Add a switcher fixture with `D_{g,1} = -1` and assert the resulting path tuple key. | `tests/test_chaisemartin_dhaultfoeuille.py` | #419 | Low |
 | dCDH by_path: survey-aware backward-horizon (`placebo + predict_het + survey_design`) raises `NotImplementedError` because the Binder TSL cell-period allocator's REGISTRY justification is tied to post-period attribution. Backward horizons would put ψ_g mass on a pre-period cell. Deriving the pre-period cell allocator (or adding a covariance-aware two-cell alternative) is deferred to a follow-up methodology PR. | `diff_diff/chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | follow-up | Medium |
-| dCDH heterogeneity: rank-deficient designs use `df = n_obs - n_params` (pre-drop column count) in the t-distribution inference. R's `lm(predict_het=...)` uses `df.residual = n - rank(design)` post-drop. Fully rank-deficient designs are NaN-filled by the rank-deficient short-circuit at `_compute_heterogeneity_test:5141-5150`, so the gap only affects near-rank-deficient designs where `solve_ols` retains the design. Thread actual rank from `solve_ols` to close the gap. | `diff_diff/chaisemartin_dhaultfoeuille.py` | follow-up | Low |
 | CallawaySantAnna: consider materializing NaN entries for non-estimable (g,t) cells in group_time_effects dict (currently omitted with consolidated warning); would require updating downstream consumers (event study, balance_e, aggregation) | `staggered.py` | #256 | Low |
 | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
 | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
@@ -186,8 +184,7 @@ Ordered paydown view across the tables above. Tier A → D is by effort × risk,
 - WooldridgeDiD: optional `weights="cohort_share"` on `aggregate()` (`wooldridge_results.py`)
 - HAD survey-design API consolidation: drop deprecated `survey=`/`weights=` kwargs (`had.py`, `had_pretests.py`; gated on next minor bump)
 - Survey-design resolution / collapse helper extraction across `continuous_did.py`, `efficient_did.py`, `stacked_did.py`
-- dCDH heterogeneity df threading: t-distribution at heterogeneity surface (or formalize the tolerance constant) (`chaisemartin_dhaultfoeuille.py`)
-- dCDH by_path placebo `predict_het` parity vs R `did_multiplegt_dyn(..., by_path, predict_het)` (`chaisemartin_dhaultfoeuille.py`, `chaisemartin_dhaultfoeuille_results.py`)
+- dCDH survey + backward-horizon `predict_het` allocator derivation: lift the warn-and-skip fallback at `_compute_heterogeneity_test` once the pre-period Binder TSL cell-period allocator is derived (currently the gate emits a `UserWarning` and falls back to forward-horizon-only heterogeneity under `survey_design + placebo + heterogeneity`) (`chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md`)
 - Rust local-method solver path unification to `solve_wls_svd` + bootstrap-weight RNG parity audit (`rust/src/trop.rs`, `rust/src/bootstrap.rs`)
 - AI review CI workflow-contract pin test expansion (`tests/test_openai_review.py`)
 - In-site Sphinx render of `REPORTING.md` and `REGISTRY.md` (`docs/conf.py` + `:doc:` link migration)

diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py
@@ -5190,9 +5190,32 @@ def _compute_heterogeneity_test(
         else:
             design = np.hstack([intercept, x_arr])
 
-        # Guard: need more observations than parameters
+        # Compute post-drop numerical rank for both the small-sample
+        # guard and the t-distribution df. `_detect_rank_deficiency` is
+        # the same helper `solve_ols` calls internally; calling it
+        # explicitly here lets the guard use post-drop rank (matching
+        # R's `df.residual = n_obs - rank(design)` convention from
+        # `DIDmultiplegtDYN:::did_multiplegt_main` `qt(0.975,
+        # df.residual(model))`) instead of pre-drop column count,
+        # which would incorrectly short-circuit cases where solve_ols's
+        # R-style alias drop leaves `n_obs > rank > 0` (e.g., cohort-
+        # dummy collinearity at high horizons). For full-rank designs
+        # `rank == n_params` and behavior is bit-identical to the
+        # pre-PR `n_obs - n_params` path. The extra O(nk^2) cost is
+        # negligible at heterogeneity scale (k = intercept + X_het +
+        # cohort dummies, typically 5-30 columns; n = path-switcher
+        # count, typically 30-300 groups).
+        from diff_diff.linalg import _detect_rank_deficiency
+
         n_params = design.shape[1]
-        if n_obs <= n_params:
+        rank, _dropped, _pivot = _detect_rank_deficiency(design)
+
+        # Guard: need MORE observations than rank for a well-defined
+        # residual df. When `n_obs <= rank`, the regression has zero
+        # residual df (perfect fit or under-identified) and inference
+        # is undefined. This is the rank-based replacement for the
+        # pre-PR `n_obs <= n_params` short-circuit.
+        if n_obs <= rank:
             results[l_h] = {
                 "beta": float("nan"),
                 "se": float("nan"),
@@ -5203,27 +5226,39 @@ def _compute_heterogeneity_test(
             }
             continue
 
+        df_ols = int(n_obs) - int(rank)
+
         if not use_survey:
-            # Plain OLS path: standard inference per Lemma 7. df is the
-            # pre-drop column count (n_obs - n_params); matches R's
-            # did_multiplegt_dyn(predict_het=...) which uses the
-            # t-distribution with df = n - k from the OLS regression
-            # (DIDmultiplegtDYN:::did_multiplegt_main `t_stat <- qt(0.975,
-            # df.residual(model))` site). Under near-rank-deficient
-            # designs that solve_ols retains rather than NaN-out, n_params
-            # may exceed actual rank; see TODO row for the deferred
-            # rank-tracking follow-up.
+            # Plain OLS path: standard inference per Lemma 7.
             coefs, _residuals, vcov = solve_ols(
                 design,
                 dep_arr,
                 return_vcov=True,
                 rank_deficient_action=rank_deficient_action,
             )
+            # Under rank-deficient designs solve_ols R-style-drops one
+            # or more columns and NaN-fills their coefs. If `X_het`
+            # (column index 1) is the dropped column, the heterogeneity
+            # coefficient is unidentified — NaN-fill the inference
+            # tuple (matches R's lm() returning NA for aliased
+            # coefficients). Other columns being dropped is fine: the
+            # X_het coefficient and its (1,1) vcov entry remain
+            # identified, df_ols already reflects the post-drop rank.
+            if not np.isfinite(coefs[1]):
+                results[l_h] = {
+                    "beta": float("nan"),
+                    "se": float("nan"),
+                    "t_stat": float("nan"),
+                    "p_value": float("nan"),
+                    "conf_int": (float("nan"), float("nan")),
+                    "n_obs": n_obs,
+                }
+                continue
             beta_het = float(coefs[1])
             se_het = float("nan")
             if vcov is not None and np.isfinite(vcov[1, 1]) and vcov[1, 1] > 0:
                 se_het = float(np.sqrt(vcov[1, 1]))
-            t_stat, p_val, ci = safe_inference(beta_het, se_het, alpha=alpha, df=n_obs - n_params)
+            t_stat, p_val, ci = safe_inference(beta_het, se_het, alpha=alpha, df=df_ols)
         else:
             # Survey-aware path: WLS with per-group weights + TSL IF variance.
             W_elig = W_g_all[eligible]

diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md