diff --git a/CHANGELOG.md b/CHANGELOG.md index 8dc9fb16..e0d494dc 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] ### Added +- **`TwoWayFixedEffects(vcov_type in {"hc2","hc2_bm"})` now supported** (`diff_diff/twfe.py:155`). Lifts Gate 1 of the six HC2/HC2-BM `NotImplementedError` gates — the last absorbed-FE gate (DiD-absorb shipped earlier, MPD-absorb shipped earlier, MPD cluster+contrast-DOF shipped earlier in this release). Unlike DiD / MPD, TWFE has no `absorb=` / `fixed_effects=` parameter to swap (unit + time FEs are baked into the estimator's identity), so the same auto-route trick isn't applicable. Instead, `TwoWayFixedEffects.fit()` bypasses the within-transform when `vcov_type in {"hc2","hc2_bm"}` and stacks the full-dummy design `[intercept, treated×post, covariates, factor(unit), factor(time)]` explicitly, then runs OLS through the standard `solve_ols` path so the leverage correction `h_ii = x_i' (X'X)^{-1} x_i` and CR2 Bell-McCaffrey adjustment `A_g = (I - H_gg)^{-1/2}` compute on the full FE projection (FWL preserves coefficients and residuals but NOT the hat matrix). Verified at `atol=1e-10` vs `lm(y ~ treat_post + factor(unit) + factor(post)) + sandwich::vcovHC(type="HC2")` for HC2, vs `clubSandwich::vcovCR(cluster=seq_len(n), type="CR2") + coef_test()$df_Satt` for the singleton-cluster one-way HC2-BM Satterthwaite DOF, and vs `vcovCR(cluster=unit, type="CR2")` for the auto-cluster CR2-BM path (new `twfe_two_period` scenario in `benchmarks/data/clubsandwich_cr2_golden.json`). **Auto-cluster default:** TWFE's unit auto-cluster is preserved on `hc2_bm` (routes to CR2-BM at unit) and on `hc2 + wild_bootstrap` (the bootstrap consumes the cluster structure for resampling regardless of the analytical sandwich choice); dropped on explicit `hc2 + analytical` to match the one-way contract (the linalg validator rejects `hc2 + cluster_ids`). **User-visible surface change** (matches the DiD-absorb / MPD-absorb disclosures above): under `vcov_type in {"hc2","hc2_bm"}`, `result.coefficients`, `result.vcov`, `result.residuals`, `result.fitted_values`, and `result.r_squared` reflect the full-dummy fit rather than the within-transformed reduced fit (FE-dummy entries are included alongside the `"ATT"` key; `r_squared` is computed on the un-demeaned outcome; residuals / fitted are on the original scale; `len(result.coefficients) == result.vcov.shape[0]` invariant upheld). `result.att`, its SE, and analytical inference are unchanged (FWL-equivalent). HC1 / CR1 / Conley / classical paths remain on the within-transform. **Survey-design scope** (mirrors DiD-absorb): when `survey_design=` is supplied, the existing survey variance path (Taylor-series linearization or replicate-weight variance) takes precedence over the analytical HC2/HC2-BM sandwich; the full-dummy build only changes FE handling. **Rejected combos:** `vcov_type in {"hc2","hc2_bm"}` + replicate-weight survey designs (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` at `twfe.py:~233` because the replicate path re-demeans per replicate, which doesn't compose with the full-dummy build (would require per-replicate full-dummy refit); workaround: use `vcov_type="hc1"` for replicate-weight CR1. `hc2_bm + weights` remains blocked at the linalg validator (same gate as Gates 4-5 — weighted CR2 variants). New tests: `tests/test_estimators_vcov_type.py::TestFitBehavior` (9 tests: rejection flip → behavioral; refactor regression vs `DifferenceInDifferences(fixed_effects=[unit, time])` at `atol=1e-12`; auto-cluster default coverage on `hc2_bm`; explicit `hc2 + analytical` no-auto-cluster; `hc2 + wild_bootstrap` auto-cluster preserved; `hc2 / hc2_bm + replicate` rejection; always-treated unit finite ATT; coefficients-vs-vcov alignment invariant); `tests/test_methodology_twfe.py::TestTWFEHC2RParity` (3 R-parity tests at `atol=1e-10`). - **Agent-discoverability contract test (`tests/test_agent_discoverability.py`).** New static-snapshot test pinning the agent-facing surface introduced by PR #464: `__all__` membership of `agent_workflow` / `profile_panel` / `get_llm_guide` / `practitioner_next_steps` / `BusinessReport`; `dir(diff_diff)` head-first ordering against `_AGENT_FACING_ORDER` (catches drift in the `_OrderedName` `__lt__` ordering trick); `_OrderedName` `isinstance(_, str)` + str-method compatibility; `dir()` full-namespace + `inspect.getmembers` parity; top-level `__doc__` first-paragraph mention of `agent_workflow` + named references to the 5-step workflow primitives; `agent_workflow()` script content references each downstream helper by name; canonical estimator class names (CallawaySantAnna, ContinuousDiD, HeterogeneousAdoptionDiD, etc.) remain importable. No live API calls; runs in the default pytest suite. Closes [issue #461](https://github.com/igerber/diff-diff/issues/461) (snapshot variant — live-agent regression test deferred to a separate follow-up that depends on causal-llm-eval packaging its harness). Also closes the `__dir__()` contract-test row from `TODO.md` that PR #464 deferred here. - **`diff_diff.agent_workflow(df, unit=..., time=..., treatment=..., outcome=...)` — stateless orchestrator for LLM-agent discoverability** (`diff_diff/agent_workflow.py`). Prints (and returns as dict) a copy-pasteable 5-step workflow with the caller's column names templated in: `profile_panel` → `get_llm_guide("autonomous")` → `(...).fit(df, ...)` → `practitioner_next_steps(result)` → `BusinessReport(result).full_report()`. The function calls nothing internally and does not inspect `df`; it is a guided tour, not a router. Surfaces the canonical workflow primitives (`profile_panel`, `get_llm_guide`, `practitioner_next_steps`, `BusinessReport`) that cold-start agent dry-passes at [igerber/causal-llm-eval](https://github.com/igerber/causal-llm-eval) showed agents practically never reach for on their own. Output structure: `{"profile_call", "guide_call", "fit_candidates", "validation_calls", "reporting_call", "script"}`; `fit_candidates` is a flat list of estimator/diagnostic class names referenced in the workflow patterns (each must remain importable on `diff_diff`, locked by `tests/test_agent_workflow.py::test_fit_candidates_all_importable`). Closes [issue #460](https://github.com/igerber/diff-diff/issues/460). - **Top-level `__doc__` rewritten to lead with the agent workflow** (`diff_diff/__init__.py`). `help(diff_diff)` now opens with the `agent_workflow(df, ...)` recommendation as the first non-blank paragraph; `get_llm_guide("full")` and `get_llm_guide("practitioner")` pointers preserved for the existing `tests/test_guides.py::test_module_docstring_mentions_helper` guard. @@ -19,8 +20,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **Helper API: `compute_pretrends_power` and `compute_mdv` now accept `violation_weights` and `pretest_form` (PR-B Step 6).** Closes the PR-A R18 helper/class API gap that previously made `violation_type='custom'` unusable from the helper functions. Helpers now forward both new parameters to the underlying `PreTrendsPower` class. Default `pretest_form='nis'` matches the class default. All existing helper call sites in `test_pretrends.py` and `test_pretrends_event_study.py` continue to pass without changes because the form-invariance of most assertions allowed the default flip with only 3 tests needing targeted updates. - **NEW `tests/test_methodology_pretrends.py` (PR-B Step 7).** Roth (2022) Section II.A-B paper-equation-numbered Verified Components walk-through. 8 classes, 30+ tests covering K=1 closed-form (Proposition 2 proof), NIS box probability via MC simulation cross-check, Propositions 1-4 simulation parity, linear-units γ-scale verification on regular / irregular / pandas.Period grids, custom-weight persistence regression, JSON-serializability of `to_dict`, CS/SA full-VCV adapter regression, helper API end-to-end, NIS-vs-Wald differentiation, and skip-gated `TestPretrendsParityR` stubs for PR-C R-package goldens. - **`benchmarks/R/generate_pretrends_golden.R` (PR-B Step 12).** R generator script for the PR-C deferred goldens. Script committed with a `` placeholder commit reference; PR-C pins the audited `pretrends` revision, runs the script, commits the JSON goldens at `benchmarks/data/r_pretrends_golden.json`, and activates the parity tests. -- **`MultiPeriodDiD(absorb=..., vcov_type in {"hc2", "hc2_bm"})` now supported** (`diff_diff/estimators.py:1476`). Mirrors the DiD-absorb auto-route shipped earlier in this release: when `absorb=` is paired with `vcov_type in {"hc2","hc2_bm"}`, `MultiPeriodDiD.fit()` promotes the absorb columns to `fixed_effects=` internally so the existing full-dummy-design code path computes the algebraically correct vcov on the event-study design (`treated + period_X dummies + treated:period_X interactions + factor(unit)`). Verified at ~1e-10 vs `lm() + sandwich::vcovHC(type="HC2")` and `lm() + clubSandwich::vcovCR(cluster=1:n, type="CR2")` on a 5-cohort × 5-period event-study fixture (new `tests/test_estimators_vcov_type.py::TestMPDAbsorbedFERParity` against `benchmarks/data/clubsandwich_cr2_golden.json` scenario `mpd_absorbed_fe_did`). HC1/CR1 paths on `absorb=` are unchanged (no leverage term). `TwoWayFixedEffects(vcov_type in {"hc2","hc2_bm"})` rejection remains as a follow-up (different fit-path structure — no `fixed_effects=` equivalent inside TWFE). **Behavioral note (full `MultiPeriodDiDResults` surface change under auto-route):** under the auto-route, the entire returned `MultiPeriodDiDResults` reflects the full-dummy fit rather than the within-transformed fit — `result.coefficients`, `result.vcov`, `result.residuals`, `result.fitted_values`, `result.r_squared` all include the FE-dummy entries / un-demeaned values. `result.period_effects[t].effect` / `.se` / `.p_value` / `.conf_int` and `result.avg_att` / `.avg_se` are invariant to this routing (FWL guarantee). MPD requires a time-invariant ever-treated indicator that lies in the span of the intercept and the post-auto-route unit FE dummies (the exact alias depends on the omitted FE reference category under `pd.get_dummies(drop_first=True)`, not just on "the sum of treated-cohort unit dummies"), so `solve_ols` drops one column from that collinear set under R-style rank-deficiency handling. Which specific column is dropped is pivot-order and dummy-coding dependent (in the shipped parity fixture it is a never-treated unit dummy, not the `treated` main effect itself). The per-period interaction coefficients (`treated:period_X`) and `avg_att` are identified and invariant to that choice; parity tests target those rather than the `treated` main effect. **Survey-design scope (replicate weights):** when `survey_design=` uses replicate weights, the auto-route short-circuits the absorb-refit branch at `estimators.py:1693` and routes through the standard `compute_replicate_vcov` path on the fixed full-dummy design — correct because the design does not depend on replicate weights so no per-replicate refit is needed. **Redundant time-FE skip:** when the routed (or directly-supplied) `fixed_effects` list contains the `time` column, MPD silently skips emitting `