Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
5008d11
Add Roth (2022) paper review for PreTrendsPower methodology audit
igerber May 17, 2026
482eece
Address R1 review (3 P1 + 1 P3) on roth-2022-review.md
igerber May 17, 2026
4932323
Address R2 review (2 P2 + 2 P3) on roth-2022-review.md
igerber May 17, 2026
a80c7bb
Address R3 review (1 P1) on roth-2022-review.md
igerber May 17, 2026
2e7c334
Address R4 review (2 P1 + 1 P2) on roth-2022-review.md
igerber May 17, 2026
cf6f48a
Address R5 review (1 P1 + 2 P2 + 1 P3) on roth-2022-review.md
igerber May 17, 2026
038897d
Address R6 review with holistic restructure (1 P1 + 1 P2 + 1 P3)
igerber May 17, 2026
d7b04e6
Address R7 review (1 P1 + 2 P2) on roth-2022-review.md
igerber May 17, 2026
1e4a2d9
Address R8 review polish (2 P2 + 1 P3) on roth-2022-review.md
igerber May 17, 2026
8f14d73
Address R9 review (2 P1) on roth-2022-review.md
igerber May 17, 2026
3f90526
Address R10 review (1 P1 + 1 P3) on roth-2022-review.md
igerber May 17, 2026
dc518a2
Address R11 polish (1 P2 + 2 P3) on roth-2022-review.md
igerber May 17, 2026
3fa4e47
Address R12 polish (1 P2 + 3 P3) on roth-2022-review.md
igerber May 17, 2026
210fc41
Address R13 polish (1 P2 + 2 P3) on roth-2022-review.md + sibling upd…
igerber May 17, 2026
173d7f0
Address R14 review (1 P1 + 1 P2 + 1 P3)
igerber May 17, 2026
9185204
Address R15 polish (2 P3) on Roth audit suite
igerber May 17, 2026
a6f11d8
Address R16 polish (2 P3-informational, 2 no-action) on PreTrendsPower
igerber May 17, 2026
3f0f2bb
Address R17 blocker via deferral: TODO.md tracking + docstring revert
igerber May 17, 2026
a1d33c8
Address R18 blocker: silent-failure guard on power_at + REGISTRY line…
igerber May 17, 2026
af53b41
Address R19 polish (1 P2 + 2 P3) on PreTrendsPower
igerber May 17, 2026
3350d0b
Address R20 polish (1 P3-informational maintainability) on power_at f…
igerber May 17, 2026
b3f8181
Address PR #463 CI R1 (2 P3) — REPORTING parity + PDF reproducibility
igerber May 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions METHODOLOGY_REVIEW.md
Original file line number Diff line number Diff line change
Expand Up @@ -1053,11 +1053,11 @@ and covariate-adjusted specifications.)
**Documentation in place:**
- REGISTRY.md section: `## PreTrendsPower` (MDV at target power, four violation types — linear/constant/last_period/custom, power curve plotting, HonestDiD integration)
- Implementation: `tests/test_pretrends.py` (point-estimator, MDV, power curve, sensitivity) plus event-study coverage in `tests/test_pretrends_event_study.py`
- Paper review on file: `docs/methodology/papers/roth-2022-review.md` (added 2026-05-17; non-authoritative source audit — registry entry remains authoritative until the follow-up audit PR)

**Outstanding for promotion:**
- Paper review under `docs/methodology/papers/roth-2022-review.md`
- Dedicated `tests/test_methodology_pretrends.py` with paper-equation-numbered Verified Components walk-through
- R parity fixture against the `pretrends` R package (the four power calculations: linear, constant, last-period, custom)
- R parity fixture against the `pretrends` R package at a **pinned revision** (TODO.md tracks the revision-pin follow-up; until that lands, the R-package surface claims in `docs/methodology/papers/roth-2022-review.md` are provisional). Covers the four power calculations: linear, constant, last-period, custom. Note that `compute_pretrends_power` does not accept `violation_weights` today, so `"custom"` parity has to run through `PreTrendsPower(..., violation_weights=...)` directly until the helper is extended (TODO.md tracks the helper-extension follow-up); helper-only parity is limited to `linear` / `constant` / `last_period`.
- Verify the REGISTRY Implementation Checklist (all four items currently unchecked)

---
Expand Down
5 changes: 5 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,11 @@ Deferred items from PR reviews that were not addressed before merge.
| WooldridgeDiD: aggregation weights use cell-level n_{g,t} counts. Paper (W2025 Eqs. 7.2-7.4) defines cohort-share weights. Add optional `weights="cohort_share"` parameter to `aggregate()`. | `wooldridge_results.py` | #216 | Medium |
| WooldridgeDiD: optional *efficiency hint* (NOT a canonical-link violation per W2023 Prop 3.1) when method/outcome pairing is sub-optimal — e.g., `method="ols"` on binary data is consistent under QMLE, but `method="logit"` is typically more efficient. The original framing in this row as a "canonical link requirement" tied to Prop 3.1 was incorrect: Wooldridge (2023) Table 1 lists Gaussian/OLS for "any response" and logistic-Bernoulli for "binary OR fractional". A useful hint exists (efficiency), but should not be framed as a methodology violation. See PR #453 R1 review for the corrected reading. | `wooldridge.py` | #216 | Low |
| WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |
| PreTrendsPower: `compute_pretrends_power` adapter uses `diag(ses^2)` instead of the full pre-period covariance block Σ_22 for `CallawaySantAnnaResults` (deliberate — non-bootstrap CS persists `event_study_vcov`; bootstrap CS fits clear it at `staggered.py:2032-2036`) and `SunAbrahamResults` (forced — SA does not expose an event-study/cohort VCV at all). Roth (2022)'s NIS box probability and the library's Wald object both depend on Σ_22 off-diagonals; diag fallback is not provably conservative. For non-bootstrap CS fits, route through `event_study_vcov`; for bootstrap CS fits the diag fallback is the only path. For SA, extend `SunAbrahamResults` to persist a cohort/event-study VCV (then route the adapter likewise). Or formally retain the diag fallback with explicit miscalibration framing. See REGISTRY.md `## PreTrendsPower` Note (deviation from paper) + `docs/methodology/papers/roth-2022-review.md`. | `diff_diff/pretrends.py:609-687`, `diff_diff/sun_abraham.py:30-88`, `docs/methodology/REGISTRY.md`, `docs/methodology/papers/roth-2022-review.md` | PR-A (Roth paper review, 2026-05-17) | Medium |
| PreTrendsPower: pin the R `pretrends` package commit/release before building the R-parity fixture. The paper review's R-package surface claims (`pretrends()`, `slope_for_power()`, NIS-only API, no joint-Wald target) are provisional pending a pinned revision; the audited revision should be recorded either in the review file's Gaps section or in this TODO row before any parity assertions are committed. | `docs/methodology/papers/roth-2022-review.md`, `METHODOLOGY_REVIEW.md` (PreTrendsPower row) | PR-A (Roth paper review, 2026-05-17) | Low |
| PreTrendsPower: helper `compute_pretrends_power(results, M, alpha, target_power, violation_type, pre_periods)` does NOT accept `violation_weights`, so `violation_type="custom"` is unusable from the helper (class-only today via `PreTrendsPower(..., violation_weights=...)`). Either add `violation_weights` to the helper signature and forward to the class, or document the helper as supporting only `linear` / `constant` / `last_period`. | `diff_diff/pretrends.py:1048-1095, 442-466` | PR-A (Roth paper review, 2026-05-17) | Low |
| PreTrendsPower: `PreTrendsPowerResults.power_at()` does not yet support `violation_type="custom"`. **Silent-failure path was mitigated** in PR-A (2026-05-17, R18 of the codex review): `power_at()` now raises `NotImplementedError` for custom fits rather than returning equal-weights output, locked in by `test_power_at_raises_on_custom_violation_type`. Remaining follow-up: persist the normalized fitted `violation_weights` on `PreTrendsPowerResults` (currently absent at `pretrends.py:77-90`) and re-enable `power_at()` for custom fits, with a parity test comparing `results.power_at(M)` to a fresh `PreTrendsPower(...).fit(..., M=M).power` on a custom-weights fixture. | `diff_diff/pretrends.py:77-90, ~196-235, ~878-892` | PR-A (Roth paper review, 2026-05-17) | Medium |
| PreTrendsPower: `linear` violation pattern does NOT implement Roth's δ_t = γ·t. `_get_violation_weights(violation_type="linear")` constructs a shifted, normalized `[n-1, ..., 1, 0]` direction from `n_pre` only (`pretrends.py:510-515`), and `fit()` never threads actual relative-time labels into that construction (`pretrends.py:862-866`). For irregular pre-period grids (e.g., anticipation-shifted `t ∈ {-5, -3, -1}`) this means the slope reported as MDV is not in Roth's γ units. Fix: build linear weights from the sorted actual relative-time values used in the fit, define the exposed parameter in γ units, persist any normalization separately, and add a regression test using anticipation-shifted / irregular pre-periods. If the shifted convention is intentional, add a `**Note (deviation from paper):**` to REGISTRY.md and convert reported MDV back to Roth's slope scale before exposing it. | `diff_diff/pretrends.py:488-531, 862-866`, `docs/methodology/REGISTRY.md:2786-2789` | PR-A (Roth paper review, 2026-05-17; surfaced by R17 of the iterative codex review on the paper review file) | **High** |
| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 8 standalone estimators that expose `cluster=`: `CallawaySantAnna`, `SunAbraham`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added `vcov_type` to the `DifferenceInDifferences` inheritance chain only. | multiple | Phase 1a | Medium |
| Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster) currently raises `NotImplementedError`. `_compute_bm_dof_from_contrasts` builds its hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but `solve_ols` solves the WLS problem by transforming to `X* = sqrt(w) X`, so the correct symmetric idempotent residual-maker is `M* = I - sqrt(W) X (X'WX)^{-1} X' sqrt(W)`. Rederive the Satterthwaite `(tr G)^2 / tr(G^2)` ratio on the transformed design and add weighted parity tests before lifting the guard. | `linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_validate_vcov_args` | Phase 1a | Medium |
| HC2 / HC2 + Bell-McCaffrey on absorbed-FE fits — REMAINING sub-gates: `TwoWayFixedEffects` (`twfe.py:154` rejects unconditionally); `MultiPeriodDiD(absorb=..., vcov_type in {"hc2","hc2_bm"})` (`estimators.py:1458` rejects). The DiD sub-gate (`DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"})`) was lifted via auto-route to `fixed_effects=` internally; clubSandwich-parity at 1e-10 verified. The same auto-route pattern can apply to MPD-absorb; TWFE is its own class and may need different surgery (TWFE always within-transforms with no equivalent `fixed_effects=` path). Within-transformation preserves coefficients and residuals under FWL but not the hat matrix; HC1/CR1 are unaffected (no leverage term). | `twfe.py::fit`, `estimators.py::MultiPeriodDiD.fit` | follow-up | Medium |
Expand Down
58 changes: 54 additions & 4 deletions diff_diff/pretrends.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,9 +209,31 @@ def power_at(self, M: float) -> float:
-------
float
Power to detect violation of magnitude M.

Raises
------
NotImplementedError
If the fit was made with ``violation_type="custom"``. The
``PreTrendsPowerResults`` dataclass does not currently persist
the fitted ``violation_weights``, so this method cannot
reconstruct the custom weights. Refit
``PreTrendsPower(violation_type="custom", violation_weights=...)``
with the new ``M`` instead. Tracked in TODO.md as a planned
follow-up to persist the fitted weights.
"""
from scipy import stats

if self.violation_type == "custom":
raise NotImplementedError(
"PreTrendsPowerResults.power_at() does not support "
"violation_type='custom': fitted violation_weights are "
"not persisted on the result object, so the custom weights "
"cannot be reconstructed. Refit "
"PreTrendsPower(violation_type='custom', "
"violation_weights=...) with the new M instead. "
"See TODO.md (PreTrendsPower power_at custom path)."
)

n_pre = self.n_pre_periods

# Reconstruct violation weights based on violation type
Expand All @@ -227,8 +249,14 @@ def power_at(self, M: float) -> float:
weights = np.zeros(n_pre)
weights[-1] = 1.0
else:
# For custom, we can't reconstruct - use equal weights as fallback
weights = np.ones(n_pre)
# Fail loud on unknown violation_type values. Mirrors the raise
# at the end of _get_violation_weights(); prevents silent
# equal-weights output if a future violation_type is added to
# fit() but not threaded through power_at().
raise ValueError(
f"Unknown violation_type: {self.violation_type!r}. "
f"Expected one of: 'linear', 'constant', 'last_period', 'custom'."
)

# Normalize weights to unit L2 norm
norm = np.linalg.norm(weights)
Expand Down Expand Up @@ -1067,7 +1095,18 @@ def compute_pretrends_power(
target_power : float, default=0.80
Target power for MDV calculation.
violation_type : str, default='linear'
Type of violation pattern.
Type of violation pattern. This convenience helper supports
``linear`` / ``constant`` / ``last_period`` only and does NOT
accept ``violation_weights``, so passing
``violation_type='custom'`` will raise ``ValueError`` from the
underlying ``PreTrendsPower`` constructor (which requires
``violation_weights`` when ``violation_type='custom'``). To use a
custom violation pattern, instantiate ``PreTrendsPower(...,
violation_weights=...)`` directly. Note that
``PreTrendsPowerResults.power_at()`` on such a fit raises
``NotImplementedError`` because fitted weights are not yet
persisted on the result object; refit with the new ``M`` instead.
Both gaps are tracked in TODO.md until the follow-up audit lands.
pre_periods : list of int, optional
Explicit list of pre-treatment periods. If None, attempts to infer
from results. Use when you've estimated all periods as post_periods.
Expand Down Expand Up @@ -1114,7 +1153,18 @@ def compute_mdv(
target_power : float, default=0.80
Target power for MDV calculation.
violation_type : str, default='linear'
Type of violation pattern.
Type of violation pattern. This convenience helper supports
``linear`` / ``constant`` / ``last_period`` only and does NOT
accept ``violation_weights``, so passing
``violation_type='custom'`` will raise ``ValueError`` from the
underlying ``PreTrendsPower`` constructor (which requires
``violation_weights`` when ``violation_type='custom'``). To use a
custom violation pattern, instantiate ``PreTrendsPower(...,
violation_weights=...)`` directly. Note that
``PreTrendsPowerResults.power_at()`` on such a fit raises
``NotImplementedError`` because fitted weights are not yet
persisted on the result object; refit with the new ``M`` instead.
Both gaps are tracked in TODO.md until the follow-up audit lands.
pre_periods : list of int, optional
Explicit list of pre-treatment periods. If None, attempts to infer
from results. Use when you've estimated all periods as post_periods.
Expand Down
13 changes: 12 additions & 1 deletion docs/methodology/REGISTRY.md
Original file line number Diff line number Diff line change
Expand Up @@ -2761,7 +2761,7 @@ CRITICAL: δ_pre = β_pre pins pre-treatment violations to observed coefficients

## PreTrendsPower

**Primary source:** [Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. *American Economic Review: Insights*, 4(3), 305-322.](https://doi.org/10.1257/aeri.20210236)
**Primary source:** [Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. *American Economic Review: Insights*, 4(3), 305-322.](https://doi.org/10.1257/aeri.20210236). Paper review on file: `docs/methodology/papers/roth-2022-review.md` (non-authoritative source audit; this REGISTRY entry remains the authoritative methodology contract).

**Key implementation requirements:**

Expand Down Expand Up @@ -2793,6 +2793,10 @@ Violation types:
- **Last period**: δ_{-1} = c, others zero
- **Custom**: user-specified pattern

- **Note (deviation from paper — `linear` violation pattern):** the shipped `PreTrendsPower._get_violation_weights("linear")` constructs `[n_pre-1, ..., 1, 0]` from `n_pre` alone and `PreTrendsPower.fit()` never threads the actual relative-time labels into that construction (`pretrends.py:488-531`, `pretrends.py:862-866`). For irregular or anticipation-shifted pre-period grids (e.g., `t ∈ {-5, -3, -1}`), this means the slope reported as MDV is NOT in Roth's `γ` units — the shifted/normalized direction effectively assumes contiguous relative times `{-(n_pre-1), ..., -1}`. The follow-up audit (tracked in TODO.md) will either rebuild `linear` weights from the sorted actual relative-time values and expose the parameter in Roth's `γ` units, or formally retain the current shifted/normalized contract with this Note as the deviation record.

- **Note (silent-failure guard — `power_at()` with `violation_type="custom"`):** `PreTrendsPowerResults` does not currently persist the fitted `violation_weights`, so `power_at(M)` cannot reconstruct the custom direction. As of this commit, `PreTrendsPowerResults.power_at()` raises `NotImplementedError` for `violation_type="custom"` rather than silently returning equal-weights output. To compute power at a new `M` for a custom fit, refit `PreTrendsPower(violation_type="custom", violation_weights=...)` with the new `M`. Tracked in TODO.md as a planned follow-up to persist the fitted weights and lift the guard.

*Standard errors:*
- Power calculations are exact (no sampling variability)
- Uncertainty comes from estimated Σ
Expand All @@ -2802,6 +2806,13 @@ Violation types:
- Single pre-period: power calculation trivial
- Very high power: MDV approaches zero

- **Note (deviation from paper — diagonal pre-period VCV fallback):** Roth (2022)'s power and bias objects (both the paper-analyzed NIS box probability and the library's Wald / noncentral-χ² form) operate on the full pre-period covariance block Σ_22. The shipped `compute_pretrends_power` adapter currently uses different sources for the pre-period covariance by result type:
- `MultiPeriodDiDResults` (`pretrends.py:592-601`): extracts the full pre-period sub-block from `results.vcov` when `interaction_indices` is populated; falls back to `diag(ses^2)` otherwise.
- `CallawaySantAnnaResults` (`pretrends.py:609-652`): hard-codes `vcov = diag(ses^2)`. Non-bootstrap CS fits persist a full `event_study_vcov` matrix (`staggered_results.py:126-128`), so the diag fallback is a deliberate choice in that path. Bootstrap CS fits clear `event_study_vcov` before storing results (`staggered.py:2032-2036`) to prevent mixing analytical VCV with bootstrap SEs, so the full-Σ22 route is not available for bootstrap fits at all.
- `SunAbrahamResults` (`pretrends.py:660-687`): hard-codes `vcov = diag(ses^2)`; the diag fallback is *forced* because `SunAbrahamResults` does not currently expose an event-study or cohort covariance matrix.

Dropping the off-diagonals is NOT a paper-supported numerical choice and is NOT guaranteed to be conservative for MDV/power (the direction of the discrepancy depends on the sign and magnitude of the dropped correlations). The PR-B follow-up audit (tracked in `TODO.md`) will either extend full-sub-VCV consumption to all three paths (with SA also requiring upstream surface work on `SunAbrahamResults`) or formally retain the diag fallback with explicit miscalibration framing. See `docs/methodology/papers/roth-2022-review.md` for the full derivation.

**Reference implementation(s):**
- R: `pretrends` package (Roth's official package)

Expand Down
21 changes: 14 additions & 7 deletions docs/methodology/REPORTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -328,13 +328,20 @@ a library setting.
`DiagnosticReport.pretrends_power` block records
`covariance_source: "diag_fallback_available_full_vcov_unused"` in
that case, and `BusinessReport` downgrades a `well_powered` tier to
`moderately_powered` before rendering prose. This is a known
conservative deviation from the documented "use the full pre-period
covariance" position — it prevents the diagonal approximation from
producing an overly optimistic "well-powered" claim when correlated
pre-period errors could tighten the MDV. The right long-term fix is
to teach `compute_pretrends_power()` to consume `event_study_vcov`
and `event_study_vcov_index`; until that lands this downgrade stays.
`moderately_powered` before rendering prose. This is a documented
deviation from the paper-derived "use the full pre-period covariance"
position. **Not provably conservative**: under Roth (2022)'s NIS
framework and the library's Wald form, the MDV/power objects depend
on the off-diagonals of Σ_22, and the direction of the discrepancy
between full-Σ_22 and diag(ses^2) depends on the sign and magnitude
of the dropped correlations — see the `**Note (deviation from paper
— diagonal pre-period VCV fallback):**` block under `## PreTrendsPower`
in `docs/methodology/REGISTRY.md`. The `well_powered → moderately_powered`
downgrade in BusinessReport reduces the chance of an overly optimistic
claim in practice, but it is not a proof of conservatism. The right
long-term fix is to teach `compute_pretrends_power()` to consume
`event_study_vcov` and `event_study_vcov_index`; until that lands the
downgrade stays.

- **Note:** Unit-translation policy. BusinessReport does not
arithmetically translate log-points to percents or level effects to
Expand Down
Loading
Loading