From 5008d11a6c08e6cf464baec3d35a749a993d2037 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 16 May 2026 20:49:05 -0400
Subject: [PATCH 01/22] Add Roth (2022) paper review for PreTrendsPower
 methodology audit

PR-A of the 3-PR PreTrendsPower methodology review sequence (paper review
-> implementation audit -> R parity goldens). Produces the structured
methodology contract that will inform the PR-B audit of diff_diff/pretrends.py
against the paper's exact equations.

The review covers:
- Propositions 1-4 (bias decomposition, sign-of-bias under monotone trend,
  variance decomposition, convexity gives variance reduction)
- Power calculation algorithm for the slope gamma_p at target power p
- Bias and coverage calculation framework (analytical via tmvtnorm-equivalent
  truncated MVN moments; simulation fallback)
- Implementation notes and 11-item gaps list flagging items the PR-B audit
  will need to resolve (notably: NIS vs Wald acceptance region, violation-type
  extensions beyond linear, R-package version pin for parity)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/roth-2022-review.md | 251 ++++++++++++++++++++
 1 file changed, 251 insertions(+)
 create mode 100644 docs/methodology/papers/roth-2022-review.md

diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
new file mode 100644
index 00000000..844faec6
--- /dev/null
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -0,0 +1,251 @@
+# Paper Review: Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends
+
+**Authors:** Jonathan Roth
+**Citation:** Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. *American Economic Review: Insights*, 4(3), 305-322.
+**PDF reviewed:** papers/roth-2022.pdf (18 pages, content pages 1-15)
+**Review date:** 2026-05-16
+**DOI:** https://doi.org/10.1257/aeri.20210236
+
+---
+
+## Methodology Registry Entry
+
+*Formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries — copy the `## PreTrendsPower` section into the registry (replacing the existing stub at line 2758).*
+
+## PreTrendsPower
+
+**Primary source:** [Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. *American Economic Review: Insights*, 4(3), 305-322.](https://doi.org/10.1257/aeri.20210236)
+
+**Key implementation requirements:**
+
+*Assumption checks / warnings:*
+- Input: event-study coefficient vector beta_hat = (beta_hat_pre, beta_hat_post)' that is asymptotically normal under the underlying estimator (Equation 1; Remark 1 lists TWFE, GMM, Freyaldenhoven-Hansen-Shapiro, regression-adjustment/IPW/DR DiD per Sant'Anna-Zhao, Callaway-Sant'Anna, Sun-Abraham)
+- Input: estimated variance-covariance matrix Sigma_hat in R^{(K+M) x (K+M)} where K = # pre-period coefficients, M = # post-period coefficients
+- Pre-trend zero-anticipation assumption: tau_pre = 0 (Equation 2) — same identifying convention as Rambachan-Roth (2023)
+- Warn if pretest has low power: e.g., if the slope at 80% power (gamma_{0.8}) produces a |bias| > |estimated treatment effect|, the pretest is uninformative for the magnitudes that matter
+- Warn that pretest-conditioning distortions are NOT removed by larger samples — they persist as long as the pretest can fail with non-vanishing probability (footnote 12)
+
+*Causal decomposition (Equation 2):*
+
+    beta = (delta_pre, delta_post)' + (0, tau_post)'
+              \--------------/         \---------/
+                  delta = bias               tau = causal effect
+                  from trends
+
+where tau_pre = 0 by the no-anticipation assumption and delta is the bias from a difference in trends. The pretest acts on beta_hat_pre, which equals delta_pre under no anticipation.
+
+*Acceptance region of the standard "no individually significant" (NIS) pretest:*
+
+    B_NIS(Sigma) = { b in R^K : |b_t| <= 1.96 * sigma_t, for all t in {-K, ..., -1} }
+
+This corresponds to checking individual 95% CIs of each pre-period coefficient (the dominant convention in applied work: 11 of 12 surveyed papers, per Section I.B).
+
+Alternative acceptance regions supported by the framework:
+- **Joint Wald (chi-squared)**: B_W(Sigma) = { b : b' Sigma_22^{-1} b <= chi^2_{1-alpha, K} } (1 of 12 surveyed papers, plus Roth's discussion of joint tests)
+- **Slope-of-best-fit-line t-test**: discussed in Section I and tabulated in Table 1 column "|t| for slope"
+- **Custom user-supplied B(Sigma)**: any (measurable) acceptance set; Propositions 1, 3, 4 apply for any B; Proposition 2 requires the specific NIS form
+
+*Conditional bias after pretesting (Proposition 1):*
+
+    E[beta_hat_post | beta_hat_pre in B(Sigma)]
+        = tau_post + delta_post + Sigma_{12} Sigma_{22}^{-1} ( E[beta_hat_pre | beta_hat_pre in B(Sigma)] - beta_pre )
+
+The third (pretest bias) term depends on:
+- Sigma_{12} Sigma_{22}^{-1}: the regression coefficient of beta_hat_post on beta_hat_pre (akin to "leakage" from pre to post via the covariance)
+- The distortion E[beta_hat_pre | beta_hat_pre in B(Sigma)] - beta_pre: how much pretest conditioning skews the pre-period means
+
+*Sign-of-bias result under monotone trend (Proposition 2 + Assumption 1):*
+
+    Assumption 1: Sigma has a common term sigma^2 on the diagonal and a common term rho > 0 off the diagonal, with sigma^2 > rho.
+
+    If delta_pre < 0 elementwise and delta_post > 0 (upward pretrend), then:
+        E[beta_hat_post | beta_hat_pre in B_NIS(Sigma)] > beta_post > tau_post
+
+(Bias is worse after pretesting under monotone violations; symmetric statement for downward pretrend.)
+
+*Variance after pretesting (Proposition 3):*
+
+    Var[beta_hat_post | beta_hat_pre in B(Sigma)]
+        = Var[beta_hat_post]
+          + (Sigma_{12} Sigma_{22}^{-1}) (Var[beta_hat_pre | beta_hat_pre in B(Sigma)] - Var[beta_hat_pre]) (Sigma_{12} Sigma_{22}^{-1})'
+
+*Convexity gives variance reduction (Proposition 4):*
+
+    If B(Sigma) is a convex set, then Var[beta_hat_post | beta_hat_pre in B(Sigma)] <= Var[beta_hat_post].
+
+Implication: under parallel trends (delta = 0), conventional 95% CIs OVER-cover conditional on passing the pretest (CIs are based on the unconditional variance, which is too large). When parallel trends is violated, conventional 95% CIs UNDER-cover, because the bias dominates the variance reduction.
+
+*Target parameter (Section I.C):*
+
+    tau_* = l' tau_post, for some user-specified l in R^M
+
+Defaults Roth uses:
+- **Average post-treatment effect**: tau_bar = (1/M)(tau_1 + ... + tau_M), i.e., l = (1/M, ..., 1/M)' (main text emphasis)
+- **First-period-after-treatment effect**: tau_1, i.e., l = (1, 0, ..., 0)' (online Appendix)
+- **Custom**: any user-specified contrast l
+
+*Plug-in estimator and CI (Section I.C):*
+
+    tau_hat = l' beta_hat_post
+    CI_{tau_*} = tau_hat +/- 1.96 * sigma_{tau_hat}, where sigma^2_{tau_hat} = l' Sigma l
+
+*Power calculation against a linear violation (Section I.C "Power Calculations"):*
+
+For a linear violation with slope gamma (so delta_t = gamma * t with relative time t),
+the pretest "passes" probability is
+
+    P( beta_hat_pre in B_NIS(Sigma) ) = P( |beta_hat_pre,t| <= 1.96 * sigma_t, for all t )
+
+where beta_hat_pre ~ N(delta_pre, Sigma_22) with delta_pre,t = gamma * t. The library should
+solve for gamma at target power 1 - p in {0.5, 0.8}:
+
+    gamma_{1 - p} = inf{ gamma : P( beta_hat_pre NOT in B_NIS(Sigma) | delta = gamma * t ) >= 1 - p }
+
+These are Roth's gamma_{0.5} and gamma_{0.8} ("the slopes against which pretests have 50%
+or 80% power"). Roth uses 80% as a benchmark following Cohen (1988); 50% is supplementary.
+
+*Bias and size calculations against a given gamma (Section I.C):*
+
+- **Unconditional bias**: E[tau_hat - tau_*] = l' delta_post (with delta_t = gamma * t for relevant t)
+- **Conditional bias**: E[tau_hat - tau_* | beta_hat_pre in B_NIS(Sigma)] (computed via Proposition 1)
+- **Unconditional null rejection**: P(tau_* in CI_{tau_*}^c) under linear trend
+- **Conditional null rejection**: P(tau_* in CI_{tau_*}^c | beta_hat_pre in B_NIS(Sigma))
+
+*Computational shortcut (footnote 8):*
+
+Under joint normality, these probabilities and conditional moments can be calculated
+ANALYTICALLY using results from Cartinhour (1990) and Manjunath & Wilhelm (2012) — Roth
+implements via the R package `tmvtnorm`. Roth verifies simulations yield similar results.
+The library should support both an analytical truncated-multivariate-normal path AND a
+simulation fallback.
+
+*Standard errors (Section II.C; footnote 7 equivariance):*
+- Power calculations are EXACT (no sampling variability — gamma is computed against a hypothesized population trend, not estimated)
+- Uncertainty comes entirely from the user-supplied Sigma
+- Roth's bias and coverage results have NO dependence on the value of tau_post (footnote 7: the distribution of beta_hat_post conditional on beta_hat_pre passing the pretest is equivariant w.r.t. tau_post)
+
+*Edge cases (paper-stated):*
+- **Linear vs nonlinear violations**: paper formally analyzes linear trends; Caveats (Section I.D) note results extend to monotone nonlinear violations under homoskedasticity (Proposition 2); arbitrarily nonlinear violations addressed heuristically — bias is worse for exponentially-growing trends, better for log/shallow trends as pre-periods grow
+- **Adding more pretreatment periods**: helps power for linear/log trends, does NOT help (and can hurt) for trends concentrated near treatment (e.g., COVID-19-like shocks)
+- **K = 1 (single pre-period)**: explicit closed-form intuition via univariate truncated normal in proof of Proposition 2: E[beta_hat_pre | beta_hat_pre in B_NIS] - beta_pre proportional to phi(-1.96 - beta_pre/sigma) - phi(1.96 - beta_pre/sigma)
+- **Symmetric two-sided pretests under parallel trends**: beta_hat_post remains UNBIASED for tau_post (E[beta_hat_pre | beta_hat_pre in B] = 0 if B is symmetric and beta_pre = 0)
+- **Heteroskedastic Sigma (off-diagonal not constant)**: Proposition 2 requires Assumption 1; under arbitrary Sigma, sign of pretest-bias term is ambiguous (worked out in Proposition 1's general form)
+- **Publication-bias trade-off (Equation 4, Section II.D)**: pretest-as-screen can REDUCE or INCREASE published bias depending on Bayes-factor of design type vs the bias-given-publication ratio; uninformative pretests are unambiguously harmful
+
+*Algorithm (no numbered algorithm in paper; implementation distilled from Section I.C):*
+
+1. Take user-supplied (beta_hat, Sigma, K, M) and a target estimand l in R^M (default: l = uniform 1/M)
+2. Compute B_NIS(Sigma) acceptance region using diagonal sigma_t = sqrt(Sigma_{tt}) for t in pre periods
+3. **Power**: solve gamma_{1-p} = root of P(reject pretest | gamma) = 1 - p
+   - For each candidate gamma, compute P(beta_hat_pre in B_NIS) under beta_hat_pre ~ N(gamma * t_pre, Sigma_22) using `tmvtnorm`-style multivariate normal CDF; or via simulation
+4. **Bias**: for gamma in {0, gamma_{0.5}, gamma_{0.8}, user-custom}:
+   - Compute unconditional bias = l' delta_post where delta_post,m = gamma * m
+   - Compute conditional bias via Proposition 1: requires E[beta_hat_pre | beta_hat_pre in B_NIS] from truncated MVN
+5. **Coverage**: for the same gamma values, compute unconditional and conditional null rejection probabilities P(tau_* not in CI):
+   - Unconditional: P(|tau_hat - tau_*|/sigma_{tau_hat} > 1.96) under beta_hat ~ N(beta, Sigma)
+   - Conditional: P(|tau_hat - tau_*|/sigma_{tau_hat} > 1.96 | beta_hat_pre in B_NIS) — joint truncated MVN
+6. Return a structured summary (Roth's Table 2/Table 3 layout)
+
+**Reference implementation(s):**
+- R: [`pretrends`](https://github.com/jonathandroth/pretrends) (Jonathan Roth's own package) and the accompanying Shiny app
+- R dependency: [`tmvtnorm`](https://cran.r-project.org/package=tmvtnorm) (Manjunath & Wilhelm 2012) for truncated multivariate normal moments and CDF
+
+**Requirements checklist:**
+- [ ] Acceptance regions: NIS (individual t), joint Wald, slope-of-best-fit, custom B
+- [ ] Power calculation against linear violation with slope gamma — solve for gamma_{0.5} and gamma_{0.8}
+- [ ] Analytical truncated multivariate normal path (tmvtnorm-equivalent) + simulation fallback
+- [ ] Unconditional and conditional bias for arbitrary linear contrast l in R^M
+- [ ] Unconditional and conditional null rejection / coverage for the same linear contrast
+- [ ] Custom slope / nonlinear trend hypotheses (linear, constant level shift, last-period jump, custom delta vector)
+- [ ] Plot of bias against pretest power for visual reporting (Roth's Figure 1 style)
+- [ ] Composes with HonestDiD result objects (shared beta_hat, Sigma_hat input contract)
+
+---
+
+## Implementation Notes
+
+### Data Structure Requirements
+- **Input**: beta_hat in R^{K+M} (concatenated pre + post event-study coefficients), Sigma_hat in R^{(K+M) x (K+M)} (variance-covariance matrix), integer K (# pre-period coefficients), integer M (# post-period coefficients)
+- **Optional input**: linear contrast l in R^M (defaults to uniform 1/M for average post-treatment effect, or e_1 for first-period-only)
+- **Optional input**: significance level alpha (default 0.05 → critical value 1.96)
+- **Optional input**: target power levels (default {0.5, 0.8} per Roth)
+- The pre-period coefficients are typically indexed by relative time t in {-K, -K+1, ..., -1}, with t = 0 omitted as the reference period
+- Compatible with the result classes of: MultiPeriodDiD (event study), CallawaySantAnna (staggered), SunAbraham (interaction-weighted), Freyaldenhoven-Hansen-Shapiro (covariate-based)
+
+### Computational Considerations
+- **Truncated MVN moments and probabilities**: scipy.stats has only the univariate case; library options for K > 1 are (a) port `tmvtnorm` (Manjunath-Wilhelm closed-form for orthant moments + Cartinhour 1990 for the rectangular box), (b) Monte Carlo simulation with rejection sampling. Recommend implementing both paths and validating equivalence at alpha-tol = 1e-3 for small K.
+- **Cost**: dominated by the multivariate normal box probability evaluations. For K <= 5, analytical methods are fast. For K > 10, simulation is preferable.
+- **Root-finding for gamma_p**: monotone function of gamma; use bisection over [0, gamma_max] with gamma_max derived from a univariate upper bound (largest |gamma| at which power = 1).
+- **Memoization**: power and bias share intermediate quantities (truncated MVN moments); cache by gamma.
+
+### Tuning Parameters
+
+| Parameter | Type | Default | Selection Method |
+|-----------|------|---------|-----------------|
+| `alpha` | float in (0, 1) | 0.05 | Standard significance level for pretest and reporting CI |
+| `target_power` | list[float] in (0, 1) | [0.5, 0.8] | Roth's reported benchmarks (Cohen 1988 conventional 0.8; 0.5 for "even-odds detection") |
+| `l` (contrast) | array in R^M | uniform 1/M | User-specified linear functional of tau_post |
+| `pretest_form` | enum | "individual" (NIS) | "individual", "joint_wald", "slope", "custom" |
+| `acceptance_region` | callable or set | B_NIS | Custom B(Sigma) for "custom" pretest_form |
+| `method` | enum | "analytical" | "analytical" (tmvtnorm-equivalent) or "simulation" |
+| `n_sim` | int | 10000 | Monte Carlo iterations when method="simulation" |
+
+### Relation to Existing diff-diff Estimators
+- **Pre-existing `diff_diff/pretrends.py`** (1133 lines) — implements a Roth-2022 framework; this paper review's main use is to audit the existing surface against the paper's exact equations
+- **Composes with**: `MultiPeriodDiD`, `CallawaySantAnna`, `SunAbraham`, `TwoWayFixedEffects` — any estimator producing an event-study coefficient vector and a consistent variance estimator
+- **Complement to `HonestDiD` (Rambachan-Roth 2023)**: Roth 2022 asks "what bias survives a pretest under linear violations?"; Rambachan-Roth 2023 asks "what is the identified set of tau_post under bounded violations?" Both use the same (beta_hat, Sigma_hat) input contract — the library should expose a unified entry-point that can produce both Roth-2022 and HonestDiD reports from one event-study result object.
+- **Shares zero-anticipation convention with HonestDiD**: tau_pre = 0, so beta_pre = delta_pre. Cross-reference the existing `diff_diff/honest_did.py` for the contract.
+
+---
+
+## Key Theorems / Propositions
+
+| # | Statement | Implementation use |
+|---|-----------|---------------------|
+| **Proposition 1** | For any B(Sigma): E[beta_hat_post | beta_hat_pre in B] = tau_post + delta_post + Sigma_{12} Sigma_{22}^{-1} (E[beta_hat_pre | beta_hat_pre in B] - beta_pre) | The main bias decomposition formula. Drives the conditional-bias computation in step 4 of the algorithm. |
+| **Proposition 2** | Under Assumption 1 (homoskedastic-equicorrelated Sigma) and monotone trend (delta_pre < 0, delta_post > 0): E[beta_hat_post | beta_hat_pre in B_NIS] > beta_post > tau_post | Justifies WARN that conditional bias is worse than unconditional bias under monotone trends — applicable in many but not all empirical settings. Library should detect when Assumption 1 holds (e.g., balanced panel + cluster-robust at unit level + equicorrelated errors) and surface this warning more strongly. |
+| **Proposition 3** | Var[beta_hat_post | beta_hat_pre in B] = Var[beta_hat_post] + (Sigma_{12} Sigma_{22}^{-1}) (Var[beta_hat_pre | beta_hat_pre in B] - Var[beta_hat_pre]) (Sigma_{12} Sigma_{22}^{-1})' | The conditional-variance formula; drives the over/under-coverage analysis. |
+| **Proposition 4** | If B(Sigma) is convex: Var[beta_hat_post | beta_hat_pre in B] <= Var[beta_hat_post]. CIs based on unconditional Sigma OVER-cover under parallel trends, UNDER-cover under violations. | Justifies the "do not interpret a wide CI as ample power" warning. |
+
+No formal theorems are stated for the publication-rules analysis (Section II.D); Equation 4 is the operational result.
+
+---
+
+## Calibrated DGP for Simulations (Section I.C "Calibrating the Model")
+
+For each paper in Roth's empirical survey:
+
+1. Calibrate finite-sample normal model (Equation 1): beta_hat ~ N(beta, Sigma) with K pre-periods + M post-periods matching the original paper
+2. Set Sigma = estimated variance-covariance matrix from the original paper (using whatever clustering method the authors specified)
+3. Set tau_post = original paper's beta_hat_post (footnote 7: has no impact on bias/coverage results by equivariance)
+4. Calibrate delta to a linear trend with slope gamma_{0.5} or gamma_{0.8}
+5. Re-compute power, bias, and coverage analytically (or by simulation)
+
+**Test fixture suggestion for the library**: a Roth-2022 parity test against one of the 12 papers in Table 1 (e.g., Bailey & Goodman-Bacon 2015 has 5 pre-periods + a clean calibrated VCV available in his replication data — `https://doi.org/10.3886/E151982V1`).
+
+---
+
+## Empirical Findings (Section I.C "Results"; Tables 2-3)
+
+Quoting Roth's key empirical results (for cross-validation):
+
+- **Power**: in the most extreme paper (Deryugina 2017), an unconditional bias of magnitude comparable to the estimated effect is detected only 50% of the time
+- **Coverage**: under gamma_{0.8} (80%-power slope), unconditional null rejection rates of 95% CIs range from 53% to 98% across the 12 papers
+- **Pretest bias**: percent additional bias from pretest conditioning (Table 3, gamma_{0.8}, tau_1): from -34% (Bosch-Campos-Vazquez 2014, beneficial — rare) to +120% (Deryugina 2017, harmful — common); paper-aggregate finding is that conditional bias EXCEEDS unconditional bias in 9 of 12 papers for tau_1 and in 10 of 12 for tau_bar
+- **Equation 4 sign**: the relative-fraction term is < 1 (pretest helps screen out biased designs); the conditional-bias term is typically > 1 (pretest amplifies bias when a biased design is published); net sign depends on which dominates — the paper does not provide closed-form criteria
+
+---
+
+## Gaps and Uncertainties
+
+- **Joint Wald acceptance region**: paper mentions joint tests only briefly (Section I.B notes 1 of 12 papers uses one). Power, bias, and coverage formulas all apply by replacing B_NIS with the joint Wald acceptance region B_W, but Roth does not work out a separate table. Library should implement both but test against R `pretrends` for the joint-Wald case (Roth's package supports it).
+- **"Slope-of-best-fit-line t-test" acceptance region**: Table 1 column shows the t-stat for the slope of the linear pre-trend. Paper does not analyze pretests based on this t-stat as a separate acceptance region; library should NOT extrapolate without further reading the `pretrends` package source.
+- **Nonlinear violations**: Section I.D acknowledges results extend to monotone violations under homoskedasticity (Proposition 2), but the linear-violation framework is the operational benchmark. Library's `violation_type in {"linear", "constant", "last_period", "custom"}` (per the existing REGISTRY entry) appears to predate the paper — the paper itself only formally analyzes linear violations. "Constant" and "last_period" are likely Roth-package extensions for practical reasoning; library should document this as an extension beyond Roth's published analysis.
+- **Custom delta**: paper does not propose a "custom delta vector" interface; this is an extension by Roth's R package. The library should preserve the convention.
+- **Choice of contrast l**: paper highlights l = uniform 1/M (average post-treatment) and l = e_1 (first period after treatment). No guidance on other contrasts (e.g., long-run effect l = e_M, dynamic-weighted contrast) — library should document defaults and warn that bias and coverage depend on l.
+- **K = 0 (no pre-periods)**: trivially no pretest possible; library should error.
+- **Heteroskedastic Sigma**: Proposition 2 requires Assumption 1. Library implements computations under arbitrary Sigma via Proposition 1; the sign of the bias-amplification effect is then NOT guaranteed. Library should NOT print "pretest amplifies bias under monotone trends" unless Assumption 1 is approximately satisfied (or just always issue the conditional warning).
+- **Equation 4 publication-rules analysis**: not standardly implemented in PreTrendsPower-style tools. Roth notes it as part of the discussion (Section II.D) but does not provide a numerical workflow for users. Library should NOT attempt to implement Equation 4 unless requested.
+- **Connection to `compute_pretrends_power` library helper** (referenced in feedback memory `feedback_verdict_powered_by_tools.md`): the paper review confirms that "minimum slope detectable at 80% power" is exactly Roth's gamma_{0.8}, and the library helper should compute and surface this. Need to verify the existing helper's calling convention against the paper's framework when auditing `diff_diff/pretrends.py`.
+- **R `pretrends` package version**: paper cites the package at https://github.com/jonathandroth/pretrends; no specific version cited. R-parity work should pin to a specific commit and document.
+- **Compatibility with multi-cohort estimators**: Remark 1 lists Callaway-Sant'Anna, Sun-Abraham, etc. as compatible. The paper does not detail how to construct (beta_hat, Sigma_hat) from those estimators when the event-study output is multi-cohort (e.g., cohort × event-time matrix). Library should document the aggregation convention (per Sun-Abraham overall ATT or per Callaway-Sant'Anna `aggregate=event`).

From 482eece00d7de2e5a9b2d06ea168af1554931314 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 16 May 2026 20:58:17 -0400
Subject: [PATCH 02/22] Address R1 review (3 P1 + 1 P3) on roth-2022-review.md

- P1 covariance block convention: add explicit "post-first" block decomposition
  paragraph defining Sigma_11=Var[beta_hat_post], Sigma_22=Var[beta_hat_pre],
  Sigma_12=Cov(beta_hat_post, beta_hat_pre). Roth's stacked vector is (pre, post)'
  but his proofs partition Var[(beta_hat_post, beta_hat_pre)'] post-first; without
  the explicit convention, Sigma_22 references in the propositions were ambiguous.
- P1 plug-in variance dimensional bug: change sigma^2_{tau_hat} = l' Sigma l to
  l' Sigma_11 l (since l in R^M and Sigma_11 = Var[beta_hat_post] is the only
  block of correct dimensions). Add explicit reminder pointing back to the
  convention.
- P1 missing Note/Deviation labels in copy-ready Registry section: inline
  **Note (deviation from paper):** for slope-of-best-fit acceptance region,
  constant/last_period/custom violation types, and the slope pretest_form
  enum value. Mark joint Wald + custom B(Sigma) as paper-supported framework
  (Propositions 1, 3, 4 apply) but not separately tabulated. Mark "individual"
  (NIS) as the only paper-analyzed form.
- P3 brittle line reference: replace "line 2758" with the heading reference
  "## PreTrendsPower stub".
- Algorithm step 2 cleanup: clarify sigma_t = sqrt(Sigma_22[t, t]) under the
  new convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/roth-2022-review.md | 27 ++++++++++++---------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index 844faec6..e010b895 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -10,7 +10,7 @@
 
 ## Methodology Registry Entry
 
-*Formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries — copy the `## PreTrendsPower` section into the registry (replacing the existing stub at line 2758).*
+*Formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries — copy the `## PreTrendsPower` section into the registry (replacing the existing `## PreTrendsPower` stub).*
 
 ## PreTrendsPower
 
@@ -21,6 +21,7 @@
 *Assumption checks / warnings:*
 - Input: event-study coefficient vector beta_hat = (beta_hat_pre, beta_hat_post)' that is asymptotically normal under the underlying estimator (Equation 1; Remark 1 lists TWFE, GMM, Freyaldenhoven-Hansen-Shapiro, regression-adjustment/IPW/DR DiD per Sant'Anna-Zhao, Callaway-Sant'Anna, Sun-Abraham)
 - Input: estimated variance-covariance matrix Sigma_hat in R^{(K+M) x (K+M)} where K = # pre-period coefficients, M = # post-period coefficients
+- **Block decomposition convention (per Roth, Section II.A-B)**: throughout this entry, the variance partition uses Roth's *post-first* ordering for the proofs, i.e., Var[(beta_hat_post, beta_hat_pre)'] = [[Sigma_11, Sigma_12], [Sigma_21, Sigma_22]] where Sigma_11 = Var[beta_hat_post] in R^{M x M} (the post-treatment block), Sigma_22 = Var[beta_hat_pre] in R^{K x K} (the pre-treatment block), Sigma_12 = Cov(beta_hat_post, beta_hat_pre) in R^{M x K}, Sigma_21 = Sigma_12'. The stacked input vector beta_hat is (pre, post)' as stated above; the (post, pre) block ordering is internal to the propositions and matches Roth's paper notation. Implementations must use the post-treatment block Sigma_11 (not the full Sigma_hat) wherever they need Var[beta_hat_post].
 - Pre-trend zero-anticipation assumption: tau_pre = 0 (Equation 2) — same identifying convention as Rambachan-Roth (2023)
 - Warn if pretest has low power: e.g., if the slope at 80% power (gamma_{0.8}) produces a |bias| > |estimated treatment effect|, the pretest is uninformative for the magnitudes that matter
 - Warn that pretest-conditioning distortions are NOT removed by larger samples — they persist as long as the pretest can fail with non-vanishing probability (footnote 12)
@@ -40,10 +41,10 @@ where tau_pre = 0 by the no-anticipation assumption and delta is the bias from a
 
 This corresponds to checking individual 95% CIs of each pre-period coefficient (the dominant convention in applied work: 11 of 12 surveyed papers, per Section I.B).
 
-Alternative acceptance regions supported by the framework:
-- **Joint Wald (chi-squared)**: B_W(Sigma) = { b : b' Sigma_22^{-1} b <= chi^2_{1-alpha, K} } (1 of 12 surveyed papers, plus Roth's discussion of joint tests)
-- **Slope-of-best-fit-line t-test**: discussed in Section I and tabulated in Table 1 column "|t| for slope"
-- **Custom user-supplied B(Sigma)**: any (measurable) acceptance set; Propositions 1, 3, 4 apply for any B; Proposition 2 requires the specific NIS form
+Alternative acceptance regions:
+- **Joint Wald (chi-squared)**: B_W(Sigma) = { b in R^K : b' Sigma_22^{-1} b <= chi^2_{1-alpha, K} }. **Note:** mentioned in the paper as a less common applied convention (1 of 12 surveyed papers, Section I.B). Propositions 1, 3, 4 apply to this B since it is convex; Roth does NOT separately tabulate power/bias/coverage for the Wald form.
+- **Slope-of-best-fit-line t-test**: the paper's Table 1 reports the t-statistic for the slope as an observed property of surveyed papers, but **Note (deviation from paper):** Roth does NOT analyze a slope-t-statistic acceptance region as a pretest framework. Library support for this acceptance form is an extension beyond Roth (2022).
+- **Custom user-supplied B(Sigma)**: any (measurable) acceptance set; Propositions 1, 3, 4 apply for any B (paper-supported framework). Proposition 2 (sign of bias under monotone trend) requires the specific NIS form plus Assumption 1.
 
 *Conditional bias after pretesting (Proposition 1):*
 
@@ -87,7 +88,9 @@ Defaults Roth uses:
 *Plug-in estimator and CI (Section I.C):*
 
     tau_hat = l' beta_hat_post
-    CI_{tau_*} = tau_hat +/- 1.96 * sigma_{tau_hat}, where sigma^2_{tau_hat} = l' Sigma l
+    CI_{tau_*} = tau_hat +/- 1.96 * sigma_{tau_hat}, where sigma^2_{tau_hat} = l' Sigma_11 l
+
+(Note: Sigma_11 is the post-treatment covariance block per the convention above, not the full Sigma_hat.)
 
 *Power calculation against a linear violation (Section I.C "Power Calculations"):*
 
@@ -135,7 +138,7 @@ simulation fallback.
 *Algorithm (no numbered algorithm in paper; implementation distilled from Section I.C):*
 
 1. Take user-supplied (beta_hat, Sigma, K, M) and a target estimand l in R^M (default: l = uniform 1/M)
-2. Compute B_NIS(Sigma) acceptance region using diagonal sigma_t = sqrt(Sigma_{tt}) for t in pre periods
+2. Compute B_NIS(Sigma) acceptance region using diagonal sigma_t = sqrt(Sigma_22[t, t]) for t in pre periods (Sigma_22 = Var[beta_hat_pre] per the block convention above)
 3. **Power**: solve gamma_{1-p} = root of P(reject pretest | gamma) = 1 - p
    - For each candidate gamma, compute P(beta_hat_pre in B_NIS) under beta_hat_pre ~ N(gamma * t_pre, Sigma_22) using `tmvtnorm`-style multivariate normal CDF; or via simulation
 4. **Bias**: for gamma in {0, gamma_{0.5}, gamma_{0.8}, user-custom}:
@@ -151,12 +154,12 @@ simulation fallback.
 - R dependency: [`tmvtnorm`](https://cran.r-project.org/package=tmvtnorm) (Manjunath & Wilhelm 2012) for truncated multivariate normal moments and CDF
 
 **Requirements checklist:**
-- [ ] Acceptance regions: NIS (individual t), joint Wald, slope-of-best-fit, custom B
+- [ ] Acceptance regions: NIS (individual t, paper-analyzed); joint Wald and custom B (paper-supported via Propositions 1, 3, 4, not separately tabulated by Roth); **Note (deviation from paper):** slope-of-best-fit-line is an extension beyond Roth (2022) — paper tabulates the slope t-stat but does not analyze a slope-t pretest framework
 - [ ] Power calculation against linear violation with slope gamma — solve for gamma_{0.5} and gamma_{0.8}
 - [ ] Analytical truncated multivariate normal path (tmvtnorm-equivalent) + simulation fallback
-- [ ] Unconditional and conditional bias for arbitrary linear contrast l in R^M
+- [ ] Unconditional and conditional bias for arbitrary linear contrast l in R^M (using Sigma_11 for the post-treatment variance)
 - [ ] Unconditional and conditional null rejection / coverage for the same linear contrast
-- [ ] Custom slope / nonlinear trend hypotheses (linear, constant level shift, last-period jump, custom delta vector)
+- [ ] **Note (deviation from paper):** non-linear trend hypotheses — Roth (2022) formally analyzes only LINEAR violations; "constant level shift", "last-period jump", and "custom delta vector" patterns are extensions from Roth's R `pretrends` package, applied via the same Proposition 1/3/4 framework
 - [ ] Plot of bias against pretest power for visual reporting (Roth's Figure 1 style)
 - [ ] Composes with HonestDiD result objects (shared beta_hat, Sigma_hat input contract)
 
@@ -185,8 +188,8 @@ simulation fallback.
 | `alpha` | float in (0, 1) | 0.05 | Standard significance level for pretest and reporting CI |
 | `target_power` | list[float] in (0, 1) | [0.5, 0.8] | Roth's reported benchmarks (Cohen 1988 conventional 0.8; 0.5 for "even-odds detection") |
 | `l` (contrast) | array in R^M | uniform 1/M | User-specified linear functional of tau_post |
-| `pretest_form` | enum | "individual" (NIS) | "individual", "joint_wald", "slope", "custom" |
-| `acceptance_region` | callable or set | B_NIS | Custom B(Sigma) for "custom" pretest_form |
+| `pretest_form` | enum | "individual" (NIS) | "individual" (paper-analyzed); "joint_wald" / "custom" (paper-supported via Propositions 1/3/4); "slope" — **deviation from paper**, R-package extension |
+| `acceptance_region` | callable or set | B_NIS | Custom B(Sigma) for "custom" pretest_form (paper-supported: Propositions 1, 3, 4 apply to any B) |
 | `method` | enum | "analytical" | "analytical" (tmvtnorm-equivalent) or "simulation" |
 | `n_sim` | int | 10000 | Monte Carlo iterations when method="simulation" |
 

From 4932323be64178efb2149efd49a4d6d6160fdbb2 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 16 May 2026 21:06:28 -0400
Subject: [PATCH 03/22] Address R2 review (2 P2 + 2 P3) on roth-2022-review.md

- P2 API surface drift: add explicit Status disclaimer at top of "Methodology
  Registry Entry" section flagging the block as proposed-replacement-text-not-
  yet-merged (mirrors the established goodman-bacon-2021-review.md:14 pattern).
  Restructure the Tuning Parameters table to add a Status column distinguishing
  "Current" (matches diff_diff/pretrends.py:442-447) from "Proposed" (requires
  API extension via the PR-B audit), so the block is not mistaken for the
  shipped surface.
- P2 Assumption 1 design-heuristic overclaim: drop the "balanced panel +
  cluster-robust + equicorrelated" design-metadata heuristic. Roth's
  Assumption 1 is a condition on the estimated Sigma, not on the experimental
  design. Replace with the requirement that any sharper sign-of-bias warning
  must come from a direct numerical check of Sigma's structure.
- P3 dead internal reference: remove the parenthetical pointer to
  `feedback_verdict_powered_by_tools.md` (a Claude private-memory artifact,
  not a git-tracked file in this repo).
- P3 source-of-truth split: addressed by the Status disclaimer above, which
  explicitly names the existing PreTrendsPower REGISTRY stub as the sole
  authoritative methodology contract until PR-B lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/roth-2022-review.md | 28 ++++++++++++---------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index e010b895..44f1bd6d 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -10,7 +10,9 @@
 
 ## Methodology Registry Entry
 
-*Formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries — copy the `## PreTrendsPower` section into the registry (replacing the existing `## PreTrendsPower` stub).*
+**Status: proposed replacement text for a future REGISTRY update.** This block has **not** been merged into `docs/methodology/REGISTRY.md` yet. The current `## PreTrendsPower` stub in `REGISTRY.md` remains the **sole authoritative methodology contract** until the follow-up audit PR for `diff_diff/pretrends.py` (PR-B in the 3-PR sequence) lands and replaces it. That audit PR will also assess which proposed parameters and capabilities below are already in the shipped surface (`diff_diff/pretrends.py:442-447` currently exposes `alpha`, `power`, `violation_type`, `violation_weights`) and which require API extension.
+
+*Formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries — copy the `## PreTrendsPower` section into the registry (replacing the existing `## PreTrendsPower` stub) once PR-B is ready.*
 
 ## PreTrendsPower
 
@@ -183,15 +185,17 @@ simulation fallback.
 
 ### Tuning Parameters
 
-| Parameter | Type | Default | Selection Method |
-|-----------|------|---------|-----------------|
-| `alpha` | float in (0, 1) | 0.05 | Standard significance level for pretest and reporting CI |
-| `target_power` | list[float] in (0, 1) | [0.5, 0.8] | Roth's reported benchmarks (Cohen 1988 conventional 0.8; 0.5 for "even-odds detection") |
-| `l` (contrast) | array in R^M | uniform 1/M | User-specified linear functional of tau_post |
-| `pretest_form` | enum | "individual" (NIS) | "individual" (paper-analyzed); "joint_wald" / "custom" (paper-supported via Propositions 1/3/4); "slope" — **deviation from paper**, R-package extension |
-| `acceptance_region` | callable or set | B_NIS | Custom B(Sigma) for "custom" pretest_form (paper-supported: Propositions 1, 3, 4 apply to any B) |
-| `method` | enum | "analytical" | "analytical" (tmvtnorm-equivalent) or "simulation" |
-| `n_sim` | int | 10000 | Monte Carlo iterations when method="simulation" |
+**Note:** The parameters below are what Roth's framework requires — they are NOT necessarily the current library's exposed API. The PR-B audit will compare these proposed knobs against `diff_diff/pretrends.py:442-447` (which currently exposes `alpha`, `power`, `violation_type`, `violation_weights`) and decide which to keep, rename, extend, or defer.
+
+| Parameter | Type | Default | Status | Selection Method |
+|-----------|------|---------|--------|-----------------|
+| `alpha` | float in (0, 1) | 0.05 | **Current** (matches `pretrends.py`) | Standard significance level for pretest and reporting CI |
+| `target_power` | list[float] in (0, 1) | [0.5, 0.8] | **Proposed** (current API exposes scalar `power=0.8` only) | Roth's reported benchmarks (Cohen 1988 conventional 0.8; 0.5 for "even-odds detection") |
+| `l` (contrast) | array in R^M | uniform 1/M | **Proposed** (not in current API) | User-specified linear functional of tau_post |
+| `pretest_form` | enum | "individual" (NIS) | **Proposed** (current API uses `violation_type`, a different axis) | "individual" (paper-analyzed); "joint_wald" / "custom" (paper-supported via Propositions 1/3/4); "slope" — **deviation from paper**, R-package extension |
+| `acceptance_region` | callable or set | B_NIS | **Proposed** (not in current API) | Custom B(Sigma) for "custom" pretest_form (paper-supported: Propositions 1, 3, 4 apply to any B) |
+| `method` | enum | "analytical" | **Proposed** (not in current API) | "analytical" (tmvtnorm-equivalent) or "simulation" |
+| `n_sim` | int | 10000 | **Proposed** (not in current API) | Monte Carlo iterations when method="simulation" |
 
 ### Relation to Existing diff-diff Estimators
 - **Pre-existing `diff_diff/pretrends.py`** (1133 lines) — implements a Roth-2022 framework; this paper review's main use is to audit the existing surface against the paper's exact equations
@@ -206,7 +210,7 @@ simulation fallback.
 | # | Statement | Implementation use |
 |---|-----------|---------------------|
 | **Proposition 1** | For any B(Sigma): E[beta_hat_post | beta_hat_pre in B] = tau_post + delta_post + Sigma_{12} Sigma_{22}^{-1} (E[beta_hat_pre | beta_hat_pre in B] - beta_pre) | The main bias decomposition formula. Drives the conditional-bias computation in step 4 of the algorithm. |
-| **Proposition 2** | Under Assumption 1 (homoskedastic-equicorrelated Sigma) and monotone trend (delta_pre < 0, delta_post > 0): E[beta_hat_post | beta_hat_pre in B_NIS] > beta_post > tau_post | Justifies WARN that conditional bias is worse than unconditional bias under monotone trends — applicable in many but not all empirical settings. Library should detect when Assumption 1 holds (e.g., balanced panel + cluster-robust at unit level + equicorrelated errors) and surface this warning more strongly. |
+| **Proposition 2** | Under Assumption 1 (homoskedastic-equicorrelated Sigma) and monotone trend (delta_pre < 0, delta_post > 0): E[beta_hat_post | beta_hat_pre in B_NIS] > beta_post > tau_post | Justifies a WARN that conditional bias is worse than unconditional bias under monotone trends — applicable in many but not all empirical settings. Assumption 1 is a condition on the *estimated covariance matrix* Sigma, not on design metadata; any sharper warning must therefore be triggered by a *direct numerical check* of Sigma (approximately-constant diagonal entries + approximately-constant positive off-diagonal entries below the diagonal). Without such a check, the library should issue only the generic caveat that the sign-of-bias result is ambiguous outside Assumption 1. |
 | **Proposition 3** | Var[beta_hat_post | beta_hat_pre in B] = Var[beta_hat_post] + (Sigma_{12} Sigma_{22}^{-1}) (Var[beta_hat_pre | beta_hat_pre in B] - Var[beta_hat_pre]) (Sigma_{12} Sigma_{22}^{-1})' | The conditional-variance formula; drives the over/under-coverage analysis. |
 | **Proposition 4** | If B(Sigma) is convex: Var[beta_hat_post | beta_hat_pre in B] <= Var[beta_hat_post]. CIs based on unconditional Sigma OVER-cover under parallel trends, UNDER-cover under violations. | Justifies the "do not interpret a wide CI as ample power" warning. |
 
@@ -249,6 +253,6 @@ Quoting Roth's key empirical results (for cross-validation):
 - **K = 0 (no pre-periods)**: trivially no pretest possible; library should error.
 - **Heteroskedastic Sigma**: Proposition 2 requires Assumption 1. Library implements computations under arbitrary Sigma via Proposition 1; the sign of the bias-amplification effect is then NOT guaranteed. Library should NOT print "pretest amplifies bias under monotone trends" unless Assumption 1 is approximately satisfied (or just always issue the conditional warning).
 - **Equation 4 publication-rules analysis**: not standardly implemented in PreTrendsPower-style tools. Roth notes it as part of the discussion (Section II.D) but does not provide a numerical workflow for users. Library should NOT attempt to implement Equation 4 unless requested.
-- **Connection to `compute_pretrends_power` library helper** (referenced in feedback memory `feedback_verdict_powered_by_tools.md`): the paper review confirms that "minimum slope detectable at 80% power" is exactly Roth's gamma_{0.8}, and the library helper should compute and surface this. Need to verify the existing helper's calling convention against the paper's framework when auditing `diff_diff/pretrends.py`.
+- **Connection to `compute_pretrends_power` library helper**: the paper review confirms that "minimum slope detectable at 80% power" is exactly Roth's gamma_{0.8}, and the library helper should compute and surface this. Need to verify the existing helper's calling convention against the paper's framework when auditing `diff_diff/pretrends.py`.
 - **R `pretrends` package version**: paper cites the package at https://github.com/jonathandroth/pretrends; no specific version cited. R-parity work should pin to a specific commit and document.
 - **Compatibility with multi-cohort estimators**: Remark 1 lists Callaway-Sant'Anna, Sun-Abraham, etc. as compatible. The paper does not detail how to construct (beta_hat, Sigma_hat) from those estimators when the event-study output is multi-cohort (e.g., cohort × event-time matrix). Library should document the aggregation convention (per Sun-Abraham overall ATT or per Callaway-Sant'Anna `aggregate=event`).

From a80c7bbb2387fe8986300f7ff85ce58ed4bd475e Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 16 May 2026 21:12:28 -0400
Subject: [PATCH 04/22] Address R3 review (1 P1) on roth-2022-review.md

P1 Proposition 4 convexity overclaim: R1/R2 fixes added language to the
Alternative-acceptance-regions block, requirements checklist, pretest_form
row, and acceptance_region row that said "Propositions 1, 3, 4 apply to any
B(Sigma)". That overstates Roth: Proposition 1 (conditional mean) and
Proposition 3 (conditional variance) hold for any measurable B, but
Proposition 4 (variance reduction / over-coverage under parallel trends)
begins "Suppose that B(Sigma) is a convex set." Nonconvex custom pretests
lose the variance-reduction guarantee even though the conditional-moment
formulas still hold.

Fixed:
- "Custom user-supplied B(Sigma)" bullet now distinguishes Props 1+3 (any B)
  from Prop 4 (requires convex B) and explicitly notes nonconvex pretests
  lose the over-coverage guarantee.
- Requirements checklist now reads "joint Wald (convex, Props 1+3+4 all
  apply); custom B (Props 1+3 apply to any measurable B, Prop 4 only if B is
  convex)".
- `pretest_form` row in tuning parameters: same convexity split.
- `acceptance_region` row: "Propositions 1, 3 apply to any measurable B;
  Proposition 4 / variance-reduction guarantee additionally requires B to
  be convex".

Joint Wald (line 47) already correctly conditioned on convexity ("apply to
this B since it is convex") and was left as-is.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/roth-2022-review.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index 44f1bd6d..692d21bb 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -46,7 +46,7 @@ This corresponds to checking individual 95% CIs of each pre-period coefficient (
 Alternative acceptance regions:
 - **Joint Wald (chi-squared)**: B_W(Sigma) = { b in R^K : b' Sigma_22^{-1} b <= chi^2_{1-alpha, K} }. **Note:** mentioned in the paper as a less common applied convention (1 of 12 surveyed papers, Section I.B). Propositions 1, 3, 4 apply to this B since it is convex; Roth does NOT separately tabulate power/bias/coverage for the Wald form.
 - **Slope-of-best-fit-line t-test**: the paper's Table 1 reports the t-statistic for the slope as an observed property of surveyed papers, but **Note (deviation from paper):** Roth does NOT analyze a slope-t-statistic acceptance region as a pretest framework. Library support for this acceptance form is an extension beyond Roth (2022).
-- **Custom user-supplied B(Sigma)**: any (measurable) acceptance set; Propositions 1, 3, 4 apply for any B (paper-supported framework). Proposition 2 (sign of bias under monotone trend) requires the specific NIS form plus Assumption 1.
+- **Custom user-supplied B(Sigma)**: any (measurable) acceptance set; Propositions 1 and 3 (conditional mean and variance) apply to any B. Proposition 4 (variance reduction / over-coverage under parallel trends) requires B to be **convex** — Roth's statement begins "Suppose that B(Sigma) is a convex set." Nonconvex custom pretests therefore lose Roth's variance-reduction / over-coverage guarantee even though the conditional mean/variance formulas still hold. Proposition 2 (sign of bias under monotone trend) requires the specific NIS form plus Assumption 1.
 
 *Conditional bias after pretesting (Proposition 1):*
 
@@ -156,7 +156,7 @@ simulation fallback.
 - R dependency: [`tmvtnorm`](https://cran.r-project.org/package=tmvtnorm) (Manjunath & Wilhelm 2012) for truncated multivariate normal moments and CDF
 
 **Requirements checklist:**
-- [ ] Acceptance regions: NIS (individual t, paper-analyzed); joint Wald and custom B (paper-supported via Propositions 1, 3, 4, not separately tabulated by Roth); **Note (deviation from paper):** slope-of-best-fit-line is an extension beyond Roth (2022) — paper tabulates the slope t-stat but does not analyze a slope-t pretest framework
+- [ ] Acceptance regions: NIS (individual t, paper-analyzed); joint Wald (convex, Propositions 1+3+4 all apply); custom B (Propositions 1+3 apply to any measurable B, Proposition 4 only if B is convex); **Note (deviation from paper):** slope-of-best-fit-line is an extension beyond Roth (2022) — paper tabulates the slope t-stat but does not analyze a slope-t pretest framework
 - [ ] Power calculation against linear violation with slope gamma — solve for gamma_{0.5} and gamma_{0.8}
 - [ ] Analytical truncated multivariate normal path (tmvtnorm-equivalent) + simulation fallback
 - [ ] Unconditional and conditional bias for arbitrary linear contrast l in R^M (using Sigma_11 for the post-treatment variance)
@@ -192,8 +192,8 @@ simulation fallback.
 | `alpha` | float in (0, 1) | 0.05 | **Current** (matches `pretrends.py`) | Standard significance level for pretest and reporting CI |
 | `target_power` | list[float] in (0, 1) | [0.5, 0.8] | **Proposed** (current API exposes scalar `power=0.8` only) | Roth's reported benchmarks (Cohen 1988 conventional 0.8; 0.5 for "even-odds detection") |
 | `l` (contrast) | array in R^M | uniform 1/M | **Proposed** (not in current API) | User-specified linear functional of tau_post |
-| `pretest_form` | enum | "individual" (NIS) | **Proposed** (current API uses `violation_type`, a different axis) | "individual" (paper-analyzed); "joint_wald" / "custom" (paper-supported via Propositions 1/3/4); "slope" — **deviation from paper**, R-package extension |
-| `acceptance_region` | callable or set | B_NIS | **Proposed** (not in current API) | Custom B(Sigma) for "custom" pretest_form (paper-supported: Propositions 1, 3, 4 apply to any B) |
+| `pretest_form` | enum | "individual" (NIS) | **Proposed** (current API uses `violation_type`, a different axis) | "individual" (paper-analyzed); "joint_wald" (convex, Propositions 1+3+4 all apply); "custom" (Propositions 1+3 always; Proposition 4 only if user's B is convex); "slope" — **deviation from paper**, R-package extension |
+| `acceptance_region` | callable or set | B_NIS | **Proposed** (not in current API) | Custom B(Sigma) for "custom" pretest_form (Propositions 1, 3 apply to any measurable B; Proposition 4 / variance-reduction guarantee additionally requires B to be convex) |
 | `method` | enum | "analytical" | **Proposed** (not in current API) | "analytical" (tmvtnorm-equivalent) or "simulation" |
 | `n_sim` | int | 10000 | **Proposed** (not in current API) | Monte Carlo iterations when method="simulation" |
 

From 2e7c334bffc32745708e87fe2d95b1654bf05fbc Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 16 May 2026 21:18:45 -0400
Subject: [PATCH 05/22] Address R4 review (2 P1 + 1 P2) on roth-2022-review.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- P1 nonlinear-trend "deviation" overstatement: walk back the "**Note
  (deviation from paper):**" labels on nonlinear violation hypotheses
  (constant level shift, last-period jump, custom delta vector). Roth's
  Section III ("Practical Recommendations") explicitly endorses power
  analyses against hypothesized nonlinear trends through the `pretrends`
  package, so the GENERAL nonlinear-power capability is paper-supported;
  only the SPECIFIC named API parameterizations are R-package conventions.
  Updates: Requirements checklist line and Gaps "Nonlinear violations" +
  "Custom delta" bullets reframed to distinguish paper-supported framework
  (Section III + Propositions 1+3 applying to any trend) from package-API
  parameterizations.
- P1 absolute "UNDER-cover" claim: Roth's wording after Prop 4 is "CIs will
  tend to under-cover if the bias is sufficiently large" — the under-coverage
  direction is contingent on bias magnitude, not universal. Updated the
  Implication paragraph after Proposition 4 to keep the conditionality
  ("tend to UNDER-cover **if the bias is sufficiently large** ... not
  universal").
- P2 "stub" mischaracterization: the live `## PreTrendsPower` entry at
  REGISTRY.md:2758-2808 is a populated 50-line block (Wald-test framed),
  not a stub. Updated the Status header to describe it accurately as "a
  populated 50-line block framed primarily around a joint-Wald pre-trends
  test" and clarify that this file is a non-authoritative source audit.

Slope-of-best-fit "deviation from paper" labels (lines 48, 159, 195) kept
as-is: Roth does NOT analyze a slope-t-stat as an acceptance region in
Section I or Section III; the slope t-stat in Table 1 is only an observed
property of surveyed papers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/roth-2022-review.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index 692d21bb..e46ed1ba 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -10,9 +10,9 @@
 
 ## Methodology Registry Entry
 
-**Status: proposed replacement text for a future REGISTRY update.** This block has **not** been merged into `docs/methodology/REGISTRY.md` yet. The current `## PreTrendsPower` stub in `REGISTRY.md` remains the **sole authoritative methodology contract** until the follow-up audit PR for `diff_diff/pretrends.py` (PR-B in the 3-PR sequence) lands and replaces it. That audit PR will also assess which proposed parameters and capabilities below are already in the shipped surface (`diff_diff/pretrends.py:442-447` currently exposes `alpha`, `power`, `violation_type`, `violation_weights`) and which require API extension.
+**Status: proposed replacement text for a future REGISTRY update; this file is a non-authoritative source audit.** The current `## PreTrendsPower` entry in `docs/methodology/REGISTRY.md` (lines 2758-2808) is a populated 50-line block framed primarily around a joint-Wald pre-trends test; it remains the **sole authoritative methodology contract** until the follow-up audit PR for `diff_diff/pretrends.py` (PR-B in the 3-PR sequence) lands and revises it. The PR-B audit will also assess which proposed parameters and capabilities below are already in the shipped surface (`diff_diff/pretrends.py:442-447` currently exposes `alpha`, `power`, `violation_type`, `violation_weights`) and which require API extension.
 
-*Formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries — copy the `## PreTrendsPower` section into the registry (replacing the existing `## PreTrendsPower` stub) once PR-B is ready.*
+*Formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries — once PR-B is ready, the `## PreTrendsPower` section below can replace the existing registry entry.*
 
 ## PreTrendsPower
 
@@ -76,7 +76,7 @@ The third (pretest bias) term depends on:
 
     If B(Sigma) is a convex set, then Var[beta_hat_post | beta_hat_pre in B(Sigma)] <= Var[beta_hat_post].
 
-Implication: under parallel trends (delta = 0), conventional 95% CIs OVER-cover conditional on passing the pretest (CIs are based on the unconditional variance, which is too large). When parallel trends is violated, conventional 95% CIs UNDER-cover, because the bias dominates the variance reduction.
+Implication (Roth, Section II.C, paragraph after Proposition 4): under parallel trends (delta = 0) and a B(Sigma) symmetric about zero, conventional 95% CIs tend to OVER-cover conditional on passing the pretest (CIs are based on the unconditional variance, which is too large). When parallel trends is violated, conventional 95% CIs tend to UNDER-cover **if the bias is sufficiently large** (i.e., when the bias dominates the variance reduction). The under-coverage direction is therefore contingent on bias magnitude, not universal.
 
 *Target parameter (Section I.C):*
 
@@ -161,7 +161,7 @@ simulation fallback.
 - [ ] Analytical truncated multivariate normal path (tmvtnorm-equivalent) + simulation fallback
 - [ ] Unconditional and conditional bias for arbitrary linear contrast l in R^M (using Sigma_11 for the post-treatment variance)
 - [ ] Unconditional and conditional null rejection / coverage for the same linear contrast
-- [ ] **Note (deviation from paper):** non-linear trend hypotheses — Roth (2022) formally analyzes only LINEAR violations; "constant level shift", "last-period jump", and "custom delta vector" patterns are extensions from Roth's R `pretrends` package, applied via the same Proposition 1/3/4 framework
+- [ ] Non-linear trend hypotheses (paper-supported via Section III): paper Section I.C formally tabulates power only against linear violations, but paper Section III ("Practical Recommendations") explicitly endorses power analyses against hypothesized nonlinear trends through the `pretrends` package, applied via the same Proposition 1+3 conditional-moment machinery (Proposition 4 still requires convex B). The specific named shapes "constant level shift", "last-period jump", and "custom delta vector" are R-package API parameterizations of that paper-supported framework, not separately analyzed in the published paper.
 - [ ] Plot of bias against pretest power for visual reporting (Roth's Figure 1 style)
 - [ ] Composes with HonestDiD result objects (shared beta_hat, Sigma_hat input contract)
 
@@ -247,8 +247,8 @@ Quoting Roth's key empirical results (for cross-validation):
 
 - **Joint Wald acceptance region**: paper mentions joint tests only briefly (Section I.B notes 1 of 12 papers uses one). Power, bias, and coverage formulas all apply by replacing B_NIS with the joint Wald acceptance region B_W, but Roth does not work out a separate table. Library should implement both but test against R `pretrends` for the joint-Wald case (Roth's package supports it).
 - **"Slope-of-best-fit-line t-test" acceptance region**: Table 1 column shows the t-stat for the slope of the linear pre-trend. Paper does not analyze pretests based on this t-stat as a separate acceptance region; library should NOT extrapolate without further reading the `pretrends` package source.
-- **Nonlinear violations**: Section I.D acknowledges results extend to monotone violations under homoskedasticity (Proposition 2), but the linear-violation framework is the operational benchmark. Library's `violation_type in {"linear", "constant", "last_period", "custom"}` (per the existing REGISTRY entry) appears to predate the paper — the paper itself only formally analyzes linear violations. "Constant" and "last_period" are likely Roth-package extensions for practical reasoning; library should document this as an extension beyond Roth's published analysis.
-- **Custom delta**: paper does not propose a "custom delta vector" interface; this is an extension by Roth's R package. The library should preserve the convention.
+- **Nonlinear violations**: Section I.C formally tabulates power only against linear violations; Section I.D extends the sign-of-bias result (Proposition 2) to monotone violations under homoskedasticity. Section III ("Practical Recommendations") explicitly endorses power analyses against hypothesized nonlinear trends via the `pretrends` package, so the general nonlinear capability is paper-supported even though the paper does not separately tabulate it. The specific named shapes the library exposes ("constant", "last_period") are R-package API conventions, not separately analyzed in the paper.
+- **Custom delta vector interface**: paper Section III endorses "power analyses for the types of violations of parallel trends deemed to be most relevant in their context," which is the paper-level framing for a user-supplied delta vector; the specific `violation_weights`-style INTERFACE used in the library and the R `pretrends` package is a package-API convention layered on top of that paper-level framework.
 - **Choice of contrast l**: paper highlights l = uniform 1/M (average post-treatment) and l = e_1 (first period after treatment). No guidance on other contrasts (e.g., long-run effect l = e_M, dynamic-weighted contrast) — library should document defaults and warn that bias and coverage depend on l.
 - **K = 0 (no pre-periods)**: trivially no pretest possible; library should error.
 - **Heteroskedastic Sigma**: Proposition 2 requires Assumption 1. Library implements computations under arbitrary Sigma via Proposition 1; the sign of the bias-amplification effect is then NOT guaranteed. Library should NOT print "pretest amplifies bias under monotone trends" unless Assumption 1 is approximately satisfied (or just always issue the conditional warning).

From cf6f48a5f4602b275cfddc8826794abe92a952b8 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 16 May 2026 21:25:49 -0400
Subject: [PATCH 06/22] Address R5 review (1 P1 + 2 P2 + 1 P3) on
 roth-2022-review.md

- P1 alpha/1.96 internal inconsistency: the proposed registry block exposed
  `alpha` as a parameter but hard-coded 1.96 in B_NIS, the plug-in CI, the
  power formula, the K=1 closed-form, and the coverage formulas. Promoted
  to REGISTRY this would document the wrong inference rule for any
  alpha != 0.05. Generalize all five formulas to z_{1-alpha/2}; keep "=1.96"
  parenthetical annotations to ground them at the paper's running default.
- P2 sample vs population identity: "beta_hat_pre, which equals delta_pre
  under no anticipation" muddied Roth's stochastic model. Roth Equations
  (1)-(2) keep beta_hat (random draw) and beta (mean) distinct; under no
  anticipation, beta_pre = delta_pre (population), not beta_hat_pre =
  delta_pre. Reworded to "beta_hat_pre, whose mean beta_pre equals
  delta_pre under no anticipation" with explicit "random draw is not itself
  equal to delta_pre" caveat.
- P2 "11 of 12" overcount: Section I.B of Roth (2022) does not support
  "11 of 12 use individual significance" as the count I wrote. The paper
  reports: 12 of 12 plot pointwise CIs (visual-NIS inspection possible),
  5 of 12 explicitly discuss individual significance, 1 of 12 reports a
  joint-significance test, several rely on visual inspection without a
  formal criterion. Restated using the paper's own breakdown and framed
  NIS as the implicit common denominator rather than a literal explicit-
  rule majority.
- P3 paper methodology vs library extension labeling: the Figure-1-style
  plotting and HonestDiD-composition checklist items inside the copy-ready
  block conflated paper-derived methodology with diff-diff design choices.
  Added "**Note (library extension):**" labels to both, distinguishing the
  paper-derived numerical content (bias and CI by gamma_p; methodology-
  agnostic on how beta_hat/Sigma_hat is produced) from the library
  presentation and cross-estimator composition decisions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/roth-2022-review.md | 24 ++++++++++-----------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index e46ed1ba..b3bf5833 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -35,13 +35,13 @@
                   delta = bias               tau = causal effect
                   from trends
 
-where tau_pre = 0 by the no-anticipation assumption and delta is the bias from a difference in trends. The pretest acts on beta_hat_pre, which equals delta_pre under no anticipation.
+where tau_pre = 0 by the no-anticipation assumption and delta is the bias from a difference in trends. The pretest acts on the random vector beta_hat_pre, whose mean beta_pre equals delta_pre under no anticipation (i.e., the population-mean identity beta_pre = delta_pre follows from Equation 2 with tau_pre = 0; the random draw beta_hat_pre is not itself equal to delta_pre).
 
 *Acceptance region of the standard "no individually significant" (NIS) pretest:*
 
-    B_NIS(Sigma) = { b in R^K : |b_t| <= 1.96 * sigma_t, for all t in {-K, ..., -1} }
+    B_NIS(Sigma) = { b in R^K : |b_t| <= z_{1-alpha/2} * sigma_t, for all t in {-K, ..., -1} }
 
-This corresponds to checking individual 95% CIs of each pre-period coefficient (the dominant convention in applied work: 11 of 12 surveyed papers, per Section I.B).
+where z_{1-alpha/2} is the (1 - alpha/2)-quantile of the standard normal (= 1.96 at Roth's running default alpha = 0.05). This corresponds to checking individual (1 - alpha) CIs of each pre-period coefficient. Section I.B of Roth (2022) reports that all 12 surveyed papers plot pointwise CIs that allow individual-significance inspection, 5 of 12 explicitly discuss individual significance, 1 of 12 reports a joint-significance test, and several rely on visual inspection without specifying a formal criterion; the NIS form is therefore the implicit common denominator across the surveyed papers rather than a literal 11-of-12 explicit-rule count.
 
 Alternative acceptance regions:
 - **Joint Wald (chi-squared)**: B_W(Sigma) = { b in R^K : b' Sigma_22^{-1} b <= chi^2_{1-alpha, K} }. **Note:** mentioned in the paper as a less common applied convention (1 of 12 surveyed papers, Section I.B). Propositions 1, 3, 4 apply to this B since it is convex; Roth does NOT separately tabulate power/bias/coverage for the Wald form.
@@ -90,16 +90,16 @@ Defaults Roth uses:
 *Plug-in estimator and CI (Section I.C):*
 
     tau_hat = l' beta_hat_post
-    CI_{tau_*} = tau_hat +/- 1.96 * sigma_{tau_hat}, where sigma^2_{tau_hat} = l' Sigma_11 l
+    CI_{tau_*} = tau_hat +/- z_{1-alpha/2} * sigma_{tau_hat}, where sigma^2_{tau_hat} = l' Sigma_11 l
 
-(Note: Sigma_11 is the post-treatment covariance block per the convention above, not the full Sigma_hat.)
+(Notes: Sigma_11 is the post-treatment covariance block per the convention above, not the full Sigma_hat. z_{1-alpha/2} is the (1 - alpha/2)-quantile of the standard normal = 1.96 at alpha = 0.05.)
 
 *Power calculation against a linear violation (Section I.C "Power Calculations"):*
 
 For a linear violation with slope gamma (so delta_t = gamma * t with relative time t),
 the pretest "passes" probability is
 
-    P( beta_hat_pre in B_NIS(Sigma) ) = P( |beta_hat_pre,t| <= 1.96 * sigma_t, for all t )
+    P( beta_hat_pre in B_NIS(Sigma) ) = P( |beta_hat_pre,t| <= z_{1-alpha/2} * sigma_t, for all t )
 
 where beta_hat_pre ~ N(delta_pre, Sigma_22) with delta_pre,t = gamma * t. The library should
 solve for gamma at target power 1 - p in {0.5, 0.8}:
@@ -132,7 +132,7 @@ simulation fallback.
 *Edge cases (paper-stated):*
 - **Linear vs nonlinear violations**: paper formally analyzes linear trends; Caveats (Section I.D) note results extend to monotone nonlinear violations under homoskedasticity (Proposition 2); arbitrarily nonlinear violations addressed heuristically — bias is worse for exponentially-growing trends, better for log/shallow trends as pre-periods grow
 - **Adding more pretreatment periods**: helps power for linear/log trends, does NOT help (and can hurt) for trends concentrated near treatment (e.g., COVID-19-like shocks)
-- **K = 1 (single pre-period)**: explicit closed-form intuition via univariate truncated normal in proof of Proposition 2: E[beta_hat_pre | beta_hat_pre in B_NIS] - beta_pre proportional to phi(-1.96 - beta_pre/sigma) - phi(1.96 - beta_pre/sigma)
+- **K = 1 (single pre-period)**: explicit closed-form intuition via univariate truncated normal in proof of Proposition 2: E[beta_hat_pre | beta_hat_pre in B_NIS] - beta_pre proportional to phi(-z_{1-alpha/2} - beta_pre/sigma) - phi(z_{1-alpha/2} - beta_pre/sigma) (= phi(-1.96 - ...) - phi(1.96 - ...) at the paper's default alpha = 0.05)
 - **Symmetric two-sided pretests under parallel trends**: beta_hat_post remains UNBIASED for tau_post (E[beta_hat_pre | beta_hat_pre in B] = 0 if B is symmetric and beta_pre = 0)
 - **Heteroskedastic Sigma (off-diagonal not constant)**: Proposition 2 requires Assumption 1; under arbitrary Sigma, sign of pretest-bias term is ambiguous (worked out in Proposition 1's general form)
 - **Publication-bias trade-off (Equation 4, Section II.D)**: pretest-as-screen can REDUCE or INCREASE published bias depending on Bayes-factor of design type vs the bias-given-publication ratio; uninformative pretests are unambiguously harmful
@@ -147,8 +147,8 @@ simulation fallback.
    - Compute unconditional bias = l' delta_post where delta_post,m = gamma * m
    - Compute conditional bias via Proposition 1: requires E[beta_hat_pre | beta_hat_pre in B_NIS] from truncated MVN
 5. **Coverage**: for the same gamma values, compute unconditional and conditional null rejection probabilities P(tau_* not in CI):
-   - Unconditional: P(|tau_hat - tau_*|/sigma_{tau_hat} > 1.96) under beta_hat ~ N(beta, Sigma)
-   - Conditional: P(|tau_hat - tau_*|/sigma_{tau_hat} > 1.96 | beta_hat_pre in B_NIS) — joint truncated MVN
+   - Unconditional: P(|tau_hat - tau_*|/sigma_{tau_hat} > z_{1-alpha/2}) under beta_hat ~ N(beta, Sigma)
+   - Conditional: P(|tau_hat - tau_*|/sigma_{tau_hat} > z_{1-alpha/2} | beta_hat_pre in B_NIS) — joint truncated MVN
 6. Return a structured summary (Roth's Table 2/Table 3 layout)
 
 **Reference implementation(s):**
@@ -162,8 +162,8 @@ simulation fallback.
 - [ ] Unconditional and conditional bias for arbitrary linear contrast l in R^M (using Sigma_11 for the post-treatment variance)
 - [ ] Unconditional and conditional null rejection / coverage for the same linear contrast
 - [ ] Non-linear trend hypotheses (paper-supported via Section III): paper Section I.C formally tabulates power only against linear violations, but paper Section III ("Practical Recommendations") explicitly endorses power analyses against hypothesized nonlinear trends through the `pretrends` package, applied via the same Proposition 1+3 conditional-moment machinery (Proposition 4 still requires convex B). The specific named shapes "constant level shift", "last-period jump", and "custom delta vector" are R-package API parameterizations of that paper-supported framework, not separately analyzed in the published paper.
-- [ ] Plot of bias against pretest power for visual reporting (Roth's Figure 1 style)
-- [ ] Composes with HonestDiD result objects (shared beta_hat, Sigma_hat input contract)
+- [ ] Plot of bias against pretest power for visual reporting (Roth's Figure 1 layout). **Note (library extension):** the underlying numerical content is paper-derived (bias and CI by `gamma_p`), but the specific plotting interface is a library design choice.
+- [ ] Composes with HonestDiD result objects (shared beta_hat, Sigma_hat input contract). **Note (library extension):** Roth (2022) is methodology-agnostic about how `(beta_hat, Sigma_hat)` is produced; cross-estimator composition (with `HonestDiD`, `CallawaySantAnna`, etc.) is a diff-diff design choice.
 
 ---
 
@@ -172,7 +172,7 @@ simulation fallback.
 ### Data Structure Requirements
 - **Input**: beta_hat in R^{K+M} (concatenated pre + post event-study coefficients), Sigma_hat in R^{(K+M) x (K+M)} (variance-covariance matrix), integer K (# pre-period coefficients), integer M (# post-period coefficients)
 - **Optional input**: linear contrast l in R^M (defaults to uniform 1/M for average post-treatment effect, or e_1 for first-period-only)
-- **Optional input**: significance level alpha (default 0.05 → critical value 1.96)
+- **Optional input**: significance level alpha (default 0.05; critical value z_{1-alpha/2}, equal to 1.96 at the default)
 - **Optional input**: target power levels (default {0.5, 0.8} per Roth)
 - The pre-period coefficients are typically indexed by relative time t in {-K, -K+1, ..., -1}, with t = 0 omitted as the reference period
 - Compatible with the result classes of: MultiPeriodDiD (event study), CallawaySantAnna (staggered), SunAbraham (interaction-weighted), Freyaldenhoven-Hansen-Shapiro (covariate-based)

From 038897d6f17819b84bc79b5dcea0a3eb99e0d2b7 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 16 May 2026 21:40:34 -0400
Subject: [PATCH 07/22] Address R6 review with holistic restructure (1 P1 + 1
 P2 + 1 P3)

Per user direction (option A: holistic restructure) after three consecutive
rounds of "paper-derived vs library-extension boundary" P1s (R3 Prop 4
convexity, R4 nonlinear-trend "deviation", R6 analytical/simulation as
paper-required). The recurring class warranted naming the invariant and
fixing it structurally rather than another surface patch.

Holistic fix:
- Renamed the single Requirements-checklist block into two named
  subsections inside the copy-ready Registry entry:
    ### Paper-derived requirements (must satisfy to be faithful to Roth 2022)
    ### Library design choices (extensions beyond Roth 2022)
- Moved each non-paper item (simulation fallback, method/n_sim knobs,
  pretest_form/acceptance_region surface, named non-linear violation
  parameterizations, Figure-1 plotting interface, HonestDiD composition)
  into the Library Design Choices subsection with explicit framing that
  the library may keep, drop, or extend each item via the follow-up audit.
- Reworded the Tuning Parameters table's Status column to a clearer
  Source column ("Paper-derived" vs "Library extension") so every row's
  origin is unambiguous.

P2 (Table 2 factual error):
- Coverage bullet under "Empirical Findings" said unconditional gamma_{0.8}
  range was "53% to 98%". Actual Table 2 range for tau_bar unconditional
  under gamma_{0.8} is 14% (Deschenes et al. 2017) to 98% (Lafortune et al.;
  Markevich & Zhuravskaya). Fixed.

P3 (transient process / line refs):
- Dropped "PR-B in the 3-PR sequence" language from the Status banner and
  Tuning Parameters note.
- Replaced literal "diff_diff/pretrends.py:442-447" with stable symbol
  refs (`compute_pretrends_power`, `PreTrendsPowerResults`).
- Replaced "lines 2758-2808" REGISTRY range with a non-line-numbered
  description ("populated block framed primarily around a joint-Wald
  pre-trends test").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/roth-2022-review.md | 58 +++++++++++++--------
 1 file changed, 37 insertions(+), 21 deletions(-)

diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index b3bf5833..ca1525e7 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -10,9 +10,9 @@
 
 ## Methodology Registry Entry
 
-**Status: proposed replacement text for a future REGISTRY update; this file is a non-authoritative source audit.** The current `## PreTrendsPower` entry in `docs/methodology/REGISTRY.md` (lines 2758-2808) is a populated 50-line block framed primarily around a joint-Wald pre-trends test; it remains the **sole authoritative methodology contract** until the follow-up audit PR for `diff_diff/pretrends.py` (PR-B in the 3-PR sequence) lands and revises it. The PR-B audit will also assess which proposed parameters and capabilities below are already in the shipped surface (`diff_diff/pretrends.py:442-447` currently exposes `alpha`, `power`, `violation_type`, `violation_weights`) and which require API extension.
+**Status: proposed replacement text for a future REGISTRY update; this file is a non-authoritative source audit.** The current `## PreTrendsPower` entry in `docs/methodology/REGISTRY.md` is a populated block framed primarily around a joint-Wald pre-trends test; it remains the **sole authoritative methodology contract** until the follow-up audit PR for `compute_pretrends_power` (and its `PreTrendsPowerResults` class in `diff_diff/pretrends.py`) lands and revises it. The follow-up audit will also assess which proposed parameters and capabilities below are already in the shipped `compute_pretrends_power` signature (which currently exposes `alpha`, `power`, `violation_type`, `violation_weights`) and which require API extension.
 
-*Formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries — once PR-B is ready, the `## PreTrendsPower` section below can replace the existing registry entry.*
+*Formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries — once the follow-up audit is ready, the `## PreTrendsPower` section below can replace the existing registry entry.*
 
 ## PreTrendsPower
 
@@ -155,15 +155,31 @@ simulation fallback.
 - R: [`pretrends`](https://github.com/jonathandroth/pretrends) (Jonathan Roth's own package) and the accompanying Shiny app
 - R dependency: [`tmvtnorm`](https://cran.r-project.org/package=tmvtnorm) (Manjunath & Wilhelm 2012) for truncated multivariate normal moments and CDF
 
-**Requirements checklist:**
-- [ ] Acceptance regions: NIS (individual t, paper-analyzed); joint Wald (convex, Propositions 1+3+4 all apply); custom B (Propositions 1+3 apply to any measurable B, Proposition 4 only if B is convex); **Note (deviation from paper):** slope-of-best-fit-line is an extension beyond Roth (2022) — paper tabulates the slope t-stat but does not analyze a slope-t pretest framework
-- [ ] Power calculation against linear violation with slope gamma — solve for gamma_{0.5} and gamma_{0.8}
-- [ ] Analytical truncated multivariate normal path (tmvtnorm-equivalent) + simulation fallback
-- [ ] Unconditional and conditional bias for arbitrary linear contrast l in R^M (using Sigma_11 for the post-treatment variance)
-- [ ] Unconditional and conditional null rejection / coverage for the same linear contrast
-- [ ] Non-linear trend hypotheses (paper-supported via Section III): paper Section I.C formally tabulates power only against linear violations, but paper Section III ("Practical Recommendations") explicitly endorses power analyses against hypothesized nonlinear trends through the `pretrends` package, applied via the same Proposition 1+3 conditional-moment machinery (Proposition 4 still requires convex B). The specific named shapes "constant level shift", "last-period jump", and "custom delta vector" are R-package API parameterizations of that paper-supported framework, not separately analyzed in the published paper.
-- [ ] Plot of bias against pretest power for visual reporting (Roth's Figure 1 layout). **Note (library extension):** the underlying numerical content is paper-derived (bias and CI by `gamma_p`), but the specific plotting interface is a library design choice.
-- [ ] Composes with HonestDiD result objects (shared beta_hat, Sigma_hat input contract). **Note (library extension):** Roth (2022) is methodology-agnostic about how `(beta_hat, Sigma_hat)` is produced; cross-estimator composition (with `HonestDiD`, `CallawaySantAnna`, etc.) is a diff-diff design choice.
+### Paper-derived requirements
+
+*Required to remain faithful to Roth (2022). The follow-up audit PR must verify the library satisfies every item below.*
+
+- [ ] NIS acceptance region B_NIS(Sigma) with critical value z_{1-alpha/2} (paper-analyzed, Section I.B + Section II)
+- [ ] Joint Wald acceptance region B_W(Sigma) (paper-supported alternative; convex so Propositions 1+3+4 all apply; not separately tabulated by Roth)
+- [ ] Conditional-mean formula (Proposition 1) and conditional-variance formula (Proposition 3) for any measurable B(Sigma)
+- [ ] Variance-reduction / over-coverage guarantee (Proposition 4) gated on convex B(Sigma) only
+- [ ] Sign-of-bias result under monotone trend (Proposition 2) gated on Assumption 1 (homoskedastic-equicorrelated Sigma)
+- [ ] Power calculation against a linear violation with slope gamma — solve for gamma_p at user-specified target power 1 - p (Roth uses gamma_{0.5} and gamma_{0.8})
+- [ ] Plug-in estimator tau_hat = l' beta_hat_post and CI tau_hat +/- z_{1-alpha/2} * sqrt(l' Sigma_11 l) for any linear contrast l in R^M
+- [ ] Unconditional and conditional bias for the linear contrast
+- [ ] Unconditional and conditional null rejection / coverage for the linear contrast
+- [ ] Truncated MVN moments and probabilities computed via Roth's analytical path (footnote 8, `tmvtnorm` / Cartinhour-1990 / Manjunath-Wilhelm-2012)
+
+### Library design choices (extensions beyond Roth 2022)
+
+*These items are diff-diff or R-package conventions, NOT required by the paper. The library may keep, drop, or extend each item via the follow-up audit — preserving these items is a library design call, not a methodology requirement.*
+
+- **Simulation fallback path alongside the analytical TMVN computation** — Roth's footnote 8 reports simulation verification yields similar results, but neither the paper nor the R `pretrends` package requires a dual-path implementation. The simulation path is a library robustness choice for cases where the analytical computation is numerically unstable.
+- **`method` and `n_sim` API parameters** — proposed knobs to select between analytical and simulation; library design choice, not paper-required.
+- **`pretest_form` and `acceptance_region` API surface** — Roth's propositions apply to any (measurable) B(Sigma), so exposing the choice via a typed enum + custom-callable interface is an engineering choice. The enum values mix paper-analyzed forms ("individual" / NIS), paper-supported alternatives ("joint_wald", "custom"), and a non-paper extension ("slope" — Roth tabulates the slope t-stat in Table 1 as an observed property of surveyed papers but does not analyze it as an acceptance region).
+- **Non-linear violation parameterizations** ("constant", "last_period", "custom") — Roth Section III endorses power analyses against hypothesized nonlinear trends via the `pretrends` package (applying the same Propositions 1+3, with Proposition 4 conditioned on convex B). The specific named shapes are R-package API conventions, not separately analyzed in the published paper.
+- **Figure-1-style plotting interface** — the underlying numerical content (bias and CI by `gamma_p`) is paper-derived; the plotting layout is a library presentation choice.
+- **HonestDiD result-object composition / cross-estimator integration** — Roth (2022) is methodology-agnostic about how `(beta_hat, Sigma_hat)` is produced; composition with `HonestDiD`, `CallawaySantAnna`, `SunAbraham`, etc. is a diff-diff design choice.
 
 ---
 
@@ -185,17 +201,17 @@ simulation fallback.
 
 ### Tuning Parameters
 
-**Note:** The parameters below are what Roth's framework requires — they are NOT necessarily the current library's exposed API. The PR-B audit will compare these proposed knobs against `diff_diff/pretrends.py:442-447` (which currently exposes `alpha`, `power`, `violation_type`, `violation_weights`) and decide which to keep, rename, extend, or defer.
+**Note:** The parameters below span both paper-derived requirements (where the paper specifies a fixed default or a free parameter that affects the math) and proposed library extensions (engineering choices for the API surface). The `Source` column makes this distinction explicit. The follow-up audit for `compute_pretrends_power` (which currently exposes `alpha`, `power`, `violation_type`, `violation_weights`) will decide which proposed extensions to keep, rename, or defer.
 
-| Parameter | Type | Default | Status | Selection Method |
+| Parameter | Type | Default | Source | Selection Method |
 |-----------|------|---------|--------|-----------------|
-| `alpha` | float in (0, 1) | 0.05 | **Current** (matches `pretrends.py`) | Standard significance level for pretest and reporting CI |
-| `target_power` | list[float] in (0, 1) | [0.5, 0.8] | **Proposed** (current API exposes scalar `power=0.8` only) | Roth's reported benchmarks (Cohen 1988 conventional 0.8; 0.5 for "even-odds detection") |
-| `l` (contrast) | array in R^M | uniform 1/M | **Proposed** (not in current API) | User-specified linear functional of tau_post |
-| `pretest_form` | enum | "individual" (NIS) | **Proposed** (current API uses `violation_type`, a different axis) | "individual" (paper-analyzed); "joint_wald" (convex, Propositions 1+3+4 all apply); "custom" (Propositions 1+3 always; Proposition 4 only if user's B is convex); "slope" — **deviation from paper**, R-package extension |
-| `acceptance_region` | callable or set | B_NIS | **Proposed** (not in current API) | Custom B(Sigma) for "custom" pretest_form (Propositions 1, 3 apply to any measurable B; Proposition 4 / variance-reduction guarantee additionally requires B to be convex) |
-| `method` | enum | "analytical" | **Proposed** (not in current API) | "analytical" (tmvtnorm-equivalent) or "simulation" |
-| `n_sim` | int | 10000 | **Proposed** (not in current API) | Monte Carlo iterations when method="simulation" |
+| `alpha` | float in (0, 1) | 0.05 | Paper-derived (free parameter affecting z_{1-alpha/2}) — also currently exposed by `compute_pretrends_power` | Standard significance level for pretest and reporting CI |
+| `target_power` | list[float] in (0, 1) | [0.5, 0.8] | Paper-derived defaults (Cohen 1988 benchmark; 0.5 supplementary) — current API exposes scalar `power=0.8` only, so a list-valued knob is a proposed extension | Roth's reported benchmarks |
+| `l` (contrast) | array in R^M | uniform 1/M | Paper-derived (free parameter in Section I.C); not in current API as a top-level knob | User-specified linear functional of tau_post |
+| `pretest_form` | enum | "individual" (NIS) | **Library extension** (current API uses `violation_type`, a different axis; the paper has no single enum for this) | "individual" (paper-analyzed); "joint_wald" (convex, Propositions 1+3+4 all apply); "custom" (Propositions 1+3 always; Proposition 4 only if user's B is convex); "slope" — deviation beyond Roth (2022) |
+| `acceptance_region` | callable or set | B_NIS | **Library extension** (Roth's propositions apply to any measurable B, but the paper does not propose a callable interface) | Custom B(Sigma) for "custom" pretest_form (Propositions 1, 3 apply to any measurable B; Proposition 4 / variance-reduction guarantee additionally requires B to be convex) |
+| `method` | enum | "analytical" | **Library extension** (Roth uses analytical via `tmvtnorm`; simulation is a library robustness choice) | "analytical" (`tmvtnorm`-equivalent) or "simulation" |
+| `n_sim` | int | 10000 | **Library extension** (only meaningful when `method="simulation"`) | Monte Carlo iterations when method="simulation" |
 
 ### Relation to Existing diff-diff Estimators
 - **Pre-existing `diff_diff/pretrends.py`** (1133 lines) — implements a Roth-2022 framework; this paper review's main use is to audit the existing surface against the paper's exact equations
@@ -237,7 +253,7 @@ For each paper in Roth's empirical survey:
 Quoting Roth's key empirical results (for cross-validation):
 
 - **Power**: in the most extreme paper (Deryugina 2017), an unconditional bias of magnitude comparable to the estimated effect is detected only 50% of the time
-- **Coverage**: under gamma_{0.8} (80%-power slope), unconditional null rejection rates of 95% CIs range from 53% to 98% across the 12 papers
+- **Coverage**: under gamma_{0.8} (80%-power slope), unconditional null rejection rates of 95% CIs for tau_bar range from 14% (Deschenes et al. 2017) to 98% (Lafortune et al. 2017; Markevich & Zhuravskaya 2018) across the 12 papers in Table 2
 - **Pretest bias**: percent additional bias from pretest conditioning (Table 3, gamma_{0.8}, tau_1): from -34% (Bosch-Campos-Vazquez 2014, beneficial — rare) to +120% (Deryugina 2017, harmful — common); paper-aggregate finding is that conditional bias EXCEEDS unconditional bias in 9 of 12 papers for tau_1 and in 10 of 12 for tau_bar
 - **Equation 4 sign**: the relative-fraction term is < 1 (pretest helps screen out biased designs); the conditional-bias term is typically > 1 (pretest amplifies bias when a biased design is published); net sign depends on which dominates — the paper does not provide closed-form criteria
 

From d7b04e6bda87c57897729e5b3040ab4c309f1726 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 16 May 2026 21:48:28 -0400
Subject: [PATCH 08/22] Address R7 review (1 P1 + 2 P2) on roth-2022-review.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- P1 joint-Wald + tmvtnorm still in paper-derived checklist: the R6
  restructure narrowed the recurring "paper vs library" class but two
  items remained mislabeled — joint Wald (a paper-mentioned alternative
  used by 1 of 12 surveyed papers, not the empirical exercise) and the
  tmvtnorm-specific analytical backend (one of several valid numerical
  paths; the paper requires producing the right quantities, not using
  this specific solver). Moved both to "Library design choices
  (paper-supported alternatives and extensions)" with explicit framing
  that the paper requirement is at the QUANTITY level (conditional
  moments, power, bias, coverage), not at the API or numerical-backend
  level. Updated the subsection heading to "paper-supported alternatives
  and extensions beyond Roth 2022" to make the two categories explicit.
- P2 Proposition 4 summary row dropped "if bias is sufficiently large":
  the Key Theorems table row claimed CIs "UNDER-cover under violations"
  unconditionally, contradicting the conditional version stated correctly
  elsewhere in the file. Rewrote the row to limit the variance-reduction
  guarantee to its strict statement and pushed the CI coverage discussion
  into a clearly-conditioned addendum.
- P2 API misattribution: I had said `compute_pretrends_power` exposes
  `alpha, power, violation_type, violation_weights` (which is actually
  the `PreTrendsPower` class signature). The helper's actual signature
  is `compute_pretrends_power(results, M, alpha, target_power,
  violation_type, pre_periods)`. Updated the Status banner and Tuning
  Parameters note to name both surfaces explicitly with their actual
  signatures, and clarified that the follow-up audit will reconcile
  them with each other and with the proposed contract.
- Related P2-adjacent overclaim: line 217 said the library "implements
  a Roth-2022 framework"; line 270 said it "implements computations
  under arbitrary Sigma via Proposition 1". Both overstate what
  `pretrends.py` currently does. Tightened to "Wald-test-based pre-trends
  MDV/power workflow ... computes Wald power/MDV from the pre-period VCV
  block rather than the full arbitrary-Sigma Proposition 1/3/4
  conditional-moment computations", and reframed the heteroskedastic-
  Sigma gap as a future capability conditional on the Prop 1/3/4
  path being added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/roth-2022-review.md | 28 ++++++++++-----------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index ca1525e7..a17e66a7 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -10,7 +10,7 @@
 
 ## Methodology Registry Entry
 
-**Status: proposed replacement text for a future REGISTRY update; this file is a non-authoritative source audit.** The current `## PreTrendsPower` entry in `docs/methodology/REGISTRY.md` is a populated block framed primarily around a joint-Wald pre-trends test; it remains the **sole authoritative methodology contract** until the follow-up audit PR for `compute_pretrends_power` (and its `PreTrendsPowerResults` class in `diff_diff/pretrends.py`) lands and revises it. The follow-up audit will also assess which proposed parameters and capabilities below are already in the shipped `compute_pretrends_power` signature (which currently exposes `alpha`, `power`, `violation_type`, `violation_weights`) and which require API extension.
+**Status: proposed replacement text for a future REGISTRY update; this file is a non-authoritative source audit.** The current `## PreTrendsPower` entry in `docs/methodology/REGISTRY.md` is a populated block framed primarily around a joint-Wald pre-trends test; it remains the **sole authoritative methodology contract** until the follow-up audit PR for `compute_pretrends_power` (the helper function), `PreTrendsPower` (the estimator class), and `PreTrendsPowerResults` (the results container) in `diff_diff/pretrends.py` lands and revises it. The follow-up audit will assess which proposed parameters and capabilities below are already in the shipped surfaces. Current signatures (for reference): the helper `compute_pretrends_power(results, M, alpha, target_power, violation_type, pre_periods)` exposes `alpha`, `target_power`, `violation_type`, `pre_periods` (plus the optional violation magnitude `M`); the class `PreTrendsPower(alpha, power, violation_type, violation_weights)` exposes `alpha`, `power`, `violation_type`, `violation_weights`. The audit will reconcile these two surfaces with each other and against this proposed contract.
 
 *Formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries — once the follow-up audit is ready, the `## PreTrendsPower` section below can replace the existing registry entry.*
 
@@ -157,25 +157,25 @@ simulation fallback.
 
 ### Paper-derived requirements
 
-*Required to remain faithful to Roth (2022). The follow-up audit PR must verify the library satisfies every item below.*
+*Required to remain faithful to Roth (2022). These are the mathematical quantities and conditioning formulas the paper specifies — they do not constrain the numerical backend or the API surface. The follow-up audit PR must verify the library can produce every item below.*
 
-- [ ] NIS acceptance region B_NIS(Sigma) with critical value z_{1-alpha/2} (paper-analyzed, Section I.B + Section II)
-- [ ] Joint Wald acceptance region B_W(Sigma) (paper-supported alternative; convex so Propositions 1+3+4 all apply; not separately tabulated by Roth)
+- [ ] NIS acceptance region B_NIS(Sigma) with critical value z_{1-alpha/2} (paper-analyzed in Section I.B + Section II, used for every empirical exercise)
 - [ ] Conditional-mean formula (Proposition 1) and conditional-variance formula (Proposition 3) for any measurable B(Sigma)
-- [ ] Variance-reduction / over-coverage guarantee (Proposition 4) gated on convex B(Sigma) only
+- [ ] Variance-reduction guarantee (Proposition 4) gated on convex B(Sigma) only
 - [ ] Sign-of-bias result under monotone trend (Proposition 2) gated on Assumption 1 (homoskedastic-equicorrelated Sigma)
 - [ ] Power calculation against a linear violation with slope gamma — solve for gamma_p at user-specified target power 1 - p (Roth uses gamma_{0.5} and gamma_{0.8})
 - [ ] Plug-in estimator tau_hat = l' beta_hat_post and CI tau_hat +/- z_{1-alpha/2} * sqrt(l' Sigma_11 l) for any linear contrast l in R^M
 - [ ] Unconditional and conditional bias for the linear contrast
 - [ ] Unconditional and conditional null rejection / coverage for the linear contrast
-- [ ] Truncated MVN moments and probabilities computed via Roth's analytical path (footnote 8, `tmvtnorm` / Cartinhour-1990 / Manjunath-Wilhelm-2012)
 
-### Library design choices (extensions beyond Roth 2022)
+### Library design choices (paper-supported alternatives and extensions beyond Roth 2022)
 
-*These items are diff-diff or R-package conventions, NOT required by the paper. The library may keep, drop, or extend each item via the follow-up audit — preserving these items is a library design call, not a methodology requirement.*
+*These items are diff-diff or R-package conventions, NOT required by the paper. They include both (i) acceptance regions and computations that the paper supports as alternatives without requiring or tabulating them, and (ii) genuine extensions beyond the paper's analysis. The library may keep, drop, or extend each item via the follow-up audit — preserving these items is a library design call, not a methodology requirement.*
 
-- **Simulation fallback path alongside the analytical TMVN computation** — Roth's footnote 8 reports simulation verification yields similar results, but neither the paper nor the R `pretrends` package requires a dual-path implementation. The simulation path is a library robustness choice for cases where the analytical computation is numerically unstable.
-- **`method` and `n_sim` API parameters** — proposed knobs to select between analytical and simulation; library design choice, not paper-required.
+- **Joint Wald acceptance region** B_W(Sigma) — paper-supported alternative (convex, so Propositions 1+3+4 all apply), but Roth's empirical exercise uses NIS only and the paper does not separately tabulate Wald-based power/bias/coverage. Library support is a paper-supported alternative, not a Roth-required item.
+- **Analytical computational backend** (`tmvtnorm` / Cartinhour-1990 / Manjunath-Wilhelm-2012) — Roth uses this analytical path in the paper, but the requirement is to produce the correct conditional moments and probabilities; any equivalent backend (`tmvtnorm`, a from-scratch port, Monte Carlo simulation, GHK simulator, etc.) is acceptable. The choice of backend is a library implementation decision, not a methodology requirement.
+- **Simulation fallback path alongside the analytical computation** — Roth's footnote 8 reports simulation verification yields similar results, but neither the paper nor the R `pretrends` package requires a dual-path implementation. The simulation path is a library robustness choice for cases where the analytical computation is numerically unstable.
+- **`method` and `n_sim` API parameters** — proposed knobs to select between analytical and simulation; library design choice.
 - **`pretest_form` and `acceptance_region` API surface** — Roth's propositions apply to any (measurable) B(Sigma), so exposing the choice via a typed enum + custom-callable interface is an engineering choice. The enum values mix paper-analyzed forms ("individual" / NIS), paper-supported alternatives ("joint_wald", "custom"), and a non-paper extension ("slope" — Roth tabulates the slope t-stat in Table 1 as an observed property of surveyed papers but does not analyze it as an acceptance region).
 - **Non-linear violation parameterizations** ("constant", "last_period", "custom") — Roth Section III endorses power analyses against hypothesized nonlinear trends via the `pretrends` package (applying the same Propositions 1+3, with Proposition 4 conditioned on convex B). The specific named shapes are R-package API conventions, not separately analyzed in the published paper.
 - **Figure-1-style plotting interface** — the underlying numerical content (bias and CI by `gamma_p`) is paper-derived; the plotting layout is a library presentation choice.
@@ -201,7 +201,7 @@ simulation fallback.
 
 ### Tuning Parameters
 
-**Note:** The parameters below span both paper-derived requirements (where the paper specifies a fixed default or a free parameter that affects the math) and proposed library extensions (engineering choices for the API surface). The `Source` column makes this distinction explicit. The follow-up audit for `compute_pretrends_power` (which currently exposes `alpha`, `power`, `violation_type`, `violation_weights`) will decide which proposed extensions to keep, rename, or defer.
+**Note:** The parameters below span both paper-derived requirements (where the paper specifies a fixed default or a free parameter that affects the math) and proposed library extensions (engineering choices for the API surface). The `Source` column makes this distinction explicit. The follow-up audit will reconcile this proposed table against the two current shipped surfaces — the helper `compute_pretrends_power` (exposes `alpha`, `target_power`, `violation_type`, `pre_periods`, plus the optional violation magnitude `M`) and the class `PreTrendsPower` (exposes `alpha`, `power`, `violation_type`, `violation_weights`) — and decide which proposed extensions to keep, rename, unify, or defer.
 
 | Parameter | Type | Default | Source | Selection Method |
 |-----------|------|---------|--------|-----------------|
@@ -214,7 +214,7 @@ simulation fallback.
 | `n_sim` | int | 10000 | **Library extension** (only meaningful when `method="simulation"`) | Monte Carlo iterations when method="simulation" |
 
 ### Relation to Existing diff-diff Estimators
-- **Pre-existing `diff_diff/pretrends.py`** (1133 lines) — implements a Roth-2022 framework; this paper review's main use is to audit the existing surface against the paper's exact equations
+- **Pre-existing `diff_diff/pretrends.py`** (1133 lines) — implements a Wald-test-based pre-trends MDV/power workflow framed around Roth (2022); the current code path computes Wald power/MDV from the pre-period variance-covariance block rather than the full arbitrary-Sigma Proposition 1 / Proposition 3 / Proposition 4 conditional-moment computations. This paper review's main use is to audit the existing surface against the paper's exact equations and identify which Roth-2022 quantities are missing.
 - **Composes with**: `MultiPeriodDiD`, `CallawaySantAnna`, `SunAbraham`, `TwoWayFixedEffects` — any estimator producing an event-study coefficient vector and a consistent variance estimator
 - **Complement to `HonestDiD` (Rambachan-Roth 2023)**: Roth 2022 asks "what bias survives a pretest under linear violations?"; Rambachan-Roth 2023 asks "what is the identified set of tau_post under bounded violations?" Both use the same (beta_hat, Sigma_hat) input contract — the library should expose a unified entry-point that can produce both Roth-2022 and HonestDiD reports from one event-study result object.
 - **Shares zero-anticipation convention with HonestDiD**: tau_pre = 0, so beta_pre = delta_pre. Cross-reference the existing `diff_diff/honest_did.py` for the contract.
@@ -228,7 +228,7 @@ simulation fallback.
 | **Proposition 1** | For any B(Sigma): E[beta_hat_post | beta_hat_pre in B] = tau_post + delta_post + Sigma_{12} Sigma_{22}^{-1} (E[beta_hat_pre | beta_hat_pre in B] - beta_pre) | The main bias decomposition formula. Drives the conditional-bias computation in step 4 of the algorithm. |
 | **Proposition 2** | Under Assumption 1 (homoskedastic-equicorrelated Sigma) and monotone trend (delta_pre < 0, delta_post > 0): E[beta_hat_post | beta_hat_pre in B_NIS] > beta_post > tau_post | Justifies a WARN that conditional bias is worse than unconditional bias under monotone trends — applicable in many but not all empirical settings. Assumption 1 is a condition on the *estimated covariance matrix* Sigma, not on design metadata; any sharper warning must therefore be triggered by a *direct numerical check* of Sigma (approximately-constant diagonal entries + approximately-constant positive off-diagonal entries below the diagonal). Without such a check, the library should issue only the generic caveat that the sign-of-bias result is ambiguous outside Assumption 1. |
 | **Proposition 3** | Var[beta_hat_post | beta_hat_pre in B] = Var[beta_hat_post] + (Sigma_{12} Sigma_{22}^{-1}) (Var[beta_hat_pre | beta_hat_pre in B] - Var[beta_hat_pre]) (Sigma_{12} Sigma_{22}^{-1})' | The conditional-variance formula; drives the over/under-coverage analysis. |
-| **Proposition 4** | If B(Sigma) is convex: Var[beta_hat_post | beta_hat_pre in B] <= Var[beta_hat_post]. CIs based on unconditional Sigma OVER-cover under parallel trends, UNDER-cover under violations. | Justifies the "do not interpret a wide CI as ample power" warning. |
+| **Proposition 4** | If B(Sigma) is convex: Var[beta_hat_post | beta_hat_pre in B] <= Var[beta_hat_post] (variance-reduction guarantee, conditional on convex B only). | Justifies the "do not interpret a wide CI as ample power" warning. Implication for CI coverage (Section II.C paragraph after Prop 4): CIs based on unconditional Sigma tend to OVER-cover under parallel trends with symmetric B; under violations they tend to UNDER-cover *only if the bias is sufficiently large* to outweigh the variance reduction — the under-coverage direction is contingent on bias magnitude, not universal. |
 
 No formal theorems are stated for the publication-rules analysis (Section II.D); Equation 4 is the operational result.
 
@@ -267,7 +267,7 @@ Quoting Roth's key empirical results (for cross-validation):
 - **Custom delta vector interface**: paper Section III endorses "power analyses for the types of violations of parallel trends deemed to be most relevant in their context," which is the paper-level framing for a user-supplied delta vector; the specific `violation_weights`-style INTERFACE used in the library and the R `pretrends` package is a package-API convention layered on top of that paper-level framework.
 - **Choice of contrast l**: paper highlights l = uniform 1/M (average post-treatment) and l = e_1 (first period after treatment). No guidance on other contrasts (e.g., long-run effect l = e_M, dynamic-weighted contrast) — library should document defaults and warn that bias and coverage depend on l.
 - **K = 0 (no pre-periods)**: trivially no pretest possible; library should error.
-- **Heteroskedastic Sigma**: Proposition 2 requires Assumption 1. Library implements computations under arbitrary Sigma via Proposition 1; the sign of the bias-amplification effect is then NOT guaranteed. Library should NOT print "pretest amplifies bias under monotone trends" unless Assumption 1 is approximately satisfied (or just always issue the conditional warning).
+- **Heteroskedastic Sigma**: Proposition 2 requires Assumption 1. Once Proposition 1 / Proposition 3 / Proposition 4 computations are added (current `diff_diff/pretrends.py` does Wald-test power/MDV only, not the full conditional-moment path), the library will be able to operate under arbitrary Sigma — but at that point the sign of the bias-amplification effect is NOT guaranteed without Assumption 1. The library should NOT print "pretest amplifies bias under monotone trends" unless Assumption 1 is approximately satisfied (or just always issue the conditional warning).
 - **Equation 4 publication-rules analysis**: not standardly implemented in PreTrendsPower-style tools. Roth notes it as part of the discussion (Section II.D) but does not provide a numerical workflow for users. Library should NOT attempt to implement Equation 4 unless requested.
 - **Connection to `compute_pretrends_power` library helper**: the paper review confirms that "minimum slope detectable at 80% power" is exactly Roth's gamma_{0.8}, and the library helper should compute and surface this. Need to verify the existing helper's calling convention against the paper's framework when auditing `diff_diff/pretrends.py`.
 - **R `pretrends` package version**: paper cites the package at https://github.com/jonathandroth/pretrends; no specific version cited. R-parity work should pin to a specific commit and document.

From 1e4a2d935d242dea2ec060539285764ca5b419d1 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 16 May 2026 21:54:44 -0400
Subject: [PATCH 09/22] Address R8 review polish (2 P2 + 1 P3) on
 roth-2022-review.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

R8 verdict was ✅ Looks good (no P0/P1) — addressing remaining polish items.

- P2 internal contradiction on simulation-fallback requirement: the
  "Computational shortcut (footnote 8)" paragraph said the library "should
  support both an analytical truncated-multivariate-normal path AND a
  simulation fallback", contradicting the Library design choices subsection
  which correctly classifies simulation fallback as a library robustness
  choice. Rewrote the paragraph to state the paper-derived requirement at
  the quantity level (compute the correct conditional moments and
  probabilities) and explicitly defer backend choice to the Library design
  choices subsection.
- P2 joint-Wald R-parity overclaim: the Gaps section said "library should
  implement both but test against R `pretrends` for the joint-Wald case
  (Roth's package supports it)." Verified via R-package source/docs that
  the published `pretrends` package exposes only NIS-based surfaces
  (`pretrends()`, `slope_for_power()`, `*_NIS` helpers) — no joint-Wald
  parity target. Rewrote to make explicit that joint Wald is theoretically
  admissible under the propositions but parity would need an independent
  fixture/derivation, not direct R-package parity.
- P3 composition overclaim: the Relation section listed
  `TwoWayFixedEffects` and "any estimator producing an event-study
  coefficient vector" as currently composable, but the shipped
  `compute_pretrends_power` adapter only dispatches `MultiPeriodDiDResults`,
  `CallawaySantAnnaResults`, `SunAbrahamResults` and raises `TypeError`
  otherwise. Split into "Currently composes with" (the three shipped
  adapter targets) and "theoretical compatibility" (extends to TWFE etc.,
  pending follow-up audit adapters).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/roth-2022-review.md | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index a17e66a7..072414e7 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -121,8 +121,10 @@ or 80% power"). Roth uses 80% as a benchmark following Cohen (1988); 50% is supp
 Under joint normality, these probabilities and conditional moments can be calculated
 ANALYTICALLY using results from Cartinhour (1990) and Manjunath & Wilhelm (2012) — Roth
 implements via the R package `tmvtnorm`. Roth verifies simulations yield similar results.
-The library should support both an analytical truncated-multivariate-normal path AND a
-simulation fallback.
+The paper-derived requirement is to compute the correct conditional moments and
+probabilities; the specific backend (analytical via `tmvtnorm`-equivalent, an
+independent port, a Monte Carlo simulator, etc.) is a library implementation choice —
+see "Library design choices" above for the proposed `method` / `n_sim` knobs.
 
 *Standard errors (Section II.C; footnote 7 equivariance):*
 - Power calculations are EXACT (no sampling variability — gamma is computed against a hypothesized population trend, not estimated)
@@ -215,7 +217,7 @@ simulation fallback.
 
 ### Relation to Existing diff-diff Estimators
 - **Pre-existing `diff_diff/pretrends.py`** (1133 lines) — implements a Wald-test-based pre-trends MDV/power workflow framed around Roth (2022); the current code path computes Wald power/MDV from the pre-period variance-covariance block rather than the full arbitrary-Sigma Proposition 1 / Proposition 3 / Proposition 4 conditional-moment computations. This paper review's main use is to audit the existing surface against the paper's exact equations and identify which Roth-2022 quantities are missing.
-- **Composes with**: `MultiPeriodDiD`, `CallawaySantAnna`, `SunAbraham`, `TwoWayFixedEffects` — any estimator producing an event-study coefficient vector and a consistent variance estimator
+- **Currently composes with** (per the shipped `compute_pretrends_power` adapter in `diff_diff/pretrends.py`): `MultiPeriodDiDResults`, `CallawaySantAnnaResults`, `SunAbrahamResults`. The adapter raises `TypeError` for other result types. Theoretical compatibility extends to any estimator producing an event-study coefficient vector and a consistent variance estimator (e.g., `TwoWayFixedEffects`), but adapters for additional result families are a follow-up audit decision.
 - **Complement to `HonestDiD` (Rambachan-Roth 2023)**: Roth 2022 asks "what bias survives a pretest under linear violations?"; Rambachan-Roth 2023 asks "what is the identified set of tau_post under bounded violations?" Both use the same (beta_hat, Sigma_hat) input contract — the library should expose a unified entry-point that can produce both Roth-2022 and HonestDiD reports from one event-study result object.
 - **Shares zero-anticipation convention with HonestDiD**: tau_pre = 0, so beta_pre = delta_pre. Cross-reference the existing `diff_diff/honest_did.py` for the contract.
 
@@ -261,7 +263,7 @@ Quoting Roth's key empirical results (for cross-validation):
 
 ## Gaps and Uncertainties
 
-- **Joint Wald acceptance region**: paper mentions joint tests only briefly (Section I.B notes 1 of 12 papers uses one). Power, bias, and coverage formulas all apply by replacing B_NIS with the joint Wald acceptance region B_W, but Roth does not work out a separate table. Library should implement both but test against R `pretrends` for the joint-Wald case (Roth's package supports it).
+- **Joint Wald acceptance region**: paper mentions joint tests only briefly (Section I.B notes 1 of 12 papers uses one). Power, bias, and coverage formulas all apply by replacing B_NIS with the joint Wald acceptance region B_W (convex, so Propositions 1+3+4 all hold), but Roth does not work out a separate table. Joint Wald is theoretically admissible under the paper's propositions, but the published R `pretrends` package surface is NIS-based (`pretrends()`, `slope_for_power()`, `*_NIS` helpers) and does NOT expose a joint-Wald parity target — any library implementation of joint-Wald PreTrendsPower will need an independent fixture or derivation rather than direct R-package parity.
 - **"Slope-of-best-fit-line t-test" acceptance region**: Table 1 column shows the t-stat for the slope of the linear pre-trend. Paper does not analyze pretests based on this t-stat as a separate acceptance region; library should NOT extrapolate without further reading the `pretrends` package source.
 - **Nonlinear violations**: Section I.C formally tabulates power only against linear violations; Section I.D extends the sign-of-bias result (Proposition 2) to monotone violations under homoskedasticity. Section III ("Practical Recommendations") explicitly endorses power analyses against hypothesized nonlinear trends via the `pretrends` package, so the general nonlinear capability is paper-supported even though the paper does not separately tabulate it. The specific named shapes the library exposes ("constant", "last_period") are R-package API conventions, not separately analyzed in the paper.
 - **Custom delta vector interface**: paper Section III endorses "power analyses for the types of violations of parallel trends deemed to be most relevant in their context," which is the paper-level framing for a user-supplied delta vector; the specific `violation_weights`-style INTERFACE used in the library and the R `pretrends` package is a package-API convention layered on top of that paper-level framework.

From 8f14d73e2f39f6ff1811554532f6ef7609334b99 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 17 May 2026 06:11:20 -0400
Subject: [PATCH 10/22] Address R9 review (2 P1) on roth-2022-review.md

- P1 CS/SA covariance-source deviation: the R7 edit said current
  `pretrends.py` "computes Wald power/MDV from the pre-period variance-
  covariance block", but only the `MultiPeriodDiDResults` branch extracts
  the full pre-period sub-VCV (pretrends.py:592-601). The CS branch
  (pretrends.py:609-652) and SA branch (pretrends.py:660-687) hard-code
  `vcov = np.diag(ses^2)`, dropping the pre-period correlations the
  paper's Propositions rely on. Added an explicit "Note (deviation in
  current covariance-source)" bullet in the "Relation to Existing
  diff-diff Estimators" section documenting the per-result-type split and
  cross-referencing the conservative-variance framing in REPORTING.md.
  The follow-up audit will decide whether to extend full-sub-VCV
  extraction to CS/SA or keep the diag fallback as a deliberate variance-
  conservatism choice.
- P1 Assumption 1 model-vs-estimated Sigma: the R2 edit said
  "Assumption 1 is a condition on the *estimated covariance matrix*
  Sigma". Roth's Section II.B states Assumption 1 on the *model*
  covariance Sigma (the population VCV in Equation 1). Software can only
  inspect the *estimated* Sigma_hat, so any direct numerical check is a
  HEURISTIC implementation proxy, not the paper's assumption itself.
  Rewrote the Prop 2 Implementation-use cell to (a) name Sigma as the
  model covariance, (b) acknowledge Sigma_hat as the implementation
  surrogate, (c) require any sharper warning based on Sigma_hat be
  labeled heuristic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/roth-2022-review.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index 072414e7..cac217ee 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -218,6 +218,7 @@ see "Library design choices" above for the proposed `method` / `n_sim` knobs.
 ### Relation to Existing diff-diff Estimators
 - **Pre-existing `diff_diff/pretrends.py`** (1133 lines) — implements a Wald-test-based pre-trends MDV/power workflow framed around Roth (2022); the current code path computes Wald power/MDV from the pre-period variance-covariance block rather than the full arbitrary-Sigma Proposition 1 / Proposition 3 / Proposition 4 conditional-moment computations. This paper review's main use is to audit the existing surface against the paper's exact equations and identify which Roth-2022 quantities are missing.
 - **Currently composes with** (per the shipped `compute_pretrends_power` adapter in `diff_diff/pretrends.py`): `MultiPeriodDiDResults`, `CallawaySantAnnaResults`, `SunAbrahamResults`. The adapter raises `TypeError` for other result types. Theoretical compatibility extends to any estimator producing an event-study coefficient vector and a consistent variance estimator (e.g., `TwoWayFixedEffects`), but adapters for additional result families are a follow-up audit decision.
+- **Note (deviation in current covariance-source):** the shipped adapter uses different sources for the pre-period covariance depending on the result type. For `MultiPeriodDiDResults` it extracts the full pre-period sub-block from `results.vcov` when `interaction_indices` is populated, falling back to `diag(ses^2)` otherwise (`pretrends.py:592-601`). For `CallawaySantAnnaResults` and `SunAbrahamResults` it hard-codes `vcov = diag(ses^2)` even when an event-study VCV is available (`pretrends.py:609-652` for CS, `pretrends.py:660-687` for SA). The diagonal fallback drops the pre-period correlations the paper's Propositions rely on; the documented deviation is captured in `docs/methodology/REPORTING.md` as a conservative variance source for staggered adapters. The follow-up audit should decide whether to extend full-sub-VCV extraction to CS/SA (matching the MultiPeriodDiD path) or keep the staggered diag fallback as a deliberate variance-conservatism choice.
 - **Complement to `HonestDiD` (Rambachan-Roth 2023)**: Roth 2022 asks "what bias survives a pretest under linear violations?"; Rambachan-Roth 2023 asks "what is the identified set of tau_post under bounded violations?" Both use the same (beta_hat, Sigma_hat) input contract — the library should expose a unified entry-point that can produce both Roth-2022 and HonestDiD reports from one event-study result object.
 - **Shares zero-anticipation convention with HonestDiD**: tau_pre = 0, so beta_pre = delta_pre. Cross-reference the existing `diff_diff/honest_did.py` for the contract.
 
@@ -228,7 +229,7 @@ see "Library design choices" above for the proposed `method` / `n_sim` knobs.
 | # | Statement | Implementation use |
 |---|-----------|---------------------|
 | **Proposition 1** | For any B(Sigma): E[beta_hat_post | beta_hat_pre in B] = tau_post + delta_post + Sigma_{12} Sigma_{22}^{-1} (E[beta_hat_pre | beta_hat_pre in B] - beta_pre) | The main bias decomposition formula. Drives the conditional-bias computation in step 4 of the algorithm. |
-| **Proposition 2** | Under Assumption 1 (homoskedastic-equicorrelated Sigma) and monotone trend (delta_pre < 0, delta_post > 0): E[beta_hat_post | beta_hat_pre in B_NIS] > beta_post > tau_post | Justifies a WARN that conditional bias is worse than unconditional bias under monotone trends — applicable in many but not all empirical settings. Assumption 1 is a condition on the *estimated covariance matrix* Sigma, not on design metadata; any sharper warning must therefore be triggered by a *direct numerical check* of Sigma (approximately-constant diagonal entries + approximately-constant positive off-diagonal entries below the diagonal). Without such a check, the library should issue only the generic caveat that the sign-of-bias result is ambiguous outside Assumption 1. |
+| **Proposition 2** | Under Assumption 1 (homoskedastic-equicorrelated Sigma) and monotone trend (delta_pre < 0, delta_post > 0): E[beta_hat_post | beta_hat_pre in B_NIS] > beta_post > tau_post | Justifies a WARN that conditional bias is worse than unconditional bias under monotone trends — applicable in many but not all empirical settings. Assumption 1 in the paper is a condition on the *model* covariance Σ (the population variance-covariance of β̂ in Equation 1), not on design metadata. Software can only inspect the *estimated* Σ̂, so any direct numerical check (e.g., approximately-constant diagonal entries + approximately-constant positive off-diagonal entries below the diagonal) is a heuristic implementation aid, not the paper's assumption itself. A library that surfaces a sharper warning based on Σ̂ should label it as a heuristic; without such a check, the library should issue only the generic caveat that the sign-of-bias result is ambiguous outside Assumption 1. |
 | **Proposition 3** | Var[beta_hat_post | beta_hat_pre in B] = Var[beta_hat_post] + (Sigma_{12} Sigma_{22}^{-1}) (Var[beta_hat_pre | beta_hat_pre in B] - Var[beta_hat_pre]) (Sigma_{12} Sigma_{22}^{-1})' | The conditional-variance formula; drives the over/under-coverage analysis. |
 | **Proposition 4** | If B(Sigma) is convex: Var[beta_hat_post | beta_hat_pre in B] <= Var[beta_hat_post] (variance-reduction guarantee, conditional on convex B only). | Justifies the "do not interpret a wide CI as ample power" warning. Implication for CI coverage (Section II.C paragraph after Prop 4): CIs based on unconditional Sigma tend to OVER-cover under parallel trends with symmetric B; under violations they tend to UNDER-cover *only if the bias is sufficiently large* to outweigh the variance reduction — the under-coverage direction is contingent on bias magnitude, not universal. |
 

From 3f90526462410acc4531998fe99265792289f85c Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 17 May 2026 06:18:21 -0400
Subject: [PATCH 11/22] Address R10 review (1 P1 + 1 P3) on roth-2022-review.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- P1 "conservative variance source" framing for diag fallback is wrong:
  R9 fix added a Note saying the CS/SA `diag(ses^2)` fallback could be
  retained as a "deliberate variance-conservatism choice". Roth's
  Propositions 1+3+4 (and the MDV/power computation w' Σ_22^{-1} w)
  operate on the full pre-period covariance block Σ_22 — dropping the
  off-diagonals is a non-paper approximation, not a paper-supported
  numerical option, and the direction of the discrepancy with the full-
  Σ_22 calc depends on the sign and magnitude of the dropped
  correlations (not guaranteed conservative). Reworded the Note to
  describe `diag(ses^2)` explicitly as a non-paper approximation; if the
  follow-up audit retains it, an explicit REGISTRY.md Note describing
  the approximation + its possible miscalibration is required instead.
- P3 R package version not pinned: R `pretrends` package API claims in
  the Joint-Wald and version-pin Gaps bullets were temporally unstable.
  Softened the Joint-Wald observation to "as observed at the time of
  this review (specific commit not pinned)" with an explicit ask for the
  follow-up audit to record the exact revision. Strengthened the
  version-pin Gaps bullet to enumerate exactly which surface claims
  depend on the unpinned R-package surface (NIS-only, no joint-Wald,
  `pretrends()` / `slope_for_power()` / `*_NIS` helpers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/roth-2022-review.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index cac217ee..14029a19 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -218,7 +218,7 @@ see "Library design choices" above for the proposed `method` / `n_sim` knobs.
 ### Relation to Existing diff-diff Estimators
 - **Pre-existing `diff_diff/pretrends.py`** (1133 lines) — implements a Wald-test-based pre-trends MDV/power workflow framed around Roth (2022); the current code path computes Wald power/MDV from the pre-period variance-covariance block rather than the full arbitrary-Sigma Proposition 1 / Proposition 3 / Proposition 4 conditional-moment computations. This paper review's main use is to audit the existing surface against the paper's exact equations and identify which Roth-2022 quantities are missing.
 - **Currently composes with** (per the shipped `compute_pretrends_power` adapter in `diff_diff/pretrends.py`): `MultiPeriodDiDResults`, `CallawaySantAnnaResults`, `SunAbrahamResults`. The adapter raises `TypeError` for other result types. Theoretical compatibility extends to any estimator producing an event-study coefficient vector and a consistent variance estimator (e.g., `TwoWayFixedEffects`), but adapters for additional result families are a follow-up audit decision.
-- **Note (deviation in current covariance-source):** the shipped adapter uses different sources for the pre-period covariance depending on the result type. For `MultiPeriodDiDResults` it extracts the full pre-period sub-block from `results.vcov` when `interaction_indices` is populated, falling back to `diag(ses^2)` otherwise (`pretrends.py:592-601`). For `CallawaySantAnnaResults` and `SunAbrahamResults` it hard-codes `vcov = diag(ses^2)` even when an event-study VCV is available (`pretrends.py:609-652` for CS, `pretrends.py:660-687` for SA). The diagonal fallback drops the pre-period correlations the paper's Propositions rely on; the documented deviation is captured in `docs/methodology/REPORTING.md` as a conservative variance source for staggered adapters. The follow-up audit should decide whether to extend full-sub-VCV extraction to CS/SA (matching the MultiPeriodDiD path) or keep the staggered diag fallback as a deliberate variance-conservatism choice.
+- **Note (deviation in current covariance-source):** the shipped adapter uses different sources for the pre-period covariance depending on the result type. For `MultiPeriodDiDResults` it extracts the full pre-period sub-block from `results.vcov` when `interaction_indices` is populated, falling back to `diag(ses^2)` otherwise (`pretrends.py:592-601`). For `CallawaySantAnnaResults` and `SunAbrahamResults` it hard-codes `vcov = diag(ses^2)` even when an event-study VCV is available (`pretrends.py:609-652` for CS, `pretrends.py:660-687` for SA). The diagonal fallback is a **non-paper approximation**: Roth's power and bias formulas operate on the full pre-period covariance block Σ_22 (the key MDV/power quantity is `w' Σ_22^{-1} w`, which depends on the off-diagonals), and dropping the off-diagonals is NOT a paper-supported numerical choice and is NOT guaranteed to be conservative for MDV/power (the direction of the discrepancy depends on the sign and magnitude of the dropped correlations). The follow-up audit should either extend full-sub-VCV extraction to CS/SA (matching the MultiPeriodDiD path) or, if the staggered diag fallback is retained, add an explicit `REGISTRY.md` Note describing the approximation and its possible miscalibration rather than framing it as conservative.
 - **Complement to `HonestDiD` (Rambachan-Roth 2023)**: Roth 2022 asks "what bias survives a pretest under linear violations?"; Rambachan-Roth 2023 asks "what is the identified set of tau_post under bounded violations?" Both use the same (beta_hat, Sigma_hat) input contract — the library should expose a unified entry-point that can produce both Roth-2022 and HonestDiD reports from one event-study result object.
 - **Shares zero-anticipation convention with HonestDiD**: tau_pre = 0, so beta_pre = delta_pre. Cross-reference the existing `diff_diff/honest_did.py` for the contract.
 
@@ -264,7 +264,7 @@ Quoting Roth's key empirical results (for cross-validation):
 
 ## Gaps and Uncertainties
 
-- **Joint Wald acceptance region**: paper mentions joint tests only briefly (Section I.B notes 1 of 12 papers uses one). Power, bias, and coverage formulas all apply by replacing B_NIS with the joint Wald acceptance region B_W (convex, so Propositions 1+3+4 all hold), but Roth does not work out a separate table. Joint Wald is theoretically admissible under the paper's propositions, but the published R `pretrends` package surface is NIS-based (`pretrends()`, `slope_for_power()`, `*_NIS` helpers) and does NOT expose a joint-Wald parity target — any library implementation of joint-Wald PreTrendsPower will need an independent fixture or derivation rather than direct R-package parity.
+- **Joint Wald acceptance region**: paper mentions joint tests only briefly (Section I.B notes 1 of 12 papers uses one). Power, bias, and coverage formulas all apply by replacing B_NIS with the joint Wald acceptance region B_W (convex, so Propositions 1+3+4 all hold), but Roth does not work out a separate table. Joint Wald is theoretically admissible under the paper's propositions, but the published R `pretrends` package surface, as observed at the time of this review (`github.com/jonathandroth/pretrends`; specific commit not pinned — the follow-up audit should record the exact revision against which parity is asserted), is NIS-based (`pretrends()`, `slope_for_power()`, `*_NIS` helpers) and does NOT expose a joint-Wald parity target. Any library implementation of joint-Wald PreTrendsPower will need an independent fixture or derivation rather than direct R-package parity.
 - **"Slope-of-best-fit-line t-test" acceptance region**: Table 1 column shows the t-stat for the slope of the linear pre-trend. Paper does not analyze pretests based on this t-stat as a separate acceptance region; library should NOT extrapolate without further reading the `pretrends` package source.
 - **Nonlinear violations**: Section I.C formally tabulates power only against linear violations; Section I.D extends the sign-of-bias result (Proposition 2) to monotone violations under homoskedasticity. Section III ("Practical Recommendations") explicitly endorses power analyses against hypothesized nonlinear trends via the `pretrends` package, so the general nonlinear capability is paper-supported even though the paper does not separately tabulate it. The specific named shapes the library exposes ("constant", "last_period") are R-package API conventions, not separately analyzed in the paper.
 - **Custom delta vector interface**: paper Section III endorses "power analyses for the types of violations of parallel trends deemed to be most relevant in their context," which is the paper-level framing for a user-supplied delta vector; the specific `violation_weights`-style INTERFACE used in the library and the R `pretrends` package is a package-API convention layered on top of that paper-level framework.
@@ -273,5 +273,5 @@ Quoting Roth's key empirical results (for cross-validation):
 - **Heteroskedastic Sigma**: Proposition 2 requires Assumption 1. Once Proposition 1 / Proposition 3 / Proposition 4 computations are added (current `diff_diff/pretrends.py` does Wald-test power/MDV only, not the full conditional-moment path), the library will be able to operate under arbitrary Sigma — but at that point the sign of the bias-amplification effect is NOT guaranteed without Assumption 1. The library should NOT print "pretest amplifies bias under monotone trends" unless Assumption 1 is approximately satisfied (or just always issue the conditional warning).
 - **Equation 4 publication-rules analysis**: not standardly implemented in PreTrendsPower-style tools. Roth notes it as part of the discussion (Section II.D) but does not provide a numerical workflow for users. Library should NOT attempt to implement Equation 4 unless requested.
 - **Connection to `compute_pretrends_power` library helper**: the paper review confirms that "minimum slope detectable at 80% power" is exactly Roth's gamma_{0.8}, and the library helper should compute and surface this. Need to verify the existing helper's calling convention against the paper's framework when auditing `diff_diff/pretrends.py`.
-- **R `pretrends` package version**: paper cites the package at https://github.com/jonathandroth/pretrends; no specific version cited. R-parity work should pin to a specific commit and document.
+- **R `pretrends` package version pin**: paper cites the package at https://github.com/jonathandroth/pretrends; no specific version cited in the paper, and this review has not pinned a commit either. Every API-surface statement in this file that refers to the R `pretrends` package (e.g., the "NIS-based, no joint-Wald parity target" claim above; the `pretrends()` / `slope_for_power()` / `*_NIS` helper references) should be read as "as observed in the audited revision". R-parity work should record the exact commit/release pinned and re-verify these claims at that revision.
 - **Compatibility with multi-cohort estimators**: Remark 1 lists Callaway-Sant'Anna, Sun-Abraham, etc. as compatible. The paper does not detail how to construct (beta_hat, Sigma_hat) from those estimators when the event-study output is multi-cohort (e.g., cohort × event-time matrix). Library should document the aggregation convention (per Sun-Abraham overall ATT or per Callaway-Sant'Anna `aggregate=event`).

From dc518a2805c9a986b75a5ed0d62fd68952c49bf2 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 17 May 2026 06:24:17 -0400
Subject: [PATCH 12/22] Address R11 polish (1 P2 + 2 P3) on roth-2022-review.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

R11 verdict was ✅ Looks good (second consecutive) — addressing remaining
polish items.

- P2 CS/SA covariance-source distinction: R9/R10 Note lumped CS and SA
  together as "even when an event-study VCV is available". That is true
  for CallawaySantAnnaResults (which persists event_study_vcov at
  staggered_results.py:126-128) but NOT for SunAbrahamResults (which
  does not expose an event-study or cohort covariance surface; verified
  in sun_abraham.py:30-88). Split the Note into per-result-type bullets:
  CS diag fallback is a deliberate choice (off-diagonals available);
  SA diag fallback is forced (off-diagonals not persisted on the result
  object, so Roth-faithful off-diagonal support requires upstream surface
  work first).
- P3 unsourced K<=5 / K>10 cutoffs: the "For K <= 5, analytical methods
  are fast. For K > 10, simulation is preferable" guidance in
  Computational Considerations was a hunch, not benchmarked or
  paper-derived. Relabeled as "tentative heuristic (not benchmarked in
  this review and not specified by the paper)" with explicit ask for
  the follow-up audit to benchmark or replace.
- P3 R-package version pin still implicit: R10 fix softened the Joint
  Wald observation to "as observed at the time of this review" but the
  reviewer wanted a more explicit catch-all. Strengthened the Gaps
  bullet to enumerate every R-package surface claim that needs
  re-verification at a pinned revision (NIS-based surface, no joint-
  Wald target, named helpers, nonlinear-trend support claim) and label
  them collectively provisional.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/roth-2022-review.md | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index 14029a19..b0df67e8 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -197,7 +197,7 @@ see "Library design choices" above for the proposed `method` / `n_sim` knobs.
 
 ### Computational Considerations
 - **Truncated MVN moments and probabilities**: scipy.stats has only the univariate case; library options for K > 1 are (a) port `tmvtnorm` (Manjunath-Wilhelm closed-form for orthant moments + Cartinhour 1990 for the rectangular box), (b) Monte Carlo simulation with rejection sampling. Recommend implementing both paths and validating equivalence at alpha-tol = 1e-3 for small K.
-- **Cost**: dominated by the multivariate normal box probability evaluations. For K <= 5, analytical methods are fast. For K > 10, simulation is preferable.
+- **Cost**: dominated by the multivariate normal box probability evaluations. As a *tentative heuristic* (not benchmarked in this review and not specified by the paper), analytical methods are typically fast for small K (e.g., K <= 5) and simulation may become preferable for larger K (e.g., K > 10); the follow-up audit should either benchmark these cutoffs locally or replace them with empirically-derived thresholds.
 - **Root-finding for gamma_p**: monotone function of gamma; use bisection over [0, gamma_max] with gamma_max derived from a univariate upper bound (largest |gamma| at which power = 1).
 - **Memoization**: power and bias share intermediate quantities (truncated MVN moments); cache by gamma.
 
@@ -218,7 +218,12 @@ see "Library design choices" above for the proposed `method` / `n_sim` knobs.
 ### Relation to Existing diff-diff Estimators
 - **Pre-existing `diff_diff/pretrends.py`** (1133 lines) — implements a Wald-test-based pre-trends MDV/power workflow framed around Roth (2022); the current code path computes Wald power/MDV from the pre-period variance-covariance block rather than the full arbitrary-Sigma Proposition 1 / Proposition 3 / Proposition 4 conditional-moment computations. This paper review's main use is to audit the existing surface against the paper's exact equations and identify which Roth-2022 quantities are missing.
 - **Currently composes with** (per the shipped `compute_pretrends_power` adapter in `diff_diff/pretrends.py`): `MultiPeriodDiDResults`, `CallawaySantAnnaResults`, `SunAbrahamResults`. The adapter raises `TypeError` for other result types. Theoretical compatibility extends to any estimator producing an event-study coefficient vector and a consistent variance estimator (e.g., `TwoWayFixedEffects`), but adapters for additional result families are a follow-up audit decision.
-- **Note (deviation in current covariance-source):** the shipped adapter uses different sources for the pre-period covariance depending on the result type. For `MultiPeriodDiDResults` it extracts the full pre-period sub-block from `results.vcov` when `interaction_indices` is populated, falling back to `diag(ses^2)` otherwise (`pretrends.py:592-601`). For `CallawaySantAnnaResults` and `SunAbrahamResults` it hard-codes `vcov = diag(ses^2)` even when an event-study VCV is available (`pretrends.py:609-652` for CS, `pretrends.py:660-687` for SA). The diagonal fallback is a **non-paper approximation**: Roth's power and bias formulas operate on the full pre-period covariance block Σ_22 (the key MDV/power quantity is `w' Σ_22^{-1} w`, which depends on the off-diagonals), and dropping the off-diagonals is NOT a paper-supported numerical choice and is NOT guaranteed to be conservative for MDV/power (the direction of the discrepancy depends on the sign and magnitude of the dropped correlations). The follow-up audit should either extend full-sub-VCV extraction to CS/SA (matching the MultiPeriodDiD path) or, if the staggered diag fallback is retained, add an explicit `REGISTRY.md` Note describing the approximation and its possible miscalibration rather than framing it as conservative.
+- **Note (deviation in current covariance-source):** the shipped adapter uses different sources for the pre-period covariance depending on the result type:
+  - `MultiPeriodDiDResults`: extracts the full pre-period sub-block from `results.vcov` when `interaction_indices` is populated, falling back to `diag(ses^2)` otherwise (`pretrends.py:592-601`).
+  - `CallawaySantAnnaResults`: hard-codes `vcov = diag(ses^2)` (`pretrends.py:609-652`) even though the result object DOES persist an `event_study_vcov` matrix (`staggered_results.py:126-128`). This is a deliberate choice to drop the off-diagonals; the follow-up audit can change it to extract the full sub-VCV from `event_study_vcov`.
+  - `SunAbrahamResults`: hard-codes `vcov = diag(ses^2)` (`pretrends.py:660-687`). Unlike CS, `SunAbrahamResults` does NOT currently expose an event-study or cohort covariance matrix (`sun_abraham.py:30-88`), so the diag fallback is *forced* — Roth-faithful off-diagonal support on the SA path first requires extending `SunAbrahamResults` to persist an event-study/cohort covariance matrix, then routing it through the adapter.
+
+  In all three cases the diagonal fallback is a **non-paper approximation**: Roth's power and bias formulas operate on the full pre-period covariance block Σ_22 (the key MDV/power quantity is `w' Σ_22^{-1} w`, which depends on the off-diagonals), and dropping the off-diagonals is NOT a paper-supported numerical choice and is NOT guaranteed to be conservative for MDV/power (the direction of the discrepancy depends on the sign and magnitude of the dropped correlations). The follow-up audit should either extend full-sub-VCV consumption to all three paths (with SA also requiring upstream surface work) or, if the diag fallback is retained anywhere, add an explicit `REGISTRY.md` Note describing the approximation and its possible miscalibration rather than framing it as conservative.
 - **Complement to `HonestDiD` (Rambachan-Roth 2023)**: Roth 2022 asks "what bias survives a pretest under linear violations?"; Rambachan-Roth 2023 asks "what is the identified set of tau_post under bounded violations?" Both use the same (beta_hat, Sigma_hat) input contract — the library should expose a unified entry-point that can produce both Roth-2022 and HonestDiD reports from one event-study result object.
 - **Shares zero-anticipation convention with HonestDiD**: tau_pre = 0, so beta_pre = delta_pre. Cross-reference the existing `diff_diff/honest_did.py` for the contract.
 
@@ -273,5 +278,11 @@ Quoting Roth's key empirical results (for cross-validation):
 - **Heteroskedastic Sigma**: Proposition 2 requires Assumption 1. Once Proposition 1 / Proposition 3 / Proposition 4 computations are added (current `diff_diff/pretrends.py` does Wald-test power/MDV only, not the full conditional-moment path), the library will be able to operate under arbitrary Sigma — but at that point the sign of the bias-amplification effect is NOT guaranteed without Assumption 1. The library should NOT print "pretest amplifies bias under monotone trends" unless Assumption 1 is approximately satisfied (or just always issue the conditional warning).
 - **Equation 4 publication-rules analysis**: not standardly implemented in PreTrendsPower-style tools. Roth notes it as part of the discussion (Section II.D) but does not provide a numerical workflow for users. Library should NOT attempt to implement Equation 4 unless requested.
 - **Connection to `compute_pretrends_power` library helper**: the paper review confirms that "minimum slope detectable at 80% power" is exactly Roth's gamma_{0.8}, and the library helper should compute and surface this. Need to verify the existing helper's calling convention against the paper's framework when auditing `diff_diff/pretrends.py`.
-- **R `pretrends` package version pin**: paper cites the package at https://github.com/jonathandroth/pretrends; no specific version cited in the paper, and this review has not pinned a commit either. Every API-surface statement in this file that refers to the R `pretrends` package (e.g., the "NIS-based, no joint-Wald parity target" claim above; the `pretrends()` / `slope_for_power()` / `*_NIS` helper references) should be read as "as observed in the audited revision". R-parity work should record the exact commit/release pinned and re-verify these claims at that revision.
+- **R `pretrends` package version pin (provisional)**: paper cites the package at https://github.com/jonathandroth/pretrends; no specific version cited in the paper, and this review has NOT pinned a commit either. **All R-package surface claims in this file are therefore provisional, pending a pinned commit/release.** Specifically, the following statements must be re-verified against a recorded revision before the follow-up audit can rely on them:
+  - "the published R `pretrends` package surface ... is NIS-based" (the Joint Wald bullet above)
+  - "exposes `pretrends()` / `slope_for_power()` / `*_NIS` helpers only, not a joint-Wald interface"
+  - "Section III ... endorses power analyses against hypothesized nonlinear trends via the `pretrends` package"
+  - any reference to specific R-package function names or behavior
+
+  R-parity work should record the exact commit/release pinned and re-verify each of these claims at that revision; either pin a commit now or treat every package-API statement above as "to be verified against a pinned revision".
 - **Compatibility with multi-cohort estimators**: Remark 1 lists Callaway-Sant'Anna, Sun-Abraham, etc. as compatible. The paper does not detail how to construct (beta_hat, Sigma_hat) from those estimators when the event-study output is multi-cohort (e.g., cohort × event-time matrix). Library should document the aggregation convention (per Sun-Abraham overall ATT or per Callaway-Sant'Anna `aggregate=event`).

From 3fa4e47a92d1cc2841fea95c3a53489e9d77aed6 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 17 May 2026 06:31:41 -0400
Subject: [PATCH 13/22] Address R12 polish (1 P2 + 3 P3) on roth-2022-review.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

R12 verdict was ✅ Looks good (third consecutive) — addressing remaining
polish items.

- P2 "uninformative pretests are unambiguously harmful" overstates II.D:
  Roth's Section II.D + Equation (4) say the net publication-screening
  effect is ambiguous, with underpowered pretests being "least effective
  and potentially harmful" — not unambiguously harmful in every
  parameterization. Internally inconsistent with my own L266 summary
  ("net sign depends on which dominates"). Reworded the Publication-bias
  edge-case bullet to match the paper's own framing.
- P3 "adding more pre-periods does NOT help (and can hurt)" overstates
  Section I.D: paper only says additional distant pre-periods may not
  help / may be uninformative for near-treatment shocks. Softened to
  "may not help / may be uninformative" without the active-harm claim.
- P3 gamma_max = "largest |gamma| at which power = 1" is impossible:
  under the normal model power approaches 1 only asymptotically, so
  no finite gamma_max satisfies power(gamma_max) = 1 exactly. Replaced
  with a doubling-expansion + bisection bracketing strategy.
- P3 audit-notes vs registry-candidate boundary unclear: added an
  explicit HTML-comment boundary marker just before "## Implementation
  Notes", retitled that heading to "(audit notes — NOT registry-
  candidate)", and updated the Status banner to call out the boundary
  explicitly so tentative heuristics + provisional R-package claims
  cannot be misread as part of the proposed REGISTRY replacement text.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/papers/roth-2022-review.md | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index b0df67e8..9e58af35 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -12,7 +12,7 @@
 
 **Status: proposed replacement text for a future REGISTRY update; this file is a non-authoritative source audit.** The current `## PreTrendsPower` entry in `docs/methodology/REGISTRY.md` is a populated block framed primarily around a joint-Wald pre-trends test; it remains the **sole authoritative methodology contract** until the follow-up audit PR for `compute_pretrends_power` (the helper function), `PreTrendsPower` (the estimator class), and `PreTrendsPowerResults` (the results container) in `diff_diff/pretrends.py` lands and revises it. The follow-up audit will assess which proposed parameters and capabilities below are already in the shipped surfaces. Current signatures (for reference): the helper `compute_pretrends_power(results, M, alpha, target_power, violation_type, pre_periods)` exposes `alpha`, `target_power`, `violation_type`, `pre_periods` (plus the optional violation magnitude `M`); the class `PreTrendsPower(alpha, power, violation_type, violation_weights)` exposes `alpha`, `power`, `violation_type`, `violation_weights`. The audit will reconcile these two surfaces with each other and against this proposed contract.
 
-*Formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries — once the follow-up audit is ready, the `## PreTrendsPower` section below can replace the existing registry entry.*
+*Formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries — once the follow-up audit is ready, the `## PreTrendsPower` section below can replace the existing registry entry. The registry-candidate text ends just before `## Implementation Notes`; everything below that boundary is **audit notes / implementation ideas** and is NOT part of the proposed registry replacement (it includes tentative heuristics, provisional R-package surface claims, and library design notes that should NOT be copied into REGISTRY.md as normative requirements).*
 
 ## PreTrendsPower
 
@@ -133,11 +133,11 @@ see "Library design choices" above for the proposed `method` / `n_sim` knobs.
 
 *Edge cases (paper-stated):*
 - **Linear vs nonlinear violations**: paper formally analyzes linear trends; Caveats (Section I.D) note results extend to monotone nonlinear violations under homoskedasticity (Proposition 2); arbitrarily nonlinear violations addressed heuristically — bias is worse for exponentially-growing trends, better for log/shallow trends as pre-periods grow
-- **Adding more pretreatment periods**: helps power for linear/log trends, does NOT help (and can hurt) for trends concentrated near treatment (e.g., COVID-19-like shocks)
+- **Adding more pretreatment periods**: helps power for linear/log trends; for trends concentrated near treatment (e.g., COVID-19-like shocks), Section I.D notes additional distant pre-periods may not help / may be uninformative — the paper does not assert that they actively *hurt*.
 - **K = 1 (single pre-period)**: explicit closed-form intuition via univariate truncated normal in proof of Proposition 2: E[beta_hat_pre | beta_hat_pre in B_NIS] - beta_pre proportional to phi(-z_{1-alpha/2} - beta_pre/sigma) - phi(z_{1-alpha/2} - beta_pre/sigma) (= phi(-1.96 - ...) - phi(1.96 - ...) at the paper's default alpha = 0.05)
 - **Symmetric two-sided pretests under parallel trends**: beta_hat_post remains UNBIASED for tau_post (E[beta_hat_pre | beta_hat_pre in B] = 0 if B is symmetric and beta_pre = 0)
 - **Heteroskedastic Sigma (off-diagonal not constant)**: Proposition 2 requires Assumption 1; under arbitrary Sigma, sign of pretest-bias term is ambiguous (worked out in Proposition 1's general form)
-- **Publication-bias trade-off (Equation 4, Section II.D)**: pretest-as-screen can REDUCE or INCREASE published bias depending on Bayes-factor of design type vs the bias-given-publication ratio; uninformative pretests are unambiguously harmful
+- **Publication-bias trade-off (Equation 4, Section II.D)**: pretest-as-screen can REDUCE or INCREASE published bias depending on the Bayes-factor of design type vs the bias-given-publication ratio; the net effect is ambiguous (Equation 4). The paper says underpowered pretests are "least effective and potentially harmful" — i.e., they are the worst-case regime, not unambiguously harmful in every parameterization.
 
 *Algorithm (no numbered algorithm in paper; implementation distilled from Section I.C):*
 
@@ -185,7 +185,13 @@ see "Library design choices" above for the proposed `method` / `n_sim` knobs.
 
 ---
 
-## Implementation Notes
+<!-- ============================================================
+     END of registry-candidate block. Everything BELOW this
+     marker is audit notes / implementation ideas and is NOT
+     part of the proposed REGISTRY.md replacement text.
+     ============================================================ -->
+
+## Implementation Notes (audit notes — NOT registry-candidate)
 
 ### Data Structure Requirements
 - **Input**: beta_hat in R^{K+M} (concatenated pre + post event-study coefficients), Sigma_hat in R^{(K+M) x (K+M)} (variance-covariance matrix), integer K (# pre-period coefficients), integer M (# post-period coefficients)
@@ -198,7 +204,7 @@ see "Library design choices" above for the proposed `method` / `n_sim` knobs.
 ### Computational Considerations
 - **Truncated MVN moments and probabilities**: scipy.stats has only the univariate case; library options for K > 1 are (a) port `tmvtnorm` (Manjunath-Wilhelm closed-form for orthant moments + Cartinhour 1990 for the rectangular box), (b) Monte Carlo simulation with rejection sampling. Recommend implementing both paths and validating equivalence at alpha-tol = 1e-3 for small K.
 - **Cost**: dominated by the multivariate normal box probability evaluations. As a *tentative heuristic* (not benchmarked in this review and not specified by the paper), analytical methods are typically fast for small K (e.g., K <= 5) and simulation may become preferable for larger K (e.g., K > 10); the follow-up audit should either benchmark these cutoffs locally or replace them with empirically-derived thresholds.
-- **Root-finding for gamma_p**: monotone function of gamma; use bisection over [0, gamma_max] with gamma_max derived from a univariate upper bound (largest |gamma| at which power = 1).
+- **Root-finding for gamma_p**: P(reject pretest | gamma) is monotone in |gamma|. Under the normal model power approaches 1 only asymptotically, so there is no finite gamma_max at which power equals 1 exactly. Use a doubling expansion (start with a univariate-derived gamma_high; double until P(reject pretest | gamma_high) >= target_power + tolerance), then bisect over [0, gamma_high] to find gamma_p.
 - **Memoization**: power and bias share intermediate quantities (truncated MVN moments); cache by gamma.
 
 ### Tuning Parameters

From 210fc41e022397f138525c8e7e3549faad3c4658 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 17 May 2026 06:40:53 -0400
Subject: [PATCH 14/22] Address R13 polish (1 P2 + 2 P3) on roth-2022-review.md
 + sibling updates
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

R13 verdict was ✅ Looks good (4th consecutive) — addressing remaining
polish items.

- P2 Wald vs NIS power-object conflation: the CS/SA covariance-source
  Note said "the key MDV/power quantity is w' Σ_22^{-1} w". That scalar
  is the current library's Wald / noncentral-χ² object, not Roth's
  paper-analyzed NIS power object. Roth's NIS power is the multivariate
  box probability P(β̂_pre ∈ B_NIS(Σ)) computed under
  β̂_pre ~ N(δ_pre, Σ_22), and Proposition 1's pretest-bias term + Proposition
  3's conditional-variance term rely on truncated-MVN moments. Reworded
  the Note to separate the two surfaces: Roth's NIS box probability and
  the current library's Wald form are BOTH non-conservative under the
  diag fallback, but for different reasons, and the audit needs to pick
  one of them as the surface to promote.
- P3 scipy.stats over-restriction: replaced "scipy.stats has only the
  univariate case" with the accurate distinction —
  `scipy.stats.multivariate_normal.cdf` does cover box probabilities,
  but SciPy lacks a `tmvtnorm`-equivalent for the truncated-MVN MOMENTS
  that Propositions 1 + 3 need.
- P3 discoverability links (sibling updates, follows Bacon precedent at
  REGISTRY.md:2611):
  * REGISTRY.md `## PreTrendsPower` entry now includes
    `Paper review on file: docs/methodology/papers/roth-2022-review.md`
    inline with the primary source citation, with explicit note that
    REGISTRY remains authoritative until the follow-up audit lands.
  * METHODOLOGY_REVIEW.md PreTrendsPower section's "Documentation in
    place" list now records the paper review with the 2026-05-17 date;
    "Outstanding for promotion" has the paper-review bullet removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 METHODOLOGY_REVIEW.md                       | 2 +-
 docs/methodology/REGISTRY.md                | 2 +-
 docs/methodology/papers/roth-2022-review.md | 8 ++++++--
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md
index 548b181f..36661161 100644
--- a/METHODOLOGY_REVIEW.md
+++ b/METHODOLOGY_REVIEW.md
@@ -1053,9 +1053,9 @@ and covariate-adjusted specifications.)
 **Documentation in place:**
 - REGISTRY.md section: `## PreTrendsPower` (MDV at target power, four violation types — linear/constant/last_period/custom, power curve plotting, HonestDiD integration)
 - Implementation: `tests/test_pretrends.py` (point-estimator, MDV, power curve, sensitivity) plus event-study coverage in `tests/test_pretrends_event_study.py`
+- Paper review on file: `docs/methodology/papers/roth-2022-review.md` (added 2026-05-17; non-authoritative source audit — registry entry remains authoritative until the follow-up audit PR)
 
 **Outstanding for promotion:**
-- Paper review under `docs/methodology/papers/roth-2022-review.md`
 - Dedicated `tests/test_methodology_pretrends.py` with paper-equation-numbered Verified Components walk-through
 - R parity fixture against the `pretrends` R package (the four power calculations: linear, constant, last-period, custom)
 - Verify the REGISTRY Implementation Checklist (all four items currently unchecked)
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 478716b8..762d07c0 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -2761,7 +2761,7 @@ CRITICAL: δ_pre = β_pre pins pre-treatment violations to observed coefficients
 
 ## PreTrendsPower
 
-**Primary source:** [Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. *American Economic Review: Insights*, 4(3), 305-322.](https://doi.org/10.1257/aeri.20210236)
+**Primary source:** [Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. *American Economic Review: Insights*, 4(3), 305-322.](https://doi.org/10.1257/aeri.20210236). Paper review on file: `docs/methodology/papers/roth-2022-review.md` (non-authoritative source audit; this REGISTRY entry remains the authoritative methodology contract).
 
 **Key implementation requirements:**
 
diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index 9e58af35..ce5eebcb 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -202,7 +202,7 @@ see "Library design choices" above for the proposed `method` / `n_sim` knobs.
 - Compatible with the result classes of: MultiPeriodDiD (event study), CallawaySantAnna (staggered), SunAbraham (interaction-weighted), Freyaldenhoven-Hansen-Shapiro (covariate-based)
 
 ### Computational Considerations
-- **Truncated MVN moments and probabilities**: scipy.stats has only the univariate case; library options for K > 1 are (a) port `tmvtnorm` (Manjunath-Wilhelm closed-form for orthant moments + Cartinhour 1990 for the rectangular box), (b) Monte Carlo simulation with rejection sampling. Recommend implementing both paths and validating equivalence at alpha-tol = 1e-3 for small K.
+- **Truncated MVN moments and probabilities**: `scipy.stats.multivariate_normal.cdf` covers MVN box probabilities `P(β̂_pre ∈ B_NIS(Σ))`, but SciPy lacks a `tmvtnorm`-equivalent API for the truncated-MVN moments (`E[β̂_pre | β̂_pre ∈ B(Σ)]` and `Var[β̂_pre | β̂_pre ∈ B(Σ)]`) that Proposition 1's pretest-bias term and Proposition 3's conditional-variance term require. Library options for those moments are (a) port `tmvtnorm` (Manjunath-Wilhelm closed-form for orthant moments + Cartinhour 1990 for the rectangular box), (b) Monte Carlo simulation with rejection sampling. Recommend implementing both paths and validating equivalence at alpha-tol = 1e-3 for small K.
 - **Cost**: dominated by the multivariate normal box probability evaluations. As a *tentative heuristic* (not benchmarked in this review and not specified by the paper), analytical methods are typically fast for small K (e.g., K <= 5) and simulation may become preferable for larger K (e.g., K > 10); the follow-up audit should either benchmark these cutoffs locally or replace them with empirically-derived thresholds.
 - **Root-finding for gamma_p**: P(reject pretest | gamma) is monotone in |gamma|. Under the normal model power approaches 1 only asymptotically, so there is no finite gamma_max at which power equals 1 exactly. Use a doubling expansion (start with a univariate-derived gamma_high; double until P(reject pretest | gamma_high) >= target_power + tolerance), then bisect over [0, gamma_high] to find gamma_p.
 - **Memoization**: power and bias share intermediate quantities (truncated MVN moments); cache by gamma.
@@ -229,7 +229,11 @@ see "Library design choices" above for the proposed `method` / `n_sim` knobs.
   - `CallawaySantAnnaResults`: hard-codes `vcov = diag(ses^2)` (`pretrends.py:609-652`) even though the result object DOES persist an `event_study_vcov` matrix (`staggered_results.py:126-128`). This is a deliberate choice to drop the off-diagonals; the follow-up audit can change it to extract the full sub-VCV from `event_study_vcov`.
   - `SunAbrahamResults`: hard-codes `vcov = diag(ses^2)` (`pretrends.py:660-687`). Unlike CS, `SunAbrahamResults` does NOT currently expose an event-study or cohort covariance matrix (`sun_abraham.py:30-88`), so the diag fallback is *forced* — Roth-faithful off-diagonal support on the SA path first requires extending `SunAbrahamResults` to persist an event-study/cohort covariance matrix, then routing it through the adapter.
 
-  In all three cases the diagonal fallback is a **non-paper approximation**: Roth's power and bias formulas operate on the full pre-period covariance block Σ_22 (the key MDV/power quantity is `w' Σ_22^{-1} w`, which depends on the off-diagonals), and dropping the off-diagonals is NOT a paper-supported numerical choice and is NOT guaranteed to be conservative for MDV/power (the direction of the discrepancy depends on the sign and magnitude of the dropped correlations). The follow-up audit should either extend full-sub-VCV consumption to all three paths (with SA also requiring upstream surface work) or, if the diag fallback is retained anywhere, add an explicit `REGISTRY.md` Note describing the approximation and its possible miscalibration rather than framing it as conservative.
+  In all three cases the diagonal fallback is a **non-paper approximation**, and the impact differs by which power object the audit chooses to surface:
+  - **Roth's NIS power object** (paper-analyzed): the multivariate box probability `P(β̂_pre ∈ B_NIS(Σ))` computed under `β̂_pre ~ N(δ_pre, Σ_22)`. This box probability genuinely depends on the off-diagonals of Σ_22; replacing Σ_22 with `diag(ses^2)` treats the pre-periods as independent and changes the rejection probability in a sign-and-magnitude-dependent way (not provably conservative).
+  - **Current library Wald object** (in shipped `compute_pretrends_power`): a Wald / noncentral-χ² calculation involving the quadratic form `w' Σ_22^{-1} w`. Replacing Σ_22 with `diag(ses^2)` ignores the off-diagonal correlations of the score vector and is similarly not provably conservative for MDV/power.
+
+  In neither object is the diag fallback a paper-supported numerical choice. The follow-up audit should either extend full-sub-VCV consumption to all three paths (with SA also requiring upstream surface work) or, if the diag fallback is retained anywhere, add an explicit `REGISTRY.md` Note describing the approximation and its possible miscalibration rather than framing it as conservative.
 - **Complement to `HonestDiD` (Rambachan-Roth 2023)**: Roth 2022 asks "what bias survives a pretest under linear violations?"; Rambachan-Roth 2023 asks "what is the identified set of tau_post under bounded violations?" Both use the same (beta_hat, Sigma_hat) input contract — the library should expose a unified entry-point that can produce both Roth-2022 and HonestDiD reports from one event-study result object.
 - **Shares zero-anticipation convention with HonestDiD**: tau_pre = 0, so beta_pre = delta_pre. Cross-reference the existing `diff_diff/honest_did.py` for the contract.
 

From 173d7f05fe80e310fa17e4b0ef78d12b967e01c7 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 17 May 2026 06:48:31 -0400
Subject: [PATCH 15/22] Address R14 review (1 P1 + 1 P2 + 1 P3)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

R13's REGISTRY pointer pulled the new audit's diag-fallback finding into
authoritative scope; R14 flagged that the authoritative `## PreTrendsPower`
REGISTRY entry didn't yet document the deviation. Adding the inline
documentation per the CLAUDE.md "Documenting Deviations" convention.

- P1 REGISTRY missing `**Note (deviation from paper):**` for diag-VCV
  fallback: added a Note inside the `## PreTrendsPower` Edge cases block
  documenting all three result-type paths (MultiPeriodDiD: full sub-VCV
  when interaction_indices populated, diag otherwise; CS: diag fallback
  even with event_study_vcov available — deliberate; SA: diag fallback
  forced — SunAbrahamResults has no event-study VCV surface). The Note
  cross-references the paper review file and the new TODO entries.
- P2 `compute_pretrends_power(..., violation_type="custom")` is broken
  today: added an explicit "**Helper/class API gap observed today**"
  call-out in the Status block of the paper review, naming the missing
  `violation_weights` parameter on the helper and clarifying that
  `"custom"` is class-only today.
- P3 R-package version pin + diag-fallback follow-up: added three TODO.md
  rows under "Methodology/Correctness":
  * Diag fallback follow-up (Medium) — route CS through event_study_vcov +
    extend SunAbrahamResults with event-study/cohort VCV, or formally
    retain diag with miscalibration framing.
  * R pretrends commit pin (Low) — record the audited revision before
    building parity fixture.
  * compute_pretrends_power "custom" helper gap (Low) — add
    violation_weights to helper signature or document helper as supporting
    only linear/constant/last_period.

All three TODO rows tagged "PR-A (Roth paper review, 2026-05-17)" for
provenance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 TODO.md                                     | 3 +++
 docs/methodology/REGISTRY.md                | 7 +++++++
 docs/methodology/papers/roth-2022-review.md | 2 +-
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/TODO.md b/TODO.md
index b1079ad3..c83448c6 100644
--- a/TODO.md
+++ b/TODO.md
@@ -94,6 +94,9 @@ Deferred items from PR reviews that were not addressed before merge.
 | WooldridgeDiD: aggregation weights use cell-level n_{g,t} counts. Paper (W2025 Eqs. 7.2-7.4) defines cohort-share weights. Add optional `weights="cohort_share"` parameter to `aggregate()`. | `wooldridge_results.py` | #216 | Medium |
 | WooldridgeDiD: optional *efficiency hint* (NOT a canonical-link violation per W2023 Prop 3.1) when method/outcome pairing is sub-optimal — e.g., `method="ols"` on binary data is consistent under QMLE, but `method="logit"` is typically more efficient. The original framing in this row as a "canonical link requirement" tied to Prop 3.1 was incorrect: Wooldridge (2023) Table 1 lists Gaussian/OLS for "any response" and logistic-Bernoulli for "binary OR fractional". A useful hint exists (efficiency), but should not be framed as a methodology violation. See PR #453 R1 review for the corrected reading. | `wooldridge.py` | #216 | Low |
 | WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |
+| PreTrendsPower: `compute_pretrends_power` adapter uses `diag(ses^2)` instead of the full pre-period covariance block Σ_22 for `CallawaySantAnnaResults` (deliberate — CS persists `event_study_vcov`) and `SunAbrahamResults` (forced — SA does not expose an event-study/cohort VCV). Roth (2022)'s NIS box probability and the library's Wald object both depend on Σ_22 off-diagonals; diag fallback is not provably conservative. Either route CS through `event_study_vcov` + extend `SunAbrahamResults` to persist a cohort/event-study VCV (then route SA likewise) or formally retain the diag fallback with explicit miscalibration framing. See REGISTRY.md `## PreTrendsPower` Note (deviation from paper) + `docs/methodology/papers/roth-2022-review.md`. | `diff_diff/pretrends.py:609-687`, `diff_diff/sun_abraham.py:30-88`, `docs/methodology/REGISTRY.md`, `docs/methodology/papers/roth-2022-review.md` | PR-A (Roth paper review, 2026-05-17) | Medium |
+| PreTrendsPower: pin the R `pretrends` package commit/release before building the R-parity fixture. The paper review's R-package surface claims (`pretrends()`, `slope_for_power()`, NIS-only API, no joint-Wald target) are provisional pending a pinned revision; the audited revision should be recorded either in the review file's Gaps section or in this TODO row before any parity assertions are committed. | `docs/methodology/papers/roth-2022-review.md`, `METHODOLOGY_REVIEW.md` (PreTrendsPower row) | PR-A (Roth paper review, 2026-05-17) | Low |
+| PreTrendsPower: helper `compute_pretrends_power(results, M, alpha, target_power, violation_type, pre_periods)` does NOT accept `violation_weights`, so `violation_type="custom"` is unusable from the helper (class-only today via `PreTrendsPower(..., violation_weights=...)`). Either add `violation_weights` to the helper signature and forward to the class, or document the helper as supporting only `linear` / `constant` / `last_period`. | `diff_diff/pretrends.py:1048-1095, 442-466` | PR-A (Roth paper review, 2026-05-17) | Low |
 | Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 8 standalone estimators that expose `cluster=`: `CallawaySantAnna`, `SunAbraham`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added `vcov_type` to the `DifferenceInDifferences` inheritance chain only. | multiple | Phase 1a | Medium |
 | Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster) currently raises `NotImplementedError`. `_compute_bm_dof_from_contrasts` builds its hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but `solve_ols` solves the WLS problem by transforming to `X* = sqrt(w) X`, so the correct symmetric idempotent residual-maker is `M* = I - sqrt(W) X (X'WX)^{-1} X' sqrt(W)`. Rederive the Satterthwaite `(tr G)^2 / tr(G^2)` ratio on the transformed design and add weighted parity tests before lifting the guard. | `linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_validate_vcov_args` | Phase 1a | Medium |
 | HC2 / HC2 + Bell-McCaffrey on absorbed-FE fits — REMAINING sub-gates: `TwoWayFixedEffects` (`twfe.py:154` rejects unconditionally); `MultiPeriodDiD(absorb=..., vcov_type in {"hc2","hc2_bm"})` (`estimators.py:1458` rejects). The DiD sub-gate (`DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"})`) was lifted via auto-route to `fixed_effects=` internally; clubSandwich-parity at 1e-10 verified. The same auto-route pattern can apply to MPD-absorb; TWFE is its own class and may need different surgery (TWFE always within-transforms with no equivalent `fixed_effects=` path). Within-transformation preserves coefficients and residuals under FWL but not the hat matrix; HC1/CR1 are unaffected (no leverage term). | `twfe.py::fit`, `estimators.py::MultiPeriodDiD.fit` | follow-up | Medium |
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 762d07c0..c7c6dac1 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -2802,6 +2802,13 @@ Violation types:
 - Single pre-period: power calculation trivial
 - Very high power: MDV approaches zero
 
+- **Note (deviation from paper — diagonal pre-period VCV fallback):** Roth (2022)'s power and bias objects (both the paper-analyzed NIS box probability and the library's Wald / noncentral-χ² form) operate on the full pre-period covariance block Σ_22. The shipped `compute_pretrends_power` adapter currently uses different sources for the pre-period covariance by result type:
+  - `MultiPeriodDiDResults` (`pretrends.py:592-601`): extracts the full pre-period sub-block from `results.vcov` when `interaction_indices` is populated; falls back to `diag(ses^2)` otherwise.
+  - `CallawaySantAnnaResults` (`pretrends.py:609-652`): hard-codes `vcov = diag(ses^2)` even though `staggered_results.py:126-128` persists a full `event_study_vcov` matrix — the diag fallback is a deliberate choice here.
+  - `SunAbrahamResults` (`pretrends.py:660-687`): hard-codes `vcov = diag(ses^2)`; the diag fallback is *forced* because `SunAbrahamResults` does not currently expose an event-study or cohort covariance matrix.
+
+  Dropping the off-diagonals is NOT a paper-supported numerical choice and is NOT guaranteed to be conservative for MDV/power (the direction of the discrepancy depends on the sign and magnitude of the dropped correlations). The PR-B follow-up audit (tracked in `TODO.md`) will either extend full-sub-VCV consumption to all three paths (with SA also requiring upstream surface work on `SunAbrahamResults`) or formally retain the diag fallback with explicit miscalibration framing. See `docs/methodology/papers/roth-2022-review.md` for the full derivation.
+
 **Reference implementation(s):**
 - R: `pretrends` package (Roth's official package)
 
diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index ce5eebcb..f70b845c 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -10,7 +10,7 @@
 
 ## Methodology Registry Entry
 
-**Status: proposed replacement text for a future REGISTRY update; this file is a non-authoritative source audit.** The current `## PreTrendsPower` entry in `docs/methodology/REGISTRY.md` is a populated block framed primarily around a joint-Wald pre-trends test; it remains the **sole authoritative methodology contract** until the follow-up audit PR for `compute_pretrends_power` (the helper function), `PreTrendsPower` (the estimator class), and `PreTrendsPowerResults` (the results container) in `diff_diff/pretrends.py` lands and revises it. The follow-up audit will assess which proposed parameters and capabilities below are already in the shipped surfaces. Current signatures (for reference): the helper `compute_pretrends_power(results, M, alpha, target_power, violation_type, pre_periods)` exposes `alpha`, `target_power`, `violation_type`, `pre_periods` (plus the optional violation magnitude `M`); the class `PreTrendsPower(alpha, power, violation_type, violation_weights)` exposes `alpha`, `power`, `violation_type`, `violation_weights`. The audit will reconcile these two surfaces with each other and against this proposed contract.
+**Status: proposed replacement text for a future REGISTRY update; this file is a non-authoritative source audit.** The current `## PreTrendsPower` entry in `docs/methodology/REGISTRY.md` is a populated block framed primarily around a joint-Wald pre-trends test; it remains the **sole authoritative methodology contract** until the follow-up audit PR for `compute_pretrends_power` (the helper function), `PreTrendsPower` (the estimator class), and `PreTrendsPowerResults` (the results container) in `diff_diff/pretrends.py` lands and revises it. The follow-up audit will assess which proposed parameters and capabilities below are already in the shipped surfaces. Current signatures (for reference): the helper `compute_pretrends_power(results, M, alpha, target_power, violation_type, pre_periods)` exposes `alpha`, `target_power`, `violation_type`, `pre_periods` (plus the optional violation magnitude `M`); the class `PreTrendsPower(alpha, power, violation_type, violation_weights)` exposes `alpha`, `power`, `violation_type`, `violation_weights`. **Helper/class API gap observed today:** the helper does NOT accept or forward `violation_weights`, so calling `compute_pretrends_power(..., violation_type="custom")` cannot supply a custom weight vector — the `"custom"` path is class-only today. The audit will reconcile these two surfaces with each other and against this proposed contract.
 
 *Formatted to match docs/methodology/REGISTRY.md structure. Heading levels and labels align with existing entries — once the follow-up audit is ready, the `## PreTrendsPower` section below can replace the existing registry entry. The registry-candidate text ends just before `## Implementation Notes`; everything below that boundary is **audit notes / implementation ideas** and is NOT part of the proposed registry replacement (it includes tentative heuristics, provisional R-package surface claims, and library design notes that should NOT be copied into REGISTRY.md as normative requirements).*
 

From 91852047510d010ea679210a9345524005876e38 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 17 May 2026 06:53:08 -0400
Subject: [PATCH 16/22] Address R15 polish (2 P3) on Roth audit suite
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

R15 verdict was ✅ Looks good (5th overall ✅) — addressing remaining polish.

- P3 CS event_study_vcov availability stated too broadly: previous wording
  in REGISTRY.md, roth-2022-review.md, and TODO.md said
  CallawaySantAnnaResults "persists event_study_vcov", but bootstrap CS
  fits explicitly clear that matrix at staggered.py:2032-2036 (to prevent
  mixing analytical VCV with bootstrap SEs). Qualified all three surfaces
  to split non-bootstrap (persists event_study_vcov, diag fallback is a
  deliberate choice) vs bootstrap (clears event_study_vcov, diag fallback
  is the only path) CS paths.
- P3 "four power calculations" parity target underspecified: the
  PreTrendsPower row in METHODOLOGY_REVIEW.md said future R parity should
  cover "the four power calculations" without acknowledging that
  `compute_pretrends_power(..., violation_type="custom")` is unusable from
  the helper today (helper does not accept violation_weights). Added
  clarification that "custom" parity has to run through PreTrendsPower
  directly until the helper is extended; helper-only parity is limited to
  linear/constant/last_period.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 METHODOLOGY_REVIEW.md                       | 2 +-
 TODO.md                                     | 2 +-
 docs/methodology/REGISTRY.md                | 2 +-
 docs/methodology/papers/roth-2022-review.md | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md
index 36661161..c55cbd24 100644
--- a/METHODOLOGY_REVIEW.md
+++ b/METHODOLOGY_REVIEW.md
@@ -1057,7 +1057,7 @@ and covariate-adjusted specifications.)
 
 **Outstanding for promotion:**
 - Dedicated `tests/test_methodology_pretrends.py` with paper-equation-numbered Verified Components walk-through
-- R parity fixture against the `pretrends` R package (the four power calculations: linear, constant, last-period, custom)
+- R parity fixture against the `pretrends` R package (the four power calculations: linear, constant, last-period, custom). Note that `compute_pretrends_power` does not accept `violation_weights` today, so `"custom"` parity has to run through `PreTrendsPower(..., violation_weights=...)` directly until the helper is extended (TODO.md tracks the helper-extension follow-up); helper-only parity is limited to `linear` / `constant` / `last_period`.
 - Verify the REGISTRY Implementation Checklist (all four items currently unchecked)
 
 ---
diff --git a/TODO.md b/TODO.md
index c83448c6..e3db83c2 100644
--- a/TODO.md
+++ b/TODO.md
@@ -94,7 +94,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | WooldridgeDiD: aggregation weights use cell-level n_{g,t} counts. Paper (W2025 Eqs. 7.2-7.4) defines cohort-share weights. Add optional `weights="cohort_share"` parameter to `aggregate()`. | `wooldridge_results.py` | #216 | Medium |
 | WooldridgeDiD: optional *efficiency hint* (NOT a canonical-link violation per W2023 Prop 3.1) when method/outcome pairing is sub-optimal — e.g., `method="ols"` on binary data is consistent under QMLE, but `method="logit"` is typically more efficient. The original framing in this row as a "canonical link requirement" tied to Prop 3.1 was incorrect: Wooldridge (2023) Table 1 lists Gaussian/OLS for "any response" and logistic-Bernoulli for "binary OR fractional". A useful hint exists (efficiency), but should not be framed as a methodology violation. See PR #453 R1 review for the corrected reading. | `wooldridge.py` | #216 | Low |
 | WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |
-| PreTrendsPower: `compute_pretrends_power` adapter uses `diag(ses^2)` instead of the full pre-period covariance block Σ_22 for `CallawaySantAnnaResults` (deliberate — CS persists `event_study_vcov`) and `SunAbrahamResults` (forced — SA does not expose an event-study/cohort VCV). Roth (2022)'s NIS box probability and the library's Wald object both depend on Σ_22 off-diagonals; diag fallback is not provably conservative. Either route CS through `event_study_vcov` + extend `SunAbrahamResults` to persist a cohort/event-study VCV (then route SA likewise) or formally retain the diag fallback with explicit miscalibration framing. See REGISTRY.md `## PreTrendsPower` Note (deviation from paper) + `docs/methodology/papers/roth-2022-review.md`. | `diff_diff/pretrends.py:609-687`, `diff_diff/sun_abraham.py:30-88`, `docs/methodology/REGISTRY.md`, `docs/methodology/papers/roth-2022-review.md` | PR-A (Roth paper review, 2026-05-17) | Medium |
+| PreTrendsPower: `compute_pretrends_power` adapter uses `diag(ses^2)` instead of the full pre-period covariance block Σ_22 for `CallawaySantAnnaResults` (deliberate — non-bootstrap CS persists `event_study_vcov`; bootstrap CS fits clear it at `staggered.py:2032-2036`) and `SunAbrahamResults` (forced — SA does not expose an event-study/cohort VCV at all). Roth (2022)'s NIS box probability and the library's Wald object both depend on Σ_22 off-diagonals; diag fallback is not provably conservative. For non-bootstrap CS fits, route through `event_study_vcov`; for bootstrap CS fits the diag fallback is the only path. For SA, extend `SunAbrahamResults` to persist a cohort/event-study VCV (then route the adapter likewise). Or formally retain the diag fallback with explicit miscalibration framing. See REGISTRY.md `## PreTrendsPower` Note (deviation from paper) + `docs/methodology/papers/roth-2022-review.md`. | `diff_diff/pretrends.py:609-687`, `diff_diff/sun_abraham.py:30-88`, `docs/methodology/REGISTRY.md`, `docs/methodology/papers/roth-2022-review.md` | PR-A (Roth paper review, 2026-05-17) | Medium |
 | PreTrendsPower: pin the R `pretrends` package commit/release before building the R-parity fixture. The paper review's R-package surface claims (`pretrends()`, `slope_for_power()`, NIS-only API, no joint-Wald target) are provisional pending a pinned revision; the audited revision should be recorded either in the review file's Gaps section or in this TODO row before any parity assertions are committed. | `docs/methodology/papers/roth-2022-review.md`, `METHODOLOGY_REVIEW.md` (PreTrendsPower row) | PR-A (Roth paper review, 2026-05-17) | Low |
 | PreTrendsPower: helper `compute_pretrends_power(results, M, alpha, target_power, violation_type, pre_periods)` does NOT accept `violation_weights`, so `violation_type="custom"` is unusable from the helper (class-only today via `PreTrendsPower(..., violation_weights=...)`). Either add `violation_weights` to the helper signature and forward to the class, or document the helper as supporting only `linear` / `constant` / `last_period`. | `diff_diff/pretrends.py:1048-1095, 442-466` | PR-A (Roth paper review, 2026-05-17) | Low |
 | Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 8 standalone estimators that expose `cluster=`: `CallawaySantAnna`, `SunAbraham`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added `vcov_type` to the `DifferenceInDifferences` inheritance chain only. | multiple | Phase 1a | Medium |
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index c7c6dac1..a6970b0f 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -2804,7 +2804,7 @@ Violation types:
 
 - **Note (deviation from paper — diagonal pre-period VCV fallback):** Roth (2022)'s power and bias objects (both the paper-analyzed NIS box probability and the library's Wald / noncentral-χ² form) operate on the full pre-period covariance block Σ_22. The shipped `compute_pretrends_power` adapter currently uses different sources for the pre-period covariance by result type:
   - `MultiPeriodDiDResults` (`pretrends.py:592-601`): extracts the full pre-period sub-block from `results.vcov` when `interaction_indices` is populated; falls back to `diag(ses^2)` otherwise.
-  - `CallawaySantAnnaResults` (`pretrends.py:609-652`): hard-codes `vcov = diag(ses^2)` even though `staggered_results.py:126-128` persists a full `event_study_vcov` matrix — the diag fallback is a deliberate choice here.
+  - `CallawaySantAnnaResults` (`pretrends.py:609-652`): hard-codes `vcov = diag(ses^2)`. Non-bootstrap CS fits persist a full `event_study_vcov` matrix (`staggered_results.py:126-128`), so the diag fallback is a deliberate choice in that path. Bootstrap CS fits clear `event_study_vcov` before storing results (`staggered.py:2032-2036`) to prevent mixing analytical VCV with bootstrap SEs, so the full-Σ22 route is not available for bootstrap fits at all.
   - `SunAbrahamResults` (`pretrends.py:660-687`): hard-codes `vcov = diag(ses^2)`; the diag fallback is *forced* because `SunAbrahamResults` does not currently expose an event-study or cohort covariance matrix.
 
   Dropping the off-diagonals is NOT a paper-supported numerical choice and is NOT guaranteed to be conservative for MDV/power (the direction of the discrepancy depends on the sign and magnitude of the dropped correlations). The PR-B follow-up audit (tracked in `TODO.md`) will either extend full-sub-VCV consumption to all three paths (with SA also requiring upstream surface work on `SunAbrahamResults`) or formally retain the diag fallback with explicit miscalibration framing. See `docs/methodology/papers/roth-2022-review.md` for the full derivation.
diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index f70b845c..fd9262b0 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -226,7 +226,7 @@ see "Library design choices" above for the proposed `method` / `n_sim` knobs.
 - **Currently composes with** (per the shipped `compute_pretrends_power` adapter in `diff_diff/pretrends.py`): `MultiPeriodDiDResults`, `CallawaySantAnnaResults`, `SunAbrahamResults`. The adapter raises `TypeError` for other result types. Theoretical compatibility extends to any estimator producing an event-study coefficient vector and a consistent variance estimator (e.g., `TwoWayFixedEffects`), but adapters for additional result families are a follow-up audit decision.
 - **Note (deviation in current covariance-source):** the shipped adapter uses different sources for the pre-period covariance depending on the result type:
   - `MultiPeriodDiDResults`: extracts the full pre-period sub-block from `results.vcov` when `interaction_indices` is populated, falling back to `diag(ses^2)` otherwise (`pretrends.py:592-601`).
-  - `CallawaySantAnnaResults`: hard-codes `vcov = diag(ses^2)` (`pretrends.py:609-652`) even though the result object DOES persist an `event_study_vcov` matrix (`staggered_results.py:126-128`). This is a deliberate choice to drop the off-diagonals; the follow-up audit can change it to extract the full sub-VCV from `event_study_vcov`.
+  - `CallawaySantAnnaResults`: hard-codes `vcov = diag(ses^2)` (`pretrends.py:609-652`). Non-bootstrap CS fits persist a full `event_study_vcov` matrix on the result object (`staggered_results.py:126-128`), so the diag fallback is a deliberate choice in that path. Bootstrap CS fits explicitly clear `event_study_vcov` before storing results (`staggered.py:2032-2036`) to prevent mixing analytical VCV with bootstrap SEs, so the full-Σ22 route is not available for bootstrap fits at all. The follow-up audit can extract the full sub-VCV from `event_study_vcov` on the non-bootstrap path; bootstrap CS fits remain on the diag fallback regardless.
   - `SunAbrahamResults`: hard-codes `vcov = diag(ses^2)` (`pretrends.py:660-687`). Unlike CS, `SunAbrahamResults` does NOT currently expose an event-study or cohort covariance matrix (`sun_abraham.py:30-88`), so the diag fallback is *forced* — Roth-faithful off-diagonal support on the SA path first requires extending `SunAbrahamResults` to persist an event-study/cohort covariance matrix, then routing it through the adapter.
 
   In all three cases the diagonal fallback is a **non-paper approximation**, and the impact differs by which power object the audit chooses to surface:

From a6f11d8c79744ac017137df6d5f9bb9e540bcd5d Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 17 May 2026 06:58:58 -0400
Subject: [PATCH 17/22] Address R16 polish (2 P3-informational, 2 no-action) on
 PreTrendsPower
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

R16 verdict was ✅ Looks good (6th overall ✅), with all findings classified
P3-informational. Two of the four findings ("None in this PR" / "TODO rows
are correct") required no action; addressing the two with concrete polish
suggestions.

- P3 helper docstring: `compute_pretrends_power` and `compute_mdv` both
  accept `violation_type` but don't accept `violation_weights`, so
  `violation_type="custom"` is unusable from either helper today. Added
  an explicit Note in the `violation_type` docstring entry of both
  convenience functions pointing users to instantiate `PreTrendsPower`
  directly for custom weights, and cross-referencing the TODO row that
  tracks the helper-extension follow-up.
- P3 METHODOLOGY_REVIEW.md "four power calculations" target: now reads
  "at a pinned revision" with cross-reference to the TODO row that
  tracks the R-package revision pin. Until that lands, the R-package
  surface claims in the paper review remain provisional.

Pyright diagnostics in pretrends.py (matplotlib import, numpy/list type
conflicts, ndarray-to-tuple coercion) are pre-existing in code I did not
touch; my edits are docstring-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 METHODOLOGY_REVIEW.md  |  2 +-
 diff_diff/pretrends.py | 14 ++++++++++++--
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md
index c55cbd24..ffe4f720 100644
--- a/METHODOLOGY_REVIEW.md
+++ b/METHODOLOGY_REVIEW.md
@@ -1057,7 +1057,7 @@ and covariate-adjusted specifications.)
 
 **Outstanding for promotion:**
 - Dedicated `tests/test_methodology_pretrends.py` with paper-equation-numbered Verified Components walk-through
-- R parity fixture against the `pretrends` R package (the four power calculations: linear, constant, last-period, custom). Note that `compute_pretrends_power` does not accept `violation_weights` today, so `"custom"` parity has to run through `PreTrendsPower(..., violation_weights=...)` directly until the helper is extended (TODO.md tracks the helper-extension follow-up); helper-only parity is limited to `linear` / `constant` / `last_period`.
+- R parity fixture against the `pretrends` R package at a **pinned revision** (TODO.md tracks the revision-pin follow-up; until that lands, the R-package surface claims in `docs/methodology/papers/roth-2022-review.md` are provisional). Covers the four power calculations: linear, constant, last-period, custom. Note that `compute_pretrends_power` does not accept `violation_weights` today, so `"custom"` parity has to run through `PreTrendsPower(..., violation_weights=...)` directly until the helper is extended (TODO.md tracks the helper-extension follow-up); helper-only parity is limited to `linear` / `constant` / `last_period`.
 - Verify the REGISTRY Implementation Checklist (all four items currently unchecked)
 
 ---
diff --git a/diff_diff/pretrends.py b/diff_diff/pretrends.py
index b249cef6..a4fcc40f 100644
--- a/diff_diff/pretrends.py
+++ b/diff_diff/pretrends.py
@@ -1067,7 +1067,12 @@ def compute_pretrends_power(
     target_power : float, default=0.80
         Target power for MDV calculation.
     violation_type : str, default='linear'
-        Type of violation pattern.
+        Type of violation pattern. Note: this convenience helper does NOT
+        accept ``violation_weights``, so ``violation_type='custom'`` is
+        unusable from the helper (only ``linear`` / ``constant`` /
+        ``last_period`` are supported here). For custom weights, instantiate
+        ``PreTrendsPower(..., violation_weights=...)`` directly. Tracked in
+        TODO.md as a planned helper extension.
     pre_periods : list of int, optional
         Explicit list of pre-treatment periods. If None, attempts to infer
         from results. Use when you've estimated all periods as post_periods.
@@ -1114,7 +1119,12 @@ def compute_mdv(
     target_power : float, default=0.80
         Target power for MDV calculation.
     violation_type : str, default='linear'
-        Type of violation pattern.
+        Type of violation pattern. Note: this convenience helper does NOT
+        accept ``violation_weights``, so ``violation_type='custom'`` is
+        unusable from the helper (only ``linear`` / ``constant`` /
+        ``last_period`` are supported here). For custom weights, instantiate
+        ``PreTrendsPower(..., violation_weights=...)`` directly. Tracked in
+        TODO.md as a planned helper extension.
     pre_periods : list of int, optional
         Explicit list of pre-treatment periods. If None, attempts to infer
         from results. Use when you've estimated all periods as post_periods.

From 3f0f2bb961ff7996e43a0be4281197bb626ba0cf Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 17 May 2026 07:08:38 -0400
Subject: [PATCH 18/22] Address R17 blocker via deferral: TODO.md tracking +
 docstring revert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

R17 escalated to ⛔ Blocker after the agentic codex reviewer dug into
pretrends.py and surfaced two pre-existing library bugs that my R16
helper docstring update inadvertently directed users toward:

- P0 verified: PreTrendsPowerResults.power_at() silently wrong for
  violation_type="custom". The result dataclass does not persist fitted
  violation_weights (pretrends.py:77-90), and power_at() falls back to
  equal weights with an inline "we can't reconstruct" comment at line
  230-231. My R16 docstring change pointed users to the broken class
  path.
- P1 verified: linear violation pattern at _get_violation_weights()
  (pretrends.py:510-515) ignores actual pre-period relative-time labels
  and constructs a shifted [n-1,...,0] direction from n_pre count alone,
  so for irregular / anticipation-shifted grids the reported MDV is
  NOT in Roth's γ units.

Per user direction, keeping PR-A paper-review-scoped and deferring both
bugs to PR-B (the implementation audit):
- Added two High-priority TODO.md rows tagged "PR-A (surfaced by R17 of
  the iterative codex review)" with explicit line refs + concrete fix
  outlines (persist normalized fitted weights on PreTrendsPowerResults
  for the P0; thread relative-time labels through _get_violation_weights
  for the P1).
- Reverted the R16 helper-docstring recommendation that pointed custom
  users to instantiate PreTrendsPower(..., violation_weights=...). The
  helper now describes the helper-vs-class API gap AND the
  PreTrendsPowerResults.power_at() known issue, without recommending
  the broken class path. Both gaps cross-referenced to TODO.md.

Pyright diagnostics in pretrends.py (matplotlib import, numpy/list type
conflicts, etc.) are pre-existing in code I did not touch; my edits are
docstring + TODO.md only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 TODO.md                |  2 ++
 diff_diff/pretrends.py | 28 ++++++++++++++++++----------
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/TODO.md b/TODO.md
index e3db83c2..cbfcade5 100644
--- a/TODO.md
+++ b/TODO.md
@@ -97,6 +97,8 @@ Deferred items from PR reviews that were not addressed before merge.
 | PreTrendsPower: `compute_pretrends_power` adapter uses `diag(ses^2)` instead of the full pre-period covariance block Σ_22 for `CallawaySantAnnaResults` (deliberate — non-bootstrap CS persists `event_study_vcov`; bootstrap CS fits clear it at `staggered.py:2032-2036`) and `SunAbrahamResults` (forced — SA does not expose an event-study/cohort VCV at all). Roth (2022)'s NIS box probability and the library's Wald object both depend on Σ_22 off-diagonals; diag fallback is not provably conservative. For non-bootstrap CS fits, route through `event_study_vcov`; for bootstrap CS fits the diag fallback is the only path. For SA, extend `SunAbrahamResults` to persist a cohort/event-study VCV (then route the adapter likewise). Or formally retain the diag fallback with explicit miscalibration framing. See REGISTRY.md `## PreTrendsPower` Note (deviation from paper) + `docs/methodology/papers/roth-2022-review.md`. | `diff_diff/pretrends.py:609-687`, `diff_diff/sun_abraham.py:30-88`, `docs/methodology/REGISTRY.md`, `docs/methodology/papers/roth-2022-review.md` | PR-A (Roth paper review, 2026-05-17) | Medium |
 | PreTrendsPower: pin the R `pretrends` package commit/release before building the R-parity fixture. The paper review's R-package surface claims (`pretrends()`, `slope_for_power()`, NIS-only API, no joint-Wald target) are provisional pending a pinned revision; the audited revision should be recorded either in the review file's Gaps section or in this TODO row before any parity assertions are committed. | `docs/methodology/papers/roth-2022-review.md`, `METHODOLOGY_REVIEW.md` (PreTrendsPower row) | PR-A (Roth paper review, 2026-05-17) | Low |
 | PreTrendsPower: helper `compute_pretrends_power(results, M, alpha, target_power, violation_type, pre_periods)` does NOT accept `violation_weights`, so `violation_type="custom"` is unusable from the helper (class-only today via `PreTrendsPower(..., violation_weights=...)`). Either add `violation_weights` to the helper signature and forward to the class, or document the helper as supporting only `linear` / `constant` / `last_period`. | `diff_diff/pretrends.py:1048-1095, 442-466` | PR-A (Roth paper review, 2026-05-17) | Low |
+| PreTrendsPower: `PreTrendsPowerResults.power_at()` silently returns the wrong power for `violation_type="custom"`. The result dataclass does not persist the fitted `violation_weights` (`pretrends.py:77-90`), and `power_at()` falls back to equal weights (`np.ones(n_pre)`) when reconstructing weights for custom fits — the in-line comment at `pretrends.py:230-231` literally reads "For custom, we can't reconstruct - use equal weights as fallback". Fix: persist the normalized fitted weights on `PreTrendsPowerResults` and reuse them in `power_at()`; until that lands, raise on custom from `power_at()` rather than returning equal-weights output silently. Add regression test comparing `results.power_at(M)` against `PreTrendsPower(...).fit(..., M=M).power` on a custom-weights fixture. | `diff_diff/pretrends.py:77-90, 217-231, 878-892` | PR-A (Roth paper review, 2026-05-17; surfaced by R17 of the iterative codex review on the paper review file) | **High** |
+| PreTrendsPower: `linear` violation pattern does NOT implement Roth's δ_t = γ·t. `_get_violation_weights(violation_type="linear")` constructs a shifted, normalized `[n-1, ..., 1, 0]` direction from `n_pre` only (`pretrends.py:510-515`), and `fit()` never threads actual relative-time labels into that construction (`pretrends.py:862-866`). For irregular pre-period grids (e.g., anticipation-shifted `t ∈ {-5, -3, -1}`) this means the slope reported as MDV is not in Roth's γ units. Fix: build linear weights from the sorted actual relative-time values used in the fit, define the exposed parameter in γ units, persist any normalization separately, and add a regression test using anticipation-shifted / irregular pre-periods. If the shifted convention is intentional, add a `**Note (deviation from paper):**` to REGISTRY.md and convert reported MDV back to Roth's slope scale before exposing it. | `diff_diff/pretrends.py:488-531, 862-866`, `docs/methodology/REGISTRY.md:2786-2789` | PR-A (Roth paper review, 2026-05-17; surfaced by R17 of the iterative codex review on the paper review file) | **High** |
 | Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 8 standalone estimators that expose `cluster=`: `CallawaySantAnna`, `SunAbraham`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added `vcov_type` to the `DifferenceInDifferences` inheritance chain only. | multiple | Phase 1a | Medium |
 | Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster) currently raises `NotImplementedError`. `_compute_bm_dof_from_contrasts` builds its hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but `solve_ols` solves the WLS problem by transforming to `X* = sqrt(w) X`, so the correct symmetric idempotent residual-maker is `M* = I - sqrt(W) X (X'WX)^{-1} X' sqrt(W)`. Rederive the Satterthwaite `(tr G)^2 / tr(G^2)` ratio on the transformed design and add weighted parity tests before lifting the guard. | `linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_validate_vcov_args` | Phase 1a | Medium |
 | HC2 / HC2 + Bell-McCaffrey on absorbed-FE fits — REMAINING sub-gates: `TwoWayFixedEffects` (`twfe.py:154` rejects unconditionally); `MultiPeriodDiD(absorb=..., vcov_type in {"hc2","hc2_bm"})` (`estimators.py:1458` rejects). The DiD sub-gate (`DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"})`) was lifted via auto-route to `fixed_effects=` internally; clubSandwich-parity at 1e-10 verified. The same auto-route pattern can apply to MPD-absorb; TWFE is its own class and may need different surgery (TWFE always within-transforms with no equivalent `fixed_effects=` path). Within-transformation preserves coefficients and residuals under FWL but not the hat matrix; HC1/CR1 are unaffected (no leverage term). | `twfe.py::fit`, `estimators.py::MultiPeriodDiD.fit` | follow-up | Medium |
diff --git a/diff_diff/pretrends.py b/diff_diff/pretrends.py
index a4fcc40f..ff182475 100644
--- a/diff_diff/pretrends.py
+++ b/diff_diff/pretrends.py
@@ -1067,12 +1067,16 @@ def compute_pretrends_power(
     target_power : float, default=0.80
         Target power for MDV calculation.
     violation_type : str, default='linear'
-        Type of violation pattern. Note: this convenience helper does NOT
+        Type of violation pattern. This convenience helper supports
+        ``linear`` / ``constant`` / ``last_period`` only and does NOT
         accept ``violation_weights``, so ``violation_type='custom'`` is
-        unusable from the helper (only ``linear`` / ``constant`` /
-        ``last_period`` are supported here). For custom weights, instantiate
-        ``PreTrendsPower(..., violation_weights=...)`` directly. Tracked in
-        TODO.md as a planned helper extension.
+        unusable from the helper. The custom path on
+        ``PreTrendsPower(..., violation_weights=...)`` accepts custom
+        weights at fit time but has known issues with
+        ``PreTrendsPowerResults.power_at()`` (fitted weights are not
+        persisted on the result object; ``power_at()`` silently falls
+        back to equal weights). Both gaps are tracked in TODO.md until
+        a follow-up audit lands.
     pre_periods : list of int, optional
         Explicit list of pre-treatment periods. If None, attempts to infer
         from results. Use when you've estimated all periods as post_periods.
@@ -1119,12 +1123,16 @@ def compute_mdv(
     target_power : float, default=0.80
         Target power for MDV calculation.
     violation_type : str, default='linear'
-        Type of violation pattern. Note: this convenience helper does NOT
+        Type of violation pattern. This convenience helper supports
+        ``linear`` / ``constant`` / ``last_period`` only and does NOT
         accept ``violation_weights``, so ``violation_type='custom'`` is
-        unusable from the helper (only ``linear`` / ``constant`` /
-        ``last_period`` are supported here). For custom weights, instantiate
-        ``PreTrendsPower(..., violation_weights=...)`` directly. Tracked in
-        TODO.md as a planned helper extension.
+        unusable from the helper. The custom path on
+        ``PreTrendsPower(..., violation_weights=...)`` accepts custom
+        weights at fit time but has known issues with
+        ``PreTrendsPowerResults.power_at()`` (fitted weights are not
+        persisted on the result object; ``power_at()`` silently falls
+        back to equal weights). Both gaps are tracked in TODO.md until
+        a follow-up audit lands.
     pre_periods : list of int, optional
         Explicit list of pre-treatment periods. If None, attempts to infer
         from results. Use when you've estimated all periods as post_periods.

From a1d33c8302bfb83c0668d12ae832b78cc8476609 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 17 May 2026 07:22:45 -0400
Subject: [PATCH 19/22] Address R18 blocker: silent-failure guard on power_at +
 REGISTRY linear-deviation Note
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

R18 rejected the R17 TODO-only deferral: "a TODO/docstring note does not
mitigate a silent statistical bug." Per user direction, applying the
minimal partial fix that codex's verdict explicitly offered as acceptable.

- P0 silent-failure guard on PreTrendsPowerResults.power_at: rather than
  doing the full weight-persistence refactor (deferred to PR-B), the
  method now raises NotImplementedError for violation_type="custom" and
  points users to refit with the new M instead. The previous equal-weights
  fallback (with the literal "we can't reconstruct" comment) is replaced
  with a fail-loud guard at the top of power_at; the inner else branch is
  retained as a defensive fallback for future violation_type values.
  Added a Raises section to the docstring documenting the guard. Added
  a regression test
  `test_power_at_raises_on_custom_violation_type` to lock the behavior in;
  full test_pretrends.py suite passes 64/64.
- P1 REGISTRY linear-violation deviation Note: added a `**Note (deviation
  from paper — linear violation pattern):**` block inside the
  `## PreTrendsPower` entry's Violation-types block describing how the
  shipped `[n_pre-1, ..., 1, 0]` direction is built from `n_pre` count
  alone (no relative-time labels), so irregular / anticipation-shifted
  grids do not yield Roth's γ-units MDV. Cross-referenced to TODO.md.
- Silent-failure-guard Note: also added a paired Note documenting the
  power_at custom raise behavior in REGISTRY.md for symmetry with the
  linear deviation Note (both label the corresponding TODO follow-up).

Pre-existing pyright diagnostics in pretrends.py are unrelated to this
diff (my edits are confined to power_at, docstrings, REGISTRY, and a
new test method).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/pretrends.py       | 25 ++++++++++++++++++++++++-
 docs/methodology/REGISTRY.md |  4 ++++
 tests/test_pretrends.py      | 22 ++++++++++++++++++++++
 3 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/diff_diff/pretrends.py b/diff_diff/pretrends.py
index ff182475..7f1f3c92 100644
--- a/diff_diff/pretrends.py
+++ b/diff_diff/pretrends.py
@@ -209,9 +209,31 @@ def power_at(self, M: float) -> float:
         -------
         float
             Power to detect violation of magnitude M.
+
+        Raises
+        ------
+        NotImplementedError
+            If the fit was made with ``violation_type="custom"``. The
+            ``PreTrendsPowerResults`` dataclass does not currently persist
+            the fitted ``violation_weights``, so this method cannot
+            reconstruct the custom weights. Refit
+            ``PreTrendsPower(violation_type="custom", violation_weights=...)``
+            with the new ``M`` instead. Tracked in TODO.md as a planned
+            follow-up to persist the fitted weights.
         """
         from scipy import stats
 
+        if self.violation_type == "custom":
+            raise NotImplementedError(
+                "PreTrendsPowerResults.power_at() does not support "
+                "violation_type='custom': fitted violation_weights are "
+                "not persisted on the result object, so the custom weights "
+                "cannot be reconstructed. Refit "
+                "PreTrendsPower(violation_type='custom', "
+                "violation_weights=...) with the new M instead. "
+                "See TODO.md (PreTrendsPower power_at custom path)."
+            )
+
         n_pre = self.n_pre_periods
 
         # Reconstruct violation weights based on violation type
@@ -227,7 +249,8 @@ def power_at(self, M: float) -> float:
             weights = np.zeros(n_pre)
             weights[-1] = 1.0
         else:
-            # For custom, we can't reconstruct - use equal weights as fallback
+            # Defensive fallback for unknown violation_type values added
+            # in the future; equal weights at least produce a valid number.
             weights = np.ones(n_pre)
 
         # Normalize weights to unit L2 norm
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index a6970b0f..30216326 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -2793,6 +2793,10 @@ Violation types:
 - **Last period**: δ_{-1} = c, others zero
 - **Custom**: user-specified pattern
 
+- **Note (deviation from paper — `linear` violation pattern):** the shipped `PreTrendsPower._get_violation_weights("linear")` constructs `[n_pre-1, ..., 1, 0]` from `n_pre` alone and `PreTrendsPower.fit()` never threads the actual relative-time labels into that construction (`pretrends.py:488-531`, `pretrends.py:862-866`). For irregular or anticipation-shifted pre-period grids (e.g., `t ∈ {-5, -3, -1}`), this means the slope reported as MDV is NOT in Roth's `γ` units — the shifted/normalized direction effectively assumes contiguous relative times `{-(n_pre-1), ..., -1}`. The follow-up audit (tracked in TODO.md) will either rebuild `linear` weights from the sorted actual relative-time values and expose the parameter in Roth's `γ` units, or formally retain the current shifted/normalized contract with this Note as the deviation record.
+
+- **Note (silent-failure guard — `power_at()` with `violation_type="custom"`):** `PreTrendsPowerResults` does not currently persist the fitted `violation_weights`, so `power_at(M)` cannot reconstruct the custom direction. As of this commit, `PreTrendsPowerResults.power_at()` raises `NotImplementedError` for `violation_type="custom"` rather than silently returning equal-weights output. To compute power at a new `M` for a custom fit, refit `PreTrendsPower(violation_type="custom", violation_weights=...)` with the new `M`. Tracked in TODO.md as a planned follow-up to persist the fitted weights and lift the guard.
+
 *Standard errors:*
 - Power calculations are exact (no sampling variability)
 - Uncertainty comes from estimated Σ
diff --git a/tests/test_pretrends.py b/tests/test_pretrends.py
index ba2c0a32..177f88d4 100644
--- a/tests/test_pretrends.py
+++ b/tests/test_pretrends.py
@@ -524,6 +524,28 @@ def test_power_adequate_property(self, mock_multiperiod_results):
 
         assert isinstance(results.power_adequate, bool)
 
+    def test_power_at_raises_on_custom_violation_type(self, mock_multiperiod_results):
+        """power_at(M) must raise NotImplementedError for violation_type='custom'.
+
+        The PreTrendsPowerResults dataclass does not currently persist the
+        fitted violation_weights, so power_at() cannot reconstruct the
+        custom direction. To prevent silent wrong output (equal-weights
+        fallback), the method raises NotImplementedError and points users
+        to refit with the new M. See REGISTRY.md PreTrendsPower section's
+        silent-failure-guard Note, the audit at
+        docs/methodology/papers/roth-2022-review.md, and the TODO.md row
+        tracking the planned weight-persistence follow-up.
+        """
+        # mock_multiperiod_results has 4 pre-periods but period 3 is the
+        # reference, so n_pre_periods after fit is 3 (matches
+        # test_results_n_pre_periods expectation in this class).
+        weights = np.array([0.1, 0.3, 0.6])
+        pt = PreTrendsPower(violation_type="custom", violation_weights=weights)
+        results = pt.fit(mock_multiperiod_results)
+
+        with pytest.raises(NotImplementedError, match="violation_type='custom'"):
+            results.power_at(0.5)
+
 
 # =============================================================================
 # Tests for convenience functions

From af53b41a88e7f3cf25814dfb1b615d8b08b5a035 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 17 May 2026 07:29:36 -0400
Subject: [PATCH 20/22] Address R19 polish (1 P2 + 2 P3) on PreTrendsPower
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

R19 verdict was ✅ Looks good — addressing all 3 cleanup items so the
documentation, tracking, and tests align with R18's actual mitigation.

- P2 helper docstrings stale: `compute_pretrends_power` and `compute_mdv`
  docstrings still said `power_at()` "silently falls back to equal
  weights" even though R18 changed the behavior to raise
  `NotImplementedError`. Updated both docstrings to describe the actual
  current behavior: helper raises `ValueError` from the underlying
  constructor for `violation_type='custom'`; `PreTrendsPowerResults.
  power_at()` on a custom class fit raises `NotImplementedError` because
  fitted weights are not yet persisted.
- P3 TODO.md row stale wording: `TODO.md:100` row still described the
  silent-failure path in present tense. Rewrote to record that the
  silent-failure path was mitigated in R18 (locked by
  test_power_at_raises_on_custom_violation_type) and narrowed the
  remaining follow-up to the weight-persistence work needed to re-enable
  `power_at()` for custom fits. Downgraded priority from High to Medium
  since the silent-bug surface is closed.
- P3 helper-level regression test: added two new tests to
  `TestConvenienceFunctions`:
  * `test_compute_pretrends_power_rejects_custom_violation_type` —
    confirms ValueError when `compute_pretrends_power(...,
    violation_type='custom')` is called without weights.
  * `test_compute_mdv_rejects_custom_violation_type` — mirrors the
    contract for `compute_mdv`.

Full test_pretrends.py suite: 66/66 passing (was 64 + 2 new helper tests).
Pre-existing pyright diagnostics in pretrends.py + test_pretrends.py are
unrelated to this diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 TODO.md                 |  2 +-
 diff_diff/pretrends.py  | 36 ++++++++++++++++++++----------------
 tests/test_pretrends.py | 27 +++++++++++++++++++++++++++
 3 files changed, 48 insertions(+), 17 deletions(-)

diff --git a/TODO.md b/TODO.md
index cbfcade5..b5988d45 100644
--- a/TODO.md
+++ b/TODO.md
@@ -97,7 +97,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | PreTrendsPower: `compute_pretrends_power` adapter uses `diag(ses^2)` instead of the full pre-period covariance block Σ_22 for `CallawaySantAnnaResults` (deliberate — non-bootstrap CS persists `event_study_vcov`; bootstrap CS fits clear it at `staggered.py:2032-2036`) and `SunAbrahamResults` (forced — SA does not expose an event-study/cohort VCV at all). Roth (2022)'s NIS box probability and the library's Wald object both depend on Σ_22 off-diagonals; diag fallback is not provably conservative. For non-bootstrap CS fits, route through `event_study_vcov`; for bootstrap CS fits the diag fallback is the only path. For SA, extend `SunAbrahamResults` to persist a cohort/event-study VCV (then route the adapter likewise). Or formally retain the diag fallback with explicit miscalibration framing. See REGISTRY.md `## PreTrendsPower` Note (deviation from paper) + `docs/methodology/papers/roth-2022-review.md`. | `diff_diff/pretrends.py:609-687`, `diff_diff/sun_abraham.py:30-88`, `docs/methodology/REGISTRY.md`, `docs/methodology/papers/roth-2022-review.md` | PR-A (Roth paper review, 2026-05-17) | Medium |
 | PreTrendsPower: pin the R `pretrends` package commit/release before building the R-parity fixture. The paper review's R-package surface claims (`pretrends()`, `slope_for_power()`, NIS-only API, no joint-Wald target) are provisional pending a pinned revision; the audited revision should be recorded either in the review file's Gaps section or in this TODO row before any parity assertions are committed. | `docs/methodology/papers/roth-2022-review.md`, `METHODOLOGY_REVIEW.md` (PreTrendsPower row) | PR-A (Roth paper review, 2026-05-17) | Low |
 | PreTrendsPower: helper `compute_pretrends_power(results, M, alpha, target_power, violation_type, pre_periods)` does NOT accept `violation_weights`, so `violation_type="custom"` is unusable from the helper (class-only today via `PreTrendsPower(..., violation_weights=...)`). Either add `violation_weights` to the helper signature and forward to the class, or document the helper as supporting only `linear` / `constant` / `last_period`. | `diff_diff/pretrends.py:1048-1095, 442-466` | PR-A (Roth paper review, 2026-05-17) | Low |
-| PreTrendsPower: `PreTrendsPowerResults.power_at()` silently returns the wrong power for `violation_type="custom"`. The result dataclass does not persist the fitted `violation_weights` (`pretrends.py:77-90`), and `power_at()` falls back to equal weights (`np.ones(n_pre)`) when reconstructing weights for custom fits — the in-line comment at `pretrends.py:230-231` literally reads "For custom, we can't reconstruct - use equal weights as fallback". Fix: persist the normalized fitted weights on `PreTrendsPowerResults` and reuse them in `power_at()`; until that lands, raise on custom from `power_at()` rather than returning equal-weights output silently. Add regression test comparing `results.power_at(M)` against `PreTrendsPower(...).fit(..., M=M).power` on a custom-weights fixture. | `diff_diff/pretrends.py:77-90, 217-231, 878-892` | PR-A (Roth paper review, 2026-05-17; surfaced by R17 of the iterative codex review on the paper review file) | **High** |
+| PreTrendsPower: `PreTrendsPowerResults.power_at()` does not yet support `violation_type="custom"`. **Silent-failure path was mitigated** in PR-A (2026-05-17, R18 of the codex review): `power_at()` now raises `NotImplementedError` for custom fits rather than returning equal-weights output, locked in by `test_power_at_raises_on_custom_violation_type`. Remaining follow-up: persist the normalized fitted `violation_weights` on `PreTrendsPowerResults` (currently absent at `pretrends.py:77-90`) and re-enable `power_at()` for custom fits, with a parity test comparing `results.power_at(M)` to a fresh `PreTrendsPower(...).fit(..., M=M).power` on a custom-weights fixture. | `diff_diff/pretrends.py:77-90, ~196-235, ~878-892` | PR-A (Roth paper review, 2026-05-17) | Medium |
 | PreTrendsPower: `linear` violation pattern does NOT implement Roth's δ_t = γ·t. `_get_violation_weights(violation_type="linear")` constructs a shifted, normalized `[n-1, ..., 1, 0]` direction from `n_pre` only (`pretrends.py:510-515`), and `fit()` never threads actual relative-time labels into that construction (`pretrends.py:862-866`). For irregular pre-period grids (e.g., anticipation-shifted `t ∈ {-5, -3, -1}`) this means the slope reported as MDV is not in Roth's γ units. Fix: build linear weights from the sorted actual relative-time values used in the fit, define the exposed parameter in γ units, persist any normalization separately, and add a regression test using anticipation-shifted / irregular pre-periods. If the shifted convention is intentional, add a `**Note (deviation from paper):**` to REGISTRY.md and convert reported MDV back to Roth's slope scale before exposing it. | `diff_diff/pretrends.py:488-531, 862-866`, `docs/methodology/REGISTRY.md:2786-2789` | PR-A (Roth paper review, 2026-05-17; surfaced by R17 of the iterative codex review on the paper review file) | **High** |
 | Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 8 standalone estimators that expose `cluster=`: `CallawaySantAnna`, `SunAbraham`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added `vcov_type` to the `DifferenceInDifferences` inheritance chain only. | multiple | Phase 1a | Medium |
 | Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster) currently raises `NotImplementedError`. `_compute_bm_dof_from_contrasts` builds its hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but `solve_ols` solves the WLS problem by transforming to `X* = sqrt(w) X`, so the correct symmetric idempotent residual-maker is `M* = I - sqrt(W) X (X'WX)^{-1} X' sqrt(W)`. Rederive the Satterthwaite `(tr G)^2 / tr(G^2)` ratio on the transformed design and add weighted parity tests before lifting the guard. | `linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_validate_vcov_args` | Phase 1a | Medium |
diff --git a/diff_diff/pretrends.py b/diff_diff/pretrends.py
index 7f1f3c92..e559a308 100644
--- a/diff_diff/pretrends.py
+++ b/diff_diff/pretrends.py
@@ -1092,14 +1092,16 @@ def compute_pretrends_power(
     violation_type : str, default='linear'
         Type of violation pattern. This convenience helper supports
         ``linear`` / ``constant`` / ``last_period`` only and does NOT
-        accept ``violation_weights``, so ``violation_type='custom'`` is
-        unusable from the helper. The custom path on
-        ``PreTrendsPower(..., violation_weights=...)`` accepts custom
-        weights at fit time but has known issues with
-        ``PreTrendsPowerResults.power_at()`` (fitted weights are not
-        persisted on the result object; ``power_at()`` silently falls
-        back to equal weights). Both gaps are tracked in TODO.md until
-        a follow-up audit lands.
+        accept ``violation_weights``, so passing
+        ``violation_type='custom'`` will raise ``ValueError`` from the
+        underlying ``PreTrendsPower`` constructor (which requires
+        ``violation_weights`` when ``violation_type='custom'``). To use a
+        custom violation pattern, instantiate ``PreTrendsPower(...,
+        violation_weights=...)`` directly. Note that
+        ``PreTrendsPowerResults.power_at()`` on such a fit raises
+        ``NotImplementedError`` because fitted weights are not yet
+        persisted on the result object; refit with the new ``M`` instead.
+        Both gaps are tracked in TODO.md until the follow-up audit lands.
     pre_periods : list of int, optional
         Explicit list of pre-treatment periods. If None, attempts to infer
         from results. Use when you've estimated all periods as post_periods.
@@ -1148,14 +1150,16 @@ def compute_mdv(
     violation_type : str, default='linear'
         Type of violation pattern. This convenience helper supports
         ``linear`` / ``constant`` / ``last_period`` only and does NOT
-        accept ``violation_weights``, so ``violation_type='custom'`` is
-        unusable from the helper. The custom path on
-        ``PreTrendsPower(..., violation_weights=...)`` accepts custom
-        weights at fit time but has known issues with
-        ``PreTrendsPowerResults.power_at()`` (fitted weights are not
-        persisted on the result object; ``power_at()`` silently falls
-        back to equal weights). Both gaps are tracked in TODO.md until
-        a follow-up audit lands.
+        accept ``violation_weights``, so passing
+        ``violation_type='custom'`` will raise ``ValueError`` from the
+        underlying ``PreTrendsPower`` constructor (which requires
+        ``violation_weights`` when ``violation_type='custom'``). To use a
+        custom violation pattern, instantiate ``PreTrendsPower(...,
+        violation_weights=...)`` directly. Note that
+        ``PreTrendsPowerResults.power_at()`` on such a fit raises
+        ``NotImplementedError`` because fitted weights are not yet
+        persisted on the result object; refit with the new ``M`` instead.
+        Both gaps are tracked in TODO.md until the follow-up audit lands.
     pre_periods : list of int, optional
         Explicit list of pre-treatment periods. If None, attempts to infer
         from results. Use when you've estimated all periods as post_periods.
diff --git a/tests/test_pretrends.py b/tests/test_pretrends.py
index 177f88d4..c42d305f 100644
--- a/tests/test_pretrends.py
+++ b/tests/test_pretrends.py
@@ -581,6 +581,33 @@ def test_compute_mdv(self, mock_multiperiod_results):
         assert isinstance(mdv, float)
         assert mdv > 0
 
+    def test_compute_pretrends_power_rejects_custom_violation_type(
+        self, mock_multiperiod_results
+    ):
+        """compute_pretrends_power(..., violation_type='custom') must raise ValueError.
+
+        The helper does not accept ``violation_weights``, so a custom-type
+        call cannot supply the required weights vector. The underlying
+        PreTrendsPower constructor must raise to prevent the helper from
+        silently coercing a custom request into a degenerate fit. See
+        REGISTRY.md PreTrendsPower section + docs/methodology/papers/
+        roth-2022-review.md (helper/class API gap).
+        """
+        with pytest.raises(ValueError, match="violation_weights"):
+            compute_pretrends_power(
+                mock_multiperiod_results, violation_type="custom"
+            )
+
+    def test_compute_mdv_rejects_custom_violation_type(self, mock_multiperiod_results):
+        """compute_mdv(..., violation_type='custom') must raise ValueError.
+
+        Same contract as ``compute_pretrends_power``: the helper does not
+        accept ``violation_weights``, so the custom path is unusable from
+        the helper.
+        """
+        with pytest.raises(ValueError, match="violation_weights"):
+            compute_mdv(mock_multiperiod_results, violation_type="custom")
+
 
 # =============================================================================
 # Tests for get_params and set_params

From 3350d0b66789c4d2e552cd77e5189180c9d6f3c2 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 17 May 2026 07:34:13 -0400
Subject: [PATCH 21/22] Address R20 polish (1 P3-informational maintainability)
 on power_at fallback
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

R20 verdict was ✅ Looks good. The two methodology + tech-debt P3-informational
findings explicitly required "none required for approval" (already tracked
via REGISTRY notes + TODO rows). Addressed the one finding with a concrete
fix:

- P3 maintainability — PreTrendsPowerResults.power_at() still had an
  unknown-violation_type else fallback that defaulted to equal weights
  (np.ones(n_pre)). That meant a future violation_type added to fit() but
  not threaded through power_at() would silently produce equal-weights
  output. Replaced with an explicit ValueError mirroring the raise at the
  end of _get_violation_weights() (line 530-531). Aligns the two
  weight-reconstruction paths and closes the future-drift silent surface.

Test suite: 66/66 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 diff_diff/pretrends.py | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/diff_diff/pretrends.py b/diff_diff/pretrends.py
index e559a308..8b32c471 100644
--- a/diff_diff/pretrends.py
+++ b/diff_diff/pretrends.py
@@ -249,9 +249,14 @@ def power_at(self, M: float) -> float:
             weights = np.zeros(n_pre)
             weights[-1] = 1.0
         else:
-            # Defensive fallback for unknown violation_type values added
-            # in the future; equal weights at least produce a valid number.
-            weights = np.ones(n_pre)
+            # Fail loud on unknown violation_type values. Mirrors the raise
+            # at the end of _get_violation_weights(); prevents silent
+            # equal-weights output if a future violation_type is added to
+            # fit() but not threaded through power_at().
+            raise ValueError(
+                f"Unknown violation_type: {self.violation_type!r}. "
+                f"Expected one of: 'linear', 'constant', 'last_period', 'custom'."
+            )
 
         # Normalize weights to unit L2 norm
         norm = np.linalg.norm(weights)

From b3f818100a13bc48d96044ac4bd69a8027fce4aa Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 17 May 2026 07:57:07 -0400
Subject: [PATCH 22/22] =?UTF-8?q?Address=20PR=20#463=20CI=20R1=20(2=20P3)?=
 =?UTF-8?q?=20=E2=80=94=20REPORTING=20parity=20+=20PDF=20reproducibility?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

CI codex Round 1 verdict was ✅ Looks good. Addressing the two actionable
P3 doc findings; the other 4 P3 findings were explicit "no fix required"
mitigation confirmations.

- P3 REPORTING.md parity: the diagonal-covariance-fallback Note still
  called the deviation "conservative" at REPORTING.md:324-337, but the
  new authoritative REGISTRY.md note (in this PR) describes it as a
  non-paper approximation that is NOT provably conservative (direction
  of the discrepancy with full Σ_22 depends on sign + magnitude of
  dropped correlations). Aligned the REPORTING note to match the
  REGISTRY language, kept the description of the BusinessReport
  well_powered → moderately_powered downgrade as a practical safeguard
  (not a proof of conservatism), and cross-referenced the REGISTRY
  authoritative deviation block.
- P3 paper-review reproducibility: the roth-2022-review.md header
  listed "PDF reviewed: papers/roth-2022.pdf" but the /papers/ directory
  is gitignored, so the path isn't reproducible from the repo alone.
  Replaced the path with the published-article citation + DOI + an
  explicit note that the PDF was reviewed externally and that the
  /papers/ working directory is gitignored. Added pointers to the
  jonathandroth.com author page for reproduction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/methodology/REPORTING.md               | 21 ++++++++++++++-------
 docs/methodology/papers/roth-2022-review.md |  4 ++--
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/docs/methodology/REPORTING.md b/docs/methodology/REPORTING.md
index 0a04b4e6..f459dc8a 100644
--- a/docs/methodology/REPORTING.md
+++ b/docs/methodology/REPORTING.md
@@ -328,13 +328,20 @@ a library setting.
   `DiagnosticReport.pretrends_power` block records
   `covariance_source: "diag_fallback_available_full_vcov_unused"` in
   that case, and `BusinessReport` downgrades a `well_powered` tier to
-  `moderately_powered` before rendering prose. This is a known
-  conservative deviation from the documented "use the full pre-period
-  covariance" position — it prevents the diagonal approximation from
-  producing an overly optimistic "well-powered" claim when correlated
-  pre-period errors could tighten the MDV. The right long-term fix is
-  to teach `compute_pretrends_power()` to consume `event_study_vcov`
-  and `event_study_vcov_index`; until that lands this downgrade stays.
+  `moderately_powered` before rendering prose. This is a documented
+  deviation from the paper-derived "use the full pre-period covariance"
+  position. **Not provably conservative**: under Roth (2022)'s NIS
+  framework and the library's Wald form, the MDV/power objects depend
+  on the off-diagonals of Σ_22, and the direction of the discrepancy
+  between full-Σ_22 and diag(ses^2) depends on the sign and magnitude
+  of the dropped correlations — see the `**Note (deviation from paper
+  — diagonal pre-period VCV fallback):**` block under `## PreTrendsPower`
+  in `docs/methodology/REGISTRY.md`. The `well_powered → moderately_powered`
+  downgrade in BusinessReport reduces the chance of an overly optimistic
+  claim in practice, but it is not a proof of conservatism. The right
+  long-term fix is to teach `compute_pretrends_power()` to consume
+  `event_study_vcov` and `event_study_vcov_index`; until that lands the
+  downgrade stays.
 
 - **Note:** Unit-translation policy. BusinessReport does not
   arithmetically translate log-points to percents or level effects to
diff --git a/docs/methodology/papers/roth-2022-review.md b/docs/methodology/papers/roth-2022-review.md
index fd9262b0..caf3bac9 100644
--- a/docs/methodology/papers/roth-2022-review.md
+++ b/docs/methodology/papers/roth-2022-review.md
@@ -2,9 +2,9 @@
 
 **Authors:** Jonathan Roth
 **Citation:** Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. *American Economic Review: Insights*, 4(3), 305-322.
-**PDF reviewed:** papers/roth-2022.pdf (18 pages, content pages 1-15)
-**Review date:** 2026-05-16
 **DOI:** https://doi.org/10.1257/aeri.20210236
+**Source reviewed:** AER:I 4(3), 305-322 (18 pages, content pages 1-15). PDF was reviewed externally and is not committed to the repository (the `/papers/` working directory is gitignored). Reproduce by downloading the published article via the DOI above or from the author's page at https://www.jonathandroth.com/.
+**Review date:** 2026-05-16
 
 ---