Skip to content

Add Goodman-Bacon (2021) paper review#451

Merged
igerber merged 2 commits into
mainfrom
docs/goodman-bacon-2021-review
May 16, 2026
Merged

Add Goodman-Bacon (2021) paper review#451
igerber merged 2 commits into
mainfrom
docs/goodman-bacon-2021-review

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 16, 2026

Summary

  • Comprehensive paper review of Goodman-Bacon (2021) Difference-in-differences with variation in treatment timing (J. Econometrics 225(2), 254-277) at docs/methodology/papers/goodman-bacon-2021-review.md. Covers Theorem 1 (Eqs. 10a-g), the three 2x2 comparison types, variance terms (Eqs. 7-9), the VWATT/VWCT/ΔATT probability-limit decomposition (Eqs. 14-15), the linear trend-break wrong-sign case (Eqs. 17-18), the timing-only pair simplification (Eq. 11), the controlled DD extension (Section 5.2, Eqs. 21-27), and the Oaxaca-Blinder-Kitagawa specification-comparison decomposition (Eq. 20).
  • Includes a "Methodology Registry Entry" block formatted to match REGISTRY.md, marked proposed — REGISTRY replacement is deferred to the BaconDecomposition methodology-audit PR (PR-B).
  • Documents two library-vs-paper deviations: unbalanced-panel acceptance (bacon.py:491-499 warns but does not raise; paper Appendix A assumes balanced) and always-treated U-bucketing (bacon.py maps only first_treat ∈ {0, np.inf} to U; paper footnote 11 also puts t_i < 1 units in U).
  • Adds a TODO.md row under Methodology/Correctness tech debt for the BaconDecomposition methodology-audit follow-up, with six concrete items the audit must address.

Methodology references (required if estimator / math changes)

  • Method name(s): BaconDecomposition (paper-review-only PR; no estimator code changes)
  • Paper / source link(s): Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. J. Econometrics 225(2), 254-277. DOI: 10.1016/j.jeconom.2021.03.014
  • Any intentional deviations from the source (and why): None in this PR. Two pre-existing library deviations (unbalanced-panel acceptance and narrower U-bucketing) are documented in the review file for the audit PR to address.

Validation

  • Tests added/updated: No test changes (paper review is a docs deliverable)
  • Backtest / simulation / notebook evidence (if applicable): N/A — pure documentation
  • Next step: BaconDecomposition methodology-audit PR (tracked in TODO.md row added in this PR) will carry the REGISTRY replacement, tests/test_methodology_bacon.py, R parity fixtures via bacondecomp::bacon(), and METHODOLOGY_REVIEW.md status flip to Complete.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Comprehensive review of "Difference-in-differences with variation in
treatment timing" (Journal of Econometrics 225(2), 254-277,
DOI: 10.1016/j.jeconom.2021.03.014) targeting the BaconDecomposition
methodology-review promotion. Covers:

- Theorem 1 decomposition (Eqs. 10a-g) with the three 2x2 comparison
  types: treated/untreated, earlier/later, later/earlier
- Variance terms (Eqs. 7-9) and decomposition weights (Eqs. 10e-g)
  with explicit superscript-convention note on which group plays the
  treatment role
- Timing-only pair simplification (Eq. 11, μ_kℓ)
- Probability-limit decomposition VWATT + VWCT - ΔATT (Eqs. 14a-c, 15)
  with the VWCT approximation (Eq. 19) — w_k^T and w_k^C formulas
  written with paper-faithful superscripts (all `k` for treatment-role
  weights, all `j` for control-role weights)
- Linear trend-break wrong-sign case (Eqs. 17-18)
- Constant-ATT identification (Eq. 16)
- Controlled DD extension (Section 5.2, Eqs. 21-27)
- Oaxaca-Blinder-Kitagawa specification-comparison decomposition
  (Eq. 20)

Includes a "Methodology Registry Entry" block formatted to match the
REGISTRY.md structure, marked as **proposed** — not yet merged into
docs/methodology/REGISTRY.md. The audit-pass PR for diff_diff/bacon.py
(tracked in TODO.md "Tech Debt from Code Reviews" → Methodology/
Correctness) carries the REGISTRY replacement plus six concrete
follow-up items (verify Theorem 1 / Eqs 10a-g, decide on always-
treated handling, R parity fixtures, methodology test file, REGISTRY
replacement, METHODOLOGY_REVIEW.md status flip).

Documents two library-vs-paper deviations explicitly:

1. Unbalanced panels: paper Appendix A assumes balanced panels;
   diff_diff/bacon.py:491-499 accepts unbalanced panels with a
   UserWarning. The decomposition is exact only on balanced panels.

2. Always-treated unit U-bucketing: paper footnote 11 puts always-
   treated units (t_i < 1) into U alongside never-treated; diff_diff/
   bacon.py:437-439 documents only first_treat ∈ {0, np.inf} as the
   U sentinels, and bacon.py:504-507 implements only that mask.
   Genuinely always-treated units (0 < first_treat <= min(time), per
   docs/troubleshooting.rst:739-747) are not automatically remapped.
   The audit needs to decide how to handle this gap.

Adds a TODO.md row under Methodology/Correctness tech debt for the
BaconDecomposition methodology-audit follow-up, listing the six
items the audit needs to address.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good

Executive Summary

  • This is a docs/TODO-only PR; it does not change estimator code, weighting code, inference code, or defaults.
  • I did not find a new P0/P1 issue in the changed files. The substantive Bacon methodology gaps this PR discusses are pre-existing code issues and are now explicitly tracked in TODO.md:77, so they are non-blocking for this PR.
  • P2: the new review claims the reviewed source PDF is papers/goodman-bacon.pdf, but that file is not present in the repository, which makes the review artifact non-reproducible from the tree: docs/methodology/papers/goodman-bacon-2021-review.md:5.
  • P3: the review labels itself a “source-of-truth target” and “authoritative reference” before REGISTRY.md is updated, which creates a temporary second methodology contract: docs/methodology/papers/goodman-bacon-2021-review.md:14.
  • The new TODO entry is useful and appropriately scopes the follow-up methodology audit: TODO.md:77.

Methodology

  • P3 — Informational / tracked: the PR usefully surfaces the pre-existing Bacon audit scope in TODO.md:77 rather than pretending the current implementation has already been verified against Goodman-Bacon (2021). Impact: this keeps the docs-only PR honest about the current state and makes the follow-up audit concrete. Concrete fix: none required for this PR; complete the follow-up audit listed in TODO.md:77.

Code Quality

  • No findings; the diff does not modify executable code.

Performance

  • No findings; the diff is documentation-only.

Maintainability

  • P3 — docs/methodology/papers/goodman-bacon-2021-review.md:14 says the paper review is the “source-of-truth target” and “authoritative reference for the audit work” even though docs/methodology/REGISTRY.md remains the official methodology contract used by reviewers and tooling. Impact: this creates two competing sources of truth until the follow-up PR lands. Concrete fix: reword line 14 to say this is “proposed replacement text for a future REGISTRY update,” and keep REGISTRY.md as the sole authoritative contract until that update is merged.

Tech Debt

  • P3 — The new deferred-work row in TODO.md:77 is well-scoped and appropriately tracks the Bacon methodology audit. Impact: non-blocking work is now visible and tied to concrete follow-up tasks. Concrete fix: none for this PR.

Security

  • No findings; no secrets, credentials, or new executable surfaces are introduced.

Documentation/Tests

  • P2 — docs/methodology/papers/goodman-bacon-2021-review.md:5 states that the “PDF reviewed” is papers/goodman-bacon.pdf, but no such file exists in the repository. Impact: future reviewers cannot reproduce the stated review artifact from repo contents, and the path currently reads like a checked-in dependency that is missing. Concrete fix: either remove the repo-local path and cite the DOI/journal version actually reviewed, or replace it with a retrieval note that points to the external source rather than a nonexistent local file.
  • P3 — No test additions is acceptable here because the diff is docs/TODO-only and the promised methodology tests are explicitly deferred in TODO.md:77. Impact: none for merge. Concrete fix: add tests/test_methodology_bacon.py in the follow-up audit PR.

P2 (Documentation/Tests): the "PDF reviewed" line referenced
papers/goodman-bacon.pdf, which is under the gitignored /papers/
directory and therefore not reproducible from the tree. Replaced
the local-path citation with the Journal of Econometrics version
referenced by DOI 10.1016/j.jeconom.2021.03.014, with an explicit
note that local PDFs are gitignored. This matches the pattern in
rambachan-roth-2023-review.md and wooldridge-2023-review.md
(DOI/URL citations) rather than the local-path pattern in the
older butts/clarke/conley reviews.

P3 (Maintainability): the "source-of-truth target" / "authoritative
reference for the audit work" framing at L14 created dual contracts
with REGISTRY.md until the audit PR lands. Reworded to:
- "proposed replacement text for a future REGISTRY update"
- explicit statement that REGISTRY.md remains the sole authoritative
  contract until the audit PR lands
- pointer to the TODO.md row tracking that audit

No estimator, weighting, variance, or inference code changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 70795a62f8ccc4fa184c05435873032da3f831db


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review: the two prior documentation findings are addressed. The review now cites the journal/DOI PDF instead of a nonexistent repo-local file, and it explicitly keeps REGISTRY.md as the sole authoritative methodology contract until the follow-up audit. docs/methodology/papers/goodman-bacon-2021-review.md:L5-L14, .gitignore:L91-L95
  • This is still a docs/TODO-only diff; no estimator code, weighting logic, inference logic, variance/SE code, or defaults changed.
  • Spot checks against the published paper did not uncover an undocumented methodology mismatch on the high-risk points surfaced in the new review: the positive-weight 2x2 decomposition framing, the always-treated t_i < 1 convention for U, the Section 5.2 controls extension, and the external-appendix note for additional appendices. docs/methodology/papers/goodman-bacon-2021-review.md:L20-L28, docs/methodology/papers/goodman-bacon-2021-review.md:L171-L176, docs/methodology/papers/goodman-bacon-2021-review.md:L195-L199, docs/methodology/papers/goodman-bacon-2021-review.md:L221-L239, docs/methodology/papers/goodman-bacon-2021-review.md:L317-L321 (gregoryeady.com)
  • The new TODO row properly tracks the substantive Bacon methodology audit as deferred follow-up instead of implying the current implementation has already been verified. TODO.md:L77-L77

Methodology

  • No findings. The changed review’s methodology claims are consistent with the published Goodman-Bacon paper on the points it elevates, and the file explicitly avoids creating a second live contract by deferring authority to docs/methodology/REGISTRY.md. docs/methodology/papers/goodman-bacon-2021-review.md:L14-L20, docs/methodology/papers/goodman-bacon-2021-review.md:L171-L176, docs/methodology/papers/goodman-bacon-2021-review.md:L195-L199, docs/methodology/papers/goodman-bacon-2021-review.md:L221-L239 (gregoryeady.com)

Code Quality

  • No findings. No executable code changed.

Performance

  • No findings. No runtime paths changed.

Maintainability

  • No findings. The earlier dual-source-of-truth concern is resolved by the explicit “proposed replacement” disclaimer plus the linked follow-up TODO. docs/methodology/papers/goodman-bacon-2021-review.md:L10-L14, TODO.md:L77-L77

Tech Debt

  • Severity: P3. Impact: the PR appropriately records that Bacon still needs a substantive methodology audit covering Theorem 1 parity, always-treated handling, R fixtures, methodology tests, REGISTRY replacement, and METHODOLOGY_REVIEW.md completion, but that work is now explicitly tracked rather than silently deferred. TODO.md:L77-L77 Concrete fix: none for this PR; complete the tracked follow-up.

Security

  • No findings. No secrets, credentials, or new executable surfaces are introduced.

Documentation/Tests

  • No findings. The prior reproducibility issue is resolved: the review now points to the journal/DOI source and the repo’s /papers/ ignore rule rather than a nonexistent tracked PDF. docs/methodology/papers/goodman-bacon-2021-review.md:L5-L5, .gitignore:L91-L95

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 16, 2026
@igerber igerber merged commit 21d4975 into main May 16, 2026
11 of 12 checks passed
@igerber igerber deleted the docs/goodman-bacon-2021-review branch May 16, 2026 11:48
igerber added a commit that referenced this pull request May 16, 2026
Promotes BaconDecomposition from In Progress → Complete (R parity pending)
in METHODOLOGY_REVIEW.md. Operationalizes the paper review landed in
PR #451 against diff_diff/bacon.py.

Audit findings and corrections:

1. Theorem 1 exact-weights rewrite (bacon.py:_recompute_exact_weights).
   The prior "exact" mode did not actually compute Eqs. 7-9 / 10e-g —
   it was missing the (1 - n_kU) factor in the within-subsample treatment
   variance, did not square the sample share, and added an extraneous
   unit_share factor not present in the paper. The post-hoc sum-to-1
   normalization masked the relative-weight error but produced ~0.3%
   decomposition error vs TWFE (0.007 absolute on a 3-cohort + never-
   treated DGP). Rewrote the function to compute exact numerators of
   Eqs. 10e/f/g and let post-hoc normalization handle the V_hat^D
   denominator (Theorem 1 guarantees V_hat^D = sum(numerators)). Now
   matches TWFE at atol=1e-10 on noisy and hand-calculable DGPs.

2. Default `weights` flipped from "approximate" to "exact" at 3 entry
   points: BaconDecomposition.__init__() (bacon.py:397), bacon_decompose()
   (bacon.py:1064), TwoWayFixedEffects.decompose() (twfe.py:684). The
   approximate path remains opt-in for speed-sensitive diagnostic loops.
   diff_diff/diagnostic_report.py:1740 updated to pass explicit
   weights="exact". The existing test_weighted_sum_equals_twfe tolerance
   was tightened from < 0.1 to < 1e-10 to lock the Theorem 1 algebraic
   identity contract.

   **Survey-design behavior change**: weights="exact" routes through
   _validate_unit_constant_survey, which rejects survey designs whose
   weights / strata / PSU / FPC columns vary within a unit across
   periods. The previous approximate default tolerated time-varying
   within-unit survey weights via observation-level weighted means.
   Migration: pass weights="approximate" explicitly to retain the
   legacy path. Documented in CHANGELOG Changed entry and the new
   bacon_decompose() docstring survey_design parameter block.

3. Always-treated warn+remap per paper footnote 11 (bacon.py:fit()).
   Units with first_treat <= min(time) (excluding never-treated
   sentinels 0 and np.inf) are auto-remapped to U via an internal
   column (__bacon_first_treat_internal__), preserving the user's
   first_treat column unchanged. The count is exposed on the result
   via a new BaconDecompositionResults.n_always_treated_remapped field
   and rendered in summary() when nonzero.

   **n_never_treated reports TRUE never-treated only**, computed from
   the original user column before remap — remapped always-treated
   units appear separately as n_always_treated_remapped, no double-
   counting.

   **Loop gate uses POST-remap U count**: the treated_vs_never
   comparison loop gates on n_units_in_U_bucket (post-remap) so panels
   whose U is composed entirely of remapped always-treated units still
   emit beta_kU^{2x2} terms. Without this distinction the loop would
   silently drop those terms and break the Theorem 1 identity.

   **Ordered-time logic**: detection uses first_treat <= min(time)
   (not positive-sign restriction), so event-time panels with
   negative or zero-crossing period labels (e.g. time ∈ [-2,..,3])
   work correctly. A cohort at first_treat=-1 on such a panel is a
   valid timing group; a cohort at first_treat=-3 is remapped to U.
   Both timing_groups filters updated to exclude only the U
   sentinels, not positive values.

REGISTRY.md replacement:

- Replaced ## BaconDecomposition block with paper-review-sourced content
  plus four sub-notes (weight modes, always-treated remap with ordered-
  time logic, R parity status, unbalanced-panel deviation).
- Explicitly removed the prior block's "Weights may be negative for
  later-vs-earlier comparisons" claim. Theorem 1 weights are strictly
  positive and sum to 1 (positivity is the headline of the theorem);
  negative weights are an estimand-level phenomenon (Borusyak-Jaravel
  2017, de Chaisemartin-D'Haultfoeuille 2020) at the ATT level, not
  estimator-level.
- Narrowed the machine-precision claim to balanced panels only; the
  unbalanced-panel library extension is documented as an explicit
  Deviation block (Goodman-Bacon Appendix A's proof assumes balanced
  panels; under unbalance, the Theorem 1 identity holds only
  approximately, though outputs remain finite).

New artifacts:

- tests/test_methodology_bacon.py (~1050 lines, ~28 tests across 6
  classes):
    - TestBaconHandCalculation: hand-checks Eqs. 7-9 + 10b-d at
      atol=1e-10 on a minimal hand-derived balanced panel (weights
      {0.3, 0.4, 0.1, 0.2} hand-computed from sample shares and
      treatment-share variances).
    - TestBaconParityR: skips on missing R goldens.
    - TestBaconAlwaysTreatedRemap: regression-tests warn+remap
      mechanics including user-data-preservation, U-bucket-only-from-
      remap (the bug from the loop-gate fix), negative first_treat
      as a valid cohort (event-time encoding), and remap of negative
      first_treat below min(time).
    - TestBaconEdgeCases: no-untreated, single-cohort, unbalanced
      panel (finite but NOT machine precision), constant-ATT recovery.
    - TestBaconWeightModes: locks exact-is-default contract.
    - TestBaconSurveyDesignNarrowing: survey_design composes with
      exact mode + warn+remap; defaulted BaconDecomposition(),
      bacon_decompose(), and the survey-time-varying-weights
      rejection contract are pinned.

- benchmarks/R/generate_bacon_golden.R (234 lines): R generator script
  for bacondecomp::bacon() parity goldens across 3 DGP fixtures.
  JSON goldens deferred until bacondecomp R package is installed in
  the local R library (TODO.md follow-up row).

CHANGELOG entries: ### Changed (default flip + survey behavior change),
TODO.md: prior BaconDecomposition row replaced by narrower R-parity
goldens deferral row.

Test results: 96 passed, 3 skipped (R parity) across all bacon/decompose
callers (test_bacon.py, test_methodology_bacon.py, test_business_report,
test_methodology_twfe, test_practitioner, test_target_parameter,
test_survey_phase3, test_survey_phase6).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant