Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md

Large diffs are not rendered by default.

45 changes: 34 additions & 11 deletions METHODOLOGY_REVIEW.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ The catalog grew incrementally over several quarters, so formats vary across the

| Tool | Module | R Reference | Status | Last Review |
|------|--------|-------------|--------|-------------|
| BaconDecomposition | `bacon.py` | `bacondecomp::bacon()` | **In Progress** | — |
| BaconDecomposition | `bacon.py` | `bacondecomp::bacon()` | **Complete** (R parity pending) | 2026-05-16 |
| HonestDiD | `honest_did.py` | `HonestDiD` package | **Complete** | 2026-04-01 |
| PreTrendsPower | `pretrends.py` | `pretrends` package | **In Progress** | — |
| PowerAnalysis | `power.py` | `pwr` / `DeclareDesign` | **In Progress** | — |
Expand Down Expand Up @@ -909,18 +909,41 @@ and covariate-adjusted specifications.)
| Module | `bacon.py` |
| Primary Reference | Goodman-Bacon (2021), *Difference-in-differences with variation in treatment timing*, J. Econometrics 225(2), 254-277 |
| R Reference | `bacondecomp::bacon()` |
| Status | **In Progress** |
| Last Review | |
| Status | **Complete** (R parity goldens pending) |
| Last Review | 2026-05-16 |

**Documentation in place:**
- REGISTRY.md section: `## BaconDecomposition` (three comparison types — treated_vs_never, earlier_vs_later, later_vs_earlier; weight construction; TWFE reconstitution; weighted survey path under Phase 3)
- Implementation: `tests/test_bacon.py` (basic decomposition, weight properties, integration with `TwoWayFixedEffects.decompose()`)
**Verified Components:**
- [x] Theorem 1 decomposition identity: `β̂^DD = Σ s · β̂^{2x2}` at `atol=1e-10` (hand-calculable + noisy DGPs)
- [x] Weight sum-to-1: `Σ s = 1.0` at `atol=1e-10` under `weights="exact"`
- [x] Three comparison types correctly classified: `treated_vs_never`, `earlier_vs_later`, `later_vs_earlier`
- [x] Eq. 7 hand-checked: `V̂_{kU}^D = n_{kU}(1-n_{kU}) · D̄_k(1-D̄_k)` (via weight-ratio test, `atol=1e-10`)
- [x] Eq. 8 hand-checked: `V̂_{kℓ}^{D,k} = n_{kℓ}(1-n_{kℓ}) · (D̄_k-D̄_ℓ)/(1-D̄_ℓ) · (1-D̄_k)/(1-D̄_ℓ)`
- [x] Eq. 9 hand-checked: `V̂_{kℓ}^{D,ℓ} = n_{kℓ}(1-n_{kℓ}) · D̄_ℓ/D̄_k · (D̄_k-D̄_ℓ)/D̄_k`
- [x] Eq. 10b 2x2 estimator value: hand-calculable panel → β̂_{kU}^{2x2} = ATT exactly
- [x] Always-treated remap to U (paper footnote 11): `first_treat <= min(time)` (excluding never-treated sentinels `0` and `np.inf`) units auto-remapped via internal column, user's data preserved, count exposed on result
- [x] `weights="exact"` is the default (PR-B 2026-05-16); `weights="approximate"` retained as opt-in
- [x] Unbalanced panel: accepted with `UserWarning` (paper assumes balanced; library extension)
- [x] No untreated group: `s_{kU}` terms drop, weights renormalize, sum-to-1 still holds
- [x] Single timing group with U: only `treated_vs_never` comparisons
- [x] Survey design composes cleanly with exact mode and warn+remap
- [ ] R `bacondecomp::bacon()` parity at `atol=1e-6` — R generator script committed; JSON goldens pending follow-up R install (see TODO.md)

**Outstanding for promotion:**
- **Substantive review pass — first target chosen during the 2026-05-15 methodology-review refresh session.** Read Goodman-Bacon (2021), audit `bacon.py` against the paper's decomposition (Equation 11, weight construction in Section 3, three comparison types in Section 4), generate R parity fixtures via `bacondecomp::bacon()`, write `tests/test_methodology_bacon.py` with paper-equation-numbered assertions, populate Verified Components / Corrections Made / Deviations here.
- Paper review under `docs/methodology/papers/goodman-bacon-2021-review.md`
- R parity fixture against `bacondecomp::bacon()` covering treated_vs_never, earlier_vs_later, later_vs_earlier weight buckets and their relative shares
- Verify the REGISTRY Implementation Checklist (all rows currently unchecked except the survey-design Phase 3 row)
**Test Coverage:**
- 24 methodology tests in `tests/test_methodology_bacon.py` across 6 classes (21 active + 3 R-parity tests that skip on missing goldens)
- 32 existing tests in `tests/test_bacon.py` (basic decomposition, weight properties, weights-parameter API, TWFE integration, visualization, balanced-panel warnings, edge cases)

**R Comparison Results:**
- **Pending**: `bacondecomp` R package not installed in the local R 4.5.2 library at PR-B authoring time. R generator script committed at `benchmarks/R/generate_bacon_golden.R`; running it requires `install.packages("bacondecomp")` + `install.packages("jsonlite")` then `cd benchmarks/R && Rscript generate_bacon_golden.R`. JSON goldens land at `benchmarks/data/r_bacondecomp_golden.json` once generated. `tests/test_methodology_bacon.py::TestBaconParityR` skips with a pointer until then. Tracked in TODO.md follow-up row.

**Corrections Made:**
1. **Theorem 1 exact-weights rewrite** (`bacon.py:_recompute_exact_weights`, lines ~740-880). The previous "exact" mode implementation did not actually compute Eqs. 7-9 / 10e-g — it was missing the `(1 - n_kU)` factor in the within-subsample treatment variance, did not square the sample share, and added an extraneous `unit_share` factor not present in the paper. The post-hoc sum-to-1 normalization masked the relative-weight error but produced a decomposition error of ~0.3% (0.007 absolute) against TWFE on a 3-cohort + never-treated DGP. **Rewrote** the function to compute the exact numerators of Eqs. 10e/f/g (with proper Eqs. 7-9 variances) and let the post-hoc normalization handle the `V̂^D` denominator (Theorem 1 identity guarantees `V̂^D = Σ numerators`). Now matches TWFE at `atol=1e-10`. The existing `test_weighted_sum_equals_twfe` tolerance was tightened from `< 0.1` to `< 1e-10` to lock the contract.
2. **Default `weights` flipped from `"approximate"` to `"exact"`** at three entry points: `BaconDecomposition.__init__()` (`bacon.py:397`), `bacon_decompose()` convenience function (`bacon.py:1064`), `TwoWayFixedEffects.decompose()` (`twfe.py:684`). The paper-faithful Theorem 1 weights are now the default; the simplified approximate path remains opt-in via explicit `weights="approximate"`. `diff_diff/diagnostic_report.py:1740` (production diagnostic surface) was updated to pass explicit `weights="exact"`.
3. **Always-treated warn+remap via internal column** (`bacon.py:fit()`, lines ~487-525). Paper footnote 11 puts units with `t_i < 1` in `U`, but `bacon.py` previously only mapped `first_treat ∈ {0, np.inf}` into U. Added detection using ordered-time logic on the **time axis** (`first_treat <= min(time)` while excluding the never-treated sentinels `0` and `np.inf`) with `UserWarning` and automatic remap via an internal column (`__bacon_first_treat_internal__`), preserving the user's `first_treat` column unchanged. Detection handles event-time-encoded panels (`time ∈ [-2,..,3]`) correctly; the `0` sentinel restriction applies only to `first_treat`. Count exposed via new `BaconDecompositionResults.n_always_treated_remapped` field.

**Deviations from R's `bacondecomp::bacon()`:**
1. **Unbalanced panel acceptance** (library extension): R errors on unbalanced panels; Python emits a `UserWarning` and decomposes. The paper's Appendix A proof assumes balanced panels — decomposition on unbalanced panels is approximate to Theorem 1.
2. **Approximate weight mode** (Python-only optimization): `weights="approximate"` is a library-only fast path with simplified variance computation, not present in R. Users who want Python-R numerical parity should pass `weights="exact"` (the new default).
3. **NaN for invalid inference fields not applicable**: the decomposition is deterministic; there are no SE/p-value fields on the comparison output. The `decomposition_error` field is a finite float (zero in well-conditioned cases).

---

Expand Down
2 changes: 1 addition & 1 deletion TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ Deferred items from PR reviews that were not addressed before merge.

| Issue | Location | PR | Priority |
|-------|----------|----|----------|
| BaconDecomposition: substantive methodology audit pass against Goodman-Bacon (2021). Paper review on file at `docs/methodology/papers/goodman-bacon-2021-review.md` includes a proposed Methodology Registry Entry, but the `REGISTRY.md` `## BaconDecomposition` section (lines ~2598-2654) still carries the older contract. Audit pass needs to (a) verify `bacon.py` matches Theorem 1 / Eqs. 10a-g exactly, (b) decide how to handle genuinely always-treated units (`0 < first_treat <= min(time)`) — paper convention puts them in `U`, but `bacon.py` currently treats only `first_treat ∈ {0, np.inf}` as the `U` bucket, (c) generate R parity fixtures via `bacondecomp::bacon()`, (d) write `tests/test_methodology_bacon.py`, (e) replace the REGISTRY entry with the proposed text, (f) populate Verified Components / Corrections Made / Deviations in `METHODOLOGY_REVIEW.md` and flip status to Complete. | `diff_diff/bacon.py`, `docs/methodology/REGISTRY.md`, `tests/test_methodology_bacon.py`, `METHODOLOGY_REVIEW.md` | follow-up | Medium |
| BaconDecomposition R parity goldens: `bacondecomp` R package not installed in the local R 4.5.2 library at PR-B authoring time (2026-05-16). R generator script committed at `benchmarks/R/generate_bacon_golden.R`; running it requires `install.packages("bacondecomp")` + `install.packages("jsonlite")` then `cd benchmarks/R && Rscript generate_bacon_golden.R`, writing `benchmarks/data/r_bacondecomp_golden.json`. `tests/test_methodology_bacon.py::TestBaconParityR` (3 tests) skips with a pointer until the JSON lands. The PR-B audit substantiates Theorem 1 (Eqs. 7-9 + 10e-g) via hand-calculable + machine-precision identity tests; R parity is desirable as a cross-language anchor but not the only substantiation. Mirrors StaggeredTripleDifference precedent (PR #245). | `benchmarks/R/generate_bacon_golden.R`, `benchmarks/data/r_bacondecomp_golden.json` (TBD), `tests/test_methodology_bacon.py::TestBaconParityR` | follow-up | Medium |
| dCDH: Phase 1 per-period placebo DID_M^pl has NaN SE (no IF derivation for the per-period aggregation path). Multi-horizon placebos (L_max >= 1) have valid SE. | `chaisemartin_dhaultfoeuille.py` | #294 | Low |
| dCDH: Survey cell-period allocator's post-period attribution is a library convention, not derived from the observation-level survey linearization. MC coverage is empirically close to nominal on the test DGP; a formal derivation (or a covariance-aware two-cell alternative) is deferred. Documented in REGISTRY.md survey IF expansion Note. | `chaisemartin_dhaultfoeuille.py`, `docs/methodology/REGISTRY.md` | #408 | Medium |
| dCDH: Parity test SE/CI assertions only cover pure-direction scenarios; mixed-direction SE comparison is structurally apples-to-oranges (cell-count vs obs-count weighting). | `test_chaisemartin_dhaultfoeuille_parity.py` | #294 | Low |
Expand Down
234 changes: 234 additions & 0 deletions benchmarks/R/generate_bacon_golden.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
#!/usr/bin/env Rscript
# Generate R bacondecomp parity goldens for diff-diff BaconDecomposition.
#
# Requires: install.packages("bacondecomp") (CRAN; main function is bacon())
# install.packages("jsonlite")
# Output: ../data/r_bacondecomp_golden.json
#
# The diff-diff BaconDecomposition implementation (`diff_diff/bacon.py`) with
# the default ``weights="exact"`` is expected to match the values in this JSON
# to atol=1e-6 on the per-component (treated, control, type) tuples, and to
# match the TWFE coefficient to the same tolerance. The ``weights="approximate"``
# path is a library-only optimization and is NOT covered by this parity harness.
#
# Three fixtures:
# 1. uniform_3groups_with_never_treated — 3 timing groups + never-treated U;
# exercises all three comparison types (treated/never, earlier/later,
# later/earlier).
# 2. two_groups_no_never_treated — 2 timing groups only; tests the
# timing-only decomposition where the s_{kU} terms drop.
# 3. always_treated_remapped — 3 timing groups + 1 always-treated cohort
# (first_treat = 1). Validates that Python's warn+remap of t_i < 1 into
# U matches R bacondecomp's native behavior.
#
# Run:
# cd benchmarks/R && Rscript generate_bacon_golden.R

suppressPackageStartupMessages({
library(bacondecomp)
library(jsonlite)
})

stopifnot(packageVersion("bacondecomp") >= "0.1.0")

# ---------------------------------------------------------------------------
# DGP helpers
# ---------------------------------------------------------------------------

# Build a balanced panel with absorbing treatment.
# n_units : units per timing group (excluding never-treated)
# n_periods : panel length (1..T)
# cohort_times : vector of first-treatment times, one per cohort
# always_treated_count : optional cohort treated at first_treat = 1
# (i.e., always-treated for the observable window)
# never_treated_count : units with first_treat = 0
# true_effect : constant ATT
# seed : reproducibility
build_panel <- function(n_units_per_cohort, n_periods, cohort_times,
always_treated_count = 0L, never_treated_count = 0L,
true_effect = 2.0, seed = 42L) {
set.seed(seed)
units <- list()
uid <- 1L

# Always-treated cohort (first_treat = 1; treated in every observable period)
if (always_treated_count > 0L) {
for (i in seq_len(always_treated_count)) {
units[[length(units) + 1L]] <- data.frame(
unit = uid, time = seq_len(n_periods), first_treat = 1L
)
uid <- uid + 1L
}
}

# Never-treated U
if (never_treated_count > 0L) {
for (i in seq_len(never_treated_count)) {
units[[length(units) + 1L]] <- data.frame(
unit = uid, time = seq_len(n_periods), first_treat = 0L
)
uid <- uid + 1L
}
}

# Treated cohorts
for (g in cohort_times) {
for (i in seq_len(n_units_per_cohort)) {
units[[length(units) + 1L]] <- data.frame(
unit = uid, time = seq_len(n_periods), first_treat = as.integer(g)
)
uid <- uid + 1L
}
}

df <- do.call(rbind, units)

# Treatment indicator: D_{it} = 1 iff first_treat in {1,..,T} AND time >= first_treat.
df$D <- as.integer(df$first_treat > 0L & df$time >= df$first_treat)

# Outcome: unit FE + linear time + constant treatment effect + noise.
unit_fe <- rnorm(uid - 1L, sd = 2.0)
df$y <- unit_fe[df$unit] +
0.1 * df$time +
true_effect * df$D +
rnorm(nrow(df), sd = 0.5)

df
}

# ---------------------------------------------------------------------------
# Extract bacondecomp::bacon() output into a fixture-shaped list.
# ---------------------------------------------------------------------------

extract_bacon <- function(df, fixture_name) {
# bacondecomp::bacon takes the OUTCOME ~ TREATMENT formula plus id_var/time_var.
# It returns a data.frame with columns: treated, untreated, estimate, weight,
# plus a `type` column (e.g. "Both Treated", "Treated vs Untreated"), and an
# attribute beta_hat_w (the weighted sum, which equals the TWFE coefficient).
res <- bacondecomp::bacon(
formula = y ~ D,
data = df,
id_var = "unit",
time_var = "time"
)

# When the data contains a never-treated group, bacon() returns a list with
# $two_by_twos (the per-component table) and $Omega (the variance-weighted
# contributions). Without never-treated, it returns the data.frame directly.
if (is.list(res) && !is.data.frame(res)) {
components_df <- res$two_by_twos
twfe_coef <- as.numeric(attr(res, "beta_hat_w"))
# Fallback: re-derive TWFE from the components if attr is missing.
if (is.null(twfe_coef) || length(twfe_coef) == 0L) {
twfe_coef <- sum(components_df$estimate * components_df$weight)
}
} else {
components_df <- res
twfe_coef <- sum(components_df$estimate * components_df$weight)
}

# Components vary across bacondecomp versions; normalize the column names.
cols <- names(components_df)
treated_col <- if ("treated" %in% cols) "treated" else "g1"
untreated_col <- if ("untreated" %in% cols) "untreated" else "g2"
estimate_col <- if ("estimate" %in% cols) "estimate" else "Estimate"
weight_col <- if ("weight" %in% cols) "weight" else "Weight"
type_col <- if ("type" %in% cols) "type" else NA_character_

components <- lapply(seq_len(nrow(components_df)), function(i) {
list(
treated_group = as.numeric(components_df[[treated_col]][i]),
control_group = as.numeric(components_df[[untreated_col]][i]),
estimate = as.numeric(components_df[[estimate_col]][i]),
weight = as.numeric(components_df[[weight_col]][i]),
type = if (!is.na(type_col))
as.character(components_df[[type_col]][i])
else NA_character_
)
})

weights_sum <- sum(sapply(components, function(c) c$weight))

list(
panel = list(
unit = as.integer(df$unit),
time = as.integer(df$time),
y = as.numeric(df$y),
first_treat = as.integer(df$first_treat),
treated = as.integer(df$D)
),
r_twfe_coef = twfe_coef,
r_components = components,
r_weights_sum = weights_sum,
n_components = length(components)
)
}

# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------

cat("Building fixture 1: uniform_3groups_with_never_treated...\n")
df1 <- build_panel(
n_units_per_cohort = 30L,
n_periods = 6L,
cohort_times = c(3L, 4L, 5L),
always_treated_count = 0L,
never_treated_count = 30L,
true_effect = 2.0,
seed = 101L
)
fixture_1 <- extract_bacon(df1, "uniform_3groups_with_never_treated")

cat("Building fixture 2: two_groups_no_never_treated...\n")
df2 <- build_panel(
n_units_per_cohort = 30L,
n_periods = 6L,
cohort_times = c(3L, 5L),
always_treated_count = 0L,
never_treated_count = 0L,
true_effect = 2.0,
seed = 202L
)
fixture_2 <- extract_bacon(df2, "two_groups_no_never_treated")

cat("Building fixture 3: always_treated_remapped...\n")
# 3 timing-cohorts + 5 always-treated units (first_treat = 1, i.e., treated
# in every observable period) + 30 never-treated. R's bacondecomp natively
# groups the first_treat=1 cohort with U (since they are treated throughout
# every observable period and never serve as a within-window control), which
# matches what diff-diff's warn+remap does in Python.
df3 <- build_panel(
n_units_per_cohort = 25L,
n_periods = 6L,
cohort_times = c(3L, 4L, 5L),
always_treated_count = 5L,
never_treated_count = 25L,
true_effect = 2.0,
seed = 303L
)
fixture_3 <- extract_bacon(df3, "always_treated_remapped")

# ---------------------------------------------------------------------------
# Write JSON
# ---------------------------------------------------------------------------

out <- list(
meta = list(
generated_at = format(Sys.Date()),
bacondecomp_version = as.character(packageVersion("bacondecomp")),
r_version = R.version.string,
description = paste(
"Goodman-Bacon (2021) decomposition parity goldens for diff-diff",
"BaconDecomposition. Parity target: atol=1e-6 on per-component",
"(treated, control, type) tuples plus the TWFE coefficient."
)
),
uniform_3groups_with_never_treated = fixture_1,
two_groups_no_never_treated = fixture_2,
always_treated_remapped = fixture_3
)

out_path <- "../data/r_bacondecomp_golden.json"
write_json(out, out_path, pretty = TRUE, digits = NA, auto_unbox = TRUE)
cat(sprintf("Wrote %s\n", out_path))
Loading
Loading