igerber · igerber · May 21, 2026 · May 21, 2026 · May 21, 2026 · May 21, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
diff --git a/TODO.md b/TODO.md
@@ -99,8 +99,9 @@ Deferred items from PR reviews that were not addressed before merge.
 | PreTrendsPower: CS/SA `anticipation=1` R-parity fixture. The PR-C R-parity goldens cover NIS power + γ_p MDV at `atol=1e-4` on four shifted-grid / regular / irregular / K=1 fixtures, but R `pretrends` has no anticipation parameter so the Python-side `_extract_pre_period_params` anticipation filter (`if t < _pre_cutoff` in `pretrends.py` lines 1138-1150 for CS; mirror in SA branch) is not R-parity-locked. Build a synthetic `CallawaySantAnnaResults` (or `SunAbrahamResults`) with `anticipation=1` and a t=-1 event-study entry that should be filtered before reaching `_compute_power_nis`, then assert the resulting γ_p matches R's `slope_for_power()` on the K=4 shifted-grid fixture. Existing PR-B MC-based tests (`TestPretrendsPropositions`) and full-VCV tests (`TestPretrendsCovarianceSource`) already cover the filter mechanically; this would close the loop against R. | `tests/test_methodology_pretrends.py::TestPretrendsParityR`, `benchmarks/R/generate_pretrends_golden.R` | PR-C follow-up | Low |
 
 
-| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 7 standalone estimators that expose `cluster=` but not yet `vcov_type=`: `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham` (this row tracks the remaining 7). | multiple | Phase 1b | Medium |
+| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the standalone estimators that expose `cluster=` but not yet `vcov_type=`: `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham`; Phase 1b PR 2/8 added `StackedDiD` (this row tracks the remaining 6). | multiple | Phase 1b | Medium |
 | Extend `SunAbraham` with `vcov_type="conley"` (Conley spatial-HAC) as a first-class feature: thread `conley_coords` / `conley_cutoff_km` / `conley_metric` / `conley_kernel` / `conley_time` / `conley_unit` / `conley_lag_cutoff` through `_fit_saturated_regression`. Phase 1b PR 1/8 deferred this; SA currently rejects `vcov_type="conley"` at `__init__` with a deferral message. | `diff_diff/sun_abraham.py` | follow-up | Medium |
+| Extend `StackedDiD` with `vcov_type="conley"` (Conley spatial-HAC) — thread the six `conley_*` params through `solve_ols` at `stacked_did.py:419` (and the `_refit_stacked` closure at `:444`). Phase 1b PR 2/8 deferred this; StackedDiD currently rejects `vcov_type="conley"` at `__init__` with a deferral message. Same shape as the SunAbraham conley follow-up. | `diff_diff/stacked_did.py` | follow-up | Medium |
 | Harmonize SunAbraham's HC1 within-transform finite-sample correction with `fixest::sunab()`. SA's `solve_ols` applies `n / (n - k_dm)` (within-transform columns only); fixest applies `n / (n - k_total)` (counts absorbed FE). SE values differ by ~1-2% on typical panel sizes (documented in REGISTRY.md "Deviation from R"; pinned at `atol=5e-3` in `tests/test_methodology_sun_abraham.py`). Either thread `df_adjustment` into the vcov scaling or document as an intentional difference. | `diff_diff/sun_abraham.py`, `diff_diff/linalg.py::compute_robust_vcov` | follow-up | Low |
 <!-- Rows 104-105 LIFTED 2026-05-20 via the clubSandwich WLS-CR2 port. The diff-diff
      form matches clubSandwich's specific algebra (W not sqrt(W), W^2 in bias term,

diff --git a/benchmarks/R/generate_stacked_did_golden.R b/benchmarks/R/generate_stacked_did_golden.R
@@ -0,0 +1,231 @@
+# Generate StackedDiD CR1 + CR2-BM golden values via R clubSandwich.
+#
+# This script is the parity source for Phase 1b 2/8 (StackedDiD vcov_type
+# threading). It loads the pre-stacked panel produced by the Python helper
+# at `benchmarks/data/stacked_did_test_panel.csv`, fits a single WLS regression
+# with weights=Q (matching the StackedDiD post-PR pattern in `stacked_did.py`
+# which calls `solve_ols(weights=composed_weights, vcov_type=...)`), and
+# computes both CR1 and CR2 Bell-McCaffrey goldens against which Python's
+# output is pinned at atol=1e-10.
+#
+# The pre-stacked CSV is committed (not regenerated by this script) so the
+# R-side regression is independent of any Python-side stacking-logic changes.
+# To regenerate the CSV, run:
+#     python3 benchmarks/python_helpers/dump_stacked_did_panel.py
+# (which uses StackedDiD.fit to extract the internal stacked dataset).
+#
+# Usage:
+#   Rscript benchmarks/R/generate_stacked_did_golden.R
+#
+# Requirements:
+#   clubSandwich (CRAN) >= 0.7.0
+#   jsonlite
+#
+# Output:
+#   benchmarks/data/stacked_did_golden.json
+
+suppressPackageStartupMessages({
+  library(clubSandwich)
+  library(jsonlite)
+})
+
+stopifnot(packageVersion("clubSandwich") >= "0.7.0")
+
+# --- Variant configurations -------------------------------------------------
+# Each variant has its own pre-stacked CSV + reference period + non_ref vector.
+# Default aggregate (kappa_pre=2, kappa_post=2, anticipation=0, weighting='aggregate')
+# is the primary case; non-default variants pin the methodology-relevant
+# parameter interactions (per local codex R8 P1).
+variants <- list(
+  default = list(
+    csv = "benchmarks/data/stacked_did_test_panel.csv",
+    ref_period = -1L,
+    out_key = NULL  # written to top-level keys: unit, unit_subexp
+  ),
+  population = list(
+    csv = "benchmarks/data/stacked_did_population_panel.csv",
+    ref_period = -1L,
+    out_key = "population_unit"  # heterogeneous-population Q-weights
+  ),
+  anticipation1 = list(
+    csv = "benchmarks/data/stacked_did_anticipation1_panel.csv",
+    ref_period = -2L,  # shifted: e = -1 - anticipation
+    out_key = "anticipation1_unit"
+  ),
+  sample_share = list(
+    csv = "benchmarks/data/stacked_did_sample_share_panel.csv",
+    ref_period = -1L,
+    out_key = "sample_share_unit"  # third Q-weight formula
+  )
+)
+
+
+process_variant <- function(csv_path, ref_period) {
+  stopifnot(file.exists(csv_path))
+  df <- read.csv(csv_path)
+
+# Sanity check expected columns
+required_cols <- c("unit", "period", "outcome", "_sub_exp", "_event_time",
+                   "_D_sa", "_Q_weight", "unit_subexp")
+# read.csv mangles underscore-prefix columns to X_sub_exp etc; normalize:
+colnames(df) <- gsub("^X_", "_", colnames(df))
+for (col in required_cols) {
+  stopifnot(col %in% colnames(df))
+}
+
+  # --- Build the design matrix ----------------------------------------------
+  # StackedDiD's regression form (Equation 3 in Wing-Freedman-Hollingsworth 2024):
+  #   y ~ intercept + D_sa + sum_h lambda_h * I(e=h) + sum_h delta_h * D_sa * I(e=h)
+  # with the reference period dropped.
+  event_times <- sort(unique(df$`_event_time`))
+  non_ref <- event_times[event_times != ref_period]
+# Rename underscore-prefix columns to R-friendly names for the formula.
+# (lm formulas can't reference backtick-quoted names with leading underscore
+# reliably across R versions.)
+df$D_sa <- df$`_D_sa`
+df$Q_weight <- df$`_Q_weight`
+df$event_time <- df$`_event_time`
+
+# Build dummies + interactions with R-friendly names. Use _neg suffix for
+# negative event times so column names are valid identifiers.
+ev_name <- function(h) {
+  if (h < 0) paste0("_neg", abs(h)) else paste0("_", h)
+}
+  for (h in non_ref) {
+    lname <- paste0("lambda", ev_name(h))
+    dname <- paste0("delta", ev_name(h))
+    df[[lname]] <- as.integer(df$event_time == h)
+    df[[dname]] <- df[[lname]] * df$D_sa
+  }
+
+  # Construct formula explicitly
+  lambda_cols <- paste0("lambda", sapply(non_ref, ev_name))
+  delta_cols <- paste0("delta", sapply(non_ref, ev_name))
+  lambda_terms <- paste(lambda_cols, collapse = " + ")
+  delta_terms <- paste(delta_cols, collapse = " + ")
+  form_str <- paste("outcome ~", "D_sa", "+", lambda_terms, "+", delta_terms)
+  form <- as.formula(form_str)
+
+  # Fit WLS with weights=Q
+  fit <- lm(form, data = df, weights = Q_weight)
+
+  coef_names <- names(coef(fit))
+  es_coef_names <- delta_cols
+
+  # Anticipation: under non-default anticipation>0, the "post-period" for
+  # the overall ATT contrast is `h >= -anticipation`, which depends on the
+  # reference period offset. anticipation = -1 - ref_period (since
+  # ref_period = -1 - anticipation in Python).
+  anticipation <- -1L - ref_period
+  post_threshold <- -anticipation  # h >= -anticipation enters overall ATT
+
+  clusters_to_test <- list(unit = df$unit, unit_subexp = df$unit_subexp)
+  variant_result <- list(
+    meta = list(
+      event_times_non_ref = non_ref,
+      es_coef_names = es_coef_names,
+      ref_period = ref_period,
+      anticipation = anticipation
+    )
+  )
+
+  for (cl_name in names(clusters_to_test)) {
+  cl <- clusters_to_test[[cl_name]]
+
+  # Use CR1S (Stata-style: G/(G-1) * (n-1)/(n-p) finite-sample correction)
+  # to match diff-diff's HC1 + cluster output. clubSandwich's plain "CR1"
+  # uses only the G/(G-1) factor and would diverge by ~(n-1)/(n-p) (~1.4%
+  # on this 325-row, 10-col stacked design).
+  vcov_cr1 <- vcovCR(fit, cluster = cl, type = "CR1S")
+  se_cr1_all <- sqrt(diag(vcov_cr1))
+  se_cr1_es <- se_cr1_all[es_coef_names]
+
+  vcov_cr2 <- vcovCR(fit, cluster = cl, type = "CR2")
+  ct_cr2 <- coef_test(fit, vcov = vcov_cr2)
+  se_cr2_all <- sqrt(diag(vcov_cr2))
+  # coef_test() returns rows in the order of names(coef(fit)); index by name.
+  rownames(ct_cr2) <- coef_names
+  se_cr2_es <- se_cr2_all[es_coef_names]
+  dof_bm_es <- ct_cr2[es_coef_names, "df_Satt"]
+
+    # Overall ATT contrast: post-period delta average. Under anticipation>0
+    # the post-period starts at h=-anticipation, NOT h=0 (mirrors Python's
+    # `_post_event_times_preview` filter at stacked_did.py:546-548).
+    post_delta_names <- delta_cols[non_ref >= post_threshold]
+    K_post <- length(post_delta_names)
+    c_overall <- matrix(0, nrow = 1, ncol = length(coef_names))
+    colnames(c_overall) <- coef_names
+    for (nm in post_delta_names) {
+      c_overall[1, nm] <- 1.0 / K_post
+    }
+    wt_overall <- Wald_test(fit, constraints = c_overall, vcov = vcov_cr2, test = "HTZ")
+    dof_bm_overall <- as.numeric(wt_overall$df_denom)
+    overall_var_cr2 <- as.numeric(c_overall %*% vcov_cr2 %*% t(c_overall))
+    se_overall_cr2 <- sqrt(overall_var_cr2)
+    overall_var_cr1 <- as.numeric(c_overall %*% vcov_cr1 %*% t(c_overall))
+    se_overall_cr1 <- sqrt(overall_var_cr1)
+
+    variant_result[[cl_name]] <- list(
+      se_cr1_es = as.numeric(se_cr1_es),
+      se_cr2_es = as.numeric(se_cr2_es),
+      dof_bm_es = as.numeric(dof_bm_es),
+      dof_bm_overall = dof_bm_overall,
+      se_overall_cr1 = se_overall_cr1,
+      se_overall_cr2 = se_overall_cr2,
+      es_coef_names = es_coef_names,
+      se_cr1_intercept = as.numeric(se_cr1_all["(Intercept)"]),
+      se_cr2_intercept = as.numeric(se_cr2_all["(Intercept)"]),
+      coef_es = as.numeric(coef(fit)[es_coef_names])
+    )
+  }
+  return(variant_result)
+}
+
+
+# --- Drive all variants -----------------------------------------------------
+output <- list(
+  meta = list(
+    clubSandwich_version = as.character(packageVersion("clubSandwich")),
+    R_version = R.version.string,
+    panel_descriptor = list(
+      n_units = 50L, n_periods = 8L,
+      cohort_periods = c(3L, 5L, 7L),
+      never_treated_frac = 0.3, treatment_effect = 2.0,
+      dynamic_effects = FALSE, seed = 20260521L
+    ),
+    estimator_descriptor = list(kappa_pre = 2L, kappa_post = 2L)
+  )
+)
+
+for (variant_name in names(variants)) {
+  v <- variants[[variant_name]]
+  cat("Processing variant:", variant_name, "(csv=", v$csv, ", ref=", v$ref_period, ")\n")
+  result <- process_variant(v$csv, v$ref_period)
+  if (variant_name == "default") {
+    # Default variant: write per-cluster keys at top level (back-compat)
+    output$unit <- result$unit
+    output$unit_subexp <- result$unit_subexp
+    output$meta$event_times_non_ref <- result$meta$event_times_non_ref
+    output$meta$es_coef_names <- result$meta$es_coef_names
+  } else {
+    # Non-default variants: write under variant-specific key
+    output[[variant_name]] <- result
+  }
+}
+
+# --- Write JSON -------------------------------------------------------------
+out_path <- "benchmarks/data/stacked_did_golden.json"
+write_json(output, out_path, digits = 15, auto_unbox = TRUE, pretty = TRUE)
+cat("Wrote", out_path, "\n")
+cat("Default CR1 unit-cluster SE for delta_0:",
+    output$unit$se_cr1_es[which(output$meta$event_times_non_ref == 0)], "\n")
+cat("Default CR2-BM unit-cluster SE for delta_0:",
+    output$unit$se_cr2_es[which(output$meta$event_times_non_ref == 0)], "\n")
+if (!is.null(output$population)) {
+  cat("Population CR2-BM unit SE for delta_0:",
+      output$population$unit$se_cr2_es[which(output$population$unit$es_coef_names == "delta_0")], "\n")
+}
+if (!is.null(output$anticipation1)) {
+  cat("Anticipation1 CR2-BM unit overall_se:",
+      output$anticipation1$unit$se_overall_cr2, "\n")
+}