Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,9 @@ Deferred items from PR reviews that were not addressed before merge.
| PreTrendsPower: CS/SA `anticipation=1` R-parity fixture. The PR-C R-parity goldens cover NIS power + γ_p MDV at `atol=1e-4` on four shifted-grid / regular / irregular / K=1 fixtures, but R `pretrends` has no anticipation parameter so the Python-side `_extract_pre_period_params` anticipation filter (`if t < _pre_cutoff` in `pretrends.py` lines 1138-1150 for CS; mirror in SA branch) is not R-parity-locked. Build a synthetic `CallawaySantAnnaResults` (or `SunAbrahamResults`) with `anticipation=1` and a t=-1 event-study entry that should be filtered before reaching `_compute_power_nis`, then assert the resulting γ_p matches R's `slope_for_power()` on the K=4 shifted-grid fixture. Existing PR-B MC-based tests (`TestPretrendsPropositions`) and full-VCV tests (`TestPretrendsCovarianceSource`) already cover the filter mechanically; this would close the loop against R. | `tests/test_methodology_pretrends.py::TestPretrendsParityR`, `benchmarks/R/generate_pretrends_golden.R` | PR-C follow-up | Low |


| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 7 standalone estimators that expose `cluster=` but not yet `vcov_type=`: `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham` (this row tracks the remaining 7). | multiple | Phase 1b | Medium |
| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the standalone estimators that expose `cluster=` but not yet `vcov_type=`: `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham`; Phase 1b PR 2/8 added `StackedDiD` (this row tracks the remaining 6). | multiple | Phase 1b | Medium |
| Extend `SunAbraham` with `vcov_type="conley"` (Conley spatial-HAC) as a first-class feature: thread `conley_coords` / `conley_cutoff_km` / `conley_metric` / `conley_kernel` / `conley_time` / `conley_unit` / `conley_lag_cutoff` through `_fit_saturated_regression`. Phase 1b PR 1/8 deferred this; SA currently rejects `vcov_type="conley"` at `__init__` with a deferral message. | `diff_diff/sun_abraham.py` | follow-up | Medium |
| Extend `StackedDiD` with `vcov_type="conley"` (Conley spatial-HAC) — thread the six `conley_*` params through `solve_ols` at `stacked_did.py:419` (and the `_refit_stacked` closure at `:444`). Phase 1b PR 2/8 deferred this; StackedDiD currently rejects `vcov_type="conley"` at `__init__` with a deferral message. Same shape as the SunAbraham conley follow-up. | `diff_diff/stacked_did.py` | follow-up | Medium |
| Harmonize SunAbraham's HC1 within-transform finite-sample correction with `fixest::sunab()`. SA's `solve_ols` applies `n / (n - k_dm)` (within-transform columns only); fixest applies `n / (n - k_total)` (counts absorbed FE). SE values differ by ~1-2% on typical panel sizes (documented in REGISTRY.md "Deviation from R"; pinned at `atol=5e-3` in `tests/test_methodology_sun_abraham.py`). Either thread `df_adjustment` into the vcov scaling or document as an intentional difference. | `diff_diff/sun_abraham.py`, `diff_diff/linalg.py::compute_robust_vcov` | follow-up | Low |
<!-- Rows 104-105 LIFTED 2026-05-20 via the clubSandwich WLS-CR2 port. The diff-diff
form matches clubSandwich's specific algebra (W not sqrt(W), W^2 in bias term,
Expand Down
231 changes: 231 additions & 0 deletions benchmarks/R/generate_stacked_did_golden.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
# Generate StackedDiD CR1 + CR2-BM golden values via R clubSandwich.
#
# This script is the parity source for Phase 1b 2/8 (StackedDiD vcov_type
# threading). It loads the pre-stacked panel produced by the Python helper
# at `benchmarks/data/stacked_did_test_panel.csv`, fits a single WLS regression
# with weights=Q (matching the StackedDiD post-PR pattern in `stacked_did.py`
# which calls `solve_ols(weights=composed_weights, vcov_type=...)`), and
# computes both CR1 and CR2 Bell-McCaffrey goldens against which Python's
# output is pinned at atol=1e-10.
#
# The pre-stacked CSV is committed (not regenerated by this script) so the
# R-side regression is independent of any Python-side stacking-logic changes.
# To regenerate the CSV, run:
# python3 benchmarks/python_helpers/dump_stacked_did_panel.py
# (which uses StackedDiD.fit to extract the internal stacked dataset).
#
# Usage:
# Rscript benchmarks/R/generate_stacked_did_golden.R
#
# Requirements:
# clubSandwich (CRAN) >= 0.7.0
# jsonlite
#
# Output:
# benchmarks/data/stacked_did_golden.json

suppressPackageStartupMessages({
library(clubSandwich)
library(jsonlite)
})

stopifnot(packageVersion("clubSandwich") >= "0.7.0")

# --- Variant configurations -------------------------------------------------
# Each variant has its own pre-stacked CSV + reference period + non_ref vector.
# Default aggregate (kappa_pre=2, kappa_post=2, anticipation=0, weighting='aggregate')
# is the primary case; non-default variants pin the methodology-relevant
# parameter interactions (per local codex R8 P1).
variants <- list(
default = list(
csv = "benchmarks/data/stacked_did_test_panel.csv",
ref_period = -1L,
out_key = NULL # written to top-level keys: unit, unit_subexp
),
population = list(
csv = "benchmarks/data/stacked_did_population_panel.csv",
ref_period = -1L,
out_key = "population_unit" # heterogeneous-population Q-weights
),
anticipation1 = list(
csv = "benchmarks/data/stacked_did_anticipation1_panel.csv",
ref_period = -2L, # shifted: e = -1 - anticipation
out_key = "anticipation1_unit"
),
sample_share = list(
csv = "benchmarks/data/stacked_did_sample_share_panel.csv",
ref_period = -1L,
out_key = "sample_share_unit" # third Q-weight formula
)
)


process_variant <- function(csv_path, ref_period) {
stopifnot(file.exists(csv_path))
df <- read.csv(csv_path)

# Sanity check expected columns
required_cols <- c("unit", "period", "outcome", "_sub_exp", "_event_time",
"_D_sa", "_Q_weight", "unit_subexp")
# read.csv mangles underscore-prefix columns to X_sub_exp etc; normalize:
colnames(df) <- gsub("^X_", "_", colnames(df))
for (col in required_cols) {
stopifnot(col %in% colnames(df))
}

# --- Build the design matrix ----------------------------------------------
# StackedDiD's regression form (Equation 3 in Wing-Freedman-Hollingsworth 2024):
# y ~ intercept + D_sa + sum_h lambda_h * I(e=h) + sum_h delta_h * D_sa * I(e=h)
# with the reference period dropped.
event_times <- sort(unique(df$`_event_time`))
non_ref <- event_times[event_times != ref_period]
# Rename underscore-prefix columns to R-friendly names for the formula.
# (lm formulas can't reference backtick-quoted names with leading underscore
# reliably across R versions.)
df$D_sa <- df$`_D_sa`
df$Q_weight <- df$`_Q_weight`
df$event_time <- df$`_event_time`

# Build dummies + interactions with R-friendly names. Use _neg suffix for
# negative event times so column names are valid identifiers.
ev_name <- function(h) {
if (h < 0) paste0("_neg", abs(h)) else paste0("_", h)
}
for (h in non_ref) {
lname <- paste0("lambda", ev_name(h))
dname <- paste0("delta", ev_name(h))
df[[lname]] <- as.integer(df$event_time == h)
df[[dname]] <- df[[lname]] * df$D_sa
}

# Construct formula explicitly
lambda_cols <- paste0("lambda", sapply(non_ref, ev_name))
delta_cols <- paste0("delta", sapply(non_ref, ev_name))
lambda_terms <- paste(lambda_cols, collapse = " + ")
delta_terms <- paste(delta_cols, collapse = " + ")
form_str <- paste("outcome ~", "D_sa", "+", lambda_terms, "+", delta_terms)
form <- as.formula(form_str)

# Fit WLS with weights=Q
fit <- lm(form, data = df, weights = Q_weight)

coef_names <- names(coef(fit))
es_coef_names <- delta_cols

# Anticipation: under non-default anticipation>0, the "post-period" for
# the overall ATT contrast is `h >= -anticipation`, which depends on the
# reference period offset. anticipation = -1 - ref_period (since
# ref_period = -1 - anticipation in Python).
anticipation <- -1L - ref_period
post_threshold <- -anticipation # h >= -anticipation enters overall ATT

clusters_to_test <- list(unit = df$unit, unit_subexp = df$unit_subexp)
variant_result <- list(
meta = list(
event_times_non_ref = non_ref,
es_coef_names = es_coef_names,
ref_period = ref_period,
anticipation = anticipation
)
)

for (cl_name in names(clusters_to_test)) {
cl <- clusters_to_test[[cl_name]]

# Use CR1S (Stata-style: G/(G-1) * (n-1)/(n-p) finite-sample correction)
# to match diff-diff's HC1 + cluster output. clubSandwich's plain "CR1"
# uses only the G/(G-1) factor and would diverge by ~(n-1)/(n-p) (~1.4%
# on this 325-row, 10-col stacked design).
vcov_cr1 <- vcovCR(fit, cluster = cl, type = "CR1S")
se_cr1_all <- sqrt(diag(vcov_cr1))
se_cr1_es <- se_cr1_all[es_coef_names]

vcov_cr2 <- vcovCR(fit, cluster = cl, type = "CR2")
ct_cr2 <- coef_test(fit, vcov = vcov_cr2)
se_cr2_all <- sqrt(diag(vcov_cr2))
# coef_test() returns rows in the order of names(coef(fit)); index by name.
rownames(ct_cr2) <- coef_names
se_cr2_es <- se_cr2_all[es_coef_names]
dof_bm_es <- ct_cr2[es_coef_names, "df_Satt"]

# Overall ATT contrast: post-period delta average. Under anticipation>0
# the post-period starts at h=-anticipation, NOT h=0 (mirrors Python's
# `_post_event_times_preview` filter at stacked_did.py:546-548).
post_delta_names <- delta_cols[non_ref >= post_threshold]
K_post <- length(post_delta_names)
c_overall <- matrix(0, nrow = 1, ncol = length(coef_names))
colnames(c_overall) <- coef_names
for (nm in post_delta_names) {
c_overall[1, nm] <- 1.0 / K_post
}
wt_overall <- Wald_test(fit, constraints = c_overall, vcov = vcov_cr2, test = "HTZ")
dof_bm_overall <- as.numeric(wt_overall$df_denom)
overall_var_cr2 <- as.numeric(c_overall %*% vcov_cr2 %*% t(c_overall))
se_overall_cr2 <- sqrt(overall_var_cr2)
overall_var_cr1 <- as.numeric(c_overall %*% vcov_cr1 %*% t(c_overall))
se_overall_cr1 <- sqrt(overall_var_cr1)

variant_result[[cl_name]] <- list(
se_cr1_es = as.numeric(se_cr1_es),
se_cr2_es = as.numeric(se_cr2_es),
dof_bm_es = as.numeric(dof_bm_es),
dof_bm_overall = dof_bm_overall,
se_overall_cr1 = se_overall_cr1,
se_overall_cr2 = se_overall_cr2,
es_coef_names = es_coef_names,
se_cr1_intercept = as.numeric(se_cr1_all["(Intercept)"]),
se_cr2_intercept = as.numeric(se_cr2_all["(Intercept)"]),
coef_es = as.numeric(coef(fit)[es_coef_names])
)
}
return(variant_result)
}


# --- Drive all variants -----------------------------------------------------
output <- list(
meta = list(
clubSandwich_version = as.character(packageVersion("clubSandwich")),
R_version = R.version.string,
panel_descriptor = list(
n_units = 50L, n_periods = 8L,
cohort_periods = c(3L, 5L, 7L),
never_treated_frac = 0.3, treatment_effect = 2.0,
dynamic_effects = FALSE, seed = 20260521L
),
estimator_descriptor = list(kappa_pre = 2L, kappa_post = 2L)
)
)

for (variant_name in names(variants)) {
v <- variants[[variant_name]]
cat("Processing variant:", variant_name, "(csv=", v$csv, ", ref=", v$ref_period, ")\n")
result <- process_variant(v$csv, v$ref_period)
if (variant_name == "default") {
# Default variant: write per-cluster keys at top level (back-compat)
output$unit <- result$unit
output$unit_subexp <- result$unit_subexp
output$meta$event_times_non_ref <- result$meta$event_times_non_ref
output$meta$es_coef_names <- result$meta$es_coef_names
} else {
# Non-default variants: write under variant-specific key
output[[variant_name]] <- result
}
}

# --- Write JSON -------------------------------------------------------------
out_path <- "benchmarks/data/stacked_did_golden.json"
write_json(output, out_path, digits = 15, auto_unbox = TRUE, pretty = TRUE)
cat("Wrote", out_path, "\n")
cat("Default CR1 unit-cluster SE for delta_0:",
output$unit$se_cr1_es[which(output$meta$event_times_non_ref == 0)], "\n")
cat("Default CR2-BM unit-cluster SE for delta_0:",
output$unit$se_cr2_es[which(output$meta$event_times_non_ref == 0)], "\n")
if (!is.null(output$population)) {
cat("Population CR2-BM unit SE for delta_0:",
output$population$unit$se_cr2_es[which(output$population$unit$es_coef_names == "delta_0")], "\n")
}
if (!is.null(output$anticipation1)) {
cat("Anticipation1 CR2-BM unit overall_se:",
output$anticipation1$unit$se_overall_cr2, "\n")
}
Loading
Loading