diff --git a/TODO.md b/TODO.md index 42ddd3a5..70eef04e 100644 --- a/TODO.md +++ b/TODO.md @@ -101,7 +101,6 @@ Deferred items from PR reviews that were not addressed before merge. | Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster) currently raises `NotImplementedError`. `_compute_bm_dof_from_contrasts` builds its hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but `solve_ols` solves the WLS problem by transforming to `X* = sqrt(w) X`, so the correct symmetric idempotent residual-maker is `M* = I - sqrt(W) X (X'WX)^{-1} X' sqrt(W)`. Rederive the Satterthwaite `(tr G)^2 / tr(G^2)` ratio on the transformed design and add weighted parity tests before lifting the guard. | `linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_validate_vcov_args` | Phase 1a | Medium | | HC2 / HC2 + Bell-McCaffrey on absorbed-FE fits currently raises `NotImplementedError` in three places: `TwoWayFixedEffects` unconditionally; `DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"})`; `MultiPeriodDiD(absorb=..., vcov_type in {"hc2","hc2_bm"})`. Within-transformation preserves coefficients and residuals under FWL but not the hat matrix, so the reduced-design `h_ii` is not the diagonal of the full FE projection and CR2's block adjustment `A_g = (I - H_gg)^{-1/2}` is likewise wrong on absorbed cluster blocks. Lifting the guard needs HC2/CR2-BM computed from the full absorbed projection (unit/time FE dummies reconstructed internally, or a FE-aware hat-matrix formulation) and a parity harness against a full-dummy OLS run or R `fixest`/`clubSandwich`. HC1/CR1 are unaffected by this because they have no leverage term. | `twfe.py::fit`, `estimators.py::DifferenceInDifferences.fit`, `estimators.py::MultiPeriodDiD.fit` | Phase 1a | Medium | | Weighted CR2 Bell-McCaffrey cluster-robust (`vcov_type="hc2_bm"` + `cluster_ids` + `weights`) currently raises `NotImplementedError`. Weighted hat matrix and residual rebalancing need threading per clubSandwich WLS handling. | `linalg.py::_compute_cr2_bm` | Phase 1a | Medium | -| Regenerate `benchmarks/data/clubsandwich_cr2_golden.json` from R (`Rscript benchmarks/R/generate_clubsandwich_golden.R`). Current JSON has `source: python_self_reference` as a stability anchor until an authoritative R run. | `benchmarks/R/generate_clubsandwich_golden.R` | Phase 1a | Medium | | `honest_did.py:1907` `np.linalg.solve(A_sys, b_sys) / except LinAlgError: continue` is a silent basis-rejection in the vertex-enumeration loop that is algorithmically intentional (try the next basis). Consider surfacing a count of rejected bases as a diagnostic when ARP enumeration exhausts, so users see when the vertex search was heavily constrained. Not a silent failure in the sense of the Phase 2 audit (the algorithm is supposed to skip), but the diagnostic would help debug borderline cases. | `honest_did.py` | #334 | Low | | Unify Rust local-method `estimate_model` solver path to `solve_wls_svd` (the same SVD helper used by the global-method since PR #348) for sub-1e-14 bootstrap SE parity. Current local-method bootstrap parity test (`tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility_local`) passes at `atol=1e-5` — the residual ~1e-7 gap is roundoff between Rust's `estimate_model` matrix factorization and numpy's `lstsq`, which accumulates differently across per-replicate bootstrap fits. Main-fit ATT parity is regime-dependent (`atol=1e-14` for `lambda_nn=inf`, `atol=1e-10` for finite `lambda_nn` — see `test_local_method_main_fit_parity`); the bootstrap gap is a same-solver-path roundoff concern and not a user-visible correctness bug. | `rust/src/trop.rs::estimate_model`, `rust/src/linalg.rs::solve_wls_svd` | follow-up | Low | | Rust multiplier-bootstrap weight RNG (`generate_bootstrap_weights_batch` in `rust/src/bootstrap.rs:9-10, 57-75`) uses `Xoshiro256PlusPlus::seed_from_u64(seed + i)` per row for Rademacher/Mammen/Webb generation. If any Python caller (SDID / efficient-DiD multiplier bootstrap) has a numpy-canonical equivalent, the two backends likely diverge under the same seed. Audit Python callers (`diff_diff/sdid.py`, `diff_diff/efficient_did_bootstrap.py`, `diff_diff/bootstrap_utils.py::generate_bootstrap_weights_batch_numpy`) for parity-test gaps. Same fix shape as TROP RNG parity (PR #354): pre-generate weights in Python via numpy and pass them to Rust through PyO3. | `rust/src/bootstrap.rs`, `diff_diff/bootstrap_utils.py` | follow-up | Medium | @@ -166,7 +165,6 @@ Ordered paydown view across the tables above. Tier A → D is by effort × risk, #### Tier A — Quick wins (≤1 day, ≤3 CI rounds expected) -- Regenerate `benchmarks/data/clubsandwich_cr2_golden.json` from R via `benchmarks/R/generate_clubsandwich_golden.R` (currently `source: python_self_reference`) - HonestDiD `test_m0_short_circuit`: replace wall-clock `elapsed < 0.5s` proxy with a state flag (`tests/test_methodology_honest_did.py:246`) - EfficientDiD `control_group="last_cohort"` REGISTRY-vs-code alignment with `anticipation>0` (`efficient_did.py`, one design decision) - TripleDifference: add `generate_ddd_panel_data` for panel DDD power analysis (`prep_dgp.py`, `power.py`) diff --git a/benchmarks/R/generate_clubsandwich_golden.R b/benchmarks/R/generate_clubsandwich_golden.R index f1673dda..1de9490c 100644 --- a/benchmarks/R/generate_clubsandwich_golden.R +++ b/benchmarks/R/generate_clubsandwich_golden.R @@ -7,7 +7,7 @@ # Rscript benchmarks/R/generate_clubsandwich_golden.R # # Requirements: -# clubSandwich (CRAN), jsonlite, readr +# clubSandwich (CRAN), jsonlite # # Output: # benchmarks/data/clubsandwich_cr2_golden.json @@ -50,12 +50,12 @@ for (nm in names(datasets)) { d <- datasets[[nm]] fit <- lm(y ~ x, data = d) vcov_cr2 <- vcovCR(fit, cluster = d$cluster, type = "CR2") - # Per-contrast Bell-McCaffrey DOF: one per coefficient via a unit contrast. + # Per-coefficient Bell-McCaffrey Satterthwaite DOF via coef_test()$df_Satt. + # (clubSandwich 0.7+ removed `Wald_test(..., test="Satterthwaite")`; the + # `df_Satt` column from coef_test() is the idiomatic per-coefficient form + # and is numerically identical to the old per-unit-contrast path.) + ct <- coef_test(fit, vcov = vcov_cr2) coef_names <- names(coef(fit)) - dof_vec <- sapply(coef_names, function(nm_coef) { - ctr <- setNames(as.numeric(names(coef(fit)) == nm_coef), names(coef(fit))) - Wald_test(fit, constraints = matrix(ctr, 1), vcov = vcov_cr2, test = "Satterthwaite")$df - }) output[[nm]] <- list( x = d$x, y = d$y, @@ -64,7 +64,7 @@ for (nm in names(datasets)) { coef_names = coef_names, vcov_cr2 = as.numeric(vcov_cr2), vcov_shape = dim(vcov_cr2), - dof_bm = as.numeric(dof_vec), + dof_bm = as.numeric(ct$df_Satt), cluster_sizes = as.numeric(table(d$cluster)) ) } diff --git a/benchmarks/data/clubsandwich_cr2_golden.json b/benchmarks/data/clubsandwich_cr2_golden.json index 29d3d3dd..1e1b3417 100644 --- a/benchmarks/data/clubsandwich_cr2_golden.json +++ b/benchmarks/data/clubsandwich_cr2_golden.json @@ -1,480 +1,42 @@ { "balanced_small": { - "x": [ - 0.9435325056105539, - 0.35942103334157316, - 0.7848054119699771, - 0.5912781852294118, - 0.2943285611969282, - 0.9227256864229143, - 0.8693315446785694, - 0.36413842629052284, - 0.973176814632894, - 0.22452433073513123, - 0.8054958679281221, - 0.6808962312808441, - 0.47106052122569997, - 0.030805470551009684, - 0.8947982030827969, - 0.5736325238146748, - 0.39030825765317734, - 0.354679037214903, - 0.6519730194604051, - 0.3470284246126759, - 0.50757990814917, - 0.37093570271031107, - 0.05520285394721014, - 0.2504092003297004, - 0.8409630382078536, - 0.8181544146253424, - 0.667325426235869, - 0.4705875910980839, - 0.9698444927156448, - 0.8402607539796008 - ], - "y": [ - 0.9124653444274926, - 0.4072731282261453, - 0.6441477445320447, - 0.6245322729646807, - 0.3940054202909884, - 0.35804100091454355, - 1.2073548572143984, - 0.7569702431699881, - 1.0735173904729616, - 0.24506888765882986, - 0.3155516277058717, - 0.3214244220887048, - 0.47584839029817083, - 0.6428414009886432, - 1.0784737561824687, - 1.0295056172852604, - 1.074654892499979, - 0.7325318128152004, - 1.5443613535937253, - 1.2483762247493897, - 1.3339598940009474, - 1.1013735885136728, - 1.2772239312210443, - 1.3363012870469466, - 0.9458188110658502, - 0.8533370526745974, - 1.0314299477607014, - 1.102902757481314, - 1.1075988192549155, - 1.083261134760365 - ], - "cluster": [ - 1, - 1, - 1, - 1, - 1, - 1, - 2, - 2, - 2, - 2, - 2, - 2, - 3, - 3, - 3, - 3, - 3, - 3, - 4, - 4, - 4, - 4, - 4, - 4, - 5, - 5, - 5, - 5, - 5, - 5 - ], - "coef": [ - 0.8339617023201237, - 0.07167199961782436 - ], - "coef_names": [ - "(Intercept)", - "x" - ], - "vcov_cr2": [ - 0.09782504656993866, - -0.09922720822392328, - -0.09922720822392324, - 0.11191805849057651 - ], - "vcov_shape": [ - 2, - 2 - ], - "dof_bm": [ - 3.046739807991152, - 3.867820152608498 - ] + "x": [0.3721983763389289, 0.04382481542415917, 0.7096840182784945, 0.6576903965324163, 0.2498557232320309, 0.3000548330601305, 0.5848666259553283, 0.3334671433549374, 0.6220119637437165, 0.5458285543136299, 0.8797957301139832, 0.7068747407756746, 0.7319725945126265, 0.9316344279795885, 0.4551205944735557, 0.5903197291772813, 0.8204360948875546, 0.2241184804588556, 0.4116668293718249, 0.03861056081950665, 0.7007115453016013, 0.9568374615628272, 0.2133520019706339, 0.6610615001991391, 0.9233188820071518, 0.7957197614014149, 0.07121255435049534, 0.3894077676814049, 0.4064512162003666, 0.6593550781253725], + "y": [1.05667907248243, 1.066947846044605, 1.204576917404174, 0.9393918637932542, 1.177146043692329, 0.7712803978302757, 0.9609694640137826, 0.7179921799716888, 0.9795764040580304, 0.9476640198770669, 1.19400793499814, 0.9843903995036551, 1.596792196243911, 1.080428664799787, 1.494763722918098, 1.179517769839397, 1.473047550838304, 1.325359020870347, 0.4626772743433495, 0.7001639163946355, 1.038012356684575, 1.221226309323235, 0.2339753492538406, 0.8297946161223602, 0.2154287997879057, 0.4532916275621509, 0.1242393529242412, 0.02833331953496379, 0.1200535751172539, 0.007748016921758638], + "cluster": [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5], + "coef": [0.6300385597851063, 0.4180976333279622], + "coef_names": ["(Intercept)", "x"], + "vcov_cr2": [0.05521248306042478, -0.03274214553627666, -0.03274214553627678, 0.06774697283981687], + "vcov_shape": [2, 2], + "dof_bm": [3.212621630843576, 3.575881216615637], + "cluster_sizes": [6, 6, 6, 6, 6] }, "unbalanced_medium": { - "x": [ - 0.3462874999237423, - 0.443655907974409, - 0.815755133576451, - 0.6874916807834863, - 0.30042599121099556, - 0.44014173776833065, - 0.5635108182432734, - 0.8367388142376376, - 0.2123988054077447, - 0.914502769923609, - 0.31540445328100175, - 0.7782568094004593, - 0.5247419459408266, - 0.3489167128946181, - 0.557303561462073, - 0.48586633346308306, - 0.1524714507792745, - 0.44748254053693426, - 0.9278430129623557, - 0.6581790789559127, - 0.11995237785491086, - 0.7014607669463622, - 0.6993877122197518, - 0.4949398323613621, - 0.02940081961158314, - 0.5543191925927794, - 0.2823434957508282, - 0.5427086647116763, - 0.38029675833106513, - 0.4025733964767805, - 0.9375123829093304, - 0.9440645103303017, - 0.1724656629410518, - 0.246872424727282, - 0.9844803926901483, - 0.5559872436147494, - 0.6878771576696437, - 0.5882087419536575, - 0.12223333644812606, - 0.5527777076815845, - 0.04316256681068387, - 0.0010296217941053731, - 0.019728001928271177, - 0.5033598479801504, - 0.08921189595557244, - 0.03511513041575487, - 0.09666762523940087, - 0.5241598869956512, - 0.6325041926946793, - 0.2154994386324135, - 0.9150819636448457, - 0.18191716317896556 - ], - "y": [ - 0.206431750468072, - 0.52914126374223, - 0.948372476430183, - 1.462848242343694, - 1.1787490896437192, - 1.2232918502435313, - 1.49883714029803, - 1.6892231717734079, - 1.0692241224594425, - 1.7836788181223873, - 1.7498094746587713, - 1.6074538152426974, - 1.90081335871295, - 1.8000176179263754, - 1.7859040936875021, - 1.8352051679895616, - 1.4032402426917756, - 1.7463091897477208, - 1.8233311000151493, - 1.373229624702863, - 0.8988692201058794, - 1.7028444873926833, - 1.1367695830608349, - 1.6372544722023115, - 1.0930977055066182, - 2.193470097143512, - 2.004641154578546, - 2.0681249662828667, - 2.1985314952996164, - 1.9785659459956395, - 2.097973041979551, - 2.273697189865885, - 1.529964015810914, - 1.7325466539629746, - 1.7756476746468373, - 1.9205833759797464, - 2.1275707263673027, - 2.061119528784678, - 1.6714765399702147, - 1.3753867950348604, - 1.5654904578276585, - 1.8951529649563046, - 0.5616087105402756, - 1.0365570638248616, - 0.5889041512587929, - 1.010971201580437, - 0.30497699123241295, - 0.8359690569328175, - 0.9175163185192178, - 0.8870270600603117, - 1.1408269122291879, - 0.7996542383725431 - ], - "cluster": [ - 1, - 1, - 1, - 2, - 2, - 2, - 2, - 3, - 3, - 3, - 3, - 3, - 4, - 4, - 4, - 4, - 4, - 4, - 5, - 5, - 5, - 5, - 5, - 5, - 5, - 6, - 6, - 6, - 6, - 6, - 6, - 6, - 6, - 7, - 7, - 7, - 7, - 7, - 7, - 7, - 7, - 7, - 8, - 8, - 8, - 8, - 8, - 8, - 8, - 8, - 8, - 8 - ], - "coef": [ - 1.1239728426835145, - 0.7158670108521947 - ], - "coef_names": [ - "(Intercept)", - "x" - ], - "vcov_cr2": [ - 0.0769099709181706, - -0.0648802497792549, - -0.0648802497792549, - 0.07456720905801277 - ], - "vcov_shape": [ - 2, - 2 - ], - "dof_bm": [ - 4.0976627009493924, - 4.560090785535048 - ] + "x": [0.128862570039928, 0.6701512336730957, 0.3297898611053824, 0.2037271915469319, 0.368284564698115, 0.7823035586625338, 0.1984838922508061, 0.2190779817756265, 0.4419498639181256, 0.6422768677584827, 0.07627971330657601, 0.2270524124614894, 0.2186972440686077, 0.7315562770236284, 0.02659884956665337, 0.2013396944385022, 0.6228771209716797, 0.6576305113267154, 0.6803408381529152, 0.1647861595265567, 0.4449463130440563, 0.5335798722226173, 0.8809765174519271, 0.6302034477703273, 0.218909045914188, 0.4129483439028263, 0.457392189418897, 0.4145040600560606, 0.9525077047292143, 0.8236416757572442, 0.3981174814980477, 0.1786765605211258, 0.9758388390764594, 0.8111260565929115, 0.2872961156535894, 0.462518839398399, 0.8308089969214052, 0.4791092942468822, 0.9257831228896976, 0.5312510745134205, 0.6328634682577103, 0.8924754566978663, 0.8108281444292516, 0.2660467734094709, 0.8168796338140965, 0.306065121665597, 0.7733509626705199, 0.897745214169845, 0.6402185300830752, 0.2910342407412827, 0.5996965372469276, 0.03418785543181002], + "y": [1.281270710066877, 1.722881312144584, 1.281775401696968, 1.413528483141223, 1.043366086227046, 1.45241495488675, 1.675613928059604, 0.8983891882314028, 0.8686731426084141, 1.093944759246245, 0.9249145411811377, 0.8493184917053391, 2.85305770155655, 2.514730609257615, 2.436761333058375, 2.341302160178671, 2.505201653980506, 2.944303753193914, 2.227941048801415, 2.412340826904834, 2.207925805301888, 2.2923019573263, 2.342348481613817, 2.226590726847836, 2.086344787976651, 1.656871526710177, 1.904644888582141, 1.862885866969904, 1.964874740458616, 1.520180307273704, 1.685953582170327, 1.515607792243089, 2.005418184074291, 0.8171658134910009, 0.6646849084479258, 1.019695539332419, 1.070006641198372, 0.7096919907678358, 1.075733955227796, 0.8546317660179634, 0.903663211647689, 0.9348092890610543, 0.4821668914973627, -0.04429208656038075, 0.3016826155382485, 0.4284240023231432, 0.663698185175872, 0.9351900235072026, 0.5567615926932054, 0.4072033162858779, 0.1744773447144307, 0.5377150345076495], + "cluster": [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8], + "coef": [1.407664197609156, -0.02639903174573305], + "coef_names": ["(Intercept)", "x"], + "vcov_cr2": [0.1192052700634108, -0.04806533789647253, -0.04806533789647224, 0.09454689012283392], + "vcov_shape": [2, 2], + "dof_bm": [6.387834602264484, 5.850537239807559], + "cluster_sizes": [3, 4, 5, 6, 7, 8, 9, 10] }, "singletons_present": { - "x": [ - 0.21443241914065292, - 0.41682182086464903, - 0.807695239893341, - 0.27392327582317477, - 0.8157795313236993, - 0.10763306403508877, - 0.43640245442460357, - 0.8388322165462259, - 0.19866476008995304, - 0.3026576176172062, - 0.3431542671793545, - 0.21186815334857012, - 0.9208492603469947, - 0.5571125190421009, - 0.8457028297541836, - 0.5345114831042415, - 0.24845868416282413, - 0.2642429245246207, - 0.9465367828433652, - 0.20945542694752117, - 0.009759083685863867, - 0.6259345454282155, - 0.3392297185756302, - 0.46294549338368507, - 0.1317216186447715, - 0.48751512348895587, - 0.15110039365510097, - 0.630530192366627, - 0.006966019961796244, - 0.5849538716054395, - 0.5801774527180487, - 0.5915577625702408, - 0.679731222994373, - 0.3730404645361476, - 0.7146163443155258, - 0.800633446837909, - 0.8946270764765526, - 0.5683047391141852, - 0.9479855397490015, - 0.8267433552654698, - 0.5713537595492587, - 0.0243267381892448, - 0.7476080024405406, - 0.33535028127617283, - 0.7698511234427318, - 0.4701191328296702 - ], - "y": [ - 1.3243096175984646, - 0.8431396417660063, - 1.4419843094572695, - 1.2870746718828974, - 1.596866114476496, - 1.1552637530029242, - 1.6100814232512937, - 1.7512158081862361, - 1.532545529434858, - 1.160147918245946, - 1.453429043900455, - 0.3908349902873533, - 0.942299040729968, - 0.5055662291402832, - 0.5005710305092965, - 0.6181560617843084, - 0.021769149201485205, - 0.16713924693374213, - 0.673423433392893, - 0.5733752379061821, - -0.0468281630117352, - 0.45355321026007617, - 1.519957688127892, - 1.269751930094416, - 1.5788986239640754, - 1.55947674117304, - 1.4519195876176205, - 1.7886169479007779, - 1.1911132158800422, - 0.18966340391475178, - 0.7023759773342588, - 0.42401023482172623, - 0.6938869588679403, - 0.6120292225026871, - 0.5707096718476062, - 0.34871206846203284, - 0.9559282948976917, - 1.69305984433428, - 1.8182487816381958, - 1.9520010432247459, - 1.921437822619558, - 1.5786029246631785, - 1.755924603719404, - 1.8120520346821916, - 1.6968331198272542, - 1.7746413567714139 - ], - "cluster": [ - 1, - 2, - 3, - 3, - 4, - 4, - 4, - 5, - 5, - 5, - 5, - 6, - 6, - 6, - 6, - 6, - 7, - 7, - 7, - 7, - 7, - 7, - 8, - 8, - 8, - 8, - 8, - 8, - 8, - 9, - 9, - 9, - 9, - 9, - 9, - 9, - 9, - 10, - 10, - 10, - 10, - 10, - 10, - 10, - 10, - 10 - ], - "coef": [ - 1.0020238985950929, - 0.20694026200922197 - ], - "coef_names": [ - "(Intercept)", - "x" - ], - "vcov_cr2": [ - 0.08616619395052143, - -0.07752403165758125, - -0.07752403165758119, - 0.19494058151826515 - ], - "vcov_shape": [ - 2, - 2 - ], - "dof_bm": [ - 5.408661566845492, - 5.965130041512546 - ] + "x": [0.3096159978304058, 0.8345736733172089, 0.6098155525978655, 0.7399769551120698, 0.6496841341722757, 0.9805952606257051, 0.5273389085195959, 0.4678991283290088, 0.2815611527767032, 0.2787613344844431, 0.9055589258205146, 0.8726157851051539, 0.1036925727967173, 0.03638615319505334, 0.2336558362003416, 0.2130224851425737, 0.4085248110350221, 0.2336621598806232, 0.7284043852705508, 0.654739553341642, 0.8465219340287149, 0.6553786264266819, 0.8273450890555978, 0.2404767158441246, 0.8276755115948617, 0.6109203656669706, 0.8021095618605614, 0.1160071832127869, 0.2334099512081593, 0.367526771267876, 0.1624952557031065, 0.04603013768792152, 0.7806035145185888, 0.2781105572357774, 0.3106493707746267, 0.2741349330171943, 0.3256947943009436, 0.4693535177502781, 0.7107607442885637, 0.8296667435206473, 0.7135459324344993, 0.3497182361315936, 0.6398582113906741, 0.5172503236681223, 0.1625131070613861, 0.7787523062434047], + "y": [0.950136668408636, 1.754592223394735, 1.122852938797655, 1.636705917889595, 1.33602946824798, 1.194648962092246, 1.148033096402285, 0.9885434625790169, 0.7050616623244204, 0.7449203796991308, 0.8522637796426105, 2.487209473904, 1.976909132715369, 2.347476489585664, 1.967524934768897, 2.602879791128856, 1.220252609394451, 1.332954036617961, 1.740259975846541, 1.115942116955388, 1.740710236232752, 1.506143920276894, 0.7486150649323104, 0.6946189692315179, 0.9177986725014496, 1.285459212606915, 1.339739549860439, 0.673901009685542, 0.4746185839128876, 0.7341158063604124, 0.7481991133308572, 0.3787909792265402, 0.8604465686455036, 1.011410595825842, 0.8174743408007981, 0.7755017956919983, 0.881726590825053, 1.302321464323421, 1.60501077060975, 1.948830729098463, 1.385133092418095, 1.418004152520744, 1.20172718727563, 1.009673937225518, 1.20690782633116, 1.588919597999956], + "cluster": [1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10], + "coef": [1.054441705396782, 0.3911987276515195], + "coef_names": ["(Intercept)", "x"], + "vcov_cr2": [0.1616023734490589, -0.1899859611041032, -0.1899859611041032, 0.238142463018284], + "vcov_shape": [2, 2], + "dof_bm": [4.423211980441538, 5.952656131218101], + "cluster_sizes": [1, 1, 2, 3, 4, 5, 6, 7, 8, 9] }, "meta": { - "source": "python_self_reference", - "note": "Regression anchor bootstrapped from diff_diff._compute_cr2_bm. Run benchmarks/R/generate_clubsandwich_golden.R to overwrite with authoritative clubSandwich values." + "source": "clubSandwich", + "clubSandwich_version": "0.7.0", + "R_version": "R version 4.5.2 (2025-10-31)", + "generated_at": "2026-05-16 10:20:19 UTC", + "note": "CR2 Bell-McCaffrey cluster-robust parity target for diff_diff._compute_cr2_bm" } -} \ No newline at end of file +} diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 099b8059..4913903a 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2540,7 +2540,7 @@ Shipped in `diff_diff/had_pretests.py` as `stute_joint_pretest()` (residuals-in - **Note (scope limitation on absorbed FE):** HC2 and HC2 + Bell-McCaffrey are rejected on any estimator that uses within-transformation (demeaning) for fixed effects: `TwoWayFixedEffects` unconditionally; `DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"})`; `MultiPeriodDiD(absorb=..., vcov_type in {"hc2","hc2_bm"})`. FWL preserves coefficients and residuals under within-transformation but NOT the hat matrix: `h_ii = x_i' (X'X)^{-1} x_i` on the reduced design is not the diagonal of the full FE projection, and CR2's block adjustment `A_g = (I - H_gg)^{-1/2}` likewise depends on the full cluster-block hat matrix. Applying the reduced-design leverage would silently mis-state small-sample SEs/DOF, so the combinations raise `NotImplementedError` with a pointer to workarounds: use `vcov_type="hc1"` (HC1/CR1 have no leverage term and survive FWL), or switch to `fixed_effects=` dummies so the hat matrix is computed on the full design. Lifting the guard requires computing HC2/CR2-BM from the full absorbed projection and validating it against a full-dummy or `fixest`/`clubSandwich` reference. Tracked in `TODO.md` under Methodology/Correctness. - [x] Phase 1a: `vcov_type` enum threaded through `DifferenceInDifferences` (`MultiPeriodDiD`, `TwoWayFixedEffects` inherit); `robust=True` <=> `vcov_type="hc1"`, `robust=False` <=> `vcov_type="classical"`. Conflict detection at `__init__`. Results summary prints the variance-family label. - **Note (deviation from the fully-symmetric enum):** `MultiPeriodDiD(cluster=..., vcov_type="hc2_bm")` is intentionally **not supported** and raises `NotImplementedError`. The scalar-coefficient `DifferenceInDifferences` path handles the cluster + CR2 Bell-McCaffrey combination (`_compute_cr2_bm` returns a per-coefficient Satterthwaite DOF that is valid for the single-ATT contrast), but `MultiPeriodDiD` also reports a post-period-average ATT constructed as a *contrast* of the event-study coefficients. The cluster-aware CR2 BM DOF for that contrast (i.e., the Pustejovsky-Tipton 2018 per-cluster adjustment matrices applied to an arbitrary aggregation contrast) is not yet implemented. Pairing CR2 cluster-robust SEs with the one-way Imbens-Kolesar (2016) contrast DOF would be a broken hybrid, so the combination fails fast with a clear workaround message (drop the cluster for one-way HC2+BM, or use `vcov_type="hc1"` with cluster for CR1 Liang-Zeger). Tracked in `TODO.md` under Methodology/Correctness. Applies only to `MultiPeriodDiD`; `DifferenceInDifferences(cluster=..., vcov_type="hc2_bm")` works. -- [x] Phase 1a: `clubSandwich::vcovCR(..., type="CR2")` parity harness committed: R script at `benchmarks/R/generate_clubsandwich_golden.R` plus a regression-anchor JSON at `benchmarks/data/clubsandwich_cr2_golden.json`. **Note:** the committed JSON currently has `"source": "python_self_reference"` and pins numerical stability only; authoritative R-produced values are generated by running the R script, which the TODO.md row under Methodology/Correctness tracks. The parity test at `tests/test_linalg_hc2_bm.py::TestCR2BMCluster::test_cr2_parity_with_golden` runs at 1e-6 tolerance (Phase 1a plan commits 6-digit parity once R regen completes). +- [x] Phase 1a: `clubSandwich::vcovCR(..., type="CR2")` parity harness committed: R script at `benchmarks/R/generate_clubsandwich_golden.R` plus the authoritative R-generated JSON at `benchmarks/data/clubsandwich_cr2_golden.json` (`"source": "clubSandwich"`, with `clubSandwich_version`, `R_version`, and `generated_at` captured in `meta` for forensic traceability). The parity test at `tests/test_linalg_hc2_bm.py::TestCR2BMCluster::test_cr2_parity_with_golden` runs at 1e-6 tolerance and passes at ≤ 7.1e-15 across all three datasets — Python's `_compute_cr2_bm` matches clubSandwich at machine precision. - [x] Phase 1b: Calonico-Cattaneo-Farrell (2018) MSE-optimal bandwidth selector. In-house port of `nprobust::lpbwselect(bwselect="mse-dpi")` (nprobust 0.5.0, SHA `36e4e53`) as `diff_diff.mse_optimal_bandwidth` and `BandwidthResult`, backed by the private `diff_diff._nprobust_port` module (`kernel_W`, `lprobust_bw`, `lpbwselect_mse_dpi`). Three-stage DPI with four `lprobust.bw` calls at orders `q+1`, `q+2`, `q`, `p`. Python matches R to `0.0000%` relative error (i.e., bit-parity within float64 precision, ~8-13 digits agreement) on all five stage bandwidths (`c_bw`, `bw_mp2`, `bw_mp3`, `b_mse`, `h_mse`) across three deterministic DGPs (uniform, Beta(2,2), half-normal) via `benchmarks/R/generate_nprobust_golden.R` → `benchmarks/data/nprobust_mse_dpi_golden.json`. **Note:** `weights=` is currently unsupported (raises `NotImplementedError`); nprobust's `lpbwselect` has no weight argument so there is no parity anchor. Weighted-data support deferred to Phase 2 (survey-design adaptation). **Note (public API scope restriction):** the exported wrapper `mse_optimal_bandwidth` hard-codes the HAD Phase 1b configuration (`p=1`, `deriv=0`, `interior=False`, `vce="nn"`, `nnmatch=3`). The underlying port supports a broader surface (`hc0`/`hc1`/`hc2`/`hc3` variance, interior evaluation, higher `p`), but those paths are not parity-tested against `nprobust` and are deferred. Callers needing the broader surface should use `diff_diff._nprobust_port.lpbwselect_mse_dpi` directly and accept that parity has not been verified on non-HAD configurations. **Note (input contract):** the wrapper enforces HAD's support restriction `D_{g,2} >= 0` (front-door `ValueError` on negative doses and empty inputs). `boundary` must equal `0` (Design 1') or `float(d.min())` (Design 1 continuous-near-d_lower) within float tolerance; off-support values raise `ValueError`. When `boundary ~ 0`, the wrapper additionally requires `d.min() <= 0.05 * median(|d|)` as a Design 1' support plausibility heuristic, chosen to pass the paper's thin-boundary-density DGPs (Beta(2,2), d.min/median ~ 3%) while rejecting substantially off-support samples (U(0.5, 1.0), d.min/median ~ 1.0). Detected mass-point designs (`d.min() > 0` with modal fraction at `d.min() > 2%`) raise `NotImplementedError` pointing to the Phase 2 2SLS path per paper Section 3.2.4. - [x] Phase 1c: First-order bias estimator `M̂_{ĥ*_G}` and robust variance `V̂_{ĥ*_G}`. Implemented via Calonico-Cattaneo-Titiunik (2014) bias-combined design matrix `Q.q` in the in-house port `diff_diff._nprobust_port.lprobust` (single-eval-point path of `nprobust::lprobust`, npfunctions.R:177-246). - [x] Phase 1c: Bias-corrected CI (Equation 8) with `nprobust` parity. Public wrapper `diff_diff.bias_corrected_local_linear` returns `BiasCorrectedFit` with μ̂-scale point estimate, robust SE, and bias-corrected 95% CI `[tau.bc ± z_{1-α/2} * se.rb]`. The β-scale rescaling from Equation 8, `(1/G) Σ D_{g,2}`, is applied by Phase 2's `HeterogeneousAdoptionDiD.fit()`. Parity against `nprobust::lprobust(..., bwselect="mse-dpi")` is asserted at `atol=1e-12` on `tau_cl`/`tau_bc`/`se_cl`/`se_rb`/`ci_low`/`ci_high` across the three unclustered golden DGPs (DGP 1 and DGP 3 typically land closer to `1e-13`). The Python wrapper computes its own `z_{1-α/2}` via `scipy.stats.norm.ppf` inside `safe_inference()`; R's `qnorm` value is stored in the golden JSON for audit, and the parity harness compares Python's CI bounds to R's pre-computed CI bounds so any residual drift is purely the floating-point arithmetic in `tau.bc ± z * se.rb`, not a critical-value disagreement. The clustered DGP achieves bit-parity (`atol=1e-14`) when cluster IDs are in first-appearance order; otherwise BLAS reduction ordering can drift to `atol=1e-10`. Generator: `benchmarks/R/generate_nprobust_lprobust_golden.R`. **Note:** The wrapper matches nprobust's `rho=1` default (`b = h` in auto mode), so Phase 1b's separately-computed `b_mse` is surfaced via `bandwidth_diagnostics.b_mse` but not applied. **Note (public-API surface restriction):** Phase 1c restricts the public wrapper's `vce` parameter to `"nn"`; hc0/hc1/hc2/hc3 raise `NotImplementedError` and are queued for Phase 2+ pending dedicated R parity goldens. The port-level `diff_diff._nprobust_port.lprobust` still accepts all five vce modes (matching R's `nprobust::lprobust` signature) for callers who need the broader surface and accept that the hc-mode variance path — which reuses p-fit hat-matrix leverage for the q-fit residual in R (lprobust.R:229-241) — has not been separately parity-tested. **Note (Phase 1c internal bug workaround):** The clustered golden DGP 4 uses manual `h=b=0.3` to sidestep an nprobust-internal singleton-cluster shape bug in `lprobust.vce` fired by the mse-dpi pilot fits; the Python port has no equivalent bug. diff --git a/tests/test_linalg_hc2_bm.py b/tests/test_linalg_hc2_bm.py index 5a0f21e4..c9706931 100644 --- a/tests/test_linalg_hc2_bm.py +++ b/tests/test_linalg_hc2_bm.py @@ -524,11 +524,12 @@ def test_cr2_bm_singleton_clusters(self): def test_cr2_parity_with_golden(self): """Parity against benchmarks/data/clubsandwich_cr2_golden.json. - The golden values are authoritative once regenerated by - benchmarks/R/generate_clubsandwich_golden.R (clubSandwich source); - until then the JSON is a self-reference anchor that pins numerical - stability. Test tolerance is 1e-6, well within the 6-digit parity - target stated in the Phase 1a plan. + The JSON is the authoritative clubSandwich-generated fixture + (regenerated via benchmarks/R/generate_clubsandwich_golden.R; + `meta.source = "clubSandwich"`). Test tolerance is 1e-6, well + within the 6-digit parity target stated in the Phase 1a plan; + empirically Python matches clubSandwich at ≤ 7.1e-15 across all + three datasets. """ import json from pathlib import Path