Thread vcov_type through SunAbraham (Phase 1b 1/8)#472
Conversation
|
Overall Assessment I cross-checked the PR against the Sun-Abraham registry entry and the in-code Sun & Abraham references. I did not find an untracked variance-formula or identification mismatch in the implemented analytical paths, and the HC1/fixest gap plus the Conley deferral are both documented and tracked. The blocking issue is narrower: the new Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
c227c35 to
269e904
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment The re-review resolves the two prior P1 test-coverage findings, but there is still one methodology-significant inference bug in the new Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
269e904 to
d7bb9da
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Affected method: Sun-Abraham interaction-weighted event study. The PR correctly fixes the prior Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
d7bb9da to
5996d43
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Affected method: Sun-Abraham interaction-weighted event study. The paper’s load-bearing point is the contamination of staggered-TWFE lead/lag coefficients under heterogeneous treatment effects and the interaction-weighted alternative that avoids it; this PR keeps that estimand intact and changes only the variance/inference layer around it. (ideas.repec.org) Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
…1b 1/8)
Adds `vcov_type` parameter to `SunAbraham`, mirroring the DiD/MPD/TWFE
chain from Phase 1a. Defaults to "hc1" (preserves prior bit-equal
behavior - SA historically hard-coded HC1). First PR of Phase 1b, which
threads `vcov_type` through the 8 standalone estimators that expose
`cluster=` but not yet `vcov_type=`.
Methodology: when `vcov_type ∈ {classical, hc2, hc2_bm}`,
`_fit_saturated_regression` auto-routes to a full-dummy saturated design
(intercept + cohort × event-time interactions + unit dummies + time
dummies). FWL preserves cohort coefficients but not the hat matrix, so
HC2 leverage and Bell-McCaffrey DOF must be computed on the full FE
projection. Mirrors TWFE Gate 1 from PR #469. Empirically matches
`lm() + sandwich::vcovHC(type="HC2")` and
`clubSandwich::vcovCR(..., type="CR2") + coef_test()$df_Satt` at
atol=1e-10 (pinned in tests/test_methodology_sun_abraham.py).
Scope limits: replicate-weight survey + hc2/hc2_bm raises
NotImplementedError (per-replicate full-dummy refit not implemented).
`vcov_type="conley"` rejected at __init__ with a deferral message
(threading conley_* params is a follow-up). Auto-cluster-at-unit is
dropped when the user opts into explicit `vcov_type="hc2"` or
`"classical"` (both one-way only); preserved for `"hc1"` and
`"hc2_bm"`.
Documented deviation from R: SA's within-transform HC1 SE differs from
`fixest::sunab()` by ~1-2% on typical panel sizes (different (n-k)
finite-sample correction). The IW aggregation is otherwise identical;
parity at atol=5e-3.
Test surface: 15 new behavioral tests in test_sun_abraham.py covering
default-vs-explicit bit-equality, all four vcov_type values
finite-and-distinct, auto-cluster drop/preserve, replicate-weight
reject, get_params/set_params, clone+repeat-fit idempotence, invalid
value rejection, cluster_var=None cascade through survey-PSU injection,
full-dummy vs within-transform HC2 divergence. 4 new R-parity tests in
test_methodology_sun_abraham.py against sandwich/clubSandwich/fixest
goldens.
New R golden scenario `sun_abraham_two_cohort` in
benchmarks/data/clubsandwich_cr2_golden.json (5 cohorts × 8 periods
panel; pins classical_se, hc2_se, cr2_bm_singleton_se+dof,
cr2_bm_unit_se+dof, sunab_hc1_event_study_e0_se).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5996d43 to
119db85
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Affected method: Sun-Abraham interaction-weighted event study. The PR changes the variance/inference layer, not the IW estimand. On re-review, the previous P1 is resolved and I do not see any unmitigated P0/P1 issues in the changed code. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
vcov_type ∈ {classical, hc1, hc2, hc2_bm}throughSunAbraham, mirroring the DiD/MPD/TWFE chain from Phase 1a (PR Lift Gate 1: HC2/HC2-BM for TwoWayFixedEffects via full-dummy auto-route #469 et al). Defaults to"hc1"— preserves prior behavior bit-equally (SA historically hard-coded HC1).vcov_type ∈ {classical, hc2, hc2_bm},_fit_saturated_regressionauto-routes to a full-dummy saturated design (intercept + cohort × event-time interactions + unit dummies + time dummies). FWL preserves cohort coefficients but not the hat matrix — HC2 leverage and Bell-McCaffrey Satterthwaite DOF require the full FE projection; classical also routes through full-dummy so the(n-k)finite-sample correction matches R'slm()interpretation. Same Part B surgery shape as TWFE Gate 1 (PR Lift Gate 1: HC2/HC2-BM for TwoWayFixedEffects via full-dummy auto-route #469).hc1keeps the within-transform path (cluster-robust HC1 does not depend on the hat matrix; matchesfixest::sunab(cluster=~unit)convention).hc2,classical); preserved forhc1andhc2_bm(which routes to CR2-BM at unit).SurveyDesign(any kind — analytical weights / stratified / PSU / replicate-weight) combined withvcov_type ∈ {classical, hc2, hc2_bm}raisesNotImplementedError: the survey TSL (or replicate-weight refit) variance overrides the analytical sandwich family, AND the auto-cluster guard for one-way families would silently downgrade unit-level PSUs to per-observation PSUs. Usevcov_type="hc1"(default) for survey designs.vcov_type="conley"rejected at__init__with a deferral message (TODO row tracks the threading needed forconley_*params on the saturated regression call).vcov_typepropagated toSunAbrahamResults.vcov_typefor downstream introspection.Methodology references (required if estimator / math changes)
fixest::sunab()by ~1-2% (~2e-3 absolute) due to a different(n-k)count: fixest counts absorbed FE ink_total; SA'ssolve_olscounts only within-transformed columns. The IW aggregation step is otherwise identical. Documented indocs/methodology/REGISTRY.mdSunAbraham section, pinned atatol=5e-3intests/test_methodology_sun_abraham.py, tracked in TODO.md for follow-up harmonization.Validation
tests/test_sun_abraham.py— 17 new behavioral tests inTestSunAbrahamVcovType(allvcov_typevalues finite-and-distinct, auto-cluster drop/preserve, replicate/survey rejects,get_params/set_params+_vcov_type_explicitrefresh, clone+repeat-fit idempotence, invalid-value rejection,vcov_typepropagated toSunAbrahamResults,n_psu+df_surveyregression for survey path, full-dummy-vs-within-transform HC2 divergence)tests/test_methodology_sun_abraham.py— NEW file, 5 R-parity tests: classical / hc2 / hc2_bm cohort SE atatol=1e-10vslm()+sandwich/clubSandwich, BM Satterthwaite DOF (singleton + cluster=unit) atatol=1e-10, HC1 event-study e=0 vsfixest::sunab(cluster=~unit)atatol=5e-3(documented deviation)benchmarks/R/generate_clubsandwich_golden.R— newsun_abraham_two_cohortscenario (5-cohort × 8-period balanced panel; saturated full-dummylm()+sandwich::vcovHC+clubSandwich::vcovCRat unit + singleton-cluster +fixest::sunabevent-study e=0). JSON golden regenerated.solve_olsfull-dummy vs Rlm()+vcovHC/vcovCRatatol=1e-12to1e-15before any source edit (perfeedback_r_source_smoke_test_before_implementing.md)./ai-review-local --backend codexuntil ✅ clean — only P3 informational items remain (HC1 finite-sample-correction deviation + Conley deferral, both tracked in TODO.md).Security / privacy
Generated with Claude Code