Skip to content

[v4.0.1] Fix contrib-CTC structural reforms in US microsim path#300

Closed
MaxGhenis wants to merge 17 commits intomainfrom
v4-fix-contrib-ctc-structural-reforms
Closed

[v4.0.1] Fix contrib-CTC structural reforms in US microsim path#300
MaxGhenis wants to merge 17 commits intomainfrom
v4-fix-contrib-ctc-structural-reforms

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

Fixes a silent bug in the US microsim path that made dict-reforms involving gov.contrib.ctc.* parameters crash at .ensure() with:

AttributeError: 'NoneType' object has no attribute 'entity'

Concretely: Tara Watson's published CTC+EITC reform (policy 94589 on app.policyengine.org) — 42 parameters with three gov.contrib.ctc.*.in_effect = True gates — was completely unable to run through pe.us.economic_impact_analysis on the v4.0 stack.

Root cause

PolicyEngineUSLatest._build_simulation_from_dataset was instantiating entities against the module-level policyengine_us.system:

builder.populations = system.instantiate_entities()   # module system, no user reform
# ...
microsim.build_from_populations(builder.populations)

But Microsimulation.__init__ applies structural reforms (triggered by user-reform parameter gates like gov.contrib.ctc.minimum_refundable.in_effect=True) to its own tax_benefit_system — not the module one. Building populations against the module system then left reform-registered variables like ctc_minimum_refundable_amount invisible at calc time; refundable_ctc (which had been swapped to the reform version) would then crash trying to sum them.

Fix

Pass microsim.tax_benefit_system to _build_simulation_from_dataset instead of the module-level system. One-line change in src/policyengine/tax_benefit_models/us/model.py.

Evidence

On the v4.0 stack (pe-us 1.653.3 + us-data 1.73.0), running the full 42-param reform now produces:

Metric Baseline Reform Δ
Federal income tax $2,141.8B $2,053.8B −$87.9B
CTC $138.2B $236.3B +$98.0B
Refundable CTC $35.9B $215.7B +$179.8B
EITC $61.4B $38.9B −$22.6B

vs. pe-us 1.601.0 (March 2026): −$69.6B (the +$18B drift is upstream model evolution, not a pe.py regression).

Tests

New tests/test_us_microsim_structural_reforms.py — builds a tiny in-memory dataset, applies the minimum-refundable reform via dict, runs the sim end-to-end. Before the fix: crashes. After: passes.

72/72 existing tests still pass.

Version

Bumped to 4.0.1 (patch — bug fix, no API changes).

🤖 Generated with Claude Code

@MaxGhenis
Copy link
Copy Markdown
Contributor Author

Subagent review complete — all blockers addressed

Two reviewers ran on this PR: code-simplifier and reproducibility-reviewer. Key findings, all addressed in commit e03dc8a:

Reproducibility (blocking)

  • The original "module-level system stays pristine" invariant was brittle — would flip false if a future `policyengine_us` release ever shipped `ctc_minimum_refundable_amount` unconditionally. Replaced with a direct identity assertion via monkeypatch: the system passed into `_build_simulation_from_dataset` must `is` `microsim.tax_benefit_system`. That's the fix's actual contract; survives arbitrary upstream changes.

Reproducibility (follow-ups)

  • Parameter-change-shows-up-in-output test now covers BOTH CTC AND EITC variables, catching partial-application regressions (not only no-op ones).
  • Added one non-CTC gate (`gov.contrib.streamlined_eitc`) to the parametrised smoke test to prove the fix generalises beyond the CTC family.

Code-simplifier

  • `simple = lambda` (with `# noqa: E731`) → `def _simple`.
  • Fix's inline comment shortened from 6 lines to 4, kept the load-bearing "hide reform-registered variables" clause.

UK latent bug

Verification

  • 7/7 tests pass with fix.
  • Fix reverted → 3/7 tests fail with clean diagnostic errors (the invariant test now reports `system is not microsim.tax_benefit_system` instead of the raw `NoneType` traceback).

Ready for merge on CI green.

MaxGhenis and others added 16 commits April 20, 2026 09:12
Collapses the household-calculator journey into one obvious call:

    import policyengine as pe
    result = pe.us.calculate_household(
        people=[{"age": 35, "employment_income": 60000}],
        tax_unit={"filing_status": "SINGLE"},
        year=2026,
        reform={"gov.irs.deductions.standard.amount.SINGLE": 5000},
        extra_variables=["adjusted_gross_income"],
    )
    print(result.tax_unit.income_tax, result.tax_unit.adjusted_gross_income)

Design goal: a fresh coding session with no prior context and a 20-file
browse budget reaches a correct number in two tool calls — one to
`import policyengine as pe`, one for `pe.us.calculate_household(...)`.
The old surface forced an agent to pick among three entry points
(`calculate_household_impact`, `managed_microsimulation`, raw
`Simulation`), build a pydantic `Input` wrapper, construct a `Policy`
object with `ParameterValue`s, then dig into a `list[dict[str, Any]]`
to get the number. Every one of those layers is gone.

Changes:

- Populate `policyengine/__init__.py` (previously empty) with
  `us`, `uk`, and `Simulation` accessors.
- Add `tax_benefit_models/{us,uk}/household.py` with a kwargs-based
  `calculate_household` that builds a policyengine_us/uk Simulation
  with a situation dict and returns a dot-access HouseholdResult.
- Add `tax_benefit_models/common/` with:
    - `compile_reform(dict) -> core reform dict` (scalar or
      `{effective_date: value}` shapes)
    - `dispatch_extra_variables(names)` — flat list, library looks up
      each name's entity via `variables_by_name`
    - `EntityResult(dict)` with `__getattr__` for dot access +
      paste-able-fix AttributeError on unknown names
    - `HouseholdResult(dict)` with `.to_dict()` / `.write(path)`
- Add `utils/household_validation.py` that catches typo'd variable
  names in entity dicts with difflib close-match suggestions.
- Remove `USHouseholdInput`, `UKHouseholdInput`, `USHouseholdOutput`,
  `UKHouseholdOutput`, and `calculate_household_impact` from both
  country modules (v4 breaking).
- Each country __init__.py exposes `model` (the pinned
  `TaxBenefitModelVersion`) alongside the existing `us_latest` /
  `uk_latest` so agents can guess either name.
- Rewrite `tests/test_household_impact.py` (19 tests) around the new
  API: kwargs inputs, dot-access results, flat `extra_variables`,
  error messages with paste-able fixes, JSON serialization.
- Rewrite `tests/test_us_reform_application.py` around reform-dict
  inputs instead of `Policy(parameter_values=[...])`.
- Update `tests/fixtures/us_reform_fixtures.py` to store
  household fixtures as plain kwargs dicts that splat into
  `calculate_household(**fixture)`.

223 tests pass locally.

Downstream migration (policyengine-api-v2-alpha, the sole consumer of
the 3.x surface): replace `calculate_household_impact(input, policy=p)`
with `calculate_household(**input, reform=reform_dict)` — fixture
script grep of call sites suggests ~25 LOC touched. The migration
guide will show the before/after.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The review called out five ship-blockers. This commit fixes all five
plus the three footguns:

1. Entity-aware validation. Placing `filing_status` on `people`
   instead of `tax_unit` now raises with the correct entity and the
   exact kwarg-swap to make: `tax_unit={'filing_status': <value>}`.

2. Realistic docstring examples. Top-of-module examples in us/household.py
   and uk/household.py are now lone-parent-with-child cases that
   exercise every grouping decision (state_code on household,
   is_tax_unit_dependent on person, would_claim_child_benefit on
   benunit), not single-adult-no-state cases that hide them.

3. Reform-path validation. `compile_reform` now takes `model_version`
   and raises with a difflib close-match suggestion on unknown
   parameter paths, matching the validator quality on variable names.

4. Scalar reform default date. Scalar reform values previously
   defaulted to `date.today().isoformat()` — a caller running a
   year=2026 sim mid-2026 got a mid-year effective date and a blended
   result. Now defaults to `{year}-01-01` (passed through from
   calculate_household).

5. Unexpected-kwargs catcher. UK `calculate_household(tax_unit=...)`
   and US `calculate_household(benunit=...)` now raise a TypeError
   that names the correct country-specific kwarg. Other unexpected
   kwargs get a difflib close-match from the allowed set.

Also added:

- `people=[]` check with an explicit error before the calc blows up
  inside policyengine_us.
- Tests for all new error paths (`test__variable_on_wrong_entity`,
  `test__empty_people`, `test__unknown_reform_path`,
  `test__us_kwarg_on_uk`, `test__uk_kwarg_on_us`).

151 tests pass locally across the facade + reform + regression suites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Separates release-manifest + TRACE TRO emission from the core value
layer. Consumers who only need Simulation / Policy / Variable /
Parameter no longer transitively import h5py through
scoping_strategy / constituency_impact / local_authority_impact.

File moves:
- core/release_manifest.py -> provenance/manifest.py
- core/trace_tro.py        -> provenance/trace.py

New provenance/__init__.py re-exports the public surface
(get_release_manifest, build_trace_tro_from_release_bundle,
serialize_trace_tro, canonical_json_bytes, etc.).

core/__init__.py drops the 20 provenance re-exports and keeps only
value objects (Dataset, Variable, Parameter*, Policy, Dynamic,
Simulation, Region, scoping strategies, TaxBenefitModel,
TaxBenefitModelVersion). Explicit core -> provenance import in
tax_benefit_model_version.py.

Lazy h5py:
- core/scoping_strategy.py: h5py no longer at top of module; imported
  inside WeightReplacementStrategy.apply() only.
- outputs/constituency_impact.py: same.
- outputs/local_authority_impact.py: same.

Internal callers migrated:
- tax_benefit_models/{us,uk}/model.py
- tax_benefit_models/{us,uk}/datasets.py
- countries/{us,uk}/regions.py
- cli.py
- results/trace_tro.py
- scripts/generate_trace_tros.py
- tests/test_{release_manifests,trace_tro,manifest_version_mismatch}.py
- docs/release-bundles.md

216 tests pass locally across the v4 surface. `from policyengine.core
import Simulation` + `from policyengine.provenance import
get_release_manifest` both work without h5py installed (verified by
temporarily uninstalling and retrying). The full `import policyengine
as pe` still pulls h5py because policyengine_us / policyengine_uk
import it eagerly (upstream); that's outside our control.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two byte-identical classes split only by British/American spelling
(program_name vs programme_name). Collapsed into a single
policyengine.outputs.ProgramStatistics; both country analysis helpers
import it from there now. Saves ~106 LOC of duplication and removes
an API-surface footgun for cross-country code.

Changes:

- Add policyengine/outputs/program_statistics.py with the unified class.
- Re-export from policyengine/outputs/__init__.py.
- Delete tax_benefit_models/us/outputs.py and
  tax_benefit_models/uk/outputs.py.
- us/__init__.py and uk/__init__.py re-export from policyengine.outputs.
- uk/analysis.py: rename programme_name -> program_name,
  programme_statistics -> program_statistics, programmes -> programs,
  programme_df/collection -> program_df/collection. Field on
  PolicyReformAnalysis also changes.

Migration for callers:
- from policyengine.tax_benefit_models.uk import ProgrammeStatistics
  -> from policyengine.outputs import ProgramStatistics
- stats.programme_name -> stats.program_name

205 tests pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls ~300 lines of shared init/save/load logic out of
PolicyEngineUSLatest and PolicyEngineUKLatest into a
MicrosimulationModelVersion base in tax_benefit_models.common.

The base handles:
- Release-manifest fetch + installed-version warning
- Data-release certification
- Variable/parameter population from the country system
- save() / load() + output-dataset filepath convention
- _build_entity_relationships via declared group_entities

Subclasses declare country_code, package_name, group_entities,
entity_variables, and implement four thin hooks (_load_system,
_load_region_registry, _dataset_class, _get_runtime_data_build_metadata).
run() intentionally stays per-country: the US applies reforms at
Microsimulation construction and manually copies structural columns,
while the UK wraps inputs as UKSingleYearDataset and applies reforms
after construction. Hiding those behind a shared skeleton would mask
real divergence.

Behaviour preservation is guarded by a byte-level snapshot test
(tests/test_base_extraction_snapshot.py) covering four US and four
UK household cases plus a model-surface snapshot. All 391 tests pass
with zero snapshot drift.
- README and core-concepts now lead with pe.uk/pe.us entry points and
  pe.uk.calculate_household / pe.us.calculate_household (flat kwargs,
  dot-access result, dict reforms).
- economic-impact-analysis, country-models-{uk,us}, and
  regions-and-scoping switched from `from policyengine.tax_benefit_models...`
  to the top-level facade.
- Removed the "Legacy filter fields" section from regions-and-scoping
  now that filter_field/filter_value have been dropped (v4 breaking).
- dev.md package-layout diagram updated to mention common/ base,
  provenance/ subpackage, and the MicrosimulationModelVersion extraction.
- examples/household_impact_example.py rewritten against the v4 API and
  verified end-to-end against both UK and US models.
Unifies the v4 reform surface: the same flat {"param.path": value} /
{"param.path": {date: value}} dict already accepted by
pe.{uk,us}.calculate_household(reform=...) now works on population
Simulation too. Dicts are compiled to Policy / Dynamic objects in a
model_validator(mode="after") using tax_benefit_model_version for
parameter-path validation and dataset.year for scalar effective-date
defaulting.

Adds compile_reform_to_policy / compile_reform_to_dynamic helpers
in tax_benefit_models.common.reform, tested directly in
tests/test_dict_reforms_on_simulation.py (6 tests covering scalar
defaulting, effective-date mappings, path validation, pass-through of
existing Policy objects, and the "no model_version" error path).

Unknown parameter paths raise with close-match suggestions (same
behaviour as the household calculator) so agents don't silently get a
no-op reform from a typo.

397/397 tests pass. End-to-end microsim with
Simulation(policy={"gov.irs.credits.ctc.amount.base[0].amount": 3000})
produces the same -$25.5B revenue impact as the manual
Policy+ParameterValue construction it replaces.
The fixture is already registered in conftest.py; pytest auto-injects
it by parameter name. Importing it explicitly triggered F811.
Bumps to 4.0.0 and addresses three reviewer passes (practitioner,
code-simplifier, end-to-end verification) before v4 ships:

Version / branding
- pyproject.toml: 3.6.0 -> 4.0.0
- release_manifests/{us,uk}.json: bundle_id and policyengine_version
  bumped to 4.0.0 so the bundle TRO URLs point at the right git tag
- test_release_manifests.py: assertion values updated

API ergonomics
- Simulation class now carries a full __doc__ with the canonical dict-
  reform call shape; help(pe.Simulation) used to return Pydantic boiler-
  plate, which hid the headline v4 feature from any agent that hits
  help() before reading source.
- RowFilterStrategy.variable_value: Union[str, int, float]. Numeric
  columns (state_fips, county_fips) are now scopable; "state_code"
  doesn't exist on enhanced_cps_2024 so docs directed users at a
  column that would crash.
- pe.__all__ now exports `outputs` so a fresh agent can tab-complete
  from pe. to the Aggregate/ChangeAggregate family without reading
  source.

Docs
- README: state_code_str -> state_code (consistent with us/household.py)
- core-concepts.md: "Reform as a dict" section leads, "Reform as a
  Policy object" relegated to the escape-hatch appendix
- economic-impact-analysis.md: both US and UK examples collapsed to
  single-line reform dicts (was 20 lines each of Parameter/
  ParameterValue boilerplate)
- country-models-{us,uk}.md: "Common policy reforms" sections rewritten
  as one-liners (lost ~130 lines of deprecated-ceremony boilerplate)
- regions-and-scoping.md: variable_name="state_code" (broken) ->
  variable_name="state_fips", variable_value=6
- reform.py module docstring: document the [N].amount / [N].threshold
  indexed-parameter convention so agents don't hit the bracket-head
  trap

Code simplification (simplifier review)
- model_version.py: except (ValueError, Exception) -> except Exception
- reform.py: compile_reform_to_policy / compile_reform_to_dynamic now
  share a private _compile_reform_to() helper (was 25 lines of
  copy-paste)
- simulation.py: _compile_dict_reforms loops over (field, compiler)
  pairs instead of branching twice by hand
- tests/test_base_extraction_snapshot.py renamed to
  test_household_calculator_snapshot.py (matches what it actually
  pins, not the refactor that motivated it); fixture dir follows

397 tests pass. ruff clean.
Simulation(policy={"gov.contrib.ctc.*": ...}) was crashing at
.ensure() with 'NoneType has no attribute entity' because
_build_simulation_from_dataset instantiated entities against the
module-level policyengine_us.system (no user reform applied) while
building the population from the per-sim Microsimulation whose
tax_benefit_system DID have the structural reform applied. Reform-
registered variables like ctc_minimum_refundable_amount were then
absent from the population's entity registry at calc time.

Fix: pass microsim.tax_benefit_system to _build_simulation_from_dataset
instead of the module-level system. The per-sim system already has
all structural reforms (gov.contrib.ctc.minimum_refundable,
per_child_phase_in, per_child_phase_out) applied by
Microsimulation.__init__'s post-user-reform structural pass.

Real-world impact: Tara Watson's CTC+EITC expansion (policy 94589,
42 parameters including three gov.contrib.ctc.* in_effect gates)
now runs end-to-end on the v4.0 stack, producing the full -$87.9B
federal income tax impact for 2025 instead of crashing.

Regression guarded by tests/test_us_microsim_structural_reforms.py
which builds a tiny in-memory dataset, applies the minimum-refundable
reform via policy dict, and asserts the sim runs to completion.

Version bumped to 4.0.1.
Adds three additional tests to guard against the same class of bug:

- Parametrized per-gate smoke test across all three gov.contrib.ctc
  structural-reform gates (minimum_refundable, per_child_phase_in,
  per_child_phase_out). Each activates cleanly through a dict reform.

- Invariant test that the module-level policyengine_us.system stays
  pristine — structural reforms must only mutate the per-sim system.
  If someone refactors _build_simulation_from_dataset and
  accidentally points it back at the module system, both ends of the
  invariant fire.

- "Reform parameter change reaches the output" test using a plain
  scalar parameter override (no structural-reform machinery), so
  regressions where ANY parameter-value reform silently becomes a
  no-op get caught, not just the contrib-CTC class.

Confirmed each test fails when the fix is reverted; 3/6 catch the
original bug (the two per_child_phase_* reforms only update existing
variables so build_from_populations doesn't trip on them — kept as
smoke tests for defense-in-depth).

403/403 tests pass with fix applied.
Code-simplifier:
- Inline-lambda -> def _simple (tests/test_us_microsim_structural_reforms.py
  L68) — drops the noqa: E731 smell.
- Tighten the fix's inline comment by ~3 lines, keep the load-bearing
  'hide reform-registered variables' clause.

Reproducibility-reviewer (blocker):
- Replace brittle "module-level system stays pristine" check with a
  direct identity assertion: the TaxBenefitSystem passed into
  _build_simulation_from_dataset must be microsim.tax_benefit_system.
  Uses monkeypatch to capture the argument; passes a positive
  assertion that the captured system has the structural-reform
  variable registered. Resilient to upstream policyengine_us ever
  shipping that variable unconditionally.

Reproducibility-reviewer (follow-ups):
- test__reform_parameter_change_is_reflected_in_output now asserts
  on BOTH CTC (base-amount change) AND EITC (phase_in_rate change)
  so partial-application regressions are caught, not only no-op ones.
- Added one non-CTC contrib gate (gov.contrib.streamlined_eitc) to
  the parametrised smoke test to prove the fix generalises beyond
  the CTC family.

Verified fail-without-fix: reverting the one-line fix breaks 3/7
tests (same set as before) with clean errors — the invariant test
now reports "system is not microsim.tax_benefit_system" instead of
the raw NoneType traceback, which is the actual contract.

7 tests, all pass.
@MaxGhenis MaxGhenis force-pushed the v4-fix-contrib-ctc-structural-reforms branch from e03dc8a to e5207ed Compare April 20, 2026 13:13
@MaxGhenis MaxGhenis changed the base branch from main to v4 April 20, 2026 13:13
@MaxGhenis MaxGhenis changed the base branch from v4 to main April 20, 2026 13:28
@MaxGhenis
Copy link
Copy Markdown
Contributor Author

Superseded by fresh rebase on main — see new PR.

@MaxGhenis MaxGhenis closed this Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant