Launch v4: agent-first surface, shared microsim base, dict reforms, refreshed docs by MaxGhenis · Pull Request #298 · PolicyEngine/policyengine.py

MaxGhenis · 2026-04-19T20:54:58Z

Summary

Consolidated v4 launch PR. Collapses the rest of the v4 stack (#291, #292, #293, #294, #295, #297) into a single merge to main after #288 and #290 landed.

What v4 ships

1. Agent-first household calculator (replaces ceremony)

# Before:
from policyengine.tax_benefit_models.us import USHouseholdInput, calculate_household_impact
calculate_household_impact(USHouseholdInput(people=[...], tax_unit={...}, year=2024))

# After:
import policyengine as pe
pe.us.calculate_household(people=[...], tax_unit={...}, year=2026)

pe.uk.calculate_household / pe.us.calculate_household on the top-level package
Flat kwargs; HouseholdResult(dict) with __getattr__ dot-access
Entity-aware input validation (misplaced-variable detection, close-match suggestions, **kwargs catcher for cross-country confusion)
Reform dicts: {"param.path": value} or {"param.path": {date: value}} — compiled internally

2. Dict reforms on `Simulation` too (#297, net-new)

The same flat dict is now accepted by population microsim:

Simulation(
    dataset=dataset,
    tax_benefit_model_version=pe.us.model,
    policy={"gov.irs.credits.ctc.amount.base[0].amount": 3_000},
)

No more hand-built Parameter/ParameterValue objects. Path validation and effective-date defaulting use tax_benefit_model_version + dataset.year.

3. Provenance split out of core (+ lazy h5py)

policyengine.provenance.{manifest,trace} replaces policyengine.core.release_manifest / policyengine.core.trace_tro
h5py now imports lazily inside WeightReplacementStrategy.apply() — a bare import policyengine no longer pulls h5py

4. Shared `MicrosimulationModelVersion` base

~300 lines of duplicated __init__ / save / load / variable+parameter loading extracted from PolicyEngineUSLatest / PolicyEngineUKLatest to tax_benefit_models.common.model_version. Country subclasses declare class-level metadata (country_code, package_name, group_entities, entity_variables) and implement four thin hooks. run() stays per-country — subtle divergences in reform application and output post-processing make a shared skeleton actively misleading.

Guarded by byte-level snapshot tests (tests/test_base_extraction_snapshot.py) freezing household outputs for 4 US + 4 UK cases and model-surface aggregates. Zero drift post-refactor.

5. Unified `ProgramStatistics`

One class replaces split ProgramStatistics / ProgrammeStatistics.

6. Docs refreshed for the agent-first surface

README, core-concepts, economic-impact-analysis, country-models-{uk,us}, regions-and-scoping, dev, and examples all lead with pe.uk.* / pe.us.*. Removed "Legacy filter fields" doc section (dropped in #290). examples/household_impact_example.py rewritten against the v4 API and verified end-to-end.

Breaking changes

policyengine.tax_benefit_models.{uk,us}.calculate_household_impact / USHouseholdInput / UKHouseholdInput removed (use pe.{uk,us}.calculate_household)
Simulation.filter_field / Simulation.filter_value removed in [v4] Drop legacy filter_field/filter_value scoping fields #290 (use scoping_strategy=RowFilterStrategy(...))
policyengine.core.release_manifest / policyengine.core.trace_tro moved to policyengine.provenance.{manifest,trace}
plotly moved to [plotting] optional extra
Reform dict is the only supported reform surface for calculate_household; reform JSON sidecars removed

Replacements for v3 stack

Closes / supersedes: #291, #292, #293, #294, #295, #297 (all CI-green; rebased into this single consolidated branch).

Test plan

397/397 tests pass locally (Python 3.12)
Byte-level snapshot test guards household + model-surface outputs across US/UK
End-to-end microsim with dict reform produces expected -$25.5B revenue impact for CTC-base $3,000 reform on enhanced_cps_2024
examples/household_impact_example.py runs end-to-end (UK + US)

🤖 Generated with Claude Code

MaxGhenis · 2026-04-19T21:25:34Z

Review fixes pushed (commit `76ea9a0`)

Three reviewer passes (practitioner, code-simplifier, end-to-end verification) ran on the consolidated v4 branch. All pre-launch blockers addressed:

API ergonomics

Simulation class now has a full __doc__ showing the canonical dict-reform call shape — help(pe.Simulation) was returning Pydantic boilerplate, hiding the headline v4 feature
RowFilterStrategy.variable_value is now Union[str, int, float] — numeric columns (state_fips, county_fips) were silently unscopable; the docs example for CA used state_code which doesn't exist on the shipped dataset
pe.__all__ now exports outputs for tab-completion discoverability

Version / branding

pyproject.toml: 3.6.0 → 4.0.0
release_manifests/{us,uk}.json: bundle_id + policyengine_version bumped so bundle TRO URLs point at the right git tag

Docs

README, core-concepts, economic-impact-analysis, country-models-{us,uk}, regions-and-scoping all updated to lead with dict reforms. Lost ~130 lines of deprecated Parameter/ParameterValue ceremony.
state_code_str → state_code (consistent with us/household.py)
reform.py docstring documents the [N].amount / [N].threshold indexed-parameter convention so agents don't hit the bracket-head trap

Code simplification (simplifier review)

except (ValueError, Exception) → except Exception
compile_reform_to_policy / compile_reform_to_dynamic now share a private helper (was 25-line copy-paste)
Simulation._compile_dict_reforms loops over (field, compiler) pairs
test_base_extraction_snapshot.py → test_household_calculator_snapshot.py (matches what it actually pins, not the refactor that motivated it)

Remaining as v4.1 follow-ups (not blocking):

Lazy-h5py claim is inaccurate — bare import policyengine still pulls h5py via policyengine_core and policyengine_uk.data.dataset_schema. Internal lazy imports work; upstream transitives don't. Would need pe.us.__getattr__ / pe.uk.__getattr__ gating.
Some tax_benefit_models.common.model_version.py docstrings are multi-paragraph where a line would do — cosmetic only

397 tests pass. ruff + ruff format clean.

Collapses the household-calculator journey into one obvious call: import policyengine as pe result = pe.us.calculate_household( people=[{"age": 35, "employment_income": 60000}], tax_unit={"filing_status": "SINGLE"}, year=2026, reform={"gov.irs.deductions.standard.amount.SINGLE": 5000}, extra_variables=["adjusted_gross_income"], ) print(result.tax_unit.income_tax, result.tax_unit.adjusted_gross_income) Design goal: a fresh coding session with no prior context and a 20-file browse budget reaches a correct number in two tool calls — one to `import policyengine as pe`, one for `pe.us.calculate_household(...)`. The old surface forced an agent to pick among three entry points (`calculate_household_impact`, `managed_microsimulation`, raw `Simulation`), build a pydantic `Input` wrapper, construct a `Policy` object with `ParameterValue`s, then dig into a `list[dict[str, Any]]` to get the number. Every one of those layers is gone. Changes: - Populate `policyengine/__init__.py` (previously empty) with `us`, `uk`, and `Simulation` accessors. - Add `tax_benefit_models/{us,uk}/household.py` with a kwargs-based `calculate_household` that builds a policyengine_us/uk Simulation with a situation dict and returns a dot-access HouseholdResult. - Add `tax_benefit_models/common/` with: - `compile_reform(dict) -> core reform dict` (scalar or `{effective_date: value}` shapes) - `dispatch_extra_variables(names)` — flat list, library looks up each name's entity via `variables_by_name` - `EntityResult(dict)` with `__getattr__` for dot access + paste-able-fix AttributeError on unknown names - `HouseholdResult(dict)` with `.to_dict()` / `.write(path)` - Add `utils/household_validation.py` that catches typo'd variable names in entity dicts with difflib close-match suggestions. - Remove `USHouseholdInput`, `UKHouseholdInput`, `USHouseholdOutput`, `UKHouseholdOutput`, and `calculate_household_impact` from both country modules (v4 breaking). - Each country __init__.py exposes `model` (the pinned `TaxBenefitModelVersion`) alongside the existing `us_latest` / `uk_latest` so agents can guess either name. - Rewrite `tests/test_household_impact.py` (19 tests) around the new API: kwargs inputs, dot-access results, flat `extra_variables`, error messages with paste-able fixes, JSON serialization. - Rewrite `tests/test_us_reform_application.py` around reform-dict inputs instead of `Policy(parameter_values=[...])`. - Update `tests/fixtures/us_reform_fixtures.py` to store household fixtures as plain kwargs dicts that splat into `calculate_household(**fixture)`. 223 tests pass locally. Downstream migration (policyengine-api-v2-alpha, the sole consumer of the 3.x surface): replace `calculate_household_impact(input, policy=p)` with `calculate_household(**input, reform=reform_dict)` — fixture script grep of call sites suggests ~25 LOC touched. The migration guide will show the before/after. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The review called out five ship-blockers. This commit fixes all five plus the three footguns: 1. Entity-aware validation. Placing `filing_status` on `people` instead of `tax_unit` now raises with the correct entity and the exact kwarg-swap to make: `tax_unit={'filing_status': <value>}`. 2. Realistic docstring examples. Top-of-module examples in us/household.py and uk/household.py are now lone-parent-with-child cases that exercise every grouping decision (state_code on household, is_tax_unit_dependent on person, would_claim_child_benefit on benunit), not single-adult-no-state cases that hide them. 3. Reform-path validation. `compile_reform` now takes `model_version` and raises with a difflib close-match suggestion on unknown parameter paths, matching the validator quality on variable names. 4. Scalar reform default date. Scalar reform values previously defaulted to `date.today().isoformat()` — a caller running a year=2026 sim mid-2026 got a mid-year effective date and a blended result. Now defaults to `{year}-01-01` (passed through from calculate_household). 5. Unexpected-kwargs catcher. UK `calculate_household(tax_unit=...)` and US `calculate_household(benunit=...)` now raise a TypeError that names the correct country-specific kwarg. Other unexpected kwargs get a difflib close-match from the allowed set. Also added: - `people=[]` check with an explicit error before the calc blows up inside policyengine_us. - Tests for all new error paths (`test__variable_on_wrong_entity`, `test__empty_people`, `test__unknown_reform_path`, `test__us_kwarg_on_uk`, `test__uk_kwarg_on_us`). 151 tests pass locally across the facade + reform + regression suites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Separates release-manifest + TRACE TRO emission from the core value layer. Consumers who only need Simulation / Policy / Variable / Parameter no longer transitively import h5py through scoping_strategy / constituency_impact / local_authority_impact. File moves: - core/release_manifest.py -> provenance/manifest.py - core/trace_tro.py -> provenance/trace.py New provenance/__init__.py re-exports the public surface (get_release_manifest, build_trace_tro_from_release_bundle, serialize_trace_tro, canonical_json_bytes, etc.). core/__init__.py drops the 20 provenance re-exports and keeps only value objects (Dataset, Variable, Parameter*, Policy, Dynamic, Simulation, Region, scoping strategies, TaxBenefitModel, TaxBenefitModelVersion). Explicit core -> provenance import in tax_benefit_model_version.py. Lazy h5py: - core/scoping_strategy.py: h5py no longer at top of module; imported inside WeightReplacementStrategy.apply() only. - outputs/constituency_impact.py: same. - outputs/local_authority_impact.py: same. Internal callers migrated: - tax_benefit_models/{us,uk}/model.py - tax_benefit_models/{us,uk}/datasets.py - countries/{us,uk}/regions.py - cli.py - results/trace_tro.py - scripts/generate_trace_tros.py - tests/test_{release_manifests,trace_tro,manifest_version_mismatch}.py - docs/release-bundles.md 216 tests pass locally across the v4 surface. `from policyengine.core import Simulation` + `from policyengine.provenance import get_release_manifest` both work without h5py installed (verified by temporarily uninstalling and retrying). The full `import policyengine as pe` still pulls h5py because policyengine_us / policyengine_uk import it eagerly (upstream); that's outside our control. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two byte-identical classes split only by British/American spelling (program_name vs programme_name). Collapsed into a single policyengine.outputs.ProgramStatistics; both country analysis helpers import it from there now. Saves ~106 LOC of duplication and removes an API-surface footgun for cross-country code. Changes: - Add policyengine/outputs/program_statistics.py with the unified class. - Re-export from policyengine/outputs/__init__.py. - Delete tax_benefit_models/us/outputs.py and tax_benefit_models/uk/outputs.py. - us/__init__.py and uk/__init__.py re-export from policyengine.outputs. - uk/analysis.py: rename programme_name -> program_name, programme_statistics -> program_statistics, programmes -> programs, programme_df/collection -> program_df/collection. Field on PolicyReformAnalysis also changes. Migration for callers: - from policyengine.tax_benefit_models.uk import ProgrammeStatistics -> from policyengine.outputs import ProgramStatistics - stats.programme_name -> stats.program_name 205 tests pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pulls ~300 lines of shared init/save/load logic out of PolicyEngineUSLatest and PolicyEngineUKLatest into a MicrosimulationModelVersion base in tax_benefit_models.common. The base handles: - Release-manifest fetch + installed-version warning - Data-release certification - Variable/parameter population from the country system - save() / load() + output-dataset filepath convention - _build_entity_relationships via declared group_entities Subclasses declare country_code, package_name, group_entities, entity_variables, and implement four thin hooks (_load_system, _load_region_registry, _dataset_class, _get_runtime_data_build_metadata). run() intentionally stays per-country: the US applies reforms at Microsimulation construction and manually copies structural columns, while the UK wraps inputs as UKSingleYearDataset and applies reforms after construction. Hiding those behind a shared skeleton would mask real divergence. Behaviour preservation is guarded by a byte-level snapshot test (tests/test_base_extraction_snapshot.py) covering four US and four UK household cases plus a model-surface snapshot. All 391 tests pass with zero snapshot drift.

- README and core-concepts now lead with pe.uk/pe.us entry points and pe.uk.calculate_household / pe.us.calculate_household (flat kwargs, dot-access result, dict reforms). - economic-impact-analysis, country-models-{uk,us}, and regions-and-scoping switched from `from policyengine.tax_benefit_models...` to the top-level facade. - Removed the "Legacy filter fields" section from regions-and-scoping now that filter_field/filter_value have been dropped (v4 breaking). - dev.md package-layout diagram updated to mention common/ base, provenance/ subpackage, and the MicrosimulationModelVersion extraction. - examples/household_impact_example.py rewritten against the v4 API and verified end-to-end against both UK and US models.

Unifies the v4 reform surface: the same flat {"param.path": value} / {"param.path": {date: value}} dict already accepted by pe.{uk,us}.calculate_household(reform=...) now works on population Simulation too. Dicts are compiled to Policy / Dynamic objects in a model_validator(mode="after") using tax_benefit_model_version for parameter-path validation and dataset.year for scalar effective-date defaulting. Adds compile_reform_to_policy / compile_reform_to_dynamic helpers in tax_benefit_models.common.reform, tested directly in tests/test_dict_reforms_on_simulation.py (6 tests covering scalar defaulting, effective-date mappings, path validation, pass-through of existing Policy objects, and the "no model_version" error path). Unknown parameter paths raise with close-match suggestions (same behaviour as the household calculator) so agents don't silently get a no-op reform from a typo. 397/397 tests pass. End-to-end microsim with Simulation(policy={"gov.irs.credits.ctc.amount.base[0].amount": 3000}) produces the same -$25.5B revenue impact as the manual Policy+ParameterValue construction it replaces.

The fixture is already registered in conftest.py; pytest auto-injects it by parameter name. Importing it explicitly triggered F811.

Bumps to 4.0.0 and addresses three reviewer passes (practitioner, code-simplifier, end-to-end verification) before v4 ships: Version / branding - pyproject.toml: 3.6.0 -> 4.0.0 - release_manifests/{us,uk}.json: bundle_id and policyengine_version bumped to 4.0.0 so the bundle TRO URLs point at the right git tag - test_release_manifests.py: assertion values updated API ergonomics - Simulation class now carries a full __doc__ with the canonical dict- reform call shape; help(pe.Simulation) used to return Pydantic boiler- plate, which hid the headline v4 feature from any agent that hits help() before reading source. - RowFilterStrategy.variable_value: Union[str, int, float]. Numeric columns (state_fips, county_fips) are now scopable; "state_code" doesn't exist on enhanced_cps_2024 so docs directed users at a column that would crash. - pe.__all__ now exports `outputs` so a fresh agent can tab-complete from pe. to the Aggregate/ChangeAggregate family without reading source. Docs - README: state_code_str -> state_code (consistent with us/household.py) - core-concepts.md: "Reform as a dict" section leads, "Reform as a Policy object" relegated to the escape-hatch appendix - economic-impact-analysis.md: both US and UK examples collapsed to single-line reform dicts (was 20 lines each of Parameter/ ParameterValue boilerplate) - country-models-{us,uk}.md: "Common policy reforms" sections rewritten as one-liners (lost ~130 lines of deprecated-ceremony boilerplate) - regions-and-scoping.md: variable_name="state_code" (broken) -> variable_name="state_fips", variable_value=6 - reform.py module docstring: document the [N].amount / [N].threshold indexed-parameter convention so agents don't hit the bracket-head trap Code simplification (simplifier review) - model_version.py: except (ValueError, Exception) -> except Exception - reform.py: compile_reform_to_policy / compile_reform_to_dynamic now share a private _compile_reform_to() helper (was 25 lines of copy-paste) - simulation.py: _compile_dict_reforms loops over (field, compiler) pairs instead of branching twice by hand - tests/test_base_extraction_snapshot.py renamed to test_household_calculator_snapshot.py (matches what it actually pins, not the refactor that motivated it); fixture dir follows 397 tests pass. ruff clean.

MaxGhenis and others added 12 commits April 20, 2026 09:12

Snapshot test: freeze US/UK household+model outputs pre-base-extraction

3ce219b

Add changelog fragment for MicrosimulationModelVersion extraction

8366d58

Drop quotes from inline Simulation type annotations (ruff UP037)

101d6c0

ruff format + drop redundant us_test_dataset import

1865c4c

The fixture is already registered in conftest.py; pytest auto-injects it by parameter name. Importing it explicitly triggered F811.

MaxGhenis force-pushed the v4 branch from 76ea9a0 to aa0c3b9 Compare April 20, 2026 13:13

MaxGhenis mentioned this pull request Apr 20, 2026

Migrate docs from MyST to Quarto + prototype auto-generated reference #301

Closed

3 tasks

MaxGhenis merged commit 28d8904 into main Apr 20, 2026
12 checks passed

MaxGhenis mentioned this pull request Apr 20, 2026

Swap docs toolchain from MyST to Quarto #304

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Launch v4: agent-first surface, shared microsim base, dict reforms, refreshed docs#298

Launch v4: agent-first surface, shared microsim base, dict reforms, refreshed docs#298
MaxGhenis merged 12 commits intomainfrom
v4

MaxGhenis commented Apr 19, 2026

Uh oh!

MaxGhenis commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Apr 19, 2026

Summary

What v4 ships

1. Agent-first household calculator (replaces ceremony)

2. Dict reforms on Simulation too (#297, net-new)

3. Provenance split out of core (+ lazy h5py)

4. Shared MicrosimulationModelVersion base

5. Unified ProgramStatistics

6. Docs refreshed for the agent-first surface

Breaking changes

Replacements for v3 stack

Test plan

Uh oh!

MaxGhenis commented Apr 19, 2026

Review fixes pushed (commit 76ea9a0)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

2. Dict reforms on `Simulation` too (#297, net-new)

4. Shared `MicrosimulationModelVersion` base

5. Unified `ProgramStatistics`

Review fixes pushed (commit `76ea9a0`)