Migrate docs from MyST to Quarto + prototype auto-generated reference#301
Migrate docs from MyST to Quarto + prototype auto-generated reference#301
Conversation
Collapses the household-calculator journey into one obvious call:
import policyengine as pe
result = pe.us.calculate_household(
people=[{"age": 35, "employment_income": 60000}],
tax_unit={"filing_status": "SINGLE"},
year=2026,
reform={"gov.irs.deductions.standard.amount.SINGLE": 5000},
extra_variables=["adjusted_gross_income"],
)
print(result.tax_unit.income_tax, result.tax_unit.adjusted_gross_income)
Design goal: a fresh coding session with no prior context and a 20-file
browse budget reaches a correct number in two tool calls — one to
`import policyengine as pe`, one for `pe.us.calculate_household(...)`.
The old surface forced an agent to pick among three entry points
(`calculate_household_impact`, `managed_microsimulation`, raw
`Simulation`), build a pydantic `Input` wrapper, construct a `Policy`
object with `ParameterValue`s, then dig into a `list[dict[str, Any]]`
to get the number. Every one of those layers is gone.
Changes:
- Populate `policyengine/__init__.py` (previously empty) with
`us`, `uk`, and `Simulation` accessors.
- Add `tax_benefit_models/{us,uk}/household.py` with a kwargs-based
`calculate_household` that builds a policyengine_us/uk Simulation
with a situation dict and returns a dot-access HouseholdResult.
- Add `tax_benefit_models/common/` with:
- `compile_reform(dict) -> core reform dict` (scalar or
`{effective_date: value}` shapes)
- `dispatch_extra_variables(names)` — flat list, library looks up
each name's entity via `variables_by_name`
- `EntityResult(dict)` with `__getattr__` for dot access +
paste-able-fix AttributeError on unknown names
- `HouseholdResult(dict)` with `.to_dict()` / `.write(path)`
- Add `utils/household_validation.py` that catches typo'd variable
names in entity dicts with difflib close-match suggestions.
- Remove `USHouseholdInput`, `UKHouseholdInput`, `USHouseholdOutput`,
`UKHouseholdOutput`, and `calculate_household_impact` from both
country modules (v4 breaking).
- Each country __init__.py exposes `model` (the pinned
`TaxBenefitModelVersion`) alongside the existing `us_latest` /
`uk_latest` so agents can guess either name.
- Rewrite `tests/test_household_impact.py` (19 tests) around the new
API: kwargs inputs, dot-access results, flat `extra_variables`,
error messages with paste-able fixes, JSON serialization.
- Rewrite `tests/test_us_reform_application.py` around reform-dict
inputs instead of `Policy(parameter_values=[...])`.
- Update `tests/fixtures/us_reform_fixtures.py` to store
household fixtures as plain kwargs dicts that splat into
`calculate_household(**fixture)`.
223 tests pass locally.
Downstream migration (policyengine-api-v2-alpha, the sole consumer of
the 3.x surface): replace `calculate_household_impact(input, policy=p)`
with `calculate_household(**input, reform=reform_dict)` — fixture
script grep of call sites suggests ~25 LOC touched. The migration
guide will show the before/after.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The review called out five ship-blockers. This commit fixes all five
plus the three footguns:
1. Entity-aware validation. Placing `filing_status` on `people`
instead of `tax_unit` now raises with the correct entity and the
exact kwarg-swap to make: `tax_unit={'filing_status': <value>}`.
2. Realistic docstring examples. Top-of-module examples in us/household.py
and uk/household.py are now lone-parent-with-child cases that
exercise every grouping decision (state_code on household,
is_tax_unit_dependent on person, would_claim_child_benefit on
benunit), not single-adult-no-state cases that hide them.
3. Reform-path validation. `compile_reform` now takes `model_version`
and raises with a difflib close-match suggestion on unknown
parameter paths, matching the validator quality on variable names.
4. Scalar reform default date. Scalar reform values previously
defaulted to `date.today().isoformat()` — a caller running a
year=2026 sim mid-2026 got a mid-year effective date and a blended
result. Now defaults to `{year}-01-01` (passed through from
calculate_household).
5. Unexpected-kwargs catcher. UK `calculate_household(tax_unit=...)`
and US `calculate_household(benunit=...)` now raise a TypeError
that names the correct country-specific kwarg. Other unexpected
kwargs get a difflib close-match from the allowed set.
Also added:
- `people=[]` check with an explicit error before the calc blows up
inside policyengine_us.
- Tests for all new error paths (`test__variable_on_wrong_entity`,
`test__empty_people`, `test__unknown_reform_path`,
`test__us_kwarg_on_uk`, `test__uk_kwarg_on_us`).
151 tests pass locally across the facade + reform + regression suites.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Separates release-manifest + TRACE TRO emission from the core value
layer. Consumers who only need Simulation / Policy / Variable /
Parameter no longer transitively import h5py through
scoping_strategy / constituency_impact / local_authority_impact.
File moves:
- core/release_manifest.py -> provenance/manifest.py
- core/trace_tro.py -> provenance/trace.py
New provenance/__init__.py re-exports the public surface
(get_release_manifest, build_trace_tro_from_release_bundle,
serialize_trace_tro, canonical_json_bytes, etc.).
core/__init__.py drops the 20 provenance re-exports and keeps only
value objects (Dataset, Variable, Parameter*, Policy, Dynamic,
Simulation, Region, scoping strategies, TaxBenefitModel,
TaxBenefitModelVersion). Explicit core -> provenance import in
tax_benefit_model_version.py.
Lazy h5py:
- core/scoping_strategy.py: h5py no longer at top of module; imported
inside WeightReplacementStrategy.apply() only.
- outputs/constituency_impact.py: same.
- outputs/local_authority_impact.py: same.
Internal callers migrated:
- tax_benefit_models/{us,uk}/model.py
- tax_benefit_models/{us,uk}/datasets.py
- countries/{us,uk}/regions.py
- cli.py
- results/trace_tro.py
- scripts/generate_trace_tros.py
- tests/test_{release_manifests,trace_tro,manifest_version_mismatch}.py
- docs/release-bundles.md
216 tests pass locally across the v4 surface. `from policyengine.core
import Simulation` + `from policyengine.provenance import
get_release_manifest` both work without h5py installed (verified by
temporarily uninstalling and retrying). The full `import policyengine
as pe` still pulls h5py because policyengine_us / policyengine_uk
import it eagerly (upstream); that's outside our control.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two byte-identical classes split only by British/American spelling (program_name vs programme_name). Collapsed into a single policyengine.outputs.ProgramStatistics; both country analysis helpers import it from there now. Saves ~106 LOC of duplication and removes an API-surface footgun for cross-country code. Changes: - Add policyengine/outputs/program_statistics.py with the unified class. - Re-export from policyengine/outputs/__init__.py. - Delete tax_benefit_models/us/outputs.py and tax_benefit_models/uk/outputs.py. - us/__init__.py and uk/__init__.py re-export from policyengine.outputs. - uk/analysis.py: rename programme_name -> program_name, programme_statistics -> program_statistics, programmes -> programs, programme_df/collection -> program_df/collection. Field on PolicyReformAnalysis also changes. Migration for callers: - from policyengine.tax_benefit_models.uk import ProgrammeStatistics -> from policyengine.outputs import ProgramStatistics - stats.programme_name -> stats.program_name 205 tests pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls ~300 lines of shared init/save/load logic out of PolicyEngineUSLatest and PolicyEngineUKLatest into a MicrosimulationModelVersion base in tax_benefit_models.common. The base handles: - Release-manifest fetch + installed-version warning - Data-release certification - Variable/parameter population from the country system - save() / load() + output-dataset filepath convention - _build_entity_relationships via declared group_entities Subclasses declare country_code, package_name, group_entities, entity_variables, and implement four thin hooks (_load_system, _load_region_registry, _dataset_class, _get_runtime_data_build_metadata). run() intentionally stays per-country: the US applies reforms at Microsimulation construction and manually copies structural columns, while the UK wraps inputs as UKSingleYearDataset and applies reforms after construction. Hiding those behind a shared skeleton would mask real divergence. Behaviour preservation is guarded by a byte-level snapshot test (tests/test_base_extraction_snapshot.py) covering four US and four UK household cases plus a model-surface snapshot. All 391 tests pass with zero snapshot drift.
- README and core-concepts now lead with pe.uk/pe.us entry points and
pe.uk.calculate_household / pe.us.calculate_household (flat kwargs,
dot-access result, dict reforms).
- economic-impact-analysis, country-models-{uk,us}, and
regions-and-scoping switched from `from policyengine.tax_benefit_models...`
to the top-level facade.
- Removed the "Legacy filter fields" section from regions-and-scoping
now that filter_field/filter_value have been dropped (v4 breaking).
- dev.md package-layout diagram updated to mention common/ base,
provenance/ subpackage, and the MicrosimulationModelVersion extraction.
- examples/household_impact_example.py rewritten against the v4 API and
verified end-to-end against both UK and US models.
Unifies the v4 reform surface: the same flat {"param.path": value} /
{"param.path": {date: value}} dict already accepted by
pe.{uk,us}.calculate_household(reform=...) now works on population
Simulation too. Dicts are compiled to Policy / Dynamic objects in a
model_validator(mode="after") using tax_benefit_model_version for
parameter-path validation and dataset.year for scalar effective-date
defaulting.
Adds compile_reform_to_policy / compile_reform_to_dynamic helpers
in tax_benefit_models.common.reform, tested directly in
tests/test_dict_reforms_on_simulation.py (6 tests covering scalar
defaulting, effective-date mappings, path validation, pass-through of
existing Policy objects, and the "no model_version" error path).
Unknown parameter paths raise with close-match suggestions (same
behaviour as the household calculator) so agents don't silently get a
no-op reform from a typo.
397/397 tests pass. End-to-end microsim with
Simulation(policy={"gov.irs.credits.ctc.amount.base[0].amount": 3000})
produces the same -$25.5B revenue impact as the manual
Policy+ParameterValue construction it replaces.
The fixture is already registered in conftest.py; pytest auto-injects it by parameter name. Importing it explicitly triggered F811.
Bumps to 4.0.0 and addresses three reviewer passes (practitioner,
code-simplifier, end-to-end verification) before v4 ships:
Version / branding
- pyproject.toml: 3.6.0 -> 4.0.0
- release_manifests/{us,uk}.json: bundle_id and policyengine_version
bumped to 4.0.0 so the bundle TRO URLs point at the right git tag
- test_release_manifests.py: assertion values updated
API ergonomics
- Simulation class now carries a full __doc__ with the canonical dict-
reform call shape; help(pe.Simulation) used to return Pydantic boiler-
plate, which hid the headline v4 feature from any agent that hits
help() before reading source.
- RowFilterStrategy.variable_value: Union[str, int, float]. Numeric
columns (state_fips, county_fips) are now scopable; "state_code"
doesn't exist on enhanced_cps_2024 so docs directed users at a
column that would crash.
- pe.__all__ now exports `outputs` so a fresh agent can tab-complete
from pe. to the Aggregate/ChangeAggregate family without reading
source.
Docs
- README: state_code_str -> state_code (consistent with us/household.py)
- core-concepts.md: "Reform as a dict" section leads, "Reform as a
Policy object" relegated to the escape-hatch appendix
- economic-impact-analysis.md: both US and UK examples collapsed to
single-line reform dicts (was 20 lines each of Parameter/
ParameterValue boilerplate)
- country-models-{us,uk}.md: "Common policy reforms" sections rewritten
as one-liners (lost ~130 lines of deprecated-ceremony boilerplate)
- regions-and-scoping.md: variable_name="state_code" (broken) ->
variable_name="state_fips", variable_value=6
- reform.py module docstring: document the [N].amount / [N].threshold
indexed-parameter convention so agents don't hit the bracket-head
trap
Code simplification (simplifier review)
- model_version.py: except (ValueError, Exception) -> except Exception
- reform.py: compile_reform_to_policy / compile_reform_to_dynamic now
share a private _compile_reform_to() helper (was 25 lines of
copy-paste)
- simulation.py: _compile_dict_reforms loops over (field, compiler)
pairs instead of branching twice by hand
- tests/test_base_extraction_snapshot.py renamed to
test_household_calculator_snapshot.py (matches what it actually
pins, not the refactor that motivated it); fixture dir follows
397 tests pass. ruff clean.
New subpackage for querying PolicyEngine Variable dependency
structure by AST-walking source trees. No runtime dependency on
country models — the extractor is pure static analysis, so it works
on any `policyengine-us` / `policyengine-uk` checkout (or fork)
regardless of whether the jurisdiction is installed.
Motivation: grep answers "who mentions this symbol" but not "what is
the dataflow DAG." PolicyEngine variables form deep dependency
chains (state EITC depends on federal EITC depends on earned income,
etc.). A graph-shaped API beats grep for refactor-impact checks,
docs generation, and code-introspection queries from
policyengine-claude (once migrated to call policyengine.py as the
programmatic contract).
Public API (from policyengine.graph):
- extract_from_path(path) -> VariableGraph
- VariableGraph.deps(var) — direct dependencies
- VariableGraph.impact(var) — transitive downstream
- VariableGraph.path(src, dst) — shortest dependency chain
Reference patterns recognized in v1:
1. <entity>("<var>", <period>) — person/tax_unit/spm_unit/household/
family/marital_unit/benunit direct call with string literal arg
2. add(<entity>, <period>, ["v1", "v2"]) — sum helper; each string
in the list becomes an edge (also handles ``aggr``)
Tracked for v2: parameter edges (parameters(period).gov...),
entity.sum("var") method calls, dynamic variable names (string
concatenation / f-strings).
Real-data smoke: indexes policyengine-us (4,577 variables) in 0.8s.
impact("adjusted_gross_income") returns 504 transitively-dependent
variables; direct deps are exactly {irs_gross_income,
above_the_line_deductions, basic_income} matching the formula body.
Tests: tests/test_graph/test_extractor.py (9 pass). The tests load
the graph submodule via importlib rather than going through
`policyengine/__init__.py`, because the full package init eagerly
imports country models which can fail in dev environments with
missing release manifests. The graph subpackage is dep-light
(stdlib + networkx) so this workaround is both clean and
well-motivated.
Limitations noted in the module docstrings:
- Parameter references not yet captured (v2).
- Dynamic variable names skipped (low prevalence).
- entity.sum("var") method calls not yet recognized (v2).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Replace docs/myst.yml with docs/_quarto.yml (same navigation layout).
- Convert {literalinclude} directives to Quarto include syntax in examples.md.
- Update Makefile docs/docs-serve targets to invoke quarto.
- Update CI workflow to install Quarto instead of mystmd and run quarto render.
- Add docs-generate-reference Makefile target wiring the new auto-generator.
- Drop docs/_build; use docs/_site/ and docs/_freeze/ for Quarto artifacts.
Content: all 11 existing MyST pages render cleanly under Quarto without
prose changes (CommonMark overlap is near-total). The generator script
from the earlier prototype is included but does not run automatically
during the PR check.
Replaces the ported MyST docs with a fresh user-journey-oriented set
written against the actual policyengine 4.0.0 API surface:
- index.md: one-pager with install, minimal example, 'where to go' table
- getting-started.md (new): install, first household, first reform
- households.md (new): calculate_household reference, entities, reforms,
results shape; replaces the households portion of the old 731-line
core-concepts.md
- reforms.md (new): parametric + time-varying + structural reforms,
combining them, validation
- microsim.md (new): Simulation, ensure_datasets, managed_microsimulation,
memory/performance notes
- impact-analysis.md: economic_impact_analysis walkthrough; replaces the
longer economic-impact-analysis.md
- outputs.md (new): one section per output type with usage examples
(Aggregate, ChangeAggregate, DecileImpact, IntraDecileImpact, Poverty,
Inequality, geographic impacts, ProgramStatistics, OutputCollection)
- regions.md: US states/districts and UK constituencies/local authorities
- countries.md: US vs UK entity differences, default datasets, poverty
measures, programs - replaces the long country-models-uk/us pair
- visualisation.md: to_plotly helpers, household curves, palette
- release-bundles.md: provenance manifests and content addressing
- examples.md: runnable examples list with Quarto include directives
- dev.md: setup, tests, docs, CI, architecture
Deleted: core-concepts.md (731 lines -> split across new pages),
advanced-outputs.md (replaced by outputs.md), regions-and-scoping.md
(-> regions.md), country-models-{uk,us}.md (-> countries.md),
economic-impact-analysis.md (-> impact-analysis.md).
Total went from 2986 lines to 847 lines with better coverage. Navigation
in _quarto.yml reorganized around user journey.
|
Quick take — I'd like to see this split before landing. What's actually in the PR (pre-rebase)Four distinct commits:
Strong positives
Concerns
RecommendationDon't merge as-is. Split into 2-3 PRs, rebase after #298. I'd review A (Quarto swap) in ~15 min, B (graph + generator) in ~45 min, C (docs rewrite) in ~1 hour — bundled, needs a full afternoon and approval confidence is lower. |
|
All six points valid. Plan:
Closing this and will reopen as A/B/C once #298 merges. |
Two changes bundled:
1. MyST → Quarto toolchain swap
All 11 existing MyST pages render cleanly under Quarto without prose changes — CommonMark overlap is near-total.
2. Prototype: auto-generated reference
Generated pages are not yet committed to the repo — the generator is scaffolded, but when to run it (per-release vs in CI) is a follow-up decision.
Why Quarto
Test plan