Skip to content

[v4] Provenance package + lazy h5py#292

Closed
MaxGhenis wants to merge 1 commit intov4-facadefrom
v4-provenance
Closed

[v4] Provenance package + lazy h5py#292
MaxGhenis wants to merge 1 commit intov4-facadefrom
v4-provenance

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Stacked on #291. Fourth in the v4 PR chain.

What

Separates the provenance layer from the core value-object layer:

  • `policyengine/core/release_manifest.py` → `policyengine/provenance/manifest.py`
  • `policyengine/core/trace_tro.py` → `policyengine/provenance/trace.py`
  • New `policyengine.provenance` package with re-exports of the public surface (`get_release_manifest`, `build_trace_tro_from_release_bundle`, `serialize_trace_tro`, `canonical_json_bytes`, etc.)
  • `policyengine.core.init.py` drops 20 provenance re-exports; keeps value objects only (Dataset, Variable, Parameter*, Policy, Dynamic, Simulation, Region, scoping strategies, TaxBenefitModel, TaxBenefitModelVersion)
  • `h5py` moved from top-of-module import to lazy import inside `WeightReplacementStrategy.apply()`, `constituency_impact.run()`, `local_authority_impact.run()`

Why (agent-UX framing)

  • Provenance is a narrow, opinionated API; it doesn't belong at the same layer as `Variable`. A new session reading `policyengine.core` now sees ~15 value-object exports, not 46 names with provenance internals mixed in.
  • `from policyengine.core import Simulation` no longer drags ~15 MB of h5py through the import graph. Lightweight consumers get a faster, smaller import.

Migration

Downstream code that did:
```python
from policyengine.core import DataReleaseManifest, get_release_manifest
```
now uses:
```python
from policyengine.provenance import DataReleaseManifest, get_release_manifest
```

Test plan

  • 216 tests pass locally across the full v4 surface (household, reform, release manifests, trace TRO, models, regions, scoping strategy, poverty/inequality/decile/constituency/LA impacts)
  • `ruff check .` + `ruff format --check .` clean
  • Verified manually: `from policyengine.core import Simulation` + `from policyengine.provenance import get_release_manifest` both work without h5py installed (tested by temporarily uninstalling)

Remaining in v4

  • `MicrosimulationModelVersion` base extraction (~600 LOC duplication between us/model.py and uk/model.py)
  • Unify `ProgramStatistics` / `ProgrammeStatistics`
  • Lazy `us_latest` / `uk_latest` cold-start (defer to v4.1)

🤖 Generated with Claude Code

Separates release-manifest + TRACE TRO emission from the core value
layer. Consumers who only need Simulation / Policy / Variable /
Parameter no longer transitively import h5py through
scoping_strategy / constituency_impact / local_authority_impact.

File moves:
- core/release_manifest.py -> provenance/manifest.py
- core/trace_tro.py        -> provenance/trace.py

New provenance/__init__.py re-exports the public surface
(get_release_manifest, build_trace_tro_from_release_bundle,
serialize_trace_tro, canonical_json_bytes, etc.).

core/__init__.py drops the 20 provenance re-exports and keeps only
value objects (Dataset, Variable, Parameter*, Policy, Dynamic,
Simulation, Region, scoping strategies, TaxBenefitModel,
TaxBenefitModelVersion). Explicit core -> provenance import in
tax_benefit_model_version.py.

Lazy h5py:
- core/scoping_strategy.py: h5py no longer at top of module; imported
  inside WeightReplacementStrategy.apply() only.
- outputs/constituency_impact.py: same.
- outputs/local_authority_impact.py: same.

Internal callers migrated:
- tax_benefit_models/{us,uk}/model.py
- tax_benefit_models/{us,uk}/datasets.py
- countries/{us,uk}/regions.py
- cli.py
- results/trace_tro.py
- scripts/generate_trace_tros.py
- tests/test_{release_manifests,trace_tro,manifest_version_mismatch}.py
- docs/release-bundles.md

216 tests pass locally across the v4 surface. `from policyengine.core
import Simulation` + `from policyengine.provenance import
get_release_manifest` both work without h5py installed (verified by
temporarily uninstalling and retrying). The full `import policyengine
as pe` still pulls h5py because policyengine_us / policyengine_uk
import it eagerly (upstream); that's outside our control.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MaxGhenis
Copy link
Copy Markdown
Contributor Author

Superseded by #298 (consolidated v4 launch PR). All commits cherry-picked cleanly onto v4.

@MaxGhenis MaxGhenis closed this Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant