Guard _fast_cache mutations; compare assert_near at narrower float dtype by MaxGhenis · Pull Request #474 · PolicyEngine/policyengine-core

MaxGhenis · 2026-04-17T20:59:36Z

Summary

Fixes two regressions that surfaced after 3.24.1 reached PyPI and are currently blocking policyengine-uk / policyengine-uk-data on main. Tracked in #473.

1. `AttributeError: 'Microsimulation' object has no attribute '_fast_cache'`

Simulation.__init__ initialises self._fast_cache = {}, but country-package subclasses (e.g. policyengine_uk.Simulation) legitimately override __init__ without calling super().__init__ — they set the attributes they need directly and don't mirror every private attribute core owns. The read-side fast path in calculate() already does getattr(self, "_fast_cache", None), but four write-side sites in set_input, delete_arrays, and purge_cache_of_invalid_values use bare self._fast_cache.pop(...) / .items() / re-assign, so they crashed on those subclasses.

Core now tolerates subclasses that don't mirror the attribute instead of requiring every downstream __init__ to set it. Regression test: tests/core/test_fast_cache_guards.py.

Repro (from policyengine-uk-data#359):

File "policyengine_uk/simulation.py", line 420, in build_from_single_year_dataset
    self.build_from_multi_year_dataset(dataset)
File "policyengine_uk/simulation.py", line 453, in build_from_multi_year_dataset
    self.set_input(variable, year, table[variable])
File "policyengine_core/simulations/simulation.py", line 1244, in set_input
    self._fast_cache.pop((variable_name, period), None)
AttributeError: 'Microsimulation' object has no attribute '_fast_cache'

2. False YAML-test failures on float-typed Variables (post-#464)

#464 switched assert_near to compare both operands at float64 so H6-style bugs (25_000_000 vs 25_000_001 silently equal under float32) would be caught. That's correct for int/float64 operands. But PolicyEngine stores float Variables as float32 (VALUE_TYPES[float]["dtype"] = numpy.float32), so YAML tests like

output:
  minimum_wage: 8.91

compare a float32-stored 8.90999985... against the literal 8.91. Under float64 comparison that diff is real (1.5e-7), so the test now fails where it silently passed before.

The float32 storage rounding isn't a regression — it's been there since forever. Make assert_near pick the comparison dtype from value.dtype:

float32 when the value is already float32 — preserves the pre-#464 behaviour for YAML tests against float Variables
float64 otherwise — preserves #464's H6 coverage for int/float64 operands

The existing H6 regression test (test_assert_near_detects_large_integer_difference) still passes (value here is int → promotes to float64). Added a new regression test (test_assert_near_accepts_float32_storage_rounding) asserting a float32 8.91 compares equal to the Python literal 8.91 with absolute_error_margin=0.

Downstream: unblocks the 7 pre-existing minimum_wage + UC YAML failures in policyengine-uk main and the build_from_single_year_dataset AttributeError in policyengine-uk-data.

Test plan

510 existing core tests still pass locally (pytest tests)
3 new tests in tests/core/test_fast_cache_guards.py pass
1 new test in tests/core/test_assert_near_precision.py passes; existing H6 test still passes
ruff format --check + ruff check clean
CI matrix 3.9 – 3.14 on ubuntu + windows

Two regressions surfaced after 3.24.1 reached PyPI, both blocking downstream country packages. 1. AttributeError on Microsimulation._fast_cache ------------------------------------------------- ``Simulation.__init__`` initialises ``self._fast_cache = {}``, but country-package subclasses (e.g. ``policyengine_uk.Simulation``) legitimately override ``__init__`` without calling ``super().__init__`` — they set the attributes they need directly and do not mirror every private attribute core owns. On those subclasses, ``set_input``, ``delete_arrays``, and ``purge_cache_of_invalid_values`` raised AttributeError: 'Microsimulation' object has no attribute '_fast_cache' during dataset load. Add ``getattr(self, "_fast_cache", None)`` guards at the four bare call sites — the read-side fast path in ``calculate()`` already guards this way, so this just makes the write-side consistent. Core now tolerates subclasses that don't mirror the attribute, instead of requiring every downstream ``__init__`` to set it explicitly. Regression tests: ``tests/core/test_fast_cache_guards.py``. 2. False YAML-test failures on float-typed Variables (post-#464) ----------------------------------------------------------------- #464 switched ``assert_near`` to compare both operands at float64 so H6-style bugs (``25_000_000`` vs ``25_000_001`` silently equal under float32) would be caught. That's correct for int/float64 operands. But PolicyEngine stores float Variables as float32 (``VALUE_TYPES[float]["dtype"] = numpy.float32``), so YAML tests like output: minimum_wage: 8.91 compare a float32-stored ``8.90999985...`` against the literal ``8.91``. Under float64 comparison that diff is real (1.5e-7), so the test now fails where it silently passed before. The storage rounding isn't new. Make ``assert_near`` pick the comparison dtype from ``value.dtype``: float32 when the value is already float32, float64 otherwise. This keeps #464's H6 coverage (the int/float64 repro at ``tests/core/test_assert_near_precision.py::test_assert_near_detects_large_integer_difference`` still passes) without surfacing the pre-existing float32 storage artefacts. Added a new regression test that asserts a float32 8.91 compares equal to the Python literal 8.91 with ``absolute_error_margin=0``. Downstream: unblocks ``policyengine-uk`` ``minimum_wage`` + UC YAML tests (7 pre-existing on main under core 3.24.0+) and ``policyengine-uk-data`` Microsimulation build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The H3 fix (PR PolicyEngine#463) added `_invalidate_all_caches` to wipe `holder._memory_storage._arrays` for every variable after `apply_reform`, so formula output caches couldn't survive a reform that invalidated them. But user-provided `set_input` values share the same storage, so they got wiped too. Country packages (e.g. `policyengine_uk.Simulation.__init__`) follow this pattern: 1. build populations, call `self.set_input(...)` from the dataset 2. apply a structural reform derived from parameters 3. calculate downstream variables Step 2's `apply_reform` triggered `_invalidate_all_caches`, silently discarding the dataset loaded in step 1. Surfaced in PolicyEngine/policyengine.py#1628 (UK household-impact tests returning 0 because age, employment_income, would_claim_* — every single input — were wiped before any calculation ran). Fix: track every `(variable, branch, period)` populated via `Simulation.set_input` in a new `_user_input_keys` set. On `_invalidate_all_caches`, snapshot those entries from storage, perform the wipe, then replay them back. Formula-output caches are still invalidated; user inputs survive. The attribute is lazy-initialised inside `set_input` so country- package subclasses that bypass `super().__init__` (the same pattern `_fast_cache` was guarded for in PR PolicyEngine#474) automatically pick up the preservation without a downstream code change. Three new tests under `tests/core/test_apply_reform_preserves_user_inputs.py`: - a set_input value survives a no-op reform - multiple set_input values survive - the H3 fix (formula cache invalidation) still holds — a neutralize_variable reform still drops the cached formula output End-to-end: installing this branch into a policyengine.py checkout with `policyengine-uk==2.88.0` turns the failing `TestUKHouseholdImpact.test_single_adult_with_employment_income` (0.0 > 0) into a passing £7,486 income tax on a £50k salary, and `test_family_with_children` into a passing £2,328 child benefit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The H3 fix (PR #463) added `_invalidate_all_caches` to wipe `holder._memory_storage._arrays` for every variable after `apply_reform`, so formula output caches couldn't survive a reform that invalidated them. But user-provided `set_input` values share the same storage, so they got wiped too. Country packages (e.g. `policyengine_uk.Simulation.__init__`) follow this pattern: 1. build populations, call `self.set_input(...)` from the dataset 2. apply a structural reform derived from parameters 3. calculate downstream variables Step 2's `apply_reform` triggered `_invalidate_all_caches`, silently discarding the dataset loaded in step 1. Surfaced in PolicyEngine/policyengine.py#1628 (UK household-impact tests returning 0 because age, employment_income, would_claim_* — every single input — were wiped before any calculation ran). Fix: track every `(variable, branch, period)` populated via `Simulation.set_input` in a new `_user_input_keys` set. On `_invalidate_all_caches`, snapshot those entries from storage, perform the wipe, then replay them back. Formula-output caches are still invalidated; user inputs survive. The attribute is lazy-initialised inside `set_input` so country- package subclasses that bypass `super().__init__` (the same pattern `_fast_cache` was guarded for in PR #474) automatically pick up the preservation without a downstream code change. Three new tests under `tests/core/test_apply_reform_preserves_user_inputs.py`: - a set_input value survives a no-op reform - multiple set_input values survive - the H3 fix (formula cache invalidation) still holds — a neutralize_variable reform still drops the cached formula output End-to-end: installing this branch into a policyengine.py checkout with `policyengine-uk==2.88.0` turns the failing `TestUKHouseholdImpact.test_single_adult_with_employment_income` (0.0 > 0) into a passing £7,486 income tax on a £50k salary, and `test_family_with_children` into a passing £2,328 child benefit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Context: PR #474 landed the fix for a `_fast_cache` AttributeError that only surfaced in `policyengine-uk` / `policyengine-uk-data` because core's Microsimulation __init__ behavior changed subtly. Similarly, the breakdown-validator tightening in #449 silently broke `policyengine-us`'s `additional_tax_bracket` param until downstream pypkg integration tests tripped on it. The existing `SmokeTestForMultipleVersions` job already installs policyengine-us from PyPI, but the test it runs (`tests/smoke/test_us.py`) is gated by `RUN_SMOKE_TESTS=1` while CI sets `RUN_SMOKE_TESTS: "0"`, so it's always skipped. Add `tests/smoke/test_country_init.py` that simply instantiates `CountryTaxBenefitSystem()` for policyengine-us and policyengine-uk. No microdata required, no HF token required, runs in seconds. This catches any future core change that breaks country-model init — breakdown validator errors, missing private attrs, etc. — before it hits downstream. Also extends the workflow to install policyengine-uk alongside policyengine-us so both tests have something to exercise. Leaves the existing `test_us.py` / `RUN_SMOKE_TESTS` env-var gating alone — those tests need microdata credentials and are a separate decision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Follow-up to PR #474's _fast_cache guards. The same pattern applied to the other private cache attribute on Simulation: country-package subclasses that override __init__ without calling super().__init__ can skip initialising self.invalidated_caches, so purge_cache_of_invalid_values and invalidate_cache_entry now fall back gracefully instead of raising AttributeError. purge_cache_of_invalid_values: early-return if invalidated_caches is missing. invalidate_cache_entry: lazily initialise invalidated_caches with the new entry if it's missing. Extends tests/core/test_fast_cache_guards.py with two new regression tests for the invalidated_caches path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Preserve set_input values across apply_reform The H3 fix (PR #463) added `_invalidate_all_caches` to wipe `holder._memory_storage._arrays` for every variable after `apply_reform`, so formula output caches couldn't survive a reform that invalidated them. But user-provided `set_input` values share the same storage, so they got wiped too. Country packages (e.g. `policyengine_uk.Simulation.__init__`) follow this pattern: 1. build populations, call `self.set_input(...)` from the dataset 2. apply a structural reform derived from parameters 3. calculate downstream variables Step 2's `apply_reform` triggered `_invalidate_all_caches`, silently discarding the dataset loaded in step 1. Surfaced in PolicyEngine/policyengine.py#1628 (UK household-impact tests returning 0 because age, employment_income, would_claim_* — every single input — were wiped before any calculation ran). Fix: track every `(variable, branch, period)` populated via `Simulation.set_input` in a new `_user_input_keys` set. On `_invalidate_all_caches`, snapshot those entries from storage, perform the wipe, then replay them back. Formula-output caches are still invalidated; user inputs survive. The attribute is lazy-initialised inside `set_input` so country- package subclasses that bypass `super().__init__` (the same pattern `_fast_cache` was guarded for in PR #474) automatically pick up the preservation without a downstream code change. Three new tests under `tests/core/test_apply_reform_preserves_user_inputs.py`: - a set_input value survives a no-op reform - multiple set_input values survive - the H3 fix (formula cache invalidation) still holds — a neutralize_variable reform still drops the cached formula output End-to-end: installing this branch into a policyengine.py checkout with `policyengine-uk==2.88.0` turns the failing `TestUKHouseholdImpact.test_single_adult_with_employment_income` (0.0 > 0) into a passing £7,486 income tax on a £50k salary, and `test_family_with_children` into a passing £2,328 child benefit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Restore nested-branch input inheritance and cover situation-dict set_input Two follow-up fixes on top of the ``Simulation.set_input`` preservation in this PR, plus a latent bug that surfaced while wiring them up. Together they unblock ``policyengine-us`` bumping its core floor past 3.24.0 (``PolicyEngine/policyengine-us#8066``). ## 1. ``Holder.set_input`` now also records ``_user_input_keys`` The previous patch only recorded keys inside ``Simulation.set_input``. But ``SimulationBuilder.finalize_variables_init`` (the path taken when ``Simulation(situation=...)`` is called) routes inputs through ``holder.set_input`` directly, bypassing the simulation-level hook. So every situation-dict input got wiped by the post-``apply_reform`` cache invalidation in country-package subclasses that apply a structural reform during construction (the ``policyengine-us`` pattern). Recording the key inside ``Holder.set_input`` covers both paths: ``Simulation.set_input`` still adds its own entry (harmless duplicate in a set), and ``holder.set_input`` picks up the situation- dict and dataset loader paths. ## 2. ``Holder.get_array`` walks ``simulation.parent_branch`` The C1 fix (``fix-holder-get-array-branch-leak``) correctly stopped ``get_array`` from returning values stored under an arbitrary sibling branch — that had caused silent reform↔baseline cross-contamination. But it only fell back to the ``default`` branch, so a nested branch (``no_salt`` cloned from ``itemizing``) could no longer read inputs set on its parent. ``policyengine-us`` uses that two-level pattern: ``tax_liability_if_itemizing`` sets ``tax_unit_itemizes=True`` on an ``itemizing`` branch, and ``ctc_limiting_tax_liability`` forks a ``no_salt`` sub-branch from it. Without parent-branch fallback, ``tax_unit_itemizes`` re-runs its formula on ``no_salt``, which calls ``tax_liability_if_itemizing`` again, producing a ``CycleError`` → eventually surfaced as a recursion exception. The fix walks ``simulation.parent_branch`` up to the root and returns the first ancestor that has a value. Sibling branches (no parent relationship) still don't leak into each other — the C1 guarantee holds. ## 3. ``GroupPopulation.clone`` passes the cloned population to holders Latent bug that surfaced while fixing #2: ``GroupPopulation.clone`` was calling ``holder.clone(self)`` — passing the *source* population to each holder. ``Holder.clone`` then set ``new_dict["simulation"] = population.simulation``, pointing the cloned holder's ``.simulation`` reference back at the original sim rather than the clone. That meant a holder on the ``no_salt`` clone thought it belonged to the ``itemizing`` simulation, so the ``parent_branch`` walk started from the wrong simulation and missed the ancestor's inputs. Pass ``result`` (the cloned population) so the holder's ``.simulation`` points at the clone. ## Tests - ``test_apply_reform_preserves_situation_dict_inputs`` — covers the ``Simulation(situation=...)`` path that bypasses ``Simulation.set_input`` (fails without #1). - ``test_get_array_falls_back_through_parent_branch_chain`` — covers nested-branch parent inheritance (fails without #2). - ``test_group_population_clone_sets_holder_simulation_to_clone`` — pins the cloned holder's ``.simulation`` to the clone (fails without #3). All existing core tests still pass (514 pass, 1 pre-existing parameter security failure unrelated to these changes). The ``tax_unit_itemizes.yaml`` integration test (7/7) and the full ``gov/irs/income/taxable_income`` suite (253/253) in ``policyengine-us`` 1.647.0 pass under this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add downstream preflight smoke tests against published country models Context: PR #474 landed the fix for a `_fast_cache` AttributeError that only surfaced in `policyengine-uk` / `policyengine-uk-data` because core's Microsimulation __init__ behavior changed subtly. Similarly, the breakdown-validator tightening in #449 silently broke `policyengine-us`'s `additional_tax_bracket` param until downstream pypkg integration tests tripped on it. The existing `SmokeTestForMultipleVersions` job already installs policyengine-us from PyPI, but the test it runs (`tests/smoke/test_us.py`) is gated by `RUN_SMOKE_TESTS=1` while CI sets `RUN_SMOKE_TESTS: "0"`, so it's always skipped. Add `tests/smoke/test_country_init.py` that simply instantiates `CountryTaxBenefitSystem()` for policyengine-us and policyengine-uk. No microdata required, no HF token required, runs in seconds. This catches any future core change that breaks country-model init — breakdown validator errors, missing private attrs, etc. — before it hits downstream. Also extends the workflow to install policyengine-uk alongside policyengine-us so both tests have something to exercise. Leaves the existing `test_us.py` / `RUN_SMOKE_TESTS` env-var gating alone — those tests need microdata credentials and are a separate decision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Guard invalidated_caches mutation sites like _fast_cache Follow-up to PR #474's _fast_cache guards. The same pattern applied to the other private cache attribute on Simulation: country-package subclasses that override __init__ without calling super().__init__ can skip initialising self.invalidated_caches, so purge_cache_of_invalid_values and invalidate_cache_entry now fall back gracefully instead of raising AttributeError. purge_cache_of_invalid_values: early-return if invalidated_caches is missing. invalidate_cache_entry: lazily initialise invalidated_caches with the new entry if it's missing. Extends tests/core/test_fast_cache_guards.py with two new regression tests for the invalidated_caches path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add spm-calculator to --no-deps fallback install for py3.13+ policyengine-us 1.647.0 depends on spm-calculator (>=0.3.1), but the 3.13+ workflow branch uses --no-deps to dodge pytest-dependency build issues and manually reinstalls a subset. Add spm-calculator to that subset so the new preflight smoke test can instantiate CountryTaxBenefitSystem() on 3.13/3.14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MaxGhenis merged commit 25ab6c2 into master Apr 17, 2026
22 checks passed

MaxGhenis deleted the fix-fast-cache-and-assert-near branch April 17, 2026 21:05

MaxGhenis mentioned this pull request Apr 17, 2026

Preserve set_input values across apply_reform #475

Merged

9 tasks

MaxGhenis mentioned this pull request Apr 18, 2026

Downstream preflight + invalidated_caches guards #477

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guard _fast_cache mutations; compare assert_near at narrower float dtype#474

Guard _fast_cache mutations; compare assert_near at narrower float dtype#474
MaxGhenis merged 1 commit intomasterfrom
fix-fast-cache-and-assert-near

MaxGhenis commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Apr 17, 2026

Summary

1. AttributeError: 'Microsimulation' object has no attribute '_fast_cache'

2. False YAML-test failures on float-typed Variables (post-#464)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `AttributeError: 'Microsimulation' object has no attribute '_fast_cache'`