Honor Simulation.extra_variables on US/UK microsim paths (closes #303)#307
Merged
Honor Simulation.extra_variables on US/UK microsim paths (closes #303)#307
Conversation
Closes #303. The field was declared on the base Simulation class and honored by pe.us.calculate_household / pe.uk.calculate_household, but the population run() methods only iterated self.entity_variables — any variable a caller added via Simulation(extra_variables={...}) was silently dropped from output_dataset. Fix: - New MicrosimulationModelVersion.resolve_entity_variables(simulation) in tax_benefit_models.common.model_version merges bundled defaults with simulation.extra_variables. Order preserved (defaults first, extras appended). Duplicates deduped. - Unknown entity keys raise ValueError with close-match suggestions (e.g. "tax_units" -> "did you mean 'tax_unit'?"). Same pattern used by compile_reform for parameter paths. - Unknown variable names raise with close-match suggestions. - Both US run() (tax_benefit_models/us/model.py) and UK run() (tax_benefit_models/uk/model.py) now call resolve_entity_variables in place of iterating self.entity_variables directly. Regression tests in tests/test_extra_variables.py: - US: extras appear on output_dataset for household + tax_unit; dedupes when extra overlaps a default; unknown entity/variable names raise with suggestions. - UK: direct test of resolve_entity_variables on pe.uk.model (full UK microsim fixture is heavier than needed; the method is the integration point and both countries call it identically). 403 existing tests still pass; 6 new tests pass.
Reproducibility-reviewer (blocking):
- Replace direct-helper UK test with end-to-end UK integration test
(test__uk_extra_variables_appear_on_output_dataset). Catches the
one-character regression where uk/model.py:212 reverts to
self.entity_variables.items() — verified by manually reverting.
- Strengthen dedup assertion: also check the column has no NaN
entries so a silent merge-blanks-column regression fails, not just
duplicate-column regressions.
Reproducibility-reviewer (follow-ups):
- Add test__default_empty_extras_produces_exact_bundled_defaults
covering the default-construction ({}) path.
- Add test__partial_coverage_leaves_untouched_entities_unchanged
covering the case where extras target one entity only.
- Comment explaining net_worth substitution (test_extra_variables.py:
net_worth not in pe-us 1.653.3 codebase; tests use
adjusted_gross_income as stand-in).
Code-simplifier (nice-to-have):
- Move get_close_matches import to module level (consistent with
common/extra_variables.py's style).
Verified fail-without-fix:
- Reverting uk/model.py:212 back to self.entity_variables.items()
makes test__uk_extra_variables_appear_on_output_dataset fail
with the expected missing-column assertion.
8 tests, all pass. Ruff clean.
This was referenced Apr 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #303.
Summary
Simulation.extra_variableswas declared on the base class and honored by the household calculator, but the populationrun()methods only iteratedself.entity_variables— extras passed viaSimulation(extra_variables={...})silently never reachedoutput_dataset.Fix
New
MicrosimulationModelVersion.resolve_entity_variables(simulation)intax_benefit_models/common/model_version.py:simulation.extra_variablesValueErrorwith close-match suggestions (e.g."tax_units"→"did you mean 'tax_unit'?"). Same idiomcompile_reformuses for parameter paths.Both
PolicyEngineUSLatest.run()(us/model.py:235) andPolicyEngineUKLatest.run()(uk/model.py:212) now call the shared helper in place of iteratingself.entity_variablesdirectly.Tests
New
tests/test_extra_variables.py(6 tests):output_datasetforhousehold+tax_unitresolve_entity_variablesdirect call merges extras (full UK microsim fixture is heavier than needed; the method is the integration point and both countries call it identically)403 existing tests still pass.
Why a shared helper vs. inline fix
Audit (spawned in parallel with this fix) confirmed no other
Simulationfields have the "declared but silently dropped" pattern. But two potential follow-ups surfaced:Simulation.save()raisesAttributeErrorwhenoutput_dataset is Noneinstead of a clear error — not silent, but unhelpful.created_at/updated_ataren't truly round-tripped;load()re-derives them from filesystemstat. The docstring atcommon/model_version.py:236oversells this as a "round-trip."Neither is urgent. Flagging for v4.0.2 / v4.1 consideration; not expanding this PR.
🤖 Generated with Claude Code