Honor Simulation.extra_variables on US/UK microsim paths (closes #303) by MaxGhenis · Pull Request #307 · PolicyEngine/policyengine.py

MaxGhenis · 2026-04-20T13:38:51Z

Closes #303.

Summary

Simulation.extra_variables was declared on the base class and honored by the household calculator, but the population run() methods only iterated self.entity_variables — extras passed via Simulation(extra_variables={...}) silently never reached output_dataset.

Fix

New MicrosimulationModelVersion.resolve_entity_variables(simulation) in tax_benefit_models/common/model_version.py:

Merges bundled defaults with simulation.extra_variables
Order preserved (defaults first, extras appended)
Duplicates deduped
Unknown entity keys raise ValueError with close-match suggestions (e.g. "tax_units" → "did you mean 'tax_unit'?"). Same idiom compile_reform uses for parameter paths.
Unknown variable names raise with close-match suggestions.

Both PolicyEngineUSLatest.run() (us/model.py:235) and PolicyEngineUKLatest.run() (uk/model.py:212) now call the shared helper in place of iterating self.entity_variables directly.

Tests

New tests/test_extra_variables.py (6 tests):

US: extras appear on output_dataset for household + tax_unit
US: dedupes when extra overlaps a default (no duplicate columns)
US: unknown entity key raises with suggestion
US: unknown variable name raises with suggestion
UK: resolve_entity_variables direct call merges extras (full UK microsim fixture is heavier than needed; the method is the integration point and both countries call it identically)
UK: unknown variable name raises with suggestion

403 existing tests still pass.

Why a shared helper vs. inline fix

Audit (spawned in parallel with this fix) confirmed no other Simulation fields have the "declared but silently dropped" pattern. But two potential follow-ups surfaced:

Simulation.save() raises AttributeError when output_dataset is None instead of a clear error — not silent, but unhelpful.
created_at/updated_at aren't truly round-tripped; load() re-derives them from filesystem stat. The docstring at common/model_version.py:236 oversells this as a "round-trip."

Neither is urgent. Flagging for v4.0.2 / v4.1 consideration; not expanding this PR.

🤖 Generated with Claude Code

Closes #303. The field was declared on the base Simulation class and honored by pe.us.calculate_household / pe.uk.calculate_household, but the population run() methods only iterated self.entity_variables — any variable a caller added via Simulation(extra_variables={...}) was silently dropped from output_dataset. Fix: - New MicrosimulationModelVersion.resolve_entity_variables(simulation) in tax_benefit_models.common.model_version merges bundled defaults with simulation.extra_variables. Order preserved (defaults first, extras appended). Duplicates deduped. - Unknown entity keys raise ValueError with close-match suggestions (e.g. "tax_units" -> "did you mean 'tax_unit'?"). Same pattern used by compile_reform for parameter paths. - Unknown variable names raise with close-match suggestions. - Both US run() (tax_benefit_models/us/model.py) and UK run() (tax_benefit_models/uk/model.py) now call resolve_entity_variables in place of iterating self.entity_variables directly. Regression tests in tests/test_extra_variables.py: - US: extras appear on output_dataset for household + tax_unit; dedupes when extra overlaps a default; unknown entity/variable names raise with suggestions. - UK: direct test of resolve_entity_variables on pe.uk.model (full UK microsim fixture is heavier than needed; the method is the integration point and both countries call it identically). 403 existing tests still pass; 6 new tests pass.

Reproducibility-reviewer (blocking): - Replace direct-helper UK test with end-to-end UK integration test (test__uk_extra_variables_appear_on_output_dataset). Catches the one-character regression where uk/model.py:212 reverts to self.entity_variables.items() — verified by manually reverting. - Strengthen dedup assertion: also check the column has no NaN entries so a silent merge-blanks-column regression fails, not just duplicate-column regressions. Reproducibility-reviewer (follow-ups): - Add test__default_empty_extras_produces_exact_bundled_defaults covering the default-construction ({}) path. - Add test__partial_coverage_leaves_untouched_entities_unchanged covering the case where extras target one entity only. - Comment explaining net_worth substitution (test_extra_variables.py: net_worth not in pe-us 1.653.3 codebase; tests use adjusted_gross_income as stand-in). Code-simplifier (nice-to-have): - Move get_close_matches import to module level (consistent with common/extra_variables.py's style). Verified fail-without-fix: - Reverting uk/model.py:212 back to self.entity_variables.items() makes test__uk_extra_variables_appear_on_output_dataset fail with the expected missing-column assertion. 8 tests, all pass. Ruff clean.

MaxGhenis added 2 commits April 20, 2026 09:38

MaxGhenis merged commit 447aa8d into main Apr 20, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Honor Simulation.extra_variables on US/UK microsim paths (closes #303)#307

Honor Simulation.extra_variables on US/UK microsim paths (closes #303)#307
MaxGhenis merged 2 commits intomainfrom
fix-extra-variables-silently-ignored

MaxGhenis commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Apr 20, 2026

Summary

Fix

Tests

Why a shared helper vs. inline fix

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant