Impute selected_marketplace_plan_benchmark_ratio from CPS premiums by MaxGhenis · Pull Request #801 · PolicyEngine/policyengine-us-data

MaxGhenis · 2026-04-20T03:58:45Z

Closes #800.

Problem

`selected_marketplace_plan_benchmark_ratio` in policyengine-us is currently an input variable with `default_value = 1.0`. Nothing populates it, so PolicyEngine-US effectively assumes every Marketplace enrollee picks the benchmark silver plan. Real CMS data shows ~35% pick bronze and ~15% pick gold/platinum — the rules-based `marketplace_net_premium` (PolicyEngine/policyengine-us#8105) overstates OOP for bronze pickers and understates for gold pickers.

Approach — CPS premium back-out

For each Marketplace-enrolled tax unit in CPS, back out the implied plan-to-SLCSP ratio from the accounting identity:

```
reported_net_premium ≈ selected_plan_cost − APTC
selected_plan_cost = SLCSP × benchmark_ratio
→ benchmark_ratio = (reported_net_premium + computed_PTC) / SLCSP
```

CPS-reported premium is net of APTC for subsidized Marketplace takers, so adding back PolicyEngine's computed PTC recovers the sticker price. Dividing by SLCSP gives the ratio.

Ratios clipped to `[0.5, 1.5]` to handle reporting noise and Marketplace-flag false positives.

Changes

New pure-function helper `compute_marketplace_plan_benchmark_ratio` in `policyengine_us_data/datasets/cps/cps.py` — all the math, no Microsimulation, unit-testable.
New CPS-stage function `add_marketplace_plan_benchmark_ratio` that loads the dataset, runs a Microsimulation to pull SLCSP / ACA PTC / reported premium / takeup, and writes the ratio back.
Called in `generate()` immediately after `add_takeup` and before downsampling.
6 unit tests covering silver / bronze / gold / non-taker / zero-SLCSP / clipping cases — synthetic inputs only, no H5 dependency.

Relationship to #618

#618 (open, by @daphnehanse11) adds state-level calibration targets for total APTC-taking Marketplace tax units and the bronze-plan subset, from CMS 2024 OEP data. That constrains the aggregate weighted distribution.

This PR is complementary — it sets the per-household ratio so each household's computed `marketplace_net_premium` reflects the plan they actually picked. After both land, the state-level targets from #618 can validate that the aggregated ratio distribution lines up with CMS effectuated-plan metal shares.

Caveats

Marketplace flag noise: CPS respondents sometimes confuse Marketplace with employer / Medicaid coverage. The `takes_up_aca_if_eligible` gate filters most, and the [0.5, 1.5] clip catches the rest.
CSR silver variants: cost-sharing-reduction silver plans have the same sticker premium but reduced deductibles/copays. The back-out correctly lands on ratio ≈ 1.0; plan-value differences are a separate MOOP question.
Non-takers / zero SLCSP: keep the 1.0 default. These paths are zeroed out downstream by `takes_up_aca_if_eligible` gating inside `selected_marketplace_plan_premium_proxy`, so the default never leaks into calculated OOP.

Testing

`uv run pytest tests/unit/datasets/test_marketplace_plan_benchmark_ratio.py -v` → 6/6 pass.

Test plan

Unit tests pass
`ruff check` clean on new code (pre-existing unrelated errors in cps.py are unchanged)
CI passes
Integration: verify weighted ratio distribution across Enhanced CPS approximately matches CMS metal-tier shares (manual check after data build)

Previously `selected_marketplace_plan_benchmark_ratio` defaulted to 1.0 for every household, which means PolicyEngine-US treats every Marketplace enrollee as on the benchmark silver plan. In reality roughly 35% pick bronze and 15% pick gold/platinum, so downstream variables like `selected_marketplace_plan_premium_proxy` and the new `marketplace_net_premium` miss real variation. Adds `compute_marketplace_plan_benchmark_ratio` (pure-Python helper) and `add_marketplace_plan_benchmark_ratio` (CPS-stage integration) that back out the implied ratio per tax unit: reported_premium ≈ plan_cost − APTC plan_cost = SLCSP × ratio → ratio = (reported_premium + computed_PTC) / SLCSP Ratios are clipped to [0.5, 1.5] to handle CPS reporting noise and Marketplace-flag false positives. Non-takers and tax units with zero SLCSP keep the 1.0 default — those paths either zero out the plan proxy via `takes_up_aca_if_eligible` or have no benchmark to divide against. Pure-function helper lets us unit-test the math with synthetic inputs (no Microsimulation) covering the silver / bronze / gold / non-taker / zero-SLCSP / clipping cases. Closes #800.

Pre-existing format drift from #801 that ruff 0.9.0+ flags; unblocks the lint check on this branch. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add state and AGI cross-tab EITC calibration targets (#802) Extend build_loss_matrix() with two new target families sourced from IRS SOI: * Per-state EITC returns and amounts from Historical Table 2 (eitc_state.csv), ~102 new loss-matrix columns covering 50 states + DC. * Per-(qualifying-children x AGI bucket) EITC returns and amounts from Publication 1304 Table 2.5 (eitc_by_agi_and_children.csv), ~224 new columns over the SOI small-bin AGI structure. Both targets use the existing eitc_spending_uprating / population_uprating factors so they move with the Treasury EITC and population trajectories. A _skip_unverified_target helper keeps the optimizer from consuming "[TO BE CALCULATED]" placeholders. Also adds refresh_eitc_state_and_agi_targets.py, a parameterized data-pull script that future-year refreshes can run with --year <tax_year>, plus tests/unit/calibration/ test_eitc_extended_targets.py covering CSV shape, the IRS state-sum-to-national crosscheck, loss-matrix column naming, and placeholder skipping. State sum crosscheck for TY2022: 23,679,560 returns / $59,178,091,000 vs IRS US row 23,692,190 returns / $59,204,588,000 — ~0.05% off, within disclosure rounding. Gap vs Treasury outlay target ($77.3B) reflects the refundable-only Treasury definition; IRS SOI is the correct comparator for the full eitc variable. Related to #802. * fixup! Add state and AGI cross-tab EITC calibration targets (#802) * Drop contradictory Treasury+legacy EITC targets; add regression tests Codex review of #803 found two internal contradictions in the EITC target set: (1) the loss function targeted Treasury's $67B outlay parameter alongside the new SOI-derived $59B state-row sum and $60B AGI×children-row sum, forcing the optimizer onto an unsatisfiable pareto front; (2) the legacy eitc.csv carried TY2020 per-child-count values that duplicated (and conflicted with) the new cross-tab. Fix by anchoring EITC calibration on IRS SOI TY2022 tables alone: keep state and (child × AGI bucket) targets, drop the Treasury aggregate column and the stale per-child-count rows. Treasury's parameter is still used to derive the dollar uprating trajectory. New tests cover the cases Codex flagged as unverified: mixed- placeholder rows (valid returns + [TO BE CALCULATED] amount) must keep the valid metric and drop the invalid one without breaking matrix/target alignment; the "3 or more children" bucket uses >= so a 4-child household registers once, in c3 only; non-unity uprating factors propagate to target values. Two regression tests pin the removals: nation/treasury/eitc must never appear as a loss-matrix column, and count_children_ slugs stay out of the source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Apply ruff format to cps.py Pre-existing format drift from #801 that ruff 0.9.0+ flags; unblocks the lint check on this branch. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MaxGhenis mentioned this pull request Apr 20, 2026

Add ACA marketplace bronze-selection target ETL #618

Open

Pass explicit period to Microsimulation calls for future-proofing

6002940

MaxGhenis merged commit ac645ef into main Apr 20, 2026
5 of 6 checks passed

MaxGhenis deleted the marketplace-benchmark-ratio-imputation branch April 20, 2026 11:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impute selected_marketplace_plan_benchmark_ratio from CPS premiums#801

Impute selected_marketplace_plan_benchmark_ratio from CPS premiums#801
MaxGhenis merged 2 commits intomainfrom
marketplace-benchmark-ratio-imputation

MaxGhenis commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Apr 20, 2026

Problem

Approach — CPS premium back-out

Changes

Relationship to #618

Caveats

Testing

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant