-
Notifications
You must be signed in to change notification settings - Fork 10
Add ACA marketplace bronze-selection target ETL #618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
6ad2911
Rebase ACA marketplace ETL onto main
daphnehanse11 8df074d
Format CPS marketplace benchmark helper
daphnehanse11 8fd8990
Fix bronze-stratum domain_variable ordering and drop dead ETL
MaxGhenis 3e2292d
Make domain_variable ordering deterministic and fix stale integration…
MaxGhenis bd15fd7
Use correlated subquery for domain_variable ordering (SQLite portabil…
MaxGhenis 59a0491
Restore etl_aca_agi_state_targets.py alongside new marketplace ETL
MaxGhenis 04dbaea
Merge remote-tracking branch 'upstream/main' into codex/aca-marketpla…
MaxGhenis File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| Add an ACA marketplace ETL that loads state-level HC.gov bronze-plan | ||
| selection targets for APTC recipients into the calibration database. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,236 @@ | ||
| from __future__ import annotations | ||
|
|
||
| import logging | ||
| from pathlib import Path | ||
|
|
||
| import pandas as pd | ||
| from sqlmodel import Session, create_engine | ||
|
|
||
| from policyengine_us_data.calibration.calibration_utils import STATE_CODES | ||
| from policyengine_us_data.db.create_database_tables import ( | ||
| Stratum, | ||
| StratumConstraint, | ||
| Target, | ||
| ) | ||
| from policyengine_us_data.storage import CALIBRATION_FOLDER, STORAGE_FOLDER | ||
| from policyengine_us_data.utils.db import etl_argparser, get_geographic_strata | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| # `selected_marketplace_plan_benchmark_ratio == 1.0` represents benchmark | ||
| # silver coverage, so bronze plan selections are the subset below this ratio. | ||
| BENCHMARK_SILVER_RATIO = 1.0 | ||
|
|
||
| STATE_METAL_SELECTION_PATH = ( | ||
|
baogorek marked this conversation as resolved.
|
||
| CALIBRATION_FOLDER / "aca_marketplace_state_metal_selection_2024.csv" | ||
| ) | ||
|
|
||
| STATE_ABBR_TO_FIPS = {abbr: fips for fips, abbr in STATE_CODES.items()} | ||
|
|
||
|
|
||
| def _extra_args(parser) -> None: | ||
| parser.add_argument( | ||
| "--state-metal-csv", | ||
| type=Path, | ||
| default=STATE_METAL_SELECTION_PATH, | ||
| help=("State-metal CMS OEP proxy CSV. Default: %(default)s"), | ||
| ) | ||
|
|
||
|
|
||
| def extract_aca_marketplace_state_metal_data( | ||
| state_metal_csv_path: Path, | ||
| ) -> pd.DataFrame: | ||
| """Extract CMS marketplace state metal-status inputs from the checked-in CSV. | ||
|
|
||
| This ETL keeps an explicit extract step even though the source file already | ||
| lives in the repository. The original CMS 2024 OEP state metal status PUF | ||
| is not currently pulled from a stable direct-download endpoint in CI, so we | ||
| store the normalized input CSV at | ||
| `policyengine_us_data/storage/calibration_targets/aca_marketplace_state_metal_selection_2024.csv`. | ||
|
|
||
| Source (CMS Marketplace Open Enrollment Period Public Use Files): | ||
| https://www.cms.gov/marketplace/resources/data/public-use-files | ||
|
|
||
| To reproduce or update that file: | ||
| 1. Download the CMS 2024 OEP State, Metal Level, and Enrollment Status PUF | ||
| from the URL above. | ||
| 2. Preserve one row per state/platform/metal/enrollment-status combination. | ||
| 3. Keep the `state_code`, `platform`, `metal_level`, | ||
| `enrollment_status`, `consumers`, and `aptc_consumers` columns. | ||
| 4. Save the normalized output back to `state_metal_csv_path`. | ||
| """ | ||
| return pd.read_csv(state_metal_csv_path) | ||
|
|
||
|
|
||
| def build_state_marketplace_bronze_aptc_targets( | ||
| state_metal_df: pd.DataFrame, | ||
| ) -> pd.DataFrame: | ||
| """ | ||
| Build HC.gov state bronze-selection targets among APTC consumers. | ||
|
|
||
| The 2024 CMS state-metal-status PUF exposes: | ||
| - metal rows (`B`, `G`, `S`) with enrollment_status=`All` | ||
| - aggregate rows (`All`) broken out by enrollment status (`01-atv`, etc.) | ||
|
|
||
| We use: | ||
| - total APTC consumers = sum of `aptc_consumers` for `metal_level == All` | ||
| across enrollment statuses | ||
| - bronze APTC consumers = `aptc_consumers` on the bronze row | ||
| """ | ||
| df = state_metal_df.copy() | ||
| df = df[df["platform"] == "HC.gov"].copy() | ||
|
|
||
| total_rows = df[ | ||
| (df["metal_level"] == "All") & (df["aptc_consumers"].notna()) | ||
| ].copy() | ||
| bronze_rows = df[ | ||
| (df["metal_level"] == "B") | ||
| & (df["enrollment_status"] == "All") | ||
| & (df["aptc_consumers"].notna()) | ||
| ].copy() | ||
|
|
||
| total_aptc = total_rows.groupby("state_code", as_index=False).agg( | ||
| marketplace_aptc_consumers=("aptc_consumers", "sum"), | ||
| marketplace_consumers=("consumers", "sum"), | ||
| ) | ||
| bronze_aptc = bronze_rows[["state_code", "aptc_consumers", "consumers"]].rename( | ||
| columns={ | ||
| "aptc_consumers": "bronze_aptc_consumers", | ||
| "consumers": "bronze_consumers", | ||
| } | ||
| ) | ||
|
|
||
| result = total_aptc.merge(bronze_aptc, on="state_code", how="inner") | ||
| result["state_fips"] = result["state_code"].map(STATE_ABBR_TO_FIPS) | ||
| result = result[result["state_fips"].notna()].copy() | ||
| result["state_fips"] = result["state_fips"].astype(int) | ||
| invalid_bronze = ( | ||
| result["bronze_aptc_consumers"] > result["marketplace_aptc_consumers"] | ||
| ) | ||
| if invalid_bronze.any(): | ||
| bad_states = result.loc[invalid_bronze, "state_code"].tolist() | ||
| raise ValueError( | ||
| "Bronze APTC consumers exceed total APTC consumers for states: " | ||
| f"{bad_states}. Source CSV likely corrupted." | ||
| ) | ||
| result["bronze_aptc_share"] = ( | ||
| result["bronze_aptc_consumers"] / result["marketplace_aptc_consumers"] | ||
| ) | ||
| result.insert(0, "year", 2024) | ||
| result.insert(1, "source", "cms_2024_oep_state_metal_status_puf") | ||
| return result.sort_values("state_code").reset_index(drop=True) | ||
|
|
||
|
|
||
| def load_state_marketplace_bronze_aptc_targets( | ||
| targets_df: pd.DataFrame, | ||
| year: int, | ||
| ) -> None: | ||
| db_url = f"sqlite:///{STORAGE_FOLDER / 'calibration' / 'policy_data.db'}" | ||
| engine = create_engine(db_url) | ||
|
|
||
| with Session(engine) as session: | ||
| geo_strata = get_geographic_strata(session) | ||
|
|
||
| for row in targets_df.itertuples(index=False): | ||
| state_fips = int(row.state_fips) | ||
| parent_id = geo_strata["state"].get(state_fips) | ||
| if parent_id is None: | ||
| logger.warning( | ||
| "No state geographic stratum for FIPS %s, skipping", state_fips | ||
| ) | ||
| continue | ||
|
|
||
| # We intentionally do not subset to `tax_unit_is_filer == 1`. | ||
| # These CMS targets describe marketplace coverage groups rather | ||
| # than the IRS filer universe, so the closest calibration entity is | ||
| # a tax unit with positive modeled APTC use. | ||
| aptc_stratum = Stratum( | ||
| parent_stratum_id=parent_id, | ||
| notes=f"State FIPS {state_fips} Marketplace APTC recipients", | ||
| ) | ||
| aptc_stratum.constraints_rel = [ | ||
| StratumConstraint( | ||
| constraint_variable="state_fips", | ||
| operation="==", | ||
| value=str(state_fips), | ||
| ), | ||
| StratumConstraint( | ||
| constraint_variable="used_aca_ptc", | ||
| operation=">", | ||
| value="0", | ||
| ), | ||
|
baogorek marked this conversation as resolved.
|
||
| ] | ||
| aptc_stratum.targets_rel.append( | ||
| Target( | ||
| # We use `tax_unit_count` rather than household/person | ||
| # counts because insurance groups map most closely to | ||
| # PolicyEngine tax units in the current calibration schema. | ||
| variable="tax_unit_count", | ||
|
baogorek marked this conversation as resolved.
|
||
| period=year, | ||
| value=float(row.marketplace_aptc_consumers), | ||
| active=True, | ||
| source="CMS 2024 OEP state metal status PUF", | ||
| notes="HC.gov APTC consumers across all enrollment statuses", | ||
| ) | ||
| ) | ||
| session.add(aptc_stratum) | ||
| session.flush() | ||
|
|
||
| bronze_stratum = Stratum( | ||
| parent_stratum_id=aptc_stratum.stratum_id, | ||
| notes=f"State FIPS {state_fips} Marketplace bronze APTC recipients", | ||
| ) | ||
| bronze_stratum.constraints_rel = [ | ||
| StratumConstraint( | ||
| constraint_variable="state_fips", | ||
| operation="==", | ||
| value=str(state_fips), | ||
| ), | ||
| StratumConstraint( | ||
| constraint_variable="selected_marketplace_plan_benchmark_ratio", | ||
| operation="<", | ||
| value=str(BENCHMARK_SILVER_RATIO), | ||
| ), | ||
| StratumConstraint( | ||
| constraint_variable="used_aca_ptc", | ||
| operation=">", | ||
| value="0", | ||
| ), | ||
| ] | ||
| bronze_stratum.targets_rel.append( | ||
| Target( | ||
| variable="tax_unit_count", | ||
| period=year, | ||
| value=float(row.bronze_aptc_consumers), | ||
| active=True, | ||
| source="CMS 2024 OEP state metal status PUF", | ||
| notes="HC.gov bronze plan selections among APTC consumers", | ||
| ) | ||
| ) | ||
| session.add(bronze_stratum) | ||
| session.flush() | ||
|
|
||
| session.commit() | ||
|
|
||
|
|
||
| def main() -> None: | ||
| args, year = etl_argparser( | ||
| "ETL for ACA marketplace bronze-selection calibration targets", | ||
| extra_args_fn=_extra_args, | ||
| ) | ||
|
|
||
| state_metal = extract_aca_marketplace_state_metal_data(args.state_metal_csv) | ||
| targets_df = build_state_marketplace_bronze_aptc_targets(state_metal) | ||
| if targets_df.empty: | ||
| raise RuntimeError("No HC.gov marketplace bronze/APTC targets were generated.") | ||
|
|
||
| print( | ||
| "Loading ACA marketplace bronze/APTC state targets for " | ||
| f"{len(targets_df)} states from {args.state_metal_csv}" | ||
| ) | ||
| load_state_marketplace_bronze_aptc_targets(targets_df, year) | ||
| print("ACA marketplace bronze/APTC targets loaded.") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.