diff --git a/changelog.d/docs-content-rewrite.changed.md b/changelog.d/docs-content-rewrite.changed.md new file mode 100644 index 00000000..77c1a95f --- /dev/null +++ b/changelog.d/docs-content-rewrite.changed.md @@ -0,0 +1 @@ +Rewrite docs content for the v4 API: separate pages per task (households, reforms, microsim, outputs, impact analysis, regions), updated code samples against real output classes and `Simulation` dict reforms. diff --git a/docs/_quarto.yml b/docs/_quarto.yml index 8d46eaba..0ad4c6c3 100644 --- a/docs/_quarto.yml +++ b/docs/_quarto.yml @@ -14,27 +14,38 @@ website: left: - href: index.md text: Overview - - core-concepts.md - - economic-impact-analysis.md - - country-models-us.md - - country-models-uk.md + - getting-started.md + - households.md + - reforms.md + - microsim.md + - impact-analysis.md - examples.md - dev.md sidebar: style: "floating" collapse-level: 2 contents: - - index.md - - core-concepts.md - - economic-impact-analysis.md - - advanced-outputs.md - - regions-and-scoping.md - - country-models-uk.md - - country-models-us.md - - examples.md - - visualisation.md - - release-bundles.md - - dev.md + - section: "Get started" + contents: + - index.md + - getting-started.md + - section: "Calculate" + contents: + - households.md + - reforms.md + - microsim.md + - section: "Analyse" + contents: + - impact-analysis.md + - outputs.md + - regions.md + - visualisation.md + - section: "Reference" + contents: + - countries.md + - release-bundles.md + - examples.md + - dev.md format: html: diff --git a/docs/advanced-outputs.md b/docs/advanced-outputs.md deleted file mode 100644 index 5fdbaead..00000000 --- a/docs/advanced-outputs.md +++ /dev/null @@ -1,276 +0,0 @@ -# Advanced outputs - -Beyond `Aggregate` and `ChangeAggregate` (covered in [Core concepts](core-concepts.md)), the package provides specialised output types for distributional analysis, poverty measurement, and inequality metrics. - -All output types follow the same pattern: create an instance, call `.run()`, read the result fields. Convenience functions are provided for common use cases. - -## OutputCollection - -Many convenience functions return an `OutputCollection[T]`, a container holding both the individual output objects and a pandas DataFrame: - -```python -from policyengine.core import OutputCollection - -# Returned by calculate_decile_impacts(), calculate_us_poverty_rates(), etc. -collection = calculate_us_poverty_rates(simulation) - -# Access individual objects -for poverty in collection.outputs: - print(f"{poverty.poverty_type}: {poverty.rate:.4f}") - -# Access as DataFrame -print(collection.dataframe) -``` - -## DecileImpact - -Calculates the impact of a policy reform on a single income decile: baseline and reform mean income, absolute and relative change, and counts of people better off, worse off, and unchanged. - -### Using the convenience function - -```python -from policyengine.outputs.decile_impact import calculate_decile_impacts - -decile_impacts = calculate_decile_impacts( - dataset=dataset, - tax_benefit_model_version=us_latest, - baseline_policy=None, # Current law - reform_policy=reform, - income_variable="household_net_income", # Default for US -) - -for d in decile_impacts.outputs: - print(f"Decile {d.decile}: " - f"baseline={d.baseline_mean:,.0f}, " - f"reform={d.reform_mean:,.0f}, " - f"change={d.absolute_change:+,.0f} " - f"({d.relative_change:+.2f}%)") -``` - -### Using directly - -```python -from policyengine.outputs.decile_impact import DecileImpact - -impact = DecileImpact( - baseline_simulation=baseline_sim, - reform_simulation=reform_sim, - income_variable="household_net_income", - decile=5, # 5th decile -) -impact.run() - -print(f"Count better off: {impact.count_better_off:,.0f}") -print(f"Count worse off: {impact.count_worse_off:,.0f}") -``` - -### Parameters - -| Parameter | Default | Description | -|---|---|---| -| `income_variable` | `equiv_hbai_household_net_income` | Income variable to group by and measure changes | -| `decile_variable` | `None` | Use a pre-computed grouping variable instead of `qcut` | -| `entity` | Auto-detected | Entity level for the income variable | -| `quantiles` | `10` | Number of quantile groups (10 = deciles, 5 = quintiles) | - -For US simulations, use `income_variable="household_net_income"`. The UK default (`equiv_hbai_household_net_income`) is the equivalised HBAI measure. - -## IntraDecileImpact - -Classifies people within each decile into five income change categories: - -| Category | Threshold | -|---|---| -| Lose more than 5% | change <= -5% | -| Lose less than 5% | -5% < change <= -0.1% | -| No change | -0.1% < change <= 0.1% | -| Gain less than 5% | 0.1% < change <= 5% | -| Gain more than 5% | change > 5% | - -Proportions are people-weighted (using `household_count_people * household_weight`). - -### Using the convenience function - -```python -from policyengine.outputs.intra_decile_impact import compute_intra_decile_impacts - -intra = compute_intra_decile_impacts( - baseline_simulation=baseline_sim, - reform_simulation=reform_sim, - income_variable="household_net_income", -) - -for row in intra.outputs: - if row.decile == 0: - label = "Overall" - else: - label = f"Decile {row.decile}" - print(f"{label}: " - f"lose>5%={row.lose_more_than_5pct:.2%}, " - f"lose<5%={row.lose_less_than_5pct:.2%}, " - f"no change={row.no_change:.2%}, " - f"gain<5%={row.gain_less_than_5pct:.2%}, " - f"gain>5%={row.gain_more_than_5pct:.2%}") -``` - -The function returns deciles 1-10 plus an overall average at `decile=0`. - -## Poverty - -Calculates poverty headcount and rates for a single simulation, with optional demographic filtering. - -### Poverty types - -**UK** (4 measures): -- Absolute before housing costs (BHC) -- Absolute after housing costs (AHC) -- Relative before housing costs (BHC) -- Relative after housing costs (AHC) - -**US** (2 measures): -- SPM poverty -- Deep SPM poverty (below 50% of SPM threshold) - -### Calculating all poverty rates - -```python -from policyengine.outputs.poverty import ( - calculate_uk_poverty_rates, - calculate_us_poverty_rates, -) - -# US -us_poverty = calculate_us_poverty_rates(simulation) -for p in us_poverty.outputs: - print(f"{p.poverty_type}: headcount={p.headcount:,.0f}, rate={p.rate:.4f}") - -# UK -uk_poverty = calculate_uk_poverty_rates(simulation) -for p in uk_poverty.outputs: - print(f"{p.poverty_type}: headcount={p.headcount:,.0f}, rate={p.rate:.4f}") -``` - -### Poverty by demographic group - -```python -from policyengine.outputs.poverty import ( - calculate_us_poverty_by_age, - calculate_us_poverty_by_gender, - calculate_us_poverty_by_race, - calculate_uk_poverty_by_age, - calculate_uk_poverty_by_gender, -) - -# By age group (child <18, adult 18-64, senior 65+) -by_age = calculate_us_poverty_by_age(simulation) -for p in by_age.outputs: - print(f"{p.filter_group} {p.poverty_type}: {p.rate:.4f}") - -# By gender -by_gender = calculate_us_poverty_by_gender(simulation) - -# By race (US only: WHITE, BLACK, HISPANIC, OTHER) -by_race = calculate_us_poverty_by_race(simulation) -``` - -### Custom filters - -```python -from policyengine.outputs.poverty import Poverty - -# Child poverty only -child_poverty = Poverty( - simulation=simulation, - poverty_variable="spm_unit_is_in_spm_poverty", - entity="person", - filter_variable="age", - filter_variable_leq=17, -) -child_poverty.run() -print(f"Child SPM poverty rate: {child_poverty.rate:.4f}") -``` - -### Result fields - -| Field | Description | -|---|---| -| `headcount` | Weighted count of people in poverty | -| `total_population` | Weighted total population (after filters) | -| `rate` | `headcount / total_population` | -| `filter_group` | Group label set by demographic convenience functions | - -## Inequality - -Calculates weighted inequality metrics for a single simulation: Gini coefficient and income share measures. - -### Using convenience functions - -```python -from policyengine.outputs.inequality import ( - calculate_uk_inequality, - calculate_us_inequality, -) - -# US (uses household_net_income by default) -ineq = calculate_us_inequality(simulation) -print(f"Gini: {ineq.gini:.4f}") -print(f"Top 10% share: {ineq.top_10_share:.4f}") -print(f"Top 1% share: {ineq.top_1_share:.4f}") -print(f"Bottom 50% share: {ineq.bottom_50_share:.4f}") - -# UK (uses equiv_hbai_household_net_income by default) -ineq = calculate_uk_inequality(simulation) -``` - -### With demographic filters - -```python -# Inequality among working-age adults only -ineq = calculate_us_inequality( - simulation, - filter_variable="age", - filter_variable_geq=18, - filter_variable_leq=64, -) -``` - -### Using directly - -```python -from policyengine.outputs.inequality import Inequality - -ineq = Inequality( - simulation=simulation, - income_variable="household_net_income", - entity="household", -) -ineq.run() -``` - -### Result fields - -| Field | Description | -|---|---| -| `gini` | Weighted Gini coefficient (0 = perfect equality, 1 = perfect inequality) | -| `top_10_share` | Share of total income held by top 10% | -| `top_1_share` | Share of total income held by top 1% | -| `bottom_50_share` | Share of total income held by bottom 50% | - -## Comparing baseline and reform - -Poverty and inequality are single-simulation outputs. To compare baseline and reform, compute both and take the difference: - -```python -baseline_poverty = calculate_us_poverty_rates(baseline_sim) -reform_poverty = calculate_us_poverty_rates(reform_sim) - -for bp, rp in zip(baseline_poverty.outputs, reform_poverty.outputs): - change = rp.rate - bp.rate - print(f"{bp.poverty_type}: {bp.rate:.4f} -> {rp.rate:.4f} ({change:+.4f})") - -baseline_ineq = calculate_us_inequality(baseline_sim) -reform_ineq = calculate_us_inequality(reform_sim) -print(f"Gini change: {reform_ineq.gini - baseline_ineq.gini:+.4f}") -``` - -The `economic_impact_analysis()` function does this automatically and returns both baseline and reform poverty/inequality in the `PolicyReformAnalysis` result. See [Economic impact analysis](economic-impact-analysis.md). diff --git a/docs/core-concepts.md b/docs/core-concepts.md deleted file mode 100644 index 7d61a404..00000000 --- a/docs/core-concepts.md +++ /dev/null @@ -1,731 +0,0 @@ -# Core concepts - -PolicyEngine.py is a Python package for tax-benefit microsimulation analysis. It provides a unified interface for running policy simulations, analysing distributional impacts, and visualising results across different countries. - -## Quick start - -Most analyses start from the country entry points on the top-level -package — ``policyengine.uk`` and ``policyengine.us``. They expose flat -keyword-argument functions that return structured results with -dot-access for scalar lookups. - -```python -import policyengine as pe - -# UK: single adult earning £50,000 -uk = pe.uk.calculate_household( - people=[{"age": 35, "employment_income": 50_000}], - year=2026, -) -print(uk.household.hbai_household_net_income) # net income -print(uk.person[0].income_tax) # per-person dot access - -# US: married couple with two kids in Texas -us = pe.us.calculate_household( - people=[ - {"age": 35, "employment_income": 40_000}, - {"age": 33}, - {"age": 8}, - {"age": 5}, - ], - tax_unit={"filing_status": "JOINT"}, - household={"state_code": "TX"}, - year=2026, -) -print(us.tax_unit.income_tax, us.tax_unit.eitc, us.tax_unit.ctc) - -# Apply a reform: just pass a parameter-path dict -reformed = pe.us.calculate_household( - people=[{"age": 35, "employment_income": 60_000}], - tax_unit={"filing_status": "SINGLE"}, - year=2026, - reform={"gov.irs.credits.ctc.amount.adult_dependent": 1000}, -) -``` - -Reforms can be scalar values (treated as ``{year}-01-01`` onwards) or a -mapping of effective-date strings to values for time-varying reforms. -Unknown variable names raise with suggestions instead of silently -returning zero. - -For population-level analysis (budget impact, distributional effects), -see [Economic impact analysis](economic-impact-analysis.md). - -## Architecture overview - -The package is organised around several core concepts: - -- **Tax-benefit models**: Country-specific implementations (UK, US) that define tax and benefit rules -- **Datasets**: Microdata representing populations at entity level (person, household, etc.) -- **Simulations**: Execution environments that apply tax-benefit models to datasets -- **Outputs**: Analysis tools for extracting insights from simulation results -- **Policies**: Parametric reforms that modify tax-benefit system parameters - -## Tax-benefit models - -Tax-benefit models define the rules and calculations for a country's tax and benefit system. Each model version contains: - -- **Variables**: Calculated values (e.g., income tax, universal credit) -- **Parameters**: System settings (e.g., personal allowance, benefit rates) -- **Parameter values**: Time-bound values for parameters - -### Using a tax-benefit model - -The country entry points expose pinned model versions as ``pe.uk.model`` -and ``pe.us.model``: - -```python -import policyengine as pe - -uk_latest = pe.uk.model -us_latest = pe.us.model - -# UK model includes variables like: -# - income_tax, national_insurance, universal_credit -# - Parameters like personal allowance, NI thresholds - -# US model includes variables like: -# - income_tax, payroll_tax, eitc, ctc, snap -# - Parameters like standard deduction, EITC rates -``` - -## Datasets - -Datasets contain microdata representing a population. Each dataset has: - -- **Entity-level data**: Separate dataframes for person, household, and other entities -- **Weights**: Survey weights for population representation -- **Join keys**: Relationships between entities (e.g., which household each person belongs to) - -### Dataset structure - -```python -from policyengine.tax_benefit_models.uk import PolicyEngineUKDataset # or: pe.uk.PolicyEngineUKDataset - -dataset = PolicyEngineUKDataset( - name="FRS 2023-24", - description="Family Resources Survey microdata", - filepath="./data/frs_2023_24_year_2026.h5", - year=2026, -) - -# Access entity-level data -person_data = dataset.data.person # MicroDataFrame -household_data = dataset.data.household -benunit_data = dataset.data.benunit # Benefit unit (UK only) -``` - -### Creating custom datasets - -You can create custom datasets for scenario analysis: - -```python -import pandas as pd -from microdf import MicroDataFrame -from policyengine.tax_benefit_models.uk import PolicyEngineUKDataset, UKYearData - -# Create person data -person_df = MicroDataFrame( - pd.DataFrame({ - "person_id": [0, 1, 2], - "person_household_id": [0, 0, 1], - "person_benunit_id": [0, 0, 1], - "age": [35, 8, 40], - "employment_income": [30000, 0, 50000], - "person_weight": [1.0, 1.0, 1.0], - }), - weights="person_weight" -) - -# Create household data -household_df = MicroDataFrame( - pd.DataFrame({ - "household_id": [0, 1], - "region": ["LONDON", "SOUTH_EAST"], - "rent": [15000, 12000], - "household_weight": [1.0, 1.0], - }), - weights="household_weight" -) - -# Create benunit data -benunit_df = MicroDataFrame( - pd.DataFrame({ - "benunit_id": [0, 1], - "would_claim_uc": [True, True], - "benunit_weight": [1.0, 1.0], - }), - weights="benunit_weight" -) - -dataset = PolicyEngineUKDataset( - name="Custom scenario", - description="Single parent vs single adult", - filepath="./custom.h5", - year=2026, - data=UKYearData( - person=person_df, - household=household_df, - benunit=benunit_df, - ) -) -``` - -## Data loading - -Before running simulations, you need representative microdata. The package provides three functions for managing datasets: - -- **`ensure_datasets()`**: Load from disk if available, otherwise download and compute (recommended) -- **`create_datasets()`**: Always download from HuggingFace and compute from scratch -- **`load_datasets()`**: Load previously saved HDF5 files from disk - -```python -from policyengine.tax_benefit_models.us import ensure_datasets # or: pe.us.ensure_datasets - -# First run: downloads from HuggingFace, computes variables, saves to ./data/ -# Subsequent runs: loads from disk instantly -datasets = ensure_datasets( - datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"], - years=[2026], - data_folder="./data", -) -dataset = datasets["enhanced_cps_2024_2026"] -``` - -```python -from policyengine.tax_benefit_models.uk import ensure_datasets # or: pe.uk.ensure_datasets - -datasets = ensure_datasets( - datasets=["hf://policyengine/policyengine-uk-data/enhanced_frs_2023_24.h5"], - years=[2026], - data_folder="./data", -) -dataset = datasets["enhanced_frs_2023_24_2026"] -``` - -All datasets are stored as HDF5 files on disk. No database server is required. - -## Simulations - -Simulations apply tax-benefit models to datasets, calculating all variables for the specified year. - -### Running a simulation - -```python -import policyengine as pe -from policyengine.core import Simulation - -simulation = Simulation( - dataset=dataset, - tax_benefit_model_version=pe.uk.model, -) -simulation.run() - -# Access output data -output_person = simulation.output_dataset.data.person -output_household = simulation.output_dataset.data.household - -# Check calculated variables -print(output_household[["household_id", "household_net_income", "household_tax"]]) -``` - -### Simulation lifecycle: `run()` vs `ensure()` - -The `Simulation` class provides two methods for computing results: - -| Method | Behaviour | -|---|---| -| `simulation.run()` | Always recomputes from scratch. No caching. | -| `simulation.ensure()` | Checks in-memory LRU cache, then tries loading from disk, then falls back to `run()` + `save()`. | - -```python -# One-off computation (no caching) -simulation.run() - -# Cache-or-compute (preferred for production use) -simulation.ensure() -``` - -`ensure()` uses a module-level LRU cache (max 100 simulations) and saves output datasets as HDF5 files alongside the input dataset. On repeated calls, it returns cached results instantly. For baseline-vs-reform comparisons, `economic_impact_analysis()` calls `ensure()` internally, so you rarely need to call it yourself. - -### Accessing calculated variables - -After running a simulation, you can access the calculated variables from the output dataset: - -```python -simulation = Simulation( - dataset=dataset, - tax_benefit_model_version=pe.uk.model, -) -simulation.run() - -# Access specific variables -output = simulation.output_dataset.data -person_data = output.person[["person_id", "age", "employment_income", "income_tax"]] -household_data = output.household[["household_id", "household_net_income"]] -benunit_data = output.benunit[["benunit_id", "universal_credit", "child_benefit"]] -``` - -## Policies - -Policies modify tax-benefit system parameters through parametric reforms. - -### Reform as a dict - -The canonical form — same shape ``pe.{uk,us}.calculate_household(reform=...)`` -accepts — is a flat ``{parameter.path: value}`` / ``{parameter.path: {date: value}}`` -dict. ``Simulation`` compiles it to a ``Policy`` at construction: - -```python -import policyengine as pe -from policyengine.core import Simulation - -baseline = Simulation(dataset=dataset, tax_benefit_model_version=pe.uk.model) -reform = Simulation( - dataset=dataset, - tax_benefit_model_version=pe.uk.model, - # Personal allowance raised from ~£12,570 to £15,000. - policy={"gov.hmrc.income_tax.allowances.personal_allowance.amount": 15_000}, -) -baseline.run() -reform.run() -``` - -Scalar values default their effective date to ``{dataset.year}-01-01``. -For time-varying reforms pass a nested ``{date: value}`` mapping: - -```python -policy = { - "gov.hmrc.income_tax.allowances.personal_allowance.amount": { - "2026-01-01": 13_000, - "2027-01-01": 15_000, - } -} -``` - -Unknown paths raise ``ValueError`` with a close-match suggestion. - -### Reform as a Policy object (escape hatch) - -For reforms that can't be expressed as parameter-value changes (e.g., -custom ``simulation_modifier`` callables), build a ``Policy`` directly: - -```python -from policyengine.core import Parameter, ParameterValue, Policy -import datetime - -policy = Policy( - name="Increased personal allowance", - parameter_values=[ - ParameterValue( - parameter=Parameter( - name="gov.hmrc.income_tax.allowances.personal_allowance.amount", - tax_benefit_model_version=pe.uk.model, - data_type=float, - ), - start_date=datetime.date(2026, 1, 1), - end_date=datetime.date(2026, 12, 31), - value=15_000, - ), - ], -) - -Simulation(dataset=dataset, tax_benefit_model_version=pe.uk.model, policy=policy) -``` - -### Combining policies - -Policies can be combined using the `+` operator: - -```python -combined = policy_a + policy_b -# Concatenates parameter_values and chains simulation_modifiers -``` - -### Simulation modifiers - -For reforms that cannot be expressed as parameter value changes, `Policy` accepts a `simulation_modifier` callable that directly manipulates the underlying `policyengine_core` simulation: - -```python -def my_modifier(sim): - """Custom reform logic applied to the core simulation object.""" - p = sim.tax_benefit_system.parameters - # Modify parameters programmatically - return sim - -policy = Policy( - name="Custom reform", - simulation_modifier=my_modifier, -) -``` - -Note: the UK model supports `simulation_modifier`. The US model currently only uses the `parameter_values` path. - -## Dynamic behavioural responses - -The `Dynamic` class is structurally identical to `Policy` and represents behavioural responses to policy changes (e.g., labour supply elasticities). It is applied after the policy in the simulation pipeline. - -```python -from policyengine.core.dynamic import Dynamic - -dynamic = Dynamic( - name="Labour supply response", - parameter_values=[...], # Same format as Policy -) - -simulation = Simulation( - dataset=dataset, - tax_benefit_model_version=pe.uk.model, - policy=policy, - dynamic=dynamic, -) -``` - -Dynamic responses can also be combined using the `+` operator and support `simulation_modifier` callables. - -## Outputs - -Output classes provide structured analysis of simulation results. - -### Aggregate - -Calculate aggregate statistics (sum, mean, count) for any variable: - -```python -from policyengine.outputs.aggregate import Aggregate, AggregateType - -# Total universal credit spending -agg = Aggregate( - simulation=simulation, - variable="universal_credit", - aggregate_type=AggregateType.SUM, - entity="benunit", # Map to benunit level -) -agg.run() -print(f"Total UC spending: £{agg.result / 1e9:.1f}bn") - -# Mean household income in top decile -agg = Aggregate( - simulation=simulation, - variable="household_net_income", - aggregate_type=AggregateType.MEAN, - filter_variable="household_net_income", - quantile=10, - quantile_eq=10, # 10th decile -) -agg.run() -print(f"Mean income in top decile: £{agg.result:,.0f}") -``` - -### ChangeAggregate - -Analyse impacts of policy reforms: - -```python -from policyengine.outputs.change_aggregate import ChangeAggregate, ChangeAggregateType - -# Count winners and losers -winners = ChangeAggregate( - baseline_simulation=baseline, - reform_simulation=reform, - variable="household_net_income", - aggregate_type=ChangeAggregateType.COUNT, - change_geq=1, # Gain at least £1 -) -winners.run() -print(f"Winners: {winners.result / 1e6:.1f}m households") - -losers = ChangeAggregate( - baseline_simulation=baseline, - reform_simulation=reform, - variable="household_net_income", - aggregate_type=ChangeAggregateType.COUNT, - change_leq=-1, # Lose at least £1 -) -losers.run() -print(f"Losers: {losers.result / 1e6:.1f}m households") - -# Revenue impact -revenue = ChangeAggregate( - baseline_simulation=baseline, - reform_simulation=reform, - variable="household_tax", - aggregate_type=ChangeAggregateType.SUM, -) -revenue.run() -print(f"Revenue change: £{revenue.result / 1e9:.1f}bn") -``` - -## Entity mapping - -The package automatically handles entity mapping when variables are defined at different entity levels. - -### Entity hierarchy - -**UK:** -``` -household - └── benunit (benefit unit) - └── person -``` - -**US:** -``` -household - ├── tax_unit - ├── spm_unit - ├── family - └── marital_unit - └── person -``` - -### Automatic mapping - -When you request a person-level variable (like `ssi`) at household level, the package: -1. Sums person-level values within each household (aggregation) -2. Returns household-level data with proper weights - -```python -# SSI is defined at person level, but we want household-level totals -agg = Aggregate( - simulation=simulation, - variable="ssi", # Person-level variable - entity="household", # Target household level - aggregate_type=AggregateType.SUM, -) -# Internally maps person → household by summing SSI for all persons in each household -``` - -When you request a household-level variable at person level: -1. Replicates household values to all persons in that household (expansion) - -### Direct entity mapping - -You can also map data between entities directly using the `map_to_entity` method: - -```python -# Map person income to household level (sum) -household_income = dataset.data.map_to_entity( - source_entity="person", - target_entity="household", - columns=["employment_income"], - how="sum" -) - -# Map household rent to person level (project/broadcast) -person_rent = dataset.data.map_to_entity( - source_entity="household", - target_entity="person", - columns=["rent"], - how="project" -) -``` - -#### Mapping with custom values - -You can map custom value arrays instead of existing columns: - -```python -# Map custom per-person values to household level -import numpy as np - -# Create custom values (e.g., imputed data) -custom_values = np.array([100, 200, 150, 300]) - -household_totals = dataset.data.map_to_entity( - source_entity="person", - target_entity="household", - values=custom_values, - how="sum" -) -``` - -#### Aggregation methods - -The `how` parameter controls how values are mapped: - -**Person → Group (aggregation):** -- `how='sum'` (default): Sum values within each group -- `how='first'`: Take first person's value in each group - -```python -# Sum person incomes to household level -household_income = data.map_to_entity( - source_entity="person", - target_entity="household", - columns=["employment_income"], - how="sum" -) - -# Take first person's age as household reference -household_age = data.map_to_entity( - source_entity="person", - target_entity="household", - columns=["age"], - how="first" -) -``` - -**Group → Person (expansion):** -- `how='project'` (default): Broadcast group value to all members -- `how='divide'`: Split group value equally among members - -```python -# Broadcast household rent to each person -person_rent = data.map_to_entity( - source_entity="household", - target_entity="person", - columns=["rent"], - how="project" -) - -# Split household savings equally per person -person_savings = data.map_to_entity( - source_entity="household", - target_entity="person", - columns=["total_savings"], - how="divide" -) -``` - -**Group → Group (via person entity):** -- `how='sum'` (default): Sum through person entity -- `how='first'`: Take first source group's value -- `how='project'`: Broadcast first source group's value -- `how='divide'`: Split proportionally based on person counts - -```python -# UK: Sum benunit benefits to household level -household_benefits = data.map_to_entity( - source_entity="benunit", - target_entity="household", - columns=["universal_credit"], - how="sum" -) - -# US: Map tax unit income to household, splitting by members -household_from_tax = data.map_to_entity( - source_entity="tax_unit", - target_entity="household", - columns=["taxable_income"], - how="divide" -) -``` - -## Visualisation - -The package includes utilities for creating PolicyEngine-branded visualisations: - -```python -from policyengine.utils.plotting import format_fig, COLORS -import plotly.graph_objects as go - -fig = go.Figure() -fig.add_trace(go.Scatter(x=[1, 2, 3], y=[4, 5, 6])) - -format_fig( - fig, - title="My chart", - xaxis_title="X axis", - yaxis_title="Y axis", - height=600, - width=800, -) -fig.show() -``` - -### Brand colours - -```python -COLORS = { - "primary": "#319795", # Teal - "success": "#22C55E", # Green - "warning": "#FEC601", # Yellow - "error": "#EF4444", # Red - "info": "#1890FF", # Blue - "blue_secondary": "#026AA2", # Dark blue - "gray": "#667085", # Gray -} -``` - -## Common workflows - -### 1. Analyse employment income variation - -See [UK employment income variation](examples.md#uk-employment-income-variation) for a complete example of: -- Creating custom datasets with varied parameters -- Running single simulations -- Extracting results with filters -- Visualising benefit phase-outs - -### 2. Policy reform analysis - -See [UK policy reform analysis](examples.md#uk-policy-reform-analysis) for: -- Applying parametric reforms -- Comparing baseline and reform -- Analysing winners/losers by decile -- Calculating revenue impacts - -### 3. Distributional analysis - -See [US income distribution](examples.md#us-income-distribution) for: -- Loading representative microdata -- Calculating statistics by income decile -- Mapping variables across entity levels -- Creating interactive visualisations - -## Best practices - -### Creating custom datasets - -1. **Always set would_claim variables**: Benefits won't be claimed unless explicitly enabled - ```python - "would_claim_uc": [True] * n_households - ``` - -2. **Set disability variables explicitly**: Prevents random UC spikes from LCWRA element - ```python - "is_disabled_for_benefits": [False] * n_people - "uc_limited_capability_for_WRA": [False] * n_people - ``` - -3. **Include required join keys**: Person data needs entity membership - ```python - "person_household_id": household_ids - "person_benunit_id": benunit_ids # UK only - ``` - -4. **Set required household fields**: Vary by country - ```python - # UK - "region": ["LONDON"] * n_households - "tenure_type": ["RENT_PRIVATELY"] * n_households - - # US - "state_code": ["CA"] * n_households - ``` - -### Performance optimisation - -1. **Single simulation for variations**: Create all scenarios in one dataset, run once -2. **Custom variable selection**: Only calculate needed variables -3. **Filter efficiently**: Use quantile filters for decile analysis -4. **Parallel analysis**: Multiple Aggregate calls can run independently - -### Data integrity - -1. **Check weights**: Ensure weights sum to expected population -2. **Validate join keys**: All persons should link to valid households -3. **Review output ranges**: Check calculated values are reasonable -4. **Test edge cases**: Zero income, high income, disabled, elderly - -## Next steps - -- [Economic impact analysis](economic-impact-analysis.md): Full baseline-vs-reform comparison workflow -- [Advanced outputs](advanced-outputs.md): DecileImpact, Poverty, Inequality, IntraDecileImpact -- [Regions and scoping](regions-and-scoping.md): Sub-national analysis (states, constituencies, districts) -- Country-specific documentation: - - [UK tax-benefit model](country-models-uk.md) - - [US tax-benefit model](country-models-us.md) -- [Visualisation](visualisation.md): Publication-ready charts -- [Examples](examples.md): Complete working scripts diff --git a/docs/countries.md b/docs/countries.md new file mode 100644 index 00000000..005f696b --- /dev/null +++ b/docs/countries.md @@ -0,0 +1,83 @@ +--- +title: "Country models" +--- + +The `policyengine` package is country-agnostic; country-specific rules live in separate packages (`policyengine-us`, `policyengine-uk`). This page captures the differences that matter to users. + +## Entities + +| US | UK | +|---|---| +| `person` | `person` | +| `family` | — | +| `marital_unit` | — | +| `tax_unit` | `benunit` | +| `spm_unit` | — | +| `household` | `household` | + +The UK `benunit` is the closest analog to the US `tax_unit` for means-testing — a single adult or married couple plus dependent children. + +## Default income variable + +Net-income calculations use country-specific defaults: + +| | Variable | +|---|---| +| US | `spm_unit_net_income` | +| UK | `hbai_household_net_income` | + +Override in any output with `income_variable=`. + +## Default dataset + +| | Dataset | +|---|---| +| US | Enhanced CPS 2024 (`enhanced_cps_2024.h5`) | +| UK | Enhanced FRS 2023/24 (`enhanced_frs_2023_24.h5`) | + +## State / regional breakdown + +US: `state_code` and `congressional_district` on every household. + +UK: constituency code and local authority code on every household where available. + +## Poverty + +US: SPM (Supplemental Poverty Measure) and deep SPM (below half the threshold). Tracked measures are listed in `US_POVERTY_VARIABLES`. + +UK: AHC (After Housing Costs) and BHC (Before Housing Costs), both relative (60 % of median) and absolute. + +## Reform targeting + +Parameter paths mirror the country's rule-making structure: + +- US: `gov.irs.*`, `gov.states..*`, `gov.usda.*`, `gov.hhs.*` +- UK: `gov.hmrc.*`, `gov.dwp.*`, `gov.obr.*` + +See [Reforms](reforms.md) for how to express changes in either tree. + +## Switching countries + +Most analysis patterns are identical — swap `pe.us` for `pe.uk`: + +```python +# US +pe.us.calculate_household( + people=[{"age": 35, "employment_income": 60_000}], + tax_unit={"filing_status": "SINGLE"}, + household={"state_code": "CA"}, + year=2026, +) + +# UK +pe.uk.calculate_household( + people=[{"age": 35, "employment_income": 50_000}], + year=2026, +) +``` + +Microsim is similarly parallel: `pe.us.ensure_datasets` / `pe.uk.ensure_datasets`, `Simulation(tax_benefit_model_version=pe.us.model)` / `pe.uk.model`. + +## Pinned versions + +Each `policyengine` release pins specific `policyengine-us` and `policyengine-uk` versions. Check them via `pe.us.model.manifest` and `pe.uk.model.manifest`. If the installed country package version diverges, the model warns — see [Release bundles](release-bundles.md). diff --git a/docs/country-models-uk.md b/docs/country-models-uk.md deleted file mode 100644 index 2d09e43e..00000000 --- a/docs/country-models-uk.md +++ /dev/null @@ -1,362 +0,0 @@ -# UK tax-benefit model - -The UK tax-benefit model implements the United Kingdom's tax and benefit system using PolicyEngine UK as the underlying calculation engine. - -## Quick start - -```python -import policyengine as pe - -# Single adult earning £50k -result = pe.uk.calculate_household( - people=[{"age": 35, "employment_income": 50_000}], - year=2026, -) -print(result.person[0].income_tax, result.household.hbai_household_net_income) - -# Family renting, with benefit claims explicitly on -result = pe.uk.calculate_household( - people=[ - {"age": 35, "employment_income": 30_000}, - {"age": 33}, - {"age": 8}, - {"age": 5}, - ], - benunit={"would_claim_uc": True, "would_claim_child_benefit": True}, - household={"rent": 12_000, "region": "NORTH_WEST"}, - year=2026, -) - -# Apply a reform -result = pe.uk.calculate_household( - people=[{"age": 35, "employment_income": 50_000}], - year=2026, - reform={ - "gov.hmrc.income_tax.allowances.personal_allowance.amount": 15_000, - }, -) -``` - -For population-level analysis and reform analysis, see -[Economic impact analysis](economic-impact-analysis.md). - -## Entity structure - -The UK model uses three entity levels: - -``` -household - └── benunit (benefit unit) - └── person -``` - -### Person - -Individual people with demographic and income characteristics. - -**Key variables:** -- `age`: Person's age in years -- `employment_income`: Annual employment income -- `self_employment_income`: Annual self-employment income -- `pension_income`: Annual pension income -- `savings_interest_income`: Annual interest from savings -- `dividend_income`: Annual dividend income -- `income_tax`: Total income tax paid -- `national_insurance`: Total NI contributions -- `is_disabled_for_benefits`: Whether disabled for benefit purposes - -### Benunit (benefit unit) - -The unit for benefit assessment. Usually a single person or a couple with dependent children. - -**Key variables:** -- `universal_credit`: Annual UC payment -- `child_benefit`: Annual child benefit -- `working_tax_credit`: Annual WTC (legacy system) -- `child_tax_credit`: Annual CTC (legacy system) -- `pension_credit`: Annual pension credit -- `income_support`: Annual income support -- `housing_benefit`: Annual housing benefit -- `council_tax_support`: Annual council tax support - -**Important flags:** -- `would_claim_uc`: Must be True to claim UC -- `would_claim_WTC`: Must be True to claim WTC -- `would_claim_CTC`: Must be True to claim CTC -- `would_claim_IS`: Must be True to claim IS -- `would_claim_pc`: Must be True to claim pension credit -- `would_claim_child_benefit`: Must be True to claim child benefit -- `would_claim_housing_benefit`: Must be True to claim HB - -### Household - -The residence unit, typically sharing accommodation. - -**Key variables:** -- `household_net_income`: Total household net income -- `hbai_household_net_income`: HBAI-equivalised net income -- `household_benefits`: Total benefits received -- `household_tax`: Total tax paid -- `household_market_income`: Total market income - -**Required fields:** -- `region`: UK region (e.g., "LONDON", "SOUTH_EAST") -- `tenure_type`: Housing tenure (e.g., "RENT_PRIVATELY", "OWNED_OUTRIGHT") -- `rent`: Annual rent paid -- `council_tax`: Annual council tax - -## Using the UK model - -### Loading representative data - -```python -from policyengine.tax_benefit_models.uk import PolicyEngineUKDataset - -dataset = PolicyEngineUKDataset( - name="FRS 2023-24", - description="Family Resources Survey microdata", - filepath="./data/frs_2023_24_year_2026.h5", - year=2026, -) - -print(f"People: {len(dataset.data.person):,}") -print(f"Benefit units: {len(dataset.data.benunit):,}") -print(f"Households: {len(dataset.data.household):,}") -``` - -### Creating custom scenarios - -```python -import pandas as pd -from microdf import MicroDataFrame -from policyengine.tax_benefit_models.uk import UKYearData - -# Single parent with 2 children -person_df = MicroDataFrame( - pd.DataFrame({ - "person_id": [0, 1, 2], - "person_benunit_id": [0, 0, 0], - "person_household_id": [0, 0, 0], - "age": [35, 8, 5], - "employment_income": [25000, 0, 0], - "person_weight": [1.0, 1.0, 1.0], - "is_disabled_for_benefits": [False, False, False], - "uc_limited_capability_for_WRA": [False, False, False], - }), - weights="person_weight" -) - -benunit_df = MicroDataFrame( - pd.DataFrame({ - "benunit_id": [0], - "benunit_weight": [1.0], - "would_claim_uc": [True], - "would_claim_child_benefit": [True], - "would_claim_WTC": [True], - "would_claim_CTC": [True], - }), - weights="benunit_weight" -) - -household_df = MicroDataFrame( - pd.DataFrame({ - "household_id": [0], - "household_weight": [1.0], - "region": ["LONDON"], - "rent": [15000], # £1,250/month - "council_tax": [2000], - "tenure_type": ["RENT_PRIVATELY"], - }), - weights="household_weight" -) - -dataset = PolicyEngineUKDataset( - name="Single parent scenario", - description="One adult, two children", - filepath="./single_parent.h5", - year=2026, - data=UKYearData( - person=person_df, - benunit=benunit_df, - household=household_df, - ) -) -``` - -### Running a simulation - -```python -from policyengine.core import Simulation -import policyengine as pe - -simulation = Simulation( - dataset=dataset, - tax_benefit_model_version=pe.uk.model, -) -simulation.run() - -# Check results -output = simulation.output_dataset.data -print(output.household[["household_net_income", "household_benefits", "household_tax"]]) -``` - -## Key parameters - -### Income tax - -- `gov.hmrc.income_tax.allowances.personal_allowance.amount`: Personal allowance (£12,570 in 2024-25) -- `gov.hmrc.income_tax.rates.uk[0].rate`: Basic rate (20%) -- `gov.hmrc.income_tax.rates.uk[1].rate`: Higher rate (40%) -- `gov.hmrc.income_tax.rates.uk[2].rate`: Additional rate (45%) -- `gov.hmrc.income_tax.rates.uk[0].threshold`: Basic rate threshold (£50,270) -- `gov.hmrc.income_tax.rates.uk[1].threshold`: Higher rate threshold (£125,140) - -### National insurance - -- `gov.hmrc.national_insurance.class_1.main.primary_threshold`: Primary threshold (£12,570) -- `gov.hmrc.national_insurance.class_1.main.upper_earnings_limit`: Upper earnings limit (£50,270) -- `gov.hmrc.national_insurance.class_1.main.rate`: Main rate (12% below UEL, 2% above) - -### Universal credit - -- `gov.dwp.universal_credit.elements.standard_allowance.single_adult`: Standard allowance for single adult (£334.91/month in 2024-25) -- `gov.dwp.universal_credit.elements.child.first_child`: First child element (£333.33/month) -- `gov.dwp.universal_credit.elements.child.subsequent_child`: Subsequent children (£287.92/month each) -- `gov.dwp.universal_credit.means_test.reduction_rate`: Taper rate (55%) -- `gov.dwp.universal_credit.means_test.earned_income.disregard`: Work allowance - -### Child benefit - -- `gov.hmrc.child_benefit.rates.eldest_child`: First child rate (£25.60/week) -- `gov.hmrc.child_benefit.rates.additional_child`: Additional children (£16.95/week each) -- `gov.hmrc.child_benefit.income_tax_charge.threshold`: HICBC threshold (£60,000) - -## Common policy reforms - -All reform examples use the same flat ``{parameter.path: value}`` dict -the household calculator accepts. ``Simulation`` compiles it into a -``Policy`` at construction; scalar values default to -``{dataset.year}-01-01``. - -### Increasing personal allowance - -```python -policy = {"gov.hmrc.income_tax.allowances.personal_allowance.amount": 15_000} -``` - -### Adjusting UC taper rate - -```python -policy = {"gov.dwp.universal_credit.means_test.reduction_rate": 0.50} -``` - -### Abolishing the two-child limit - -```python -# Set the subsequent-child element equal to the first-child rate. -policy = {"gov.dwp.universal_credit.elements.child.subsequent_child": 333.33} -``` - -Plug any of the above into ``Simulation(policy=policy, ...)``. - -## Regional variations - -The UK model accounts for regional differences: - -- **Council tax**: Varies by local authority -- **Rent levels**: Regional housing markets -- **Scottish income tax**: Different rates and thresholds for Scottish taxpayers - -### Regions - -Valid region values: -- `LONDON` -- `SOUTH_EAST` -- `SOUTH_WEST` -- `EAST_OF_ENGLAND` -- `WEST_MIDLANDS` -- `EAST_MIDLANDS` -- `YORKSHIRE` -- `NORTH_WEST` -- `NORTH_EAST` -- `WALES` -- `SCOTLAND` -- `NORTHERN_IRELAND` - -## Entity mapping - -The UK model has a simpler entity structure than the US, with three levels: person → benunit → household. - -### Direct entity mapping - -You can map data between entities using the `map_to_entity` method: - -```python -# Map person income to benunit level -benunit_income = dataset.data.map_to_entity( - source_entity="person", - target_entity="benunit", - columns=["employment_income"], - how="sum" -) - -# Split household rent equally among persons -person_rent_share = dataset.data.map_to_entity( - source_entity="household", - target_entity="person", - columns=["rent"], - how="divide" -) - -# Map benunit UC to household level -household_uc = dataset.data.map_to_entity( - source_entity="benunit", - target_entity="household", - columns=["universal_credit"], - how="sum" -) -``` - -See the [Entity mapping section](core-concepts.md#entity-mapping) in Core Concepts for full documentation on aggregation methods. - -## Data sources - -The UK model can use several data sources: - -1. **Family Resources Survey (FRS)**: Official UK household survey - - ~19,000 households - - Detailed income and benefit receipt - - Published annually - -2. **Enhanced FRS**: Uprated and enhanced version - - Calibrated to population totals - - Additional imputed variables - - Multiple projection years - -3. **Custom datasets**: User-created scenarios - - Full control over household composition - - Exact income levels - - Specific benefit claiming patterns - -## Validation - -When creating custom datasets, validate: - -1. **Would claim flags**: All set to True -2. **Disability flags**: Set explicitly (not random) -3. **Join keys**: Person data links to benunits and households -4. **Required fields**: Region, tenure_type set correctly -5. **Weights**: Sum to expected values -6. **Income ranges**: Realistic values - -## Examples - -- [UK employment income variation](examples.md#uk-employment-income-variation): Vary employment income, analyse benefit phase-outs -- [UK policy reform analysis](examples.md#uk-policy-reform-analysis): Apply reforms, analyse winners/losers -- [UK income bands](examples.md#uk-income-bands): Calculate net income and tax by income decile - -## References - -- PolicyEngine UK documentation: https://policyengine.github.io/policyengine-uk/ -- UK tax-benefit system: https://www.gov.uk/browse/benefits -- HBAI methodology: https://www.gov.uk/government/statistics/households-below-average-income-for-financial-years-ending-1995-to-2023 diff --git a/docs/country-models-us.md b/docs/country-models-us.md deleted file mode 100644 index 52b44d85..00000000 --- a/docs/country-models-us.md +++ /dev/null @@ -1,439 +0,0 @@ -# US tax-benefit model - -The US tax-benefit model implements the United States federal tax and benefit system using PolicyEngine US as the underlying calculation engine. - -## Quick start - -```python -import policyengine as pe - -# Single adult earning $60k (SINGLE filer, default state) -result = pe.us.calculate_household( - people=[{"age": 35, "employment_income": 60_000}], - tax_unit={"filing_status": "SINGLE"}, - year=2026, -) -print(result.tax_unit.income_tax, result.household.household_net_income) - -# With a reform -result = pe.us.calculate_household( - people=[{"age": 35, "employment_income": 60_000}], - tax_unit={"filing_status": "SINGLE"}, - year=2026, - reform={"gov.irs.credits.ctc.amount.adult_dependent": 1000}, -) - -# Request extra variables not in the default result -result = pe.us.calculate_household( - people=[{"age": 35, "employment_income": 60_000}], - tax_unit={"filing_status": "SINGLE"}, - year=2026, - extra_variables=["adjusted_gross_income", "taxable_income"], -) -``` - -For population-level analysis and reform analysis, see -[Economic impact analysis](economic-impact-analysis.md). - -## Entity structure - -The US model uses a more complex entity hierarchy: - -``` -household - ├── tax_unit (federal tax filing unit) - ├── spm_unit (Supplemental Poverty Measure unit) - ├── family (Census definition) - └── marital_unit (married couple or single person) - └── person -``` - -### Person - -Individual people with demographic and income characteristics. - -**Key variables:** -- `age`: Person's age in years -- `employment_income`: Annual employment income -- `self_employment_income`: Annual self-employment income -- `social_security`: Annual Social Security benefits -- `ssi`: Annual Supplemental Security Income -- `medicaid`: Annual Medicaid value -- `medicare`: Annual Medicare value -- `unemployment_compensation`: Annual unemployment benefits - -### Tax unit - -The federal tax filing unit (individual or married filing jointly). - -**Key variables:** -- `income_tax`: Federal income tax liability -- `employee_payroll_tax`: Employee payroll tax (FICA) -- `eitc`: Earned Income Tax Credit -- `ctc`: Child Tax Credit -- `income_tax_before_credits`: Tax before credits - -### SPM unit - -The Supplemental Poverty Measure unit used for SNAP and other means-tested benefits. - -**Key variables:** -- `snap`: Annual SNAP (food stamps) benefits -- `tanf`: Annual TANF (cash assistance) benefits -- `spm_unit_net_income`: SPM net income -- `spm_unit_size`: Number of people in unit - -### Family - -Census definition of family (related individuals). - -**Key variables:** -- `family_id`: Family identifier -- `family_weight`: Survey weight - -### Marital unit - -Married couple or single person. - -**Key variables:** -- `marital_unit_id`: Marital unit identifier -- `marital_unit_weight`: Survey weight - -### Household - -The residence unit. - -**Key variables:** -- `household_net_income`: Total household net income -- `household_benefits`: Total benefits received -- `household_tax`: Total tax paid -- `household_market_income`: Total market income before taxes and transfers - -**Required fields:** -- `state_code`: State (e.g., "CA", "NY", "TX") - -## Using the US model - -### Loading representative data - -```python -from policyengine.tax_benefit_models.us import PolicyEngineUSDataset - -dataset = PolicyEngineUSDataset( - name="Enhanced CPS 2024", - description="Enhanced Current Population Survey microdata", - filepath="./data/enhanced_cps_2024_year_2024.h5", - year=2024, -) - -print(f"People: {len(dataset.data.person):,}") -print(f"Tax units: {len(dataset.data.tax_unit):,}") -print(f"SPM units: {len(dataset.data.spm_unit):,}") -print(f"Households: {len(dataset.data.household):,}") -``` - -### Creating custom scenarios - -```python -import pandas as pd -from microdf import MicroDataFrame -from policyengine.tax_benefit_models.us import USYearData - -# Married couple with 2 children -person_df = MicroDataFrame( - pd.DataFrame({ - "person_id": [0, 1, 2, 3], - "person_household_id": [0, 0, 0, 0], - "person_tax_unit_id": [0, 0, 0, 0], - "person_spm_unit_id": [0, 0, 0, 0], - "person_family_id": [0, 0, 0, 0], - "person_marital_unit_id": [0, 0, 1, 2], - "age": [35, 33, 8, 5], - "employment_income": [60000, 40000, 0, 0], - "person_weight": [1.0, 1.0, 1.0, 1.0], - }), - weights="person_weight" -) - -tax_unit_df = MicroDataFrame( - pd.DataFrame({ - "tax_unit_id": [0], - "tax_unit_weight": [1.0], - }), - weights="tax_unit_weight" -) - -spm_unit_df = MicroDataFrame( - pd.DataFrame({ - "spm_unit_id": [0], - "spm_unit_weight": [1.0], - }), - weights="spm_unit_weight" -) - -family_df = MicroDataFrame( - pd.DataFrame({ - "family_id": [0], - "family_weight": [1.0], - }), - weights="family_weight" -) - -marital_unit_df = MicroDataFrame( - pd.DataFrame({ - "marital_unit_id": [0, 1, 2], - "marital_unit_weight": [1.0, 1.0, 1.0], - }), - weights="marital_unit_weight" -) - -household_df = MicroDataFrame( - pd.DataFrame({ - "household_id": [0], - "household_weight": [1.0], - "state_code": ["CA"], - }), - weights="household_weight" -) - -dataset = PolicyEngineUSDataset( - name="Married couple scenario", - description="Two adults, two children", - filepath="./married_couple.h5", - year=2024, - data=USYearData( - person=person_df, - tax_unit=tax_unit_df, - spm_unit=spm_unit_df, - family=family_df, - marital_unit=marital_unit_df, - household=household_df, - ) -) -``` - -### Running a simulation - -```python -from policyengine.core import Simulation -import policyengine as pe - -simulation = Simulation( - dataset=dataset, - tax_benefit_model_version=pe.us.model, -) -simulation.run() - -# Check results -output = simulation.output_dataset.data -print(output.household[["household_net_income", "household_benefits", "household_tax"]]) -``` - -## Key parameters - -### Income tax - -- `gov.irs.income.standard_deduction.joint`: Standard deduction (married filing jointly) -- `gov.irs.income.standard_deduction.single`: Standard deduction (single) -- `gov.irs.income.bracket.rates[0]`: 10% bracket rate -- `gov.irs.income.bracket.rates[1]`: 12% bracket rate -- `gov.irs.income.bracket.rates[2]`: 22% bracket rate -- `gov.irs.income.bracket.thresholds.joint[0]`: 10% bracket threshold (MFJ) -- `gov.irs.income.bracket.thresholds.single[0]`: 10% bracket threshold (single) - -### Payroll tax - -- `gov.ssa.payroll.rate.employee`: Employee OASDI rate (6.2%) -- `gov.medicare.payroll.rate`: Medicare rate (1.45%) -- `gov.ssa.payroll.cap`: OASDI wage base ($168,600 in 2024) - -### Child Tax Credit - -- `gov.irs.credits.ctc.amount.base`: Base CTC amount ($2,000 per child) -- `gov.irs.credits.ctc.refundable.amount.max`: Maximum refundable amount ($1,700) -- `gov.irs.credits.ctc.phase_out.threshold.joint`: Phase-out threshold (MFJ) -- `gov.irs.credits.ctc.phase_out.rate`: Phase-out rate - -### Earned Income Tax Credit - -- `gov.irs.credits.eitc.max[0]`: Maximum EITC (0 children) -- `gov.irs.credits.eitc.max[1]`: Maximum EITC (1 child) -- `gov.irs.credits.eitc.max[2]`: Maximum EITC (2 children) -- `gov.irs.credits.eitc.max[3]`: Maximum EITC (3+ children) -- `gov.irs.credits.eitc.phase_out.start[0]`: Phase-out start (0 children) -- `gov.irs.credits.eitc.phase_out.rate[0]`: Phase-out rate (0 children) - -### SNAP - -- `gov.usda.snap.normal_allotment.max[1]`: Maximum benefit (1 person) -- `gov.usda.snap.normal_allotment.max[2]`: Maximum benefit (2 people) -- `gov.usda.snap.income_limit.net`: Net income limit (100% FPL) -- `gov.usda.snap.income_deduction.earned.rate`: Earned income deduction rate (20%) - -## Common policy reforms - -All reform examples use the same flat ``{parameter.path: value}`` dict -the household calculator accepts. ``Simulation`` compiles it into a -``Policy`` at construction; scalar values default to -``{dataset.year}-01-01``. Indexed-breakdown parameters (age groups, -filing statuses) end in ``[N].amount``. - -### Increasing standard deduction - -```python -policy = {"gov.irs.income.standard_deduction.single": 20_000} -``` - -### Expanding Child Tax Credit - -```python -policy = {"gov.irs.credits.ctc.amount.base[0].amount": 3_000} -``` - -### Making CTC fully refundable - -```python -policy = {"gov.irs.credits.ctc.refundable.amount.max": 2_000} -``` - -### Time-varying reform - -```python -policy = { - "gov.irs.credits.ctc.amount.base[0].amount": { - "2026-07-01": 2_500, - "2027-01-01": 3_000, - }, -} -``` - -Plug any of the above into ``Simulation(policy=policy, ...)``. - -## State variations - -The US model includes state-level variations for: - -- **State income tax**: Different rates and structures by state -- **State EITC**: State supplements to federal EITC -- **Medicaid**: State-specific eligibility and benefits -- **TANF**: State-administered cash assistance - -### State codes - -Use two-letter state codes (e.g., "CA", "NY", "TX"). All 50 states plus DC are supported. - -## Entity mapping considerations - -The US model's complex entity structure requires careful attention to entity mapping: - -### Person → Household - -When mapping person-level variables (like `ssi`) to household level, values are summed across all household members: - -```python -agg = Aggregate( - simulation=simulation, - variable="ssi", # Person-level - entity="household", # Aggregate to household - aggregate_type=AggregateType.SUM, -) -# Result: Total SSI for all persons in each household -``` - -### Tax unit → Household - -Tax units nest within households. A household may contain multiple tax units (e.g., adult child filing separately): - -```python -agg = Aggregate( - simulation=simulation, - variable="income_tax", # Tax unit level - entity="household", # Aggregate to household - aggregate_type=AggregateType.SUM, -) -# Result: Total income tax for all tax units in each household -``` - -### Household → Person - -Household variables are replicated to all household members: - -```python -# household_net_income at person level -# Each person in household gets the same household_net_income value -``` - -### Direct entity mapping - -For complex multi-entity scenarios, you can use `map_to_entity` directly: - -```python -# Map SPM unit SNAP benefits to household level -household_snap = dataset.data.map_to_entity( - source_entity="spm_unit", - target_entity="household", - columns=["snap"], - how="sum" -) - -# Split tax unit income equally among persons -person_tax_income = dataset.data.map_to_entity( - source_entity="tax_unit", - target_entity="person", - columns=["taxable_income"], - how="divide" -) - -# Map custom analysis values -custom_analysis = dataset.data.map_to_entity( - source_entity="person", - target_entity="tax_unit", - values=custom_values_array, - how="sum" -) -``` - -See the [Entity mapping section](core-concepts.md#entity-mapping) in Core Concepts for full documentation on aggregation methods. - -## Data sources - -The US model can use several data sources: - -1. **Current Population Survey (CPS)**: Census Bureau household survey - - ~60,000 households - - Detailed income and demographic data - - Published annually - -2. **Enhanced CPS**: Calibrated and enhanced version - - Uprated to population totals - - Imputed benefit receipt - - Multiple projection years - -3. **Custom datasets**: User-created scenarios - - Full control over household composition - - Exact income levels - - Specific tax filing scenarios - -## Validation - -When creating custom datasets, validate: - -1. **Entity relationships**: All persons link to valid tax_unit, spm_unit, household -2. **Join key naming**: Use `person_household_id`, `person_tax_unit_id`, etc. -3. **Weights**: Appropriate weights for each entity level -4. **State codes**: Valid two-letter codes -5. **Filing status**: Tax units should reflect actual filing patterns - -## Examples - -- [US income distribution](examples.md#us-income-distribution): Analyse benefit distribution by income decile -- [US employment income variation](examples.md#us-employment-income-variation): Vary employment income, analyse phase-outs -- [US budgetary impact](examples.md#us-budgetary-impact): Full baseline-vs-reform comparison -- [Simulation performance](examples.md#simulation-performance): Performance benchmarking - -## References - -- PolicyEngine US documentation: https://policyengine.github.io/policyengine-us/ -- IRS tax information: https://www.irs.gov/forms-pubs -- Benefits.gov: https://www.benefits.gov/ -- SPM methodology: https://www.census.gov/topics/income-poverty/supplemental-poverty-measure.html diff --git a/docs/dev.md b/docs/dev.md index 5dc7ac5f..62a3e398 100644 --- a/docs/dev.md +++ b/docs/dev.md @@ -70,7 +70,7 @@ This project uses [towncrier](https://towncrier.readthedocs.io/) for changelog m ```bash # Fragment types: breaking, added, changed, fixed, removed -echo "Description of change" > changelog.d/my-change.added +echo "Description of change" > changelog.d/my-branch.added.md ``` On merge, the versioning workflow bumps the version, builds the changelog, and creates a GitHub Release. @@ -84,14 +84,17 @@ For the target release-bundle architecture, see [Release bundles](release-bundle ``` src/policyengine/ ├── __init__.py # Public surface: `pe.uk`, `pe.us`, `pe.Simulation` +├── cli.py # `policyengine` entry point (e.g. TRACE TRO emission) ├── core/ # Domain models (Simulation, Dataset, Policy, etc.) ├── tax_benefit_models/ │ ├── common/ # MicrosimulationModelVersion base, result types, reform compiler │ ├── uk/ # UK model, datasets, household calculator, reform analysis │ └── us/ # US model, datasets, household calculator, reform analysis ├── outputs/ # Output templates (Aggregate, Poverty, etc.) +├── results/ # Typed results + schema validation ├── provenance/ # Release manifests + TRACE TRO export ├── countries/ # Geographic region registries (scoping, constituencies, districts) +├── data/ # Bundled release manifests and schemas └── utils/ # Helpers (reforms, entity mapping, plotting) ``` diff --git a/docs/economic-impact-analysis.md b/docs/economic-impact-analysis.md deleted file mode 100644 index 9a81f46b..00000000 --- a/docs/economic-impact-analysis.md +++ /dev/null @@ -1,242 +0,0 @@ -# Economic impact analysis - -The `economic_impact_analysis()` function is the canonical way to compare a baseline simulation against a reform simulation. It produces a comprehensive `PolicyReformAnalysis` containing decile impacts, programme-by-programme statistics, poverty rates, and inequality metrics in a single call. - -## Overview - -There are two approaches to comparing simulations: - -| Approach | Use case | -|---|---| -| `ChangeAggregate` | Single-metric queries: "What is the total tax revenue change?" | -| `economic_impact_analysis()` | Full analysis: decile impacts, programme stats, poverty, inequality | - -`ChangeAggregate` gives you one number per call. `economic_impact_analysis()` runs ~30+ aggregate computations and returns a structured result containing everything. - -## Full analysis workflow - -### US example - -```python -import policyengine as pe -from policyengine.core import Simulation - -# 1. Load data -datasets = pe.us.ensure_datasets( - datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"], - years=[2026], - data_folder="./data", -) -dataset = datasets["enhanced_cps_2024_2026"] - -# 2. Build baseline and reform simulations. -# The reform dict is the same shape `pe.us.calculate_household(reform=...)` accepts — -# Simulation compiles it into a Policy automatically. -baseline_sim = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model) -reform_sim = Simulation( - dataset=dataset, - tax_benefit_model_version=pe.us.model, - policy={"gov.irs.credits.ctc.amount.base[0].amount": 3_000}, -) - -# 3. Run full analysis (ensure() is called internally) -analysis = pe.us.economic_impact_analysis(baseline_sim, reform_sim) -``` - -### UK example - -```python -import policyengine as pe -from policyengine.core import Simulation - -datasets = pe.uk.ensure_datasets( - datasets=["hf://policyengine/policyengine-uk-data/enhanced_frs_2023_24.h5"], - years=[2026], - data_folder="./data", -) -dataset = datasets["enhanced_frs_2023_24_2026"] - -baseline_sim = Simulation(dataset=dataset, tax_benefit_model_version=pe.uk.model) -reform_sim = Simulation( - dataset=dataset, - tax_benefit_model_version=pe.uk.model, - policy={"gov.hmrc.income_tax.allowances.personal_allowance.amount": 0}, -) - -analysis = pe.uk.economic_impact_analysis(baseline_sim, reform_sim) -``` - -> If you need the full `Policy` / `ParameterValue` construction (e.g., a reform with a custom ``simulation_modifier`` callable), you can still pass an object; see `policyengine.core.policy` for details. - -## What `economic_impact_analysis()` computes - -The function calls `ensure()` on both simulations (run + cache if not already computed), then produces: - -### Decile impacts - -Mean income changes by income decile (1-10), with counts of people better off, worse off, and unchanged. - -```python -for d in analysis.decile_impacts.outputs: - print(f"Decile {d.decile}: avg change={d.absolute_change:+.0f}, " - f"relative={d.relative_change:+.2f}%") -``` - -**Fields on each `DecileImpact`:** -- `decile`: 1-10 -- `baseline_mean`, `reform_mean`: Mean income before and after reform -- `absolute_change`: Mean absolute income change -- `relative_change`: Mean percentage income change -- `count_better_off`, `count_worse_off`, `count_no_change`: Weighted counts - -### Programme/program statistics - -Per-programme totals, changes, and winner/loser counts. - -**US programs analysed:** `income_tax`, `payroll_tax`, `state_income_tax`, `snap`, `tanf`, `ssi`, `social_security`, `medicare`, `medicaid`, `eitc`, `ctc` - -**UK programmes analysed:** `income_tax`, `national_insurance`, `vat`, `council_tax`, `universal_credit`, `child_benefit`, `pension_credit`, `income_support`, `working_tax_credit`, `child_tax_credit` - -```python -for p in analysis.program_statistics.outputs: # US - print(f"{p.program_name}: baseline=${p.baseline_total/1e9:.1f}B, " - f"reform=${p.reform_total/1e9:.1f}B, change=${p.change/1e9:+.1f}B") -``` - -**Fields on each `ProgramStatistics` / `ProgrammeStatistics`:** -- `program_name` / `programme_name`: Variable name -- `baseline_total`, `reform_total`: Weighted sums -- `change`: `reform_total - baseline_total` -- `baseline_count`, `reform_count`: Weighted recipient counts -- `winners`, `losers`: Weighted counts of people gaining/losing - -### Poverty rates - -Poverty headcount and rates for both baseline and reform simulations. - -**US poverty types:** SPM poverty, deep SPM poverty - -**UK poverty types:** Absolute BHC, absolute AHC, relative BHC, relative AHC - -```python -for bp, rp in zip(analysis.baseline_poverty.outputs, - analysis.reform_poverty.outputs): - print(f"{bp.poverty_type}: baseline={bp.rate:.4f}, reform={rp.rate:.4f}") -``` - -### Inequality metrics - -Gini coefficient and income share metrics for both simulations. - -```python -bi = analysis.baseline_inequality -ri = analysis.reform_inequality -print(f"Gini: baseline={bi.gini:.4f}, reform={ri.gini:.4f}") -print(f"Top 10% share: baseline={bi.top_10_share:.4f}, reform={ri.top_10_share:.4f}") -print(f"Top 1% share: baseline={bi.top_1_share:.4f}, reform={ri.top_1_share:.4f}") -print(f"Bottom 50% share: baseline={bi.bottom_50_share:.4f}, reform={ri.bottom_50_share:.4f}") -``` - -## The `PolicyReformAnalysis` return type - -```python -class PolicyReformAnalysis(BaseModel): - decile_impacts: OutputCollection[DecileImpact] - program_statistics: OutputCollection[ProgramStatistics] # US - # programme_statistics: OutputCollection[ProgrammeStatistics] # UK - baseline_poverty: OutputCollection[Poverty] - reform_poverty: OutputCollection[Poverty] - baseline_inequality: Inequality - reform_inequality: Inequality -``` - -Each `OutputCollection` contains: -- `outputs`: List of individual output objects -- `dataframe`: A pandas DataFrame with all results in tabular form - -## Using ChangeAggregate for targeted queries - -When you only need a single metric, `ChangeAggregate` is more direct than the full analysis pipeline. It requires that both simulations have already been run (or ensure'd). - -### Tax revenue change - -```python -from policyengine.outputs.change_aggregate import ChangeAggregate, ChangeAggregateType - -baseline_sim.run() -reform_sim.run() - -revenue = ChangeAggregate( - baseline_simulation=baseline_sim, - reform_simulation=reform_sim, - variable="household_tax", - aggregate_type=ChangeAggregateType.SUM, -) -revenue.run() -print(f"Revenue change: ${revenue.result / 1e9:.1f}B") -``` - -### Winners and losers - -```python -winners = ChangeAggregate( - baseline_simulation=baseline_sim, - reform_simulation=reform_sim, - variable="household_net_income", - aggregate_type=ChangeAggregateType.COUNT, - change_geq=1, # Gained at least $1 -) -winners.run() - -losers = ChangeAggregate( - baseline_simulation=baseline_sim, - reform_simulation=reform_sim, - variable="household_net_income", - aggregate_type=ChangeAggregateType.COUNT, - change_leq=-1, # Lost at least $1 -) -losers.run() -``` - -### Filtering by income decile - -```python -# Average loss in the 3rd income decile -avg_loss = ChangeAggregate( - baseline_simulation=baseline_sim, - reform_simulation=reform_sim, - variable="household_net_income", - aggregate_type=ChangeAggregateType.MEAN, - filter_variable="household_net_income", - quantile=10, - quantile_eq=3, -) -avg_loss.run() -``` - -### Filter options reference - -**Absolute change filters:** -- `change_geq`: Change >= value (e.g., gain >= 500) -- `change_leq`: Change <= value (e.g., loss <= -500) -- `change_eq`: Change == value - -**Relative change filters:** -- `relative_change_geq`: Relative change >= value (decimal, e.g., 0.05 = 5%) -- `relative_change_leq`: Relative change <= value -- `relative_change_eq`: Relative change == value - -**Variable filters:** -- `filter_variable`: Variable to filter on (from the baseline simulation) -- `filter_variable_eq`, `filter_variable_leq`, `filter_variable_geq`: Comparison operators - -**Quantile filters:** -- `quantile`: Number of quantiles (e.g., 10 for deciles, 5 for quintiles) -- `quantile_eq`: Exact quantile (e.g., 3 for 3rd decile) -- `quantile_leq`: Maximum quantile -- `quantile_geq`: Minimum quantile - -## Examples - -- [UK policy reform analysis](examples.md#uk-policy-reform-analysis): Full reform analysis with ChangeAggregate and visualisation -- [US budgetary impact](examples.md#us-budgetary-impact): Budgetary impact comparing both approaches diff --git a/docs/getting-started.md b/docs/getting-started.md new file mode 100644 index 00000000..bbaa3cee --- /dev/null +++ b/docs/getting-started.md @@ -0,0 +1,113 @@ +--- +title: "Getting started" +--- + +## Install + +```bash +pip install policyengine +``` + +The base install contains the wrapper API. Install each country's rules alongside: + +```bash +pip install policyengine[us] # US +pip install policyengine[uk] # UK +pip install policyengine[us,uk] # both +``` + +Country modules (`pe.us`, `pe.uk`) are only importable if the matching country package is installed. + +## Compute one household + +```python +import policyengine as pe + +result = pe.us.calculate_household( + people=[{"age": 35, "employment_income": 60_000}], + tax_unit={"filing_status": "SINGLE"}, + household={"state_code": "CA"}, + year=2026, +) + +result.tax_unit.income_tax +result.tax_unit.eitc +result.household.household_net_income +``` + +Each `.*` lookup is a regular Python scalar. The result is a typed `HouseholdResult` with entity sections (`person[i]`, `tax_unit`, `spm_unit`, `household`) populated from every variable in the country model. + +## Apply a reform + +Reforms are parameter-path → value dicts: + +```python +reformed = pe.us.calculate_household( + people=[{"age": 35, "employment_income": 60_000}], + tax_unit={"filing_status": "SINGLE"}, + year=2026, + reform={"gov.irs.credits.ctc.amount.adult_dependent": 1_000}, +) +``` + +For values effective on specific dates, pass a nested dict: + +```python +reform = { + "gov.irs.credits.ctc.amount.adult_dependent": { + "2026-01-01": 1_000, + "2028-01-01": 2_000, + }, +} +``` + +Scale parameters are addressed by bracket index: + +```python +reform = {"gov.irs.credits.ctc.amount.base[0].amount": 3_000} +``` + +See [Reforms](reforms.md) for structural reforms and the `Simulation.policy=` counterpart for population analysis. + +## Scale up + +For population estimates — budget cost, distributional impact, poverty — move to a microsimulation over calibrated microdata. The reform dict carries over unchanged; only the constructor changes. + +```python +from policyengine.core import Simulation + +datasets = pe.us.ensure_datasets( + datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"], + years=[2026], +) +dataset = datasets["enhanced_cps_2024_2026"] + +baseline = Simulation( + dataset=dataset, + tax_benefit_model_version=pe.us.model, +) +reformed = Simulation( + dataset=dataset, + tax_benefit_model_version=pe.us.model, + policy={"gov.irs.credits.ctc.amount.adult_dependent": 1_000}, +) +``` + +(Note: `reform=` for household calc, `policy=` for `Simulation` — same dict shape.) + +## What you get back + +Every calculation returns a typed result with sections per entity: + +- **US**: `person`, `family`, `marital_unit`, `tax_unit`, `spm_unit`, `household` +- **UK**: `person`, `benunit`, `household` + +Person-level lookups index the list: `result.person[0].age`. Group-entity lookups don't: `result.tax_unit.income_tax`. + +Unknown variable names raise with suggestions — no silent zero returns. + +## Next + +- [Households](households.md) — full reference for `calculate_household` +- [Reforms](reforms.md) — parametric and structural reforms +- [Microsimulation](microsim.md) — population-level analysis diff --git a/docs/households.md b/docs/households.md new file mode 100644 index 00000000..cae9501a --- /dev/null +++ b/docs/households.md @@ -0,0 +1,129 @@ +--- +title: "Households" +--- + +`pe.us.calculate_household` and `pe.uk.calculate_household` compute every variable in the country model for a single household. Same keyword arguments, different entity structures. + +## US + +```python +result = pe.us.calculate_household( + people=[ + {"age": 35, "employment_income": 40_000}, + {"age": 33}, + {"age": 8}, + {"age": 5}, + ], + tax_unit={"filing_status": "JOINT"}, + household={"state_code": "TX"}, + year=2026, +) +``` + +### Entities + +| Argument | Purpose | +|---|---| +| `people` | List of person dicts. Keys are any person-level variable on the model. | +| `tax_unit` | Tax-unit inputs (e.g. `filing_status`). | +| `spm_unit` | SPM-unit inputs. | +| `household` | Household inputs. `state_code` is essentially always needed. | +| `family` | Family-level inputs. | +| `marital_unit` | Marital-unit inputs. | + +All adults default to one shared tax unit and household. For separate tax units (e.g. two adult roommates), construct the `Simulation` directly and set the entity-membership arrays. + +## UK + +```python +result = pe.uk.calculate_household( + people=[ + {"age": 35, "employment_income": 50_000}, + {"age": 33, "employment_income": 30_000}, + {"age": 4}, + ], + benunit={}, + household={}, + year=2026, +) +``` + +| Argument | Purpose | +|---|---| +| `people` | Person-level inputs. | +| `benunit` | Benefit unit (closest analog to US tax unit — single adult or couple plus their dependent children). | +| `household` | Household-level inputs. | + +## Reforms + +Pass a `reform` dict of parameter-path to value: + +```python +pe.us.calculate_household( + ..., + reform={"gov.irs.credits.ctc.amount.adult_dependent": 1_000}, +) +``` + +Scale parameters use bracket indexing: + +```python +reform = {"gov.irs.credits.ctc.amount.base[0].amount": 3_000} +``` + +Time-varying reforms use a nested dict of `YYYY-MM-DD → value`: + +```python +reform = { + "gov.irs.credits.ctc.amount.adult_dependent": { + "2026-01-01": 1_000, + "2028-01-01": 2_000, + }, +} +``` + +Structural reforms (new variables, formula swaps) require the `Simulation` path — see [Reforms](reforms.md). + +## Year + +```python +pe.us.calculate_household(..., year=2026) +``` + +The year determines which parameter values apply. For multi-year analysis, call the function once per year rather than building a custom reform. + +## Extra variables + +The result exposes every variable in the model by default. To surface variables that aren't in the default catalog explicitly: + +```python +result = pe.us.calculate_household( + ..., + extra_variables=["medicaid_income_level", "spm_unit_spm_threshold"], +) +``` + +## Accessing the result + +```python +result.person[0].income_tax # first person +result.person[2].age # third person +result.tax_unit.income_tax # single tax unit +result.household.household_net_income # single household +``` + +The result is a Pydantic model — `.model_dump()` gives you a dict, individual sections are regular attribute lookups. + +## Errors + +Unknown variables raise with the closest match: + +``` +ValueError: Unknown variable 'income_ax'. Did you mean 'income_tax'? +``` + +Unknown parameters in reforms raise similarly. Misplaced inputs (a person-level variable under `tax_unit=...`) raise with entity hints. The catalog is enumerated at construction time — typos fail fast. + +## When not to use this + +Loops over many households are much slower than a single `Simulation` call. For population analysis, see [Microsimulation](microsim.md) — the reform dict carries over identically. diff --git a/docs/impact-analysis.md b/docs/impact-analysis.md new file mode 100644 index 00000000..6f7d43fe --- /dev/null +++ b/docs/impact-analysis.md @@ -0,0 +1,110 @@ +--- +title: "Impact analysis" +--- + +`economic_impact_analysis` runs a baseline and a reform simulation through a bundled set of outputs — decile impacts, program statistics, poverty, and inequality — and returns a typed `PolicyReformAnalysis`. + +## Usage + +```python +import policyengine as pe +from policyengine.core import Simulation + +datasets = pe.us.ensure_datasets( + datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"], + years=[2026], +) +dataset = datasets["enhanced_cps_2024_2026"] + +baseline = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model) +reformed = Simulation( + dataset=dataset, + tax_benefit_model_version=pe.us.model, + policy={"gov.irs.credits.ctc.amount.base[0].amount": 3_000}, +) + +analysis = pe.us.economic_impact_analysis(baseline, reformed) +``` + +The UK equivalent is `pe.uk.economic_impact_analysis`. Both call `Simulation.ensure()` internally — run/cached simulations are reused, fresh ones are computed and cached. + +## What it returns + +A `PolicyReformAnalysis` with: + +| Attribute | Type | Content | +|---|---|---| +| `decile_impacts` | `OutputCollection[DecileImpact]` | Mean baseline / reform / change and winner-loser counts per decile | +| `program_statistics` | `OutputCollection[ProgramStatistics]` | Totals, counts, winners/losers per program | +| `baseline_poverty` | `OutputCollection[Poverty]` | Baseline rates by measure and demographic group | +| `reform_poverty` | `OutputCollection[Poverty]` | Reform rates, same schema as baseline | +| `baseline_inequality` | `Inequality` | Gini plus top / bottom income shares (baseline) | +| `reform_inequality` | `Inequality` | Same, under the reform | + +`OutputCollection` exposes `.outputs` (typed list) and `.dataframe` (flat DataFrame). + +```python +for prog in analysis.program_statistics.outputs: + print(prog.program_name, prog.change) + +for d in analysis.decile_impacts.outputs: + print(d.decile, d.absolute_change, d.relative_change) + +analysis.reform_inequality.gini - analysis.baseline_inequality.gini +``` + +## When to call it + +- Producing a reform brief covering multiple metrics +- Standardising reporting across reforms where each run should cover the same bundle + +## When not to + +- If you only need one number (a budget cost, a single poverty rate), `ChangeAggregate` / `Aggregate` / `Poverty` avoids running ~30+ aggregations. +- If you're sweeping a parameter, cache the baseline and build a new reform simulation per iteration: + +```python +baseline = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model) +for amount in [0, 1_000, 2_000, 3_000]: + reformed = Simulation( + dataset=dataset, + tax_benefit_model_version=pe.us.model, + policy={"gov.irs.credits.ctc.amount.base[0].amount": amount}, + ) + analysis = pe.us.economic_impact_analysis(baseline, reformed) +``` + +## Composing manually + +`economic_impact_analysis` is a thin wrapper over the convenience functions in `policyengine.outputs`. Replicate it if you need a different bundle or can skip sections: + +```python +from policyengine.outputs import ( + ChangeAggregate, ChangeAggregateType, + calculate_decile_impacts, + calculate_us_poverty_rates, + calculate_us_inequality, +) + +budget = ChangeAggregate( + baseline_simulation=baseline, + reform_simulation=reformed, + variable="household_net_income", + aggregate_type=ChangeAggregateType.SUM, +) +budget.run() + +deciles = calculate_decile_impacts(baseline_simulation=baseline, reform_simulation=reformed) + +baseline_poverty = calculate_us_poverty_rates(simulation=baseline) +reform_poverty = calculate_us_poverty_rates(simulation=reformed) + +baseline_ineq = calculate_us_inequality(simulation=baseline) +reform_ineq = calculate_us_inequality(simulation=reformed) +``` + +## Next + +- [Outputs](outputs.md) — individual output classes and their options +- [Regions](regions.md) — state, constituency, and district breakdowns +- [Examples](examples.md) — runnable scripts using this helper diff --git a/docs/index.md b/docs/index.md index bbd88974..1aa1b28f 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,20 +1,46 @@ -# policyengine.py - -This package aims to simplify and productionise the use of PolicyEngine's tax-benefit microsimulation models to flexibly produce useful information at scale, slotting into existing analysis pipelines while also standardising analysis. - -We do this by: -* Standardising around a set of core types that let us do policy analysis in an object-oriented way -* Exemplifying this behaviour by using this package in all PolicyEngine's production applications, and analyses - -## Documentation - -- [Core concepts](core-concepts.md): Architecture, datasets, simulations, policies, outputs, entity mapping -- [Economic impact analysis](economic-impact-analysis.md): Full baseline-vs-reform comparison workflow -- [Advanced outputs](advanced-outputs.md): DecileImpact, Poverty, Inequality, IntraDecileImpact -- [Regions and scoping](regions-and-scoping.md): Sub-national analysis (states, constituencies, districts) -- [UK tax-benefit model](country-models-uk.md): Entities, parameters, reform examples -- [US tax-benefit model](country-models-us.md): Entities, parameters, reform examples -- [Examples](examples.md): Complete working scripts -- [Visualisation](visualisation.md): Publication-ready charts with Plotly -- [Release bundles](release-bundles.md): Reproducible model-plus-data certification and provenance -- [Development](dev.md): Setup, testing, CI, architecture +--- +title: "policyengine" +--- + +`policyengine` is the Python wrapper for PolicyEngine's tax-benefit microsimulation models. One package for both single-household calculations and population-scale microsimulation, US and UK. + +## Install + +```bash +pip install policyengine[us] # US rules +pip install policyengine[uk] # UK rules +pip install policyengine[us,uk] # both +``` + +## Minimal example + +```python +import policyengine as pe + +result = pe.us.calculate_household( + people=[{"age": 35, "employment_income": 60_000}], + tax_unit={"filing_status": "SINGLE"}, + household={"state_code": "CA"}, + year=2026, +) + +result.tax_unit.income_tax +result.household.household_net_income +``` + +## Where to go next + +| If you want to | Go to | +|---|---| +| Install and run your first calculation | [Getting started](getting-started.md) | +| Compute taxes and benefits for one family | [Households](households.md) | +| Express a policy change | [Reforms](reforms.md) | +| Produce population estimates (budget cost, poverty) | [Microsimulation](microsim.md) | +| See the full catalog of typed outputs | [Outputs](outputs.md) | +| Run the canonical baseline-vs-reform bundle | [Impact analysis](impact-analysis.md) | +| Break results down by state, constituency, district | [Regions](regions.md) | +| Understand US vs UK differences | [Countries](countries.md) | +| Build publication-ready charts | [Visualisation](visualisation.md) | +| Pin a reproducible model-plus-data version | [Release bundles](release-bundles.md) | +| See a full worked script | [Examples](examples.md) | +| Develop against the source | [Development](dev.md) | diff --git a/docs/microsim.md b/docs/microsim.md new file mode 100644 index 00000000..9a80059e --- /dev/null +++ b/docs/microsim.md @@ -0,0 +1,152 @@ +--- +title: "Microsimulation" +--- + +For population-level estimates — budget cost, winners and losers, poverty impact — run a microsimulation over calibrated microdata. + +## Quick example + +```python +import policyengine as pe +from policyengine.core import Simulation +from policyengine.outputs import Aggregate, AggregateType + +datasets = pe.us.ensure_datasets( + datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"], + years=[2026], +) +dataset = datasets["enhanced_cps_2024_2026"] + +baseline = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model) +baseline.ensure() + +total_snap = Aggregate( + simulation=baseline, + variable="snap", + aggregate_type=AggregateType.SUM, +) +total_snap.run() +total_snap.result +``` + +`Simulation.ensure()` loads a cached result if one exists, or runs and caches on miss. Call `Simulation.run()` explicitly if you want to bypass the cache. + +## Datasets + +Microdata is stored as HDF5 on Hugging Face. `ensure_datasets` downloads, caches, and uprates: + +```python +datasets = pe.us.ensure_datasets( + datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"], + years=[2024, 2026], + data_folder="./data", # local cache directory +) +# Keys are "_": +dataset = datasets["enhanced_cps_2024_2026"] +``` + +The default US dataset is **Enhanced CPS 2024** — CPS ASEC fused with IRS SOI tax-return records and calibrated to IRS, CMS, SNAP, and other administrative totals. The UK default is **Enhanced FRS 2023/24** — the Family Resources Survey fused with tax-return microdata and calibrated to HMRC and DWP totals. + +List datasets already known to the country: + +```python +pe.us.load_datasets() # or pe.uk.load_datasets() +``` + +## Simulations + +A `Simulation` needs a dataset, a tax-benefit model version, and optionally a policy (reform): + +```python +baseline = Simulation( + dataset=dataset, + tax_benefit_model_version=pe.us.model, +) + +reformed = Simulation( + dataset=dataset, + tax_benefit_model_version=pe.us.model, + policy={"gov.irs.credits.ctc.amount.base[0].amount": 3_000}, +) +``` + +`policy=` accepts the same flat `{"param.path": value}` dict shape as `pe.us.calculate_household(reform=...)`, or a `Policy` object with explicit `ParameterValue` entries. Scale parameters use bracket indexing — see [Reforms](reforms.md). + +## Outputs + +Every output has the same lifecycle: instantiate with the simulation(s) and configuration, call `.run()`, read the typed result fields. + +```python +from policyengine.outputs import ( + Aggregate, AggregateType, + ChangeAggregate, ChangeAggregateType, +) + +snap_cost = Aggregate( + simulation=baseline, + variable="snap", + aggregate_type=AggregateType.SUM, +) +snap_cost.run() + +budget = ChangeAggregate( + baseline_simulation=baseline, + reform_simulation=reformed, + variable="household_net_income", + aggregate_type=ChangeAggregateType.SUM, +) +budget.run() +``` + +See [Outputs](outputs.md) for the full catalog. + +## Memory and performance + +A full Enhanced CPS microsimulation uses roughly 4 GB of memory and takes 15–30 seconds on a laptop. For parameter sweeps, reuse the baseline: + +```python +baseline = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model) +for amount in [0, 1_000, 2_000, 3_000]: + reformed = Simulation( + dataset=dataset, + tax_benefit_model_version=pe.us.model, + policy={"gov.irs.credits.ctc.amount.base[0].amount": amount}, + ) + # each iteration runs only the reform +``` + +Downsampled datasets are available for testing: + +```python +datasets = pe.us.ensure_datasets( + datasets=["hf://policyengine/policyengine-us-data/cps_small_2024.h5"], + years=[2026], +) +``` + +These run in seconds and are fine for integration tests. Don't use them for production analysis — the weights are not calibration-tuned. + +## Managed microsimulation + +`managed_microsimulation` constructs a country-package `Microsimulation` pinned to the `policyengine.py` release bundle (so the dataset selection is certified, not ad-hoc): + +```python +from policyengine.tax_benefit_models.us import managed_microsimulation + +sim = managed_microsimulation() +# `sim` is a policyengine_us.Microsimulation — use its API directly +``` + +Pass `allow_unmanaged=True` with a custom `dataset=` to opt out of the release bundle. + +## Pinned model versions + +Every `policyengine` release pins specific country-model and country-data versions so results are reproducible. `pe.us.model` and `pe.uk.model` expose the pinned `TaxBenefitModelVersion`. + +If the installed country-package version doesn't match the pinned manifest, `managed_microsimulation` warns. For strict reproducibility, pin country packages to the versions the `policyengine` release was built against — see [Release bundles](release-bundles.md). + +## Next + +- [Outputs](outputs.md) — catalog of typed output classes +- [Impact analysis](impact-analysis.md) — full baseline-vs-reform in one call +- [Regions](regions.md) — sub-national analysis diff --git a/docs/outputs.md b/docs/outputs.md new file mode 100644 index 00000000..0d2de093 --- /dev/null +++ b/docs/outputs.md @@ -0,0 +1,246 @@ +--- +title: "Outputs" +--- + +Outputs are Pydantic models that hold a simulation (or a baseline + reform pair) plus configuration, and populate result fields when you call `.run()`. + +```python +out = Aggregate( + simulation=baseline, + variable="snap", + aggregate_type=AggregateType.SUM, +) +out.run() +out.result +``` + +Convenience functions (e.g. `calculate_decile_impacts`, `calculate_us_poverty_rates`) construct and run collections of outputs in one call and return an `OutputCollection[T]` with `.outputs` (typed list) and `.dataframe`. + +## Aggregate + +Single-number summaries over one simulation. + +```python +from policyengine.outputs import Aggregate, AggregateType + +snap = Aggregate( + simulation=baseline, + variable="snap", + aggregate_type=AggregateType.SUM, +) +snap.run() +snap.result +``` + +`AggregateType` values: `SUM`, `MEAN`, `COUNT`. + +Filter by another variable: + +```python +Aggregate( + simulation=baseline, + variable="household_net_income", + aggregate_type=AggregateType.MEAN, + filter_variable="household_size", + filter_variable_geq=4, +) +``` + +Or by quantile: + +```python +Aggregate( + simulation=baseline, + variable="household_net_income", + aggregate_type=AggregateType.MEAN, + filter_variable="household_net_income", + quantile=10, + quantile_eq=1, # bottom decile +) +``` + +## ChangeAggregate + +Difference between a baseline and a reform, optionally filtered. + +```python +from policyengine.outputs import ChangeAggregate, ChangeAggregateType + +budget = ChangeAggregate( + baseline_simulation=baseline, + reform_simulation=reform, + variable="household_net_income", + aggregate_type=ChangeAggregateType.SUM, +) +budget.run() +budget.result +``` + +`ChangeAggregateType` values: `SUM`, `MEAN`, `COUNT`. + +The change is `reform - baseline`. Filter on the change itself: + +```python +winners = ChangeAggregate( + baseline_simulation=baseline, + reform_simulation=reform, + variable="household_net_income", + aggregate_type=ChangeAggregateType.COUNT, + change_geq=1, # households gaining at least $1 +) +``` + +Or on a relative change — `relative_change_geq=0.05` selects households with a 5 %+ gain. + +## DecileImpact + +One decile's baseline mean, reform mean, and mean change. For all ten at once, use `calculate_decile_impacts`. + +```python +from policyengine.outputs import calculate_decile_impacts + +impacts = calculate_decile_impacts( + baseline_simulation=baseline, + reform_simulation=reform, + income_variable="household_net_income", +) + +for row in impacts.outputs: + print(row.decile, row.absolute_change, row.relative_change) + +impacts.dataframe # same data as a DataFrame +``` + +## IntraDecileImpact + +Distribution of household-level impact within each decile (five bucket categories summing to 1.0). Use `compute_intra_decile_impacts` for the full set. + +```python +from policyengine.outputs import compute_intra_decile_impacts + +spread = compute_intra_decile_impacts( + baseline_simulation=baseline, + reform_simulation=reform, + income_variable="household_net_income", +) +``` + +## Poverty + +Poverty headcount and rate for one measure and one simulation. + +```python +from policyengine.outputs import Poverty + +rate = Poverty( + simulation=baseline, + poverty_variable="spm_unit_is_in_spm_poverty", + entity="person", +) +rate.run() +rate.headcount, rate.total_population, rate.rate +``` + +For all canonical poverty measures over one simulation: + +```python +from policyengine.outputs import calculate_us_poverty_rates + +rates = calculate_us_poverty_rates(simulation=baseline) +rates.outputs # Poverty entries for each measure +rates.dataframe +``` + +Call it once per simulation for a baseline-vs-reform comparison. Age / gender / race breakdowns: `calculate_us_poverty_by_age`, `_by_gender`, `_by_race`. UK counterparts: `calculate_uk_poverty_rates`, `_by_age`, `_by_gender`. + +## Inequality + +Gini, top-10 share, top-1 share, bottom-50 share — for one simulation. + +```python +from policyengine.outputs import Inequality + +ineq = Inequality( + simulation=baseline, + income_variable="household_net_income", + entity="household", +) +ineq.run() +ineq.gini, ineq.top_10_share, ineq.top_1_share, ineq.bottom_50_share +``` + +With defaults pre-wired for the country: + +```python +from policyengine.outputs import calculate_us_inequality, USInequalityPreset + +baseline_ineq = calculate_us_inequality( + simulation=baseline, + preset=USInequalityPreset.STANDARD, +) +reform_ineq = calculate_us_inequality( + simulation=reform, + preset=USInequalityPreset.STANDARD, +) +``` + +`calculate_uk_inequality` is the UK equivalent. + +## ProgramStatistics + +Per-program totals, counts, and winners/losers for a reform. + +```python +from policyengine.outputs import ProgramStatistics + +stats = ProgramStatistics( + baseline_simulation=baseline, + reform_simulation=reform, + program_name="snap", + entity="spm_unit", +) +stats.run() +stats.baseline_total, stats.reform_total, stats.change +stats.baseline_count, stats.reform_count +stats.winners, stats.losers +``` + +`is_tax=True` treats the variable as a tax (positive baseline is a cost to households). + +## Geographic outputs + +### US congressional districts + +```python +from policyengine.outputs import compute_us_congressional_district_impacts + +impacts = compute_us_congressional_district_impacts( + baseline_simulation=baseline, + reform_simulation=reform, +) +for row in impacts.district_results: + print(row["district_geoid"], row["avg_change"], row["winner_percentage"]) +``` + +### UK constituencies / local authorities + +Constituency and local-authority breakdowns require externally-supplied weight matrices: + +```python +from policyengine.outputs import compute_uk_constituency_impacts + +impacts = compute_uk_constituency_impacts( + baseline_simulation=baseline, + reform_simulation=reform, + weight_matrix_path="parliamentary_constituency_weights.h5", + constituency_csv_path="constituencies_2024.csv", + year="2025", +) +impacts.constituency_results +``` + +`compute_uk_local_authority_impacts` follows the same pattern. See [Regions](regions.md). + +## Writing your own + +Subclass `Output`, declare Pydantic fields for configuration and results, implement `run()` to populate the result fields. The base class is a plain `BaseModel` — see `src/policyengine/outputs/aggregate.py` for the simplest reference implementation. diff --git a/docs/reforms.md b/docs/reforms.md new file mode 100644 index 00000000..f66dffd0 --- /dev/null +++ b/docs/reforms.md @@ -0,0 +1,108 @@ +--- +title: "Reforms" +--- + +A reform is a change to the rules used in a calculation. PolicyEngine supports two kinds: **parametric** (adjust a parameter value) and **structural** (swap or subclass a rule formula). + +## Parametric reforms + +A dict of parameter path → new value. The same shape works for `calculate_household` (`reform=`) and `Simulation` (`policy=`). + +```python +reform = { + "gov.irs.credits.ctc.amount.adult_dependent": 1_000, +} + +pe.us.calculate_household(..., reform=reform) +``` + +Scalar values are treated as effective January 1 of the simulation year and onward. + +### Time-varying + +```python +reform = { + "gov.irs.credits.ctc.amount.adult_dependent": { + "2026-01-01": 1_000, + "2028-01-01": 2_000, + }, +} +``` + +Dates that haven't passed become "from this date onward." Earlier dates replace the baseline schedule. + +### Multiple changes + +Any number of paths in the same dict compose into one reform: + +```python +reform = { + "gov.irs.credits.ctc.amount.adult_dependent": 1_000, + "gov.irs.credits.eitc.phase_out.rate[0]": 0.08, + "gov.states.ca.tax.income.credits.eitc.max_amount": 500, +} +``` + +### Scale and array parameters + +Scale parameters (brackets with thresholds and amounts) are addressed by bracket index: + +```python +reform = { + "gov.irs.income.tax.rate[0]": 0.12, # first bracket rate + "gov.irs.income.tax.threshold[1]": 50_000, # second bracket threshold + "gov.irs.credits.ctc.amount.base[0].amount": 3_000, +} +``` + +### Where parameters live + +Every parameter has a canonical path matching the YAML directory structure in the country model. `gov.irs.credits.ctc.amount.adult_dependent` corresponds to `policyengine_us/parameters/gov/irs/credits/ctc/amount/adult_dependent.yaml`. + +An auto-generated parameter reference is a planned addition; for now, browse the YAML tree in the country-model repository or rely on the error-message suggestions — an unknown path raises with the closest match. + +## Parametric reforms in microsimulation + +Same dict, different keyword name: + +```python +from policyengine.core import Simulation + +baseline = Simulation( + dataset=dataset, + tax_benefit_model_version=pe.us.model, +) +reformed = Simulation( + dataset=dataset, + tax_benefit_model_version=pe.us.model, + policy={"gov.irs.credits.ctc.amount.base[0].amount": 3_000}, +) +``` + +## Structural reforms + +For rule changes that can't be expressed as a parameter change — swapping a formula, adding a variable, neutralising a program — drop down to the underlying country package. Both `policyengine_us` and `policyengine_uk` expose `Reform.from_dict(...)` and class-based reforms with overridable formulas; use them directly and run the simulation via `managed_microsimulation` (or by constructing a country-package `Microsimulation` yourself). + +```python +from policyengine_us import Microsimulation +from policyengine_us.model_api import Reform + +class NeutralizeEITC(Reform): + def apply(self): + self.neutralize_variable("eitc") + +sim = Microsimulation(reform=NeutralizeEITC) +``` + +The current `policyengine.py` surface (`Simulation`, `calculate_household`) accepts parametric reforms only. + +## Validating a reform before you run it + +The parameter catalog is known at import time. Wrong paths raise before the simulation starts, with the closest-matching path. + +For time-varying reforms, effective dates are checked against the parameter's defined start and end. A date before the parameter existed raises. + +## Worked examples + +- [Impact analysis](impact-analysis.md) — baseline-vs-reform with population estimates +- [Examples](examples.md) — runnable scripts for reform scenarios in `examples/` diff --git a/docs/regions-and-scoping.md b/docs/regions-and-scoping.md deleted file mode 100644 index 01914889..00000000 --- a/docs/regions-and-scoping.md +++ /dev/null @@ -1,228 +0,0 @@ -# Regions and scoping - -The package supports sub-national analysis through a geographic region system. Regions can scope simulations to states, constituencies, congressional districts, local authorities, and cities. - -## Region system - -### Region - -A `Region` represents a geographic area with a unique prefixed code: - -| Region type | Code format | Examples | -|---|---|---| -| National | `us`, `uk` | `us`, `uk` | -| State | `state/{code}` | `state/ca`, `state/ny` | -| Congressional district | `congressional_district/{ST-DD}` | `congressional_district/CA-01` | -| Place/city | `place/{ST-FIPS}` | `place/NJ-57000` | -| UK country | `country/{name}` | `country/england` | -| Constituency | `constituency/{name}` | `constituency/Sheffield Central` | -| Local authority | `local_authority/{code}` | `local_authority/E09000001` | - -### RegionRegistry - -Each model version has a `RegionRegistry` providing O(1) lookups: - -```python -import policyengine as pe - -registry = pe.us.model.region_registry - -# Look up by code -california = registry.get("state/ca") -print(f"{california.label}: {california.region_type}") - -# Get all regions of a type -states = registry.get_by_type("state") -print(f"{len(states)} states") - -districts = registry.get_by_type("congressional_district") -print(f"{len(districts)} congressional districts") - -# Get children of a region -ca_districts = registry.get_children("state/ca") -``` - -```python -import policyengine as pe - -registry = pe.uk.model.region_registry - -# UK countries -countries = registry.get_by_type("country") -for c in countries: - print(f"{c.code}: {c.label}") -``` - -### Region counts - -**US:** 1 national + 51 states (inc. DC) + 436 congressional districts + 333 census places = 821 regions - -**UK:** 1 national + 4 countries. Constituencies and local authorities are available via extended registry builders. - -## Scoping strategies - -Scoping strategies control how a national dataset is narrowed to represent a sub-national region. They are applied during `Simulation.run()`, before the microsimulation calculation. - -### RowFilterStrategy - -Filters dataset rows where a household-level variable matches a specific value. Used for UK countries and US places/cities. - -```python -from policyengine.core import Simulation -from policyengine.core.scoping_strategy import RowFilterStrategy - -# Simulate only California households -simulation = Simulation( - dataset=dataset, - tax_benefit_model_version=pe.us.model, - scoping_strategy=RowFilterStrategy( - variable_name="state_fips", - variable_value=6, # California FIPS code - ), -) -simulation.run() -``` - -This removes all non-California households from the dataset before running the simulation. The remaining household weights still reflect California's population. - -```python -# UK: simulate only England -simulation = Simulation( - dataset=dataset, - tax_benefit_model_version=pe.uk.model, - scoping_strategy=RowFilterStrategy( - variable_name="country", - variable_value="ENGLAND", - ), -) -``` - -### WeightReplacementStrategy - -Replaces household weights from a pre-computed weight matrix stored in Google Cloud Storage. Used for UK constituencies and local authorities, where the weight matrix (shape: N_regions x N_households) reweights all households to represent each region's demographics. - -```python -from policyengine.core.scoping_strategy import WeightReplacementStrategy - -simulation = Simulation( - dataset=dataset, - tax_benefit_model_version=pe.uk.model, - scoping_strategy=WeightReplacementStrategy( - weight_matrix_bucket="policyengine-uk-data", - weight_matrix_key="parliamentary_constituency_weights.h5", - lookup_csv_bucket="policyengine-uk-data", - lookup_csv_key="constituencies_2024.csv", - region_code="Sheffield Central", - ), -) -``` - -Unlike row filtering, weight replacement keeps all households but assigns region-specific weights. This is more statistically robust for small geographic areas where filtering would leave too few households. - -## Geographic impact outputs - -The package provides output types that compute per-region metrics across all regions simultaneously. - -### CongressionalDistrictImpact (US) - -Groups households by `congressional_district_geoid` and computes weighted average and relative income changes per district. - -```python -from policyengine.outputs.congressional_district_impact import ( - compute_us_congressional_district_impacts, -) - -baseline_sim.run() -reform_sim.run() - -impact = compute_us_congressional_district_impacts(baseline_sim, reform_sim) - -for d in impact.district_results: - print(f"District {d['state_fips']:02d}-{d['district_number']:02d}: " - f"avg change=${d['average_household_income_change']:+,.0f}, " - f"relative={d['relative_household_income_change']:+.2%}") -``` - -**Result fields per district:** -- `district_geoid`: Integer SSDD (state FIPS * 100 + district number) -- `state_fips`: State FIPS code -- `district_number`: District number within state -- `average_household_income_change`: Weighted mean change -- `relative_household_income_change`: Weighted relative change -- `population`: Weighted household count - -### ConstituencyImpact (UK) - -Uses pre-computed weight matrices (650 x N_households) to compute per-constituency income changes without filtering. - -```python -from policyengine.outputs.constituency_impact import ( - compute_uk_constituency_impacts, -) - -impact = compute_uk_constituency_impacts( - baseline_simulation=baseline_sim, - reform_simulation=reform_sim, - weight_matrix_path="parliamentary_constituency_weights.h5", - constituency_csv_path="constituencies_2024.csv", - year="2025", -) - -for c in impact.constituency_results: - print(f"{c['constituency_name']}: " - f"avg change={c['average_household_income_change']:+,.0f}") -``` - -**Result fields per constituency:** -- `constituency_code`, `constituency_name`: Identifiers -- `x`, `y`: Hex map coordinates -- `average_household_income_change`, `relative_household_income_change` -- `population`: Weighted household count - -### LocalAuthorityImpact (UK) - -Works identically to `ConstituencyImpact` but for local authorities (360 x N_households weight matrix). - -```python -from policyengine.outputs.local_authority_impact import ( - compute_uk_local_authority_impacts, -) - -impact = compute_uk_local_authority_impacts( - baseline_simulation=baseline_sim, - reform_simulation=reform_sim, - weight_matrix_path="local_authority_weights.h5", - local_authority_csv_path="local_authorities_2024.csv", - year="2025", -) -``` - -## Using regions with `economic_impact_analysis()` - -Scoping strategies compose naturally with the full analysis pipeline: - -```python -from policyengine.core.scoping_strategy import RowFilterStrategy - -# State-level analysis -baseline_sim = Simulation( - dataset=dataset, - tax_benefit_model_version=pe.us.model, - scoping_strategy=RowFilterStrategy( - variable_name="state_fips", - variable_value=6, # California FIPS code - ), -) -reform_sim = Simulation( - dataset=dataset, - tax_benefit_model_version=pe.us.model, - policy=reform, - scoping_strategy=RowFilterStrategy( - variable_name="state_fips", - variable_value=6, # California FIPS code - ), -) - -# Full analysis scoped to California -analysis = economic_impact_analysis(baseline_sim, reform_sim) -``` diff --git a/docs/regions.md b/docs/regions.md new file mode 100644 index 00000000..9aa30e3b --- /dev/null +++ b/docs/regions.md @@ -0,0 +1,147 @@ +--- +title: "Regional analysis" +--- + +Sub-national breakdowns: state / district filters on any output, plus dedicated classes for US congressional districts and UK constituencies / local authorities. + +## US states + +`state_code` is an Enum variable on every household (values `"CA"`, `"TX"`, ...). Pass it as a filter on any `Aggregate` or `ChangeAggregate`: + +```python +from policyengine.outputs import Aggregate, AggregateType + +ca_snap = Aggregate( + simulation=baseline, + variable="snap", + aggregate_type=AggregateType.SUM, + filter_variable="state_code", + filter_variable_eq="CA", +) +ca_snap.run() +``` + +Each state is a region in the US registry, with its own dataset: + +```python +states = pe.us.model.region_registry.get_by_type("state") +for region in states: + print(region.code, region.label, region.dataset_path) +``` + +For state-specific datasets (rather than filtering a national one), pass `scoping_strategy=region.scoping_strategy` or resolve the dataset path directly. + +## US congressional districts + +```python +from policyengine.outputs import compute_us_congressional_district_impacts + +impacts = compute_us_congressional_district_impacts( + baseline_simulation=baseline, + reform_simulation=reform, +) +for row in impacts.district_results: + print(row["district_geoid"], row["avg_change"], row["winner_percentage"]) +``` + +`district_geoid` is the SSDD integer (state FIPS × 100 + district number). Requires a dataset with `congressional_district_geoid` populated — the default enhanced CPS does. + +## UK parliamentary constituencies + +Constituency-level impacts reweight every household to each constituency's demographic profile using a pre-computed weight matrix, so both the weight file and a constituency metadata CSV are required inputs: + +```python +from policyengine.outputs import compute_uk_constituency_impacts + +impacts = compute_uk_constituency_impacts( + baseline_simulation=baseline, + reform_simulation=reform, + weight_matrix_path="parliamentary_constituency_weights.h5", + constituency_csv_path="constituencies_2024.csv", + year="2025", +) +impacts.constituency_results +``` + +## UK local authorities + +```python +from policyengine.outputs import compute_uk_local_authority_impacts + +impacts = compute_uk_local_authority_impacts( + baseline_simulation=baseline, + reform_simulation=reform, + weight_matrix_path="local_authority_weights.h5", + local_authority_csv_path="local_authorities_2021.csv", + year="2025", +) +impacts.local_authority_results +``` + +## Region registries + +`pe.us.model.region_registry` and `pe.uk.model.region_registry` enumerate supported sub-national units: + +```python +pe.us.model.region_registry.get_by_type("state") +pe.us.model.region_registry.get_by_type("congressional_district") + +pe.uk.model.region_registry.get_by_type("constituency") +pe.uk.model.region_registry.get_by_type("local_authority") +``` + +Other helpers: `.get(code)` resolves a single region, `.get_children(parent_code)` returns sub-regions, `.get_national()` returns the national region. + +## Custom geographies + +For a geography not covered by the built-in classes, compute the underlying variables via `Simulation.run()` and group yourself: + +```python +import pandas as pd + +baseline.run() +reform.run() + +baseline_hh = baseline.output_dataset.data.household +reform_hh = reform.output_dataset.data.household + +df = pd.DataFrame({ + "baseline": baseline_hh["household_net_income"].values, + "reform": reform_hh["household_net_income"].values, + "geo": baseline_hh["custom_geography_id"].values, + "weight": baseline_hh["household_weight"].values, +}) + +df["change"] = df["reform"] - df["baseline"] +df.groupby("geo").apply(lambda g: (g["change"] * g["weight"]).sum() / g["weight"].sum()) +``` + +## Scoping datasets to a region + +For reforms defined only over a sub-national slice, pass a scoping strategy to `Simulation`. `RowFilterStrategy` keeps only matching households; `WeightReplacementStrategy` reweights the full sample to represent the region. + +```python +from policyengine.core.scoping_strategy import RowFilterStrategy + +baseline = Simulation( + dataset=dataset, + tax_benefit_model_version=pe.us.model, + scoping_strategy=RowFilterStrategy( + variable_name="state_code", + variable_value="CA", + ), +) +``` + +Regions that filter (US places, UK countries, and any region with `region.requires_filter == True`) carry their own `scoping_strategy`. Pull it off the region object rather than reconstructing it: + +```python +nyc = pe.us.model.region_registry.get("place/NY-51000") +baseline = Simulation( + dataset=dataset, + tax_benefit_model_version=pe.us.model, + scoping_strategy=nyc.scoping_strategy, +) +``` + +US states and congressional districts don't use a scoping strategy — they point to dedicated state- or district-specific datasets via `region.dataset_path`. Pass that dataset to `Simulation` instead.