Skip to content

Declarative country spec schema with Belgium as its first consumer#269

Merged
MaxGhenis merged 3 commits into
mainfrom
country-spec-schema
Jul 2, 2026
Merged

Declarative country spec schema with Belgium as its first consumer#269
MaxGhenis merged 3 commits into
mainfrom
country-spec-schema

Conversation

@MaxGhenis

Copy link
Copy Markdown
Contributor

Fixes #261. Part of #259 (populace-be epic); the concrete first step of #159/#160, built greenfield for Belgium on the conventions the repo already enforces (country_package.json, spec-only country packages, manifest-defined stages, Ledger references).

What

populace.build.country_spec — one loader for a spec-only country package, returning a validated, content-hashed CountrySpec:

  • Existing manifests load unchanged (source_stages.jsonSourceManifest, support_spine.jsonSupportSpineManifest); the US and UK packages load through the same loader (tested).
  • Three new typed resources, each a shared schema class with the repo's from_mapping validation style:
    • geography_spine.json (GeographySpineManifest) — clone-and-assign spine as data: geography level, code system and vintage, pool multiplier, collision avoidance, assignment source citation. vintage_policy admits only "error" — a commune-grain target bound to one code vintage joined against a spine of another must fail compilation, never silently partial-join (the Translate old-vintage CD targets to current district geography #205 lesson, schema-enforced).
    • gates.json (GatesManifest) — gate selection from the gates.py vocabulary with release_blocking/diagnostic criticality per selection; unknown gate functions are refused, and a package whose every gate is diagnostic is refused (no release contract).
    • release_contract.json (ReleaseContractManifest) — artifact repo/staging repo, dataset filename template, required release files, public/private boundary, licence. restricted licence ⇒ private repo is enforced at parse, and ordinal version tokens (-v1, _v2) anywhere in names are refused (data_build_id + HF revisions are the versioning).
  • target_references.json rows parse as LedgerTargetReference (the exact class the US fiscal references use), and any row carrying an observed-value key (value, values, observed…) is refused with "values live in Ledger" — targets are references, structurally.
  • country_stage_plan(spec, implementations) compiles source stages + the spine stage into a StagePlan with the US plan's no-fallback posture: every declared stage needs an implementation, unknown implementations are refused, and each stage carries its manifest citation as the donor record.
  • Content addressing: every resource is sha256-hashed at load; CountrySpec.fingerprint composes them (trace.compute_composition_fingerprint), so a release manifest can name exactly the spec content that built it.

be/ country package — the first full consumer, pure data (zero .py, enforced by the existing test_spec_only_country_packages.py which now covers it automatically):

  • source_stages.json — the BE-SILC stage against real EU-SILC register names (DB030/RB030 ids, DB040 region, DB090 weights, PY010G → the Axiom article-23 worker-remuneration input), with the survey-year/income-reference-year lag declared as a first-class operation (declare_income_reference_offset: -1) so the build's period bookkeeping and the target profiles bind to the same basis (feeds ledger#71).
  • geography_spine.json — commune NIS spine, 2025 vintage, 20× clone pool, NUTS1-constrained assignment (SILC carries region; the spine assigns the commune), collision avoidance; generalizes the UK OA rowwise operators (implementation lands with Build the Belgian calibration surface by recalibrating the US release recipe #263).
  • target_references.json — six initial families by reference (Statbel demography, Statbel commune fiscal income @ NIS 2025 diagnostic-only, SPF PIT total against the Axiom settlement output, ONSS contributions, ONEM caseloads, NBB validation-tier anchor), source names bound to the planned ledger-be packages (ledger#69/Investigate ACA take-up and plan-choice inputs driving high PTC estimates #70 own the facts and the full profile).
  • gates.json — release-blocking national+NUTS1 posture with commune rows diagnostic (the Replace all remaining policyengine-us-data artifacts with Populace #204 CD posture); the incumbent-comparison gates (parity/export_surface/target_surface) are deliberately not selected — Belgium has no incumbent; external oracles replace self-parity (Validate populace-be against EUROMOD-BE and Federal Planning Bureau scores #264).
  • release_contract.jsonpolicyengine/populace-be-private + staging repo, country-neutral source_coverage.json naming (per Stand up the populace-be release channel #265), reform_validation.json in the required set, restricted-licence boundary.

Two general operation kinds join the source-manifest vocabulary: map_columns and declare_income_reference_offset (general operators, not country escape hatches).

Acceptance, as verified

  • packages/populace-build/src/populace/build/be/ contains zero .py — and the standing spec-only tests now enforce that for be/ with no test changes.
  • Golden-file test: the loaded spec — stage order, gate ids, release contract, and the sha256 of every resource — is byte-compared against tests/golden/be_country_spec.json; editing any BE spec byte fails the test until the golden is regenerated and reviewed. Deterministic across loads (fingerprint-stability test).
  • The plan compiler refuses to assemble on any missing or unknown stage implementation (tests mirror test_us_plan).
  • Refusal suite: undeclared files on disk, missing declared resources, country mismatches, unknown gate functions, all-diagnostic gate sets, smuggled target values, restricted-licence-public-repo, ordinal tokens, non-error vintage policy — each named in its error message.
  • Third-country story: a new country is mkdir + five JSON files copied and edited; the loader, plan compiler, hashes, and enforcement tests need no changes (the US/UK-load tests are the proof).

Full populace-build suite + frame + data pass locally; ruff clean.

🤖 Generated with Claude Code

MaxGhenis and others added 2 commits July 2, 2026 01:15
…umer

populace.build.country_spec loads a spec-only country package as one
validated, content-hashed object: the existing source/support-spine
manifests plus three new typed resources — geography_spine.json
(clone-and-assign, vintage-aware geography codes; mismatch is a compile
error per the populace#205 lesson), gates.json (gate selection from the
gates.py vocabulary with release_blocking/diagnostic criticality), and
release_contract.json (repos, artifact set, data_build_id naming with
ordinal tokens refused, licence boundary with restricted => private
enforced). target_references.json rows parse as LedgerTargetReference
and any observed-value key is refused — values live in Ledger.
country_stage_plan() compiles the manifests into a StagePlan with the
US plan's no-fallback posture, and every resource byte is pinned by a
sha256 in the golden-file test.

The be/ package is the first full consumer: BE-SILC source stage (real
EU-SILC register names, income-reference-year lag declared as data),
commune NIS spine (2025 vintage, NUTS1-constrained assignment,
collision avoidance), six initial target-reference families bound to
the planned ledger-be packages, a gate selection that deliberately
omits the incumbent-comparison gates (external oracles replace
self-parity), and the private-repo release contract. Zero Python in
the country folder; the US and UK packages load through the same
loader unchanged.

Two general operation kinds join the source-manifest vocabulary:
map_columns and declare_income_reference_offset.

Fixes #261. Part of #259.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…, recursive value guard

- Replace belgium_pit_final_income_tax_payable (does not exist in
  rulespec-be) with belgium_pit_federal_and_local_tax_before_withholding,
  the actual assessed-total liability output (Article 134 federal tax
  plus Articles 465-468 local additions), matching the SPF Finances
  "assessed total" series the target anchors.
- Tighten the ordinal-token guard: lookahead instead of \b so embedded
  tokens (populace_xx_v2_staging) are refused, while sha-v2x and nuts1
  stay benign. Loader-level tests for both sides.
- Recurse the observed-value guard: a value key nested under
  ledger_selector or metadata is refused like a top-level one.
- Regenerate the Belgian golden (fingerprint acc6e4b7bcbe).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@MaxGhenis

Copy link
Copy Markdown
Contributor Author

Adversarial re-review (re-run of the review that died at last night's session limit)

Reviewed at 0ffbb7f against the 10-probe spec (full diff read, protocol/schema semantics, EU-SILC and rulespec-be name verification, schema-hole hunt, suite + ruff + US/UK load checks). Verdict: merge-clean after fixes — one factual defect and two guards weaker than described, all fixed in 1c290a9.

Findings → fixes

  1. Fabricated engine variable (fixed). be/target_references.json declared "measure": "belgium_pit_final_income_tax_payable" — that variable does not exist in rulespec-be (verified at 7f201e0: 0 matches). When ledger#70 wires the profile, the release-blocking PIT anchor would reference a non-existent column and fail downstream while CI here stays green. Replaced with belgium_pit_federal_and_local_tax_before_withholding (be/statutes/income_tax/individual/final_tax.yaml:37 — Article 134 federal tax + Articles 465–468 local additions), which matches the "Personal income tax: assessed total" series the row anchors. The other three engine names in the spec are real (verified: belgium_pit_article_23_worker_remuneration, belgium_pit_taxable_income, belgium_worker_article_17_uncapped_component_contribution).
  2. Ordinal-token guard missed embedded tokens (fixed). [-_]v\d+\b fails when the token is followed by _ or a letter, so populace_be_v2_staging and artifact_v2_repo evaded the refusal — the most natural place for an ordinal in _-separated names. Now [-_]v\d+(?=[-_.]|$): embedded tokens refused, sha-v2x/nuts1/dates stay benign. Loader-level tests added for both directions.
  3. No-value guard was top-level only (fixed). {"ledger_selector": {..., "value": 99999.0}} loaded without refusal (verified by execution). Mitigated in practice — calibration values come exclusively from the resolved Ledger fact, so a smuggled value was inert — but the guard now recurses so the structural claim is true. Test added.

Noted, no action needed

  • receives_unemployment_benefit / household_disposable_income measures are declared-future (not yet in rulespec-be); the row notes and file header already hedge this.
  • country_package.json's schema_version is never read by the loader (resource-level version is what's validated) — vestigial.
  • importlib.resourcesPath(str(...)) would break under zip imports; nothing in this repo zip-imports.
  • .DS_Store fails the undeclared-file check, consistent with the existing spec-only test posture.
  • country_stage_plan sets consumes=() so the spine's dependency on region_nuts1 isn't a plan edge — acceptable while transforms are injected later.

Clean probes

Gate allowlist is an exact match with gates.py __all__ (14/14). All seven EU-SILC names verified correct (DB030/RB030/DB040/DB090/RB080/RB090/PY010G). Golden was not stale and is regenerated (acc6e4b7bcbe). Source-manifest diff is exactly the two claimed operation kinds, both refusal-safe without handlers. Suite: 650 passed, 6 skipped post-fix; ruff clean; US (024315780c1f) and UK (a2d7e32fd348) load unchanged.

🤖 Generated with Claude Code

@MaxGhenis MaxGhenis merged commit de00cae into main Jul 2, 2026
4 checks passed
@MaxGhenis MaxGhenis deleted the country-spec-schema branch July 2, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Define the declarative country spec schema with Belgium as its first consumer

1 participant