Skip to content

Add policyengine.graph and reference-generator prototype#306

Merged
MaxGhenis merged 2 commits intomainfrom
variable-graph-and-reference-generator
Apr 20, 2026
Merged

Add policyengine.graph and reference-generator prototype#306
MaxGhenis merged 2 commits intomainfrom
variable-graph-and-reference-generator

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Part 2 of 3 (A: toolchain swap → B: graph + generator → C: docs rewrite) per PR #301's review feedback.

Scope

  1. policyengine.graph subpackage — static AST-based variable dependency extractor for PolicyEngine source trees.
  2. docs/_generator/build_reference.py — reference page generator using country-model metadata.
  3. New [graph] optional extra so networkx doesn't bloat base install.

policyengine.graph

Pure static analysis — walks a directory of .py files, picks out class Foo(Variable): definitions, and extracts edges from formula-method bodies. Recognized patterns:

  • <entity>("<var>", <period>) — direct calls on person, tax_unit, spm_unit, household, family, marital_unit, benunit.
  • add(<entity>, <period>, ["v1", "v2", ...]) — sum-helper list.

Because it never imports user code, it works on any PolicyEngine source tree regardless of whether the jurisdiction is installed. Useful for refactor-impact analysis, CI pre-merge checks, docs generation, and agent-session introspection (where the country packages may not be importable in the sandbox).

Limitations (v2 targets)

  • Parameter references not captured.
  • Dynamic variable names (f-strings, etc.) skipped.
  • entity.sum("var") method calls not recognized.

Reference generator

docs/_generator/build_reference.py introspects a country model's TaxBenefitSystem and writes one .qmd page per variable — metadata (entity, value type, unit, period, defined_for), documentation, adds / subtracts decomposition, statutory references, source file path.

Also emits a program-coverage page from programs.yaml. Quarto's built-in directory listings handle the per-subtree index pages automatically.

Against a CHIP subset of policyengine-us: 34 variable pages + 1 programs page + 56 directory indices, under a second to generate, Quarto compiles all of them cleanly.

Optional extra

pip install policyengine[graph]

networkx is only imported when the user explicitly imports policyengine.graph. Missing networkx raises a clear ImportError pointing at the install command.

Testing

  • 9/9 tests in tests/test_graph/test_extractor.py pass locally
  • Tests use synthetic source-tree fixtures; no dependency on a live country model
  • Generator manually verified end-to-end against policyengine-us

Dependency on PR A

The generator writes .qmd files, so it assumes Quarto is the docs toolchain (PR #304). No technical blocker — the .qmd extension is just a label — but reviewing order is cleaner A → B.

Test plan

  • Graph unit tests pass
  • Generator runs against policyengine-us end-to-end
  • pip install policyengine[graph] pulls networkx; base install without the extra imports policyengine successfully (graph only errors at from policyengine.graph import ...)
  • CI passes

Two related additions behind one new optional extra.

### policyengine.graph

New subpackage for querying PolicyEngine Variable dependency structure
by AST-walking source trees. No runtime dependency on country models —
the extractor is pure static analysis, so it works on any
`policyengine-us` / `policyengine-uk` checkout (or fork) regardless of
whether the jurisdiction is installed. Particularly useful in agent
sessions where the country packages may not be importable in the
sandbox.

Recognized reference patterns in v1:

- `<entity>("<var>", <period>)` calls on entity Names
  (`person`, `tax_unit`, `spm_unit`, `household`, `family`,
  `marital_unit`, `benunit`).
- `add(<entity>, <period>, ["v1", "v2", ...])` sum-helper list.

Limitations noted in module docstrings:

- Parameter references not yet captured (v2).
- Dynamic variable names skipped (low prevalence).
- `entity.sum("var")` method calls not yet recognized (v2).

### Reference generator prototype

`docs/_generator/build_reference.py` walks a country model's
`TaxBenefitSystem` and writes one `.qmd` page per variable grouped by
its parameter-tree path. Also emits a program-coverage page from
`programs.yaml`. The generator reads everything from the imported
country model — no web API calls, no cached JSON — which keeps the
build offline-reproducible and pinned to whatever country model
version the `policyengine` package has installed.

Run against a CHIP subset of `policyengine-us`, the generator emits
34 variable pages + 1 programs page + 56 directory indices in under
a second; Quarto compiles all of them cleanly.

### Optional extra

`pip install policyengine[graph]` pulls in networkx; base install
stays lean. `policyengine.graph.graph` raises an informative
`ImportError` when networkx is missing, pointing at the extra.

### Testing

9/9 graph extractor tests pass (`tests/test_graph/`). Tests use
synthetic source-tree fixtures; no dependency on a live country model.
@MaxGhenis MaxGhenis merged commit 9310339 into main Apr 20, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant