PolicyEngine · anth-volk · Apr 18, 2026
diff --git a/.github/review/global.md b/.github/review/global.md
@@ -0,0 +1,21 @@
+# PR Review Instructions
+
+Primary goal: identify bugs, regressions, missing tests, contract drift, scope drift, and hidden operational risk.
+
+Review rules:
+- Findings first. Do not lead with summary or praise.
+- Prioritize behavior, correctness, migration boundaries, and release risk over style.
+- Ignore purely cosmetic issues unless they hide a behavioral problem.
+- Distinguish direct evidence from inference.
+- Be explicit about blind spots such as unrun tests, missing optional dependencies, or unclear runtime context.
+
+Required structure:
+- `Severity`: `high`, `medium`, or `low`
+- `Confidence`: `high`, `medium`, or `low`
+- `Basis`: `direct_code_evidence`, `test_evidence`, `inference`, or `missing_context`
+- `Why it matters`
+- `Suggested fix`
+
+Use `.github/review/segments/priority_and_confidence.md` for the detailed severity and confidence rubric.
+
+If there are no findings, say so explicitly and still list residual risks or blind spots.
diff --git a/.github/review/segments/general.md b/.github/review/segments/general.md
@@ -0,0 +1,11 @@
+# General Review Segment
+
+Check for:
+- obvious bugs and behavioral regressions
+- changed control flow that no longer matches caller expectations
+- signature drift between callers and callees
+- data path mistakes, especially path handling, identifiers, and selection logic
+- missing or misleading validation
+- missing unit coverage for newly introduced logic
+
+Bias toward concise, actionable findings. Do not manufacture issues to fill space.
diff --git a/.github/review/segments/priority_and_confidence.md b/.github/review/segments/priority_and_confidence.md
@@ -0,0 +1,36 @@
+# Priority and Confidence Segment
+
+Use this segment to classify both finding priority and confidence level consistently.
+
+## Priority rubric
+
+Classify each finding as `high`, `medium`, or `low`.
+
+- `high`
+  A likely merge blocker. The issue can cause incorrect behavior, runtime failure, broken contracts, artifact corruption, publication mistakes, or materially misleading output.
+- `medium`
+  Important, but not always a blocker. The issue can plausibly cause regressions, maintenance traps, incomplete migrations, or missing coverage around meaningful new behavior.
+- `low`
+  Real but limited impact. The issue is worth fixing, but it is unlikely to cause immediate user-facing failure or operational damage.
+
+If a concern is merely stylistic or speculative, do not promote it into a finding.
+
+## Confidence rubric
+
+Classify each finding as `high`, `medium`, or `low`.
+
+- `high`
+  Directly supported by code in the diff, surrounding code, or executed tests.
+- `medium`
+  Strong inference from the code path, but not fully validated by execution or complete context.
+- `low`
+  Plausible concern, but evidence is incomplete or significant context is missing.
+
+Also state the basis for the finding:
+
+- `direct_code_evidence`
+- `test_evidence`
+- `inference`
+- `missing_context`
+
+When confidence is not `high`, briefly say what is missing.
diff --git a/.github/review/segments/staged_prs.md b/.github/review/segments/staged_prs.md
@@ -0,0 +1,17 @@
+# Staged PR Review Segment
+
+This repository often uses staged migration PRs with narrow scope limits.
+
+Check for:
+- scope drift beyond the intended phase
+- contract breaks across staged seams
+- compatibility regressions in dual-path or legacy-adapter code
+- accidental schema or artifact format changes
+- conflicting implementations that should have one clear owner
+- code landing in the wrong layer, such as orchestration absorbing domain logic
+
+Call out whether each finding is:
+- a true merge blocker
+- a follow-up that can wait
+
+If the PR looks intentionally transitional, say so, but still flag broken boundaries.
diff --git a/.github/review/segments/testing.md b/.github/review/segments/testing.md
@@ -0,0 +1,12 @@
+# Testing Review Segment
+
+Check whether the PR adds focused tests for the new behavior it introduces.
+
+Look for:
+- direct unit coverage for newly added branch logic
+- overreliance on broad integration tests when a narrow unit test would be clearer
+- tests that are brittle because they depend on ambient environment state
+- module-reload or monkeypatch patterns that can poison the rest of the suite
+- new code paths with no test exercising them
+
+If coverage is partial, say which production files or behaviors remain uncovered.
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,32 @@
+# Codex Instructions
+
+These instructions apply repository-wide.
+
+## PR review workflow
+
+When the task is a pull request review:
+
+1. Read `.github/review/global.md`.
+2. Always read:
+   - `.github/review/segments/general.md`
+   - `.github/review/segments/priority_and_confidence.md`
+3. Inspect the changed files and selectively read these additional segments:
+   - `.github/review/segments/staged_prs.md`
+     Use when the PR touches staged-migration areas such as `modal_app/local_area.py`, `modal_app/worker_script.py`, `modal_app/pipeline.py`, `policyengine_us_data/calibration/local_h5/`, or `policyengine_us_data/calibration/validate_staging.py`.
+   - `.github/review/segments/testing.md`
+     Use when the PR changes production code or tests.
+4. Prioritize bugs, regressions, contract drift, scope drift, and missing tests.
+5. Present findings first.
+6. For every finding, include:
+   - severity
+   - confidence
+   - basis
+   - why it matters
+   - suggested fix
+7. If there are no findings, say so explicitly and still mention blind spots.
+
+## General engineering expectations
+
+- Prefer direct evidence over speculation.
+- Flag missing execution context when confidence is limited.
+- Focus on behavior and operational risk before style.
diff --git a/changelog.d/796.added.md b/changelog.d/796.added.md
@@ -0,0 +1 @@
+Add repo-native Codex PR review instruction files for experimental pull request review guidance.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		Add repo-native Codex PR review instruction files for experimental pull request review guidance.