From 9595d2360cdf6f14a788f429d808e721a65ad0aa Mon Sep 17 00:00:00 2001
From: Anthony Volk <anth.volk@gmail.com>
Date: Sat, 18 Apr 2026 03:01:35 +0200
Subject: [PATCH] Add Codex PR review guidance docs

---
 .github/review/global.md                      | 21 +++++++++++
 .github/review/segments/general.md            | 11 ++++++
 .../segments/priority_and_confidence.md       | 36 +++++++++++++++++++
 .github/review/segments/staged_prs.md         | 17 +++++++++
 .github/review/segments/testing.md            | 12 +++++++
 AGENTS.md                                     | 32 +++++++++++++++++
 changelog.d/796.added.md                      |  1 +
 7 files changed, 130 insertions(+)
 create mode 100644 .github/review/global.md
 create mode 100644 .github/review/segments/general.md
 create mode 100644 .github/review/segments/priority_and_confidence.md
 create mode 100644 .github/review/segments/staged_prs.md
 create mode 100644 .github/review/segments/testing.md
 create mode 100644 AGENTS.md
 create mode 100644 changelog.d/796.added.md

diff --git a/.github/review/global.md b/.github/review/global.md
new file mode 100644
index 000000000..a653beb48
--- /dev/null
+++ b/.github/review/global.md
@@ -0,0 +1,21 @@
+# PR Review Instructions
+
+Primary goal: identify bugs, regressions, missing tests, contract drift, scope drift, and hidden operational risk.
+
+Review rules:
+- Findings first. Do not lead with summary or praise.
+- Prioritize behavior, correctness, migration boundaries, and release risk over style.
+- Ignore purely cosmetic issues unless they hide a behavioral problem.
+- Distinguish direct evidence from inference.
+- Be explicit about blind spots such as unrun tests, missing optional dependencies, or unclear runtime context.
+
+Required structure:
+- `Severity`: `high`, `medium`, or `low`
+- `Confidence`: `high`, `medium`, or `low`
+- `Basis`: `direct_code_evidence`, `test_evidence`, `inference`, or `missing_context`
+- `Why it matters`
+- `Suggested fix`
+
+Use `.github/review/segments/priority_and_confidence.md` for the detailed severity and confidence rubric.
+
+If there are no findings, say so explicitly and still list residual risks or blind spots.
diff --git a/.github/review/segments/general.md b/.github/review/segments/general.md
new file mode 100644
index 000000000..1499c8ad4
--- /dev/null
+++ b/.github/review/segments/general.md
@@ -0,0 +1,11 @@
+# General Review Segment
+
+Check for:
+- obvious bugs and behavioral regressions
+- changed control flow that no longer matches caller expectations
+- signature drift between callers and callees
+- data path mistakes, especially path handling, identifiers, and selection logic
+- missing or misleading validation
+- missing unit coverage for newly introduced logic
+
+Bias toward concise, actionable findings. Do not manufacture issues to fill space.
diff --git a/.github/review/segments/priority_and_confidence.md b/.github/review/segments/priority_and_confidence.md
new file mode 100644
index 000000000..1530c4704
--- /dev/null
+++ b/.github/review/segments/priority_and_confidence.md
@@ -0,0 +1,36 @@
+# Priority and Confidence Segment
+
+Use this segment to classify both finding priority and confidence level consistently.
+
+## Priority rubric
+
+Classify each finding as `high`, `medium`, or `low`.
+
+- `high`
+  A likely merge blocker. The issue can cause incorrect behavior, runtime failure, broken contracts, artifact corruption, publication mistakes, or materially misleading output.
+- `medium`
+  Important, but not always a blocker. The issue can plausibly cause regressions, maintenance traps, incomplete migrations, or missing coverage around meaningful new behavior.
+- `low`
+  Real but limited impact. The issue is worth fixing, but it is unlikely to cause immediate user-facing failure or operational damage.
+
+If a concern is merely stylistic or speculative, do not promote it into a finding.
+
+## Confidence rubric
+
+Classify each finding as `high`, `medium`, or `low`.
+
+- `high`
+  Directly supported by code in the diff, surrounding code, or executed tests.
+- `medium`
+  Strong inference from the code path, but not fully validated by execution or complete context.
+- `low`
+  Plausible concern, but evidence is incomplete or significant context is missing.
+
+Also state the basis for the finding:
+
+- `direct_code_evidence`
+- `test_evidence`
+- `inference`
+- `missing_context`
+
+When confidence is not `high`, briefly say what is missing.
diff --git a/.github/review/segments/staged_prs.md b/.github/review/segments/staged_prs.md
new file mode 100644
index 000000000..ed7717f5b
--- /dev/null
+++ b/.github/review/segments/staged_prs.md
@@ -0,0 +1,17 @@
+# Staged PR Review Segment
+
+This repository often uses staged migration PRs with narrow scope limits.
+
+Check for:
+- scope drift beyond the intended phase
+- contract breaks across staged seams
+- compatibility regressions in dual-path or legacy-adapter code
+- accidental schema or artifact format changes
+- conflicting implementations that should have one clear owner
+- code landing in the wrong layer, such as orchestration absorbing domain logic
+
+Call out whether each finding is:
+- a true merge blocker
+- a follow-up that can wait
+
+If the PR looks intentionally transitional, say so, but still flag broken boundaries.
diff --git a/.github/review/segments/testing.md b/.github/review/segments/testing.md
new file mode 100644
index 000000000..43d62a372
--- /dev/null
+++ b/.github/review/segments/testing.md
@@ -0,0 +1,12 @@
+# Testing Review Segment
+
+Check whether the PR adds focused tests for the new behavior it introduces.
+
+Look for:
+- direct unit coverage for newly added branch logic
+- overreliance on broad integration tests when a narrow unit test would be clearer
+- tests that are brittle because they depend on ambient environment state
+- module-reload or monkeypatch patterns that can poison the rest of the suite
+- new code paths with no test exercising them
+
+If coverage is partial, say which production files or behaviors remain uncovered.
diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 000000000..ff86b0267
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,32 @@
+# Codex Instructions
+
+These instructions apply repository-wide.
+
+## PR review workflow
+
+When the task is a pull request review:
+
+1. Read `.github/review/global.md`.
+2. Always read:
+   - `.github/review/segments/general.md`
+   - `.github/review/segments/priority_and_confidence.md`
+3. Inspect the changed files and selectively read these additional segments:
+   - `.github/review/segments/staged_prs.md`
+     Use when the PR touches staged-migration areas such as `modal_app/local_area.py`, `modal_app/worker_script.py`, `modal_app/pipeline.py`, `policyengine_us_data/calibration/local_h5/`, or `policyengine_us_data/calibration/validate_staging.py`.
+   - `.github/review/segments/testing.md`
+     Use when the PR changes production code or tests.
+4. Prioritize bugs, regressions, contract drift, scope drift, and missing tests.
+5. Present findings first.
+6. For every finding, include:
+   - severity
+   - confidence
+   - basis
+   - why it matters
+   - suggested fix
+7. If there are no findings, say so explicitly and still mention blind spots.
+
+## General engineering expectations
+
+- Prefer direct evidence over speculation.
+- Flag missing execution context when confidence is limited.
+- Focus on behavior and operational risk before style.
diff --git a/changelog.d/796.added.md b/changelog.d/796.added.md
new file mode 100644
index 000000000..fd9fc98fc
--- /dev/null
+++ b/changelog.d/796.added.md
@@ -0,0 +1 @@
+Add repo-native Codex PR review instruction files for experimental pull request review guidance.