From 6281e04bf2ff2a6549f810dcd659af15a710b3ea Mon Sep 17 00:00:00 2001
From: prrao87 <35005448+prrao87@users.noreply.github.com>
Date: Wed, 6 May 2026 16:40:00 -0400
Subject: [PATCH 1/2] Add docs audit area rotation

---
 workflows/docs-audit/README.md                |  64 +++++--
 workflows/docs-audit/config.toml              |   8 +-
 .../docs-audit/prompts/weekly_automation.md   |  31 +--
 workflows/docs-audit/scripts/run_audit.py     | 177 +++++++++++++++++-
 4 files changed, 245 insertions(+), 35 deletions(-)

diff --git a/workflows/docs-audit/README.md b/workflows/docs-audit/README.md
index 674a9e3..9f9bc07 100644
--- a/workflows/docs-audit/README.md
+++ b/workflows/docs-audit/README.md
@@ -39,21 +39,22 @@ The audit runner only reads those repos and records refresh status. This workspa
 Each weekly run follows the same sequence:
 
 1. Refresh watched repos with safe fast-forward pulls.
-2. Read the enabled area manifests.
-3. Build deterministic evidence bundles for each page in the selected area.
-4. Compare current evidence fingerprints to the last completed run.
-5. Select pages to audit:
+2. Select a bounded set of enabled area manifests for the weekly run.
+3. Read the selected area manifests.
+4. Build deterministic evidence bundles for each page in the selected area.
+5. Compare current evidence fingerprints to the last completed run.
+6. Select pages to audit:
    - always include pages whose mapped evidence changed
-   - then include one rotating extra page for broader coverage
-   - if no pages changed, the rotating extra page becomes the only selected page
-   - the rotation walks through the pages in manifest order and advances one slot after each completed run
-6. Use Codex LLM passes on the selected page bundles to extract:
+   - then include rotating extra pages for broader coverage
+   - if no pages changed, the rotating extra pages become the selected pages
+   - the rotation walks through the pages in manifest order and advances as rotating pages are added
+7. Use Codex LLM passes on the selected page bundles to extract:
    - code claims
    - doc claims
    - candidate gaps and final markdown observations
-7. Save artifacts under a timestamped run directory.
-8. Mark the run complete and update state.
-9. Surface the final markdown report through a Codex inbox item.
+8. Save artifacts under a timestamped run directory.
+9. Mark the run complete and update state.
+10. Surface the final markdown report through a Codex inbox item.
 
 ## Workspace Layout
 
@@ -72,11 +73,12 @@ The deterministic layer is responsible for the parts that should not require sem
 
 - refreshing repos with `git pull --ff-only`
 - reading manifests
+- selecting changed enabled areas first, then filling the weekly area budget by rotation
 - selecting source files per page
 - hashing file contents and detecting changed pages
 - extracting compact raw signals from docs pages and code-side surfaces
 - writing page-scoped evidence bundles
-- selecting changed pages plus one rotating extra page
+- selecting changed pages plus rotating extra pages
 - updating local state after a completed run
 
 The deterministic layer intentionally keeps evidence compact so the LLM does not need to read entire files or repos.
@@ -102,7 +104,16 @@ The saved artifacts should include:
 From this workspace root:
 
 ```bash
-uv run python scripts/run_audit.py prepare --area indexing --refresh
+uv run python scripts/run_audit.py select-areas --refresh --advance
+```
+
+This chooses a bounded list of enabled area manifests for the weekly run. The selector uses
+`[area_selection]` in `config.toml`: changed enabled areas are considered first, then any remaining
+weekly slots are filled by rotating through `enabled_areas`. Use the printed `selected_areas` list
+for the per-area `prepare` commands.
+
+```bash
+uv run python scripts/run_audit.py prepare --area indexing
 ```
 
 `--area` is the manifest name, not a hardcoded value in the script. The runner loads:
@@ -112,11 +123,36 @@ uv run python scripts/run_audit.py prepare --area indexing --refresh
 So `--area indexing` maps to `manifests/indexing.toml`. If you add `manifests/search.toml`, you would run:
 
 ```bash
-uv run python scripts/run_audit.py prepare --area search --refresh
+uv run python scripts/run_audit.py prepare --area search
 ```
 
 This creates a pending run directory under `artifacts/pending/<run_id>/` and prints a JSON summary to stdout.
 
+When running after `select-areas --refresh`, omit `--refresh` from `prepare`; the repos were already
+refreshed once for the weekly selection.
+For a standalone one-area audit where you skip `select-areas`, pass `--refresh` to `prepare`.
+
+## Area Selection
+
+`enabled_areas` is the full pool of manifests the weekly automation may audit. The `[area_selection]`
+block controls how many of those enabled manifests are selected for a single weekly run:
+
+```toml
+[area_selection]
+mode = "changed_first_rotate"
+areas_per_run = 2
+```
+
+Supported modes:
+
+- `all`: select every enabled area.
+- `rotate`: ignore changed-area detection and select only by rotating through `enabled_areas`.
+- `changed_first_rotate`: select changed enabled areas first, up to `areas_per_run`, then fill any remaining slots by rotation.
+
+The area rotation cursor is stored in `state/state.json` under `area_selection.rotation_index` when
+you run `select-areas --advance`. Page-level rotation still happens independently inside each
+selected area through `[selection].rotation_extra_pages`.
+
 After the LLM phase writes the expected outputs into that pending run directory, complete the run with:
 
 ```bash
diff --git a/workflows/docs-audit/config.toml b/workflows/docs-audit/config.toml
index f593263..8df3eb3 100644
--- a/workflows/docs-audit/config.toml
+++ b/workflows/docs-audit/config.toml
@@ -5,10 +5,14 @@ enabled_areas = [
     "table-operations",
     "reranking",
     "embeddings",
-    "storage",,
+    "storage",
     "namespaces",
 ]
 
+[area_selection]
+mode = "changed_first_rotate"
+areas_per_run = 2
+
 [repos.lancedb]
 path = "../../../lancedb"
 
@@ -19,7 +23,7 @@ path = "../.."
 path = "../../../sophon"
 
 [selection]
-rotation_extra_pages = 10
+rotation_extra_pages = 5
 prefer_changed_pages = true
 
 [paths]
diff --git a/workflows/docs-audit/prompts/weekly_automation.md b/workflows/docs-audit/prompts/weekly_automation.md
index 73db24c..931a92d 100644
--- a/workflows/docs-audit/prompts/weekly_automation.md
+++ b/workflows/docs-audit/prompts/weekly_automation.md
@@ -16,12 +16,17 @@ This workflow also includes manifest maintenance. Before each audit run, review
 - `prompts/page_audit_guidelines.md`
 - `skills/area-manifest-authoring/SKILL.md`
 
-Then read the manifest file for each area listed in `enabled_areas` in `config.toml`.
+Then select the area manifests for this run using the deterministic area selector.
 
 ## Required workflow
 
 1. Read `config.toml` and determine the enabled areas from `enabled_areas`.
-2. For each enabled area, run a manifest maintenance pass before `prepare`.
+2. Select the areas for this weekly run:
+   - `uv run python scripts/run_audit.py select-areas --refresh --advance`
+   - Use the printed `selected_areas` list for the rest of this workflow.
+   - The selector refreshes watched repos once, detects changed enabled areas, and fills the remaining weekly slots by area rotation.
+   - Do not run unselected enabled areas in this weekly pass.
+3. For each selected area, run a manifest maintenance pass before `prepare`.
    - Use `skills/area-manifest-authoring/SKILL.md` as the procedure.
    - Read the current `manifests/<area>.toml`.
    - Check whether the docs area boundary has changed:
@@ -34,33 +39,31 @@ Then read the manifest file for each area listed in `enabled_areas` in `config.t
      - source blocks whose `applies_to` mapping is now too broad or too narrow
    - Keep the manifest compact. Do not add files just because they mention the topic; add them only if they are likely to expose user-visible behavior the docs may be missing.
    - If the manifest changes, save the updated `manifests/<area>.toml` before preparing the run.
-3. Run the deterministic prepare step for each enabled area.
-   - For the first area, refresh the watched repos:
-     - `uv run python scripts/run_audit.py prepare --area <first-area> --refresh`
-   - For subsequent areas in the same weekly run, skip the refresh to avoid repeating `git pull`:
-     - `uv run python scripts/run_audit.py prepare --area <next-area>`
-4. Read the JSON summary printed by each `prepare` command and locate each pending run directory.
+4. Run the deterministic prepare step for each selected area.
+   - Repos were already refreshed by `select-areas`, so skip `--refresh` here:
+     - `uv run python scripts/run_audit.py prepare --area <area>`
+5. Read the JSON summary printed by each `prepare` command and locate each pending run directory.
    - Use the printed `run_dir`; it should point under `artifacts/pending/<run_id>`.
    - Do not create or write directly under `artifacts/runs/<run_id>` before completion.
-5. For each pending run directory, read `selected_pages.json` and the corresponding files in `page_bundles/`.
-6. For each selected page bundle:
+6. For each pending run directory, read `selected_pages.json` and the corresponding files in `page_bundles/`.
+7. For each selected page bundle:
    - apply `prompts/page_audit_guidelines.md` as the page-level review rubric
    - infer normalized code claims from the evidence bundle
    - infer normalized doc claims from the docs bundle
    - identify only the missing documentation
-7. Write semantic outputs under `llm_outputs/` in each pending run directory.
+8. Write semantic outputs under `llm_outputs/` in each pending run directory.
    - one file per page for code claims
    - one file per page for doc claims
    - one file per page for candidate gaps
-8. Write `report.md` in each pending run directory.
+9. Write `report.md` in each pending run directory.
    - `report.md` is the docs-gap summary only.
    - Do not include refresh status, manifest-maintenance notes, selected-pages bookkeeping, or any other workflow narration in `report.md`.
    - Include operational notes only if they materially affected audit quality, such as an unrefreshable repo, missing source files, or a manifest ambiguity that changes confidence in the findings.
-9. Complete each run:
+10. Complete each run:
    - `uv run python scripts/run_audit.py complete --run-id <run_id>`
    - Completion publishes the pending directory to `artifacts/runs/<run_id>` and updates `artifacts/latest_run.json`.
    - Only completed runs with `report.md` should appear under `artifacts/runs/`.
-10. Return a concise markdown summary suitable for the Codex inbox item.
+11. Return a concise markdown summary suitable for the Codex inbox item.
 
 ## Manifest maintenance rules
 
diff --git a/workflows/docs-audit/scripts/run_audit.py b/workflows/docs-audit/scripts/run_audit.py
index 1dddf15..7cad94c 100755
--- a/workflows/docs-audit/scripts/run_audit.py
+++ b/workflows/docs-audit/scripts/run_audit.py
@@ -24,6 +24,7 @@
 ROOT = Path(__file__).resolve().parent.parent
 DEFAULT_CONFIG = ROOT / "config.toml"
 DEFAULT_PENDING_ARTIFACTS_DIR = "artifacts/pending"
+DEFAULT_AREA_SELECTION_MODE = "all"
 EXCERPT_LIMIT = 12
 SYMBOL_PATTERNS = [
     re.compile(r"\b(pub\s+enum|pub\s+struct|pub\s+fn|async\s+fn|fn\s+test_|def\s+test_|test\(|describe\()"),
@@ -281,6 +282,17 @@ def pending_runs_dir(config: dict[str, Any]) -> Path:
     return ROOT / config["paths"].get("pending_artifacts_dir", DEFAULT_PENDING_ARTIFACTS_DIR)
 
 
+def repo_infos(config: dict[str, Any]) -> dict[str, RepoInfo]:
+    return {
+        name: RepoInfo(name=name, path=Path(info["path"]))
+        for name, info in config["repos"].items()
+    }
+
+
+def manifest_path(config: dict[str, Any], area: str) -> Path:
+    return ROOT / config["paths"]["manifests_dir"] / f"{area}.toml"
+
+
 def load_state(path: Path) -> dict[str, Any]:
     if not path.exists():
         return {"version": 1, "areas": {}}
@@ -408,6 +420,31 @@ def collect_page_bundle(
     }
 
 
+def collect_area_fingerprints(
+    manifest: AreaManifest,
+    config: dict[str, Any],
+    repos: dict[str, RepoInfo],
+) -> dict[str, str]:
+    fingerprints: dict[str, str] = {}
+    for page in manifest.pages:
+        bundle = collect_page_bundle(manifest, config, page, repos)
+        fingerprints[page.id] = bundle["page_fingerprint"]
+    return fingerprints
+
+
+def changed_pages_for_area(
+    pages: list[Page],
+    fingerprints: dict[str, str],
+    previous_fingerprints: dict[str, str],
+) -> list[str]:
+    ordered_ids = [page.id for page in pages]
+    return [
+        page_id
+        for page_id in ordered_ids
+        if previous_fingerprints.get(page_id) != fingerprints[page_id]
+    ]
+
+
 def select_pages(
     pages: list[Page],
     fingerprints: dict[str, str],
@@ -432,13 +469,121 @@ def select_pages(
     return changed, selected, next_index
 
 
-def prepare(args: argparse.Namespace) -> int:
+def select_area_names(
+    enabled_areas: list[str],
+    changed_areas: list[str],
+    rotation_index: int,
+    areas_per_run: int,
+    mode: str,
+) -> tuple[list[str], int]:
+    if not enabled_areas:
+        return [], rotation_index
+    if areas_per_run <= 0:
+        raise SystemExit("area_selection.areas_per_run must be greater than 0")
+    if mode == "all":
+        return enabled_areas, rotation_index
+    if mode not in {"rotate", "changed_first_rotate"}:
+        raise SystemExit(
+            "area_selection.mode must be one of: all, rotate, changed_first_rotate"
+        )
+
+    selected: list[str] = []
+    if mode == "changed_first_rotate":
+        selected.extend(changed_areas[:areas_per_run])
+
+    next_index = rotation_index
+    visited = 0
+    while len(selected) < areas_per_run and visited < len(enabled_areas):
+        candidate = enabled_areas[next_index % len(enabled_areas)]
+        next_index = (next_index + 1) % len(enabled_areas)
+        visited += 1
+        if candidate in selected:
+            continue
+        selected.append(candidate)
+
+    return selected, next_index
+
+
+def select_areas(args: argparse.Namespace) -> int:
     config = load_config(Path(args.config))
-    repos = {
-        name: RepoInfo(name=name, path=Path(info["path"]))
-        for name, info in config["repos"].items()
+    enabled_areas = list(config.get("enabled_areas", []))
+    if not enabled_areas:
+        raise SystemExit("No enabled_areas configured")
+
+    area_selection = config.get("area_selection", {})
+    mode = str(area_selection.get("mode", DEFAULT_AREA_SELECTION_MODE))
+    areas_per_run = int(area_selection.get("areas_per_run", len(enabled_areas)))
+    state_path = ROOT / config["paths"]["state_file"]
+    state = load_state(state_path)
+    selection_state = state.setdefault("area_selection", {})
+    rotation_index = int(selection_state.get("rotation_index", 0))
+    repos = repo_infos(config)
+
+    refresh_results = []
+    simulated = set(args.simulate_refresh_failure or [])
+    for repo in repos.values():
+        refresh_results.append(repo_snapshot(repo, args.refresh, repo.name in simulated))
+
+    area_status: dict[str, Any] = {}
+    changed_areas: list[str] = []
+    for area in enabled_areas:
+        manifest = load_manifest(manifest_path(config, area))
+        fingerprints = collect_area_fingerprints(manifest, config, repos)
+        previous_fingerprints = (
+            state.get("areas", {}).get(area, {}).get("page_fingerprints", {})
+        )
+        changed_pages = changed_pages_for_area(
+            manifest.pages,
+            fingerprints,
+            previous_fingerprints,
+        )
+        if changed_pages:
+            changed_areas.append(area)
+        area_status[area] = {
+            "changed_pages": changed_pages,
+            "page_count": len(manifest.pages),
+        }
+
+    selected_areas, next_rotation_index = select_area_names(
+        enabled_areas,
+        changed_areas,
+        rotation_index,
+        areas_per_run,
+        mode,
+    )
+
+    created_at = datetime.now(timezone.utc).isoformat()
+    summary = {
+        "created_at": created_at,
+        "mode": mode,
+        "areas_per_run": areas_per_run,
+        "enabled_areas": enabled_areas,
+        "changed_areas": changed_areas,
+        "selected_areas": selected_areas,
+        "rotation": {
+            "previous_index": rotation_index,
+            "next_index": next_rotation_index,
+        },
+        "area_status": area_status,
+        "refresh": refresh_results,
+        "advanced": args.advance,
     }
-    manifest = load_manifest(ROOT / config["paths"]["manifests_dir"] / f"{args.area}.toml")
+
+    if args.advance:
+        selection_state["rotation_index"] = next_rotation_index
+        selection_state["last_selected_areas"] = selected_areas
+        selection_state["last_changed_areas"] = changed_areas
+        selection_state["selected_at"] = created_at
+        write_json(state_path, state)
+
+    print(json.dumps(summary, indent=2, sort_keys=True))
+    return 0
+
+
+def prepare(args: argparse.Namespace) -> int:
+    config = load_config(Path(args.config))
+    repos = repo_infos(config)
+    manifest = load_manifest(manifest_path(config, args.area))
     state_path = ROOT / config["paths"]["state_file"]
     state = load_state(state_path)
     area_state = state.get("areas", {}).get(args.area, {})
@@ -631,6 +776,28 @@ def build_parser() -> argparse.ArgumentParser:
     )
     prepare_parser.set_defaults(func=prepare)
 
+    select_areas_parser = subparsers.add_parser(
+        "select-areas",
+        help="Select enabled area manifests for a weekly run",
+    )
+    select_areas_parser.add_argument(
+        "--refresh",
+        action="store_true",
+        help="Attempt git pull --ff-only on watched repos before detecting changed areas",
+    )
+    select_areas_parser.add_argument(
+        "--advance",
+        action="store_true",
+        help="Persist the next area rotation cursor after selecting areas",
+    )
+    select_areas_parser.add_argument(
+        "--simulate-refresh-failure",
+        action="append",
+        choices=["lancedb", "docs", "sophon"],
+        help="Simulate an unrefreshable repo for manual validation",
+    )
+    select_areas_parser.set_defaults(func=select_areas)
+
     complete_parser = subparsers.add_parser("complete", help="Mark a run complete and update state")
     complete_parser.add_argument("--run-id", required=True, help="Run ID returned by prepare")
     complete_parser.set_defaults(func=complete)

From 03e81bcbcd6807435c78199e0a0f488ad28d9d22 Mon Sep 17 00:00:00 2001
From: prrao87 <35005448+prrao87@users.noreply.github.com>
Date: Wed, 6 May 2026 16:48:25 -0400
Subject: [PATCH 2/2] Make docs audit prompts agent-neutral

---
 workflows/docs-audit/AGENTS.md                    |  2 +-
 workflows/docs-audit/README.md                    | 14 +++++++-------
 workflows/docs-audit/prompts/weekly_automation.md |  2 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/workflows/docs-audit/AGENTS.md b/workflows/docs-audit/AGENTS.md
index 0b78cdf..d367e91 100644
--- a/workflows/docs-audit/AGENTS.md
+++ b/workflows/docs-audit/AGENTS.md
@@ -6,7 +6,7 @@ This workspace orchestrates a docs-gap audit across external local repos. It doe
 
 - Deterministic scripts live in `scripts/`.
 - Area manifests live in `manifests/`.
-- Codex prompt templates live in `prompts/`.
+- Agent prompt templates live in `prompts/`.
 - Run state lives in `state/`.
 - Generated run artifacts live in `artifacts/`.
 
diff --git a/workflows/docs-audit/README.md b/workflows/docs-audit/README.md
index 9f9bc07..83bb9e3 100644
--- a/workflows/docs-audit/README.md
+++ b/workflows/docs-audit/README.md
@@ -11,7 +11,7 @@ The goal is to find what is missing from the docs, especially conceptual and imp
 This is a research workflow, not a production service. The design favors:
 
 - compact deterministic preprocessing
-- page-scoped LLM work inside the Codex app
+- page-scoped LLM work by the running agent
 - saved local artifacts for inspection and reuse
 - simple extension through manifests
 
@@ -20,7 +20,7 @@ This is a research workflow, not a production service. The design favors:
 This workspace does not:
 
 - clone or vendor source code from the watched repos
-- attempt to enforce a hard token quota in Codex
+- attempt to enforce a hard token quota in the agent runtime
 - produce doc fixes automatically
 - behave like a production CI system
 
@@ -48,19 +48,19 @@ Each weekly run follows the same sequence:
    - then include rotating extra pages for broader coverage
    - if no pages changed, the rotating extra pages become the selected pages
    - the rotation walks through the pages in manifest order and advances as rotating pages are added
-7. Use Codex LLM passes on the selected page bundles to extract:
+7. Use page-scoped LLM passes on the selected page bundles to extract:
    - code claims
    - doc claims
    - candidate gaps and final markdown observations
 8. Save artifacts under a timestamped run directory.
 9. Mark the run complete and update state.
-10. Surface the final markdown report through a Codex inbox item.
+10. Surface the final markdown report through an inbox item.
 
 ## Workspace Layout
 
 - `config.toml`: repo paths, enabled areas, selection rules, and output paths
 - `manifests/`: docs-area manifests
-- `prompts/`: reusable Codex prompt templates
+- `prompts/`: reusable agent prompt templates
 - `scripts/`: deterministic extraction, refresh, selection, and state utilities
 - `state/`: lightweight run state and rotation cursor
 - `artifacts/`: per-run evidence bundles, LLM outputs, and reports
@@ -85,7 +85,7 @@ The deterministic layer intentionally keeps evidence compact so the LLM does not
 
 ## LLM-Assisted Layer
 
-The semantic layer runs inside Codex through the automation prompt. For each selected page bundle, the LLM should:
+The semantic layer runs through the automation prompt. For each selected page bundle, the LLM should:
 
 1. infer normalized code claims from the evidence bundle
 2. infer normalized doc claims from the docs bundle
@@ -343,7 +343,7 @@ The runner is designed so new docs areas should generally require a new manifest
 
 ## Weekly Automation
 
-The weekly Codex automation should use this workspace as its cwd and follow `prompts/weekly_automation.md`.
+The weekly automation should use this workspace as its cwd and follow `prompts/weekly_automation.md`.
 
 The automation should:
 
diff --git a/workflows/docs-audit/prompts/weekly_automation.md b/workflows/docs-audit/prompts/weekly_automation.md
index 931a92d..c255a73 100644
--- a/workflows/docs-audit/prompts/weekly_automation.md
+++ b/workflows/docs-audit/prompts/weekly_automation.md
@@ -63,7 +63,7 @@ Then select the area manifests for this run using the deterministic area selecto
    - `uv run python scripts/run_audit.py complete --run-id <run_id>`
    - Completion publishes the pending directory to `artifacts/runs/<run_id>` and updates `artifacts/latest_run.json`.
    - Only completed runs with `report.md` should appear under `artifacts/runs/`.
-11. Return a concise markdown summary suitable for the Codex inbox item.
+11. Return a concise markdown summary suitable for the inbox item.
 
 ## Manifest maintenance rules