From f5ad53747bde429b9d1fc881ff0b86327ea54184 Mon Sep 17 00:00:00 2001 From: Gabor Szabo Date: Mon, 1 Jun 2026 15:59:55 +0200 Subject: [PATCH 01/44] =?UTF-8?q?feat(docs,repo):=20flow-pack=20E1=20found?= =?UTF-8?q?ation=20=E2=80=94=20/flow-prime=20+=20tracked=20contract=20+=20?= =?UTF-8?q?rule=20(#369)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .gitignore | 5 +- PRPs/PRP-flow-pack-E1-foundation.md | 258 ++++++++++++++++++++++++ docs/flow-pack-methodology.md | 270 ++++++++++++++++++++++++++ docs/flow-pack/commands/flow-prime.md | 154 +++++++++++++++ 4 files changed, 685 insertions(+), 2 deletions(-) create mode 100644 PRPs/PRP-flow-pack-E1-foundation.md create mode 100644 docs/flow-pack-methodology.md create mode 100644 docs/flow-pack/commands/flow-prime.md diff --git a/.gitignore b/.gitignore index 9f21159a..f7432f39 100644 --- a/.gitignore +++ b/.gitignore @@ -41,9 +41,10 @@ artifacts/ # Local session artifacts (plans, handoffs, current session notes) .agents/ +.flow/ .handoffs/ HANDOFF.md # Local CI / dogfood logs and screenshots (per-session, never committed) -.ci-logs/ -docs/manual_hun/ +.ci-logs/ +docs/manual_hun/ diff --git a/PRPs/PRP-flow-pack-E1-foundation.md b/PRPs/PRP-flow-pack-E1-foundation.md new file mode 100644 index 00000000..f9e7db50 --- /dev/null +++ b/PRPs/PRP-flow-pack-E1-foundation.md @@ -0,0 +1,258 @@ +name: "PRP — flow-pack E1 Foundation (tracked contract + /flow-prime + rule + local install)" +description: | + Foundation slice of the flow: command-suite integration. Lands the tracked durable + source-of-truth (docs/flow-pack/**) plus the first command (/flow-prime) and the + umbrella-issue rule as a regenerable local install under .claude/**. Blocks E2–E5. + + + +## Issue links +- Umbrella: **#368** — feat(repo): integrate flow-pack methodology as the flow: command suite +- This epic: **#369** — flow-pack E1 — foundation (minimal viable; BLOCKS E2–E5) +- Milestone: **#1 flow-pack-suite** · labels: `epic`, `flow` + +--- + +## Goal +Implement the **E1 foundation** of the `flow:` command suite: a tracked, durable methodology +contract and the first command, installed locally per the **durable-source split**. The end +state: a reviewer on a fresh clone can read `docs/flow-pack/**`, regenerate the local +`.claude/commands/flow/` install, and run `/flow-prime` to produce a current-workflow-map + +"you are here" snapshot written to `.flow/state.md`. + +**Deliverable:** 4 files + 1 documented recovery path (see Desired tree). No E2–E5. No GitHub +issue creation. No commit/push. + +## Why +- `.claude/` is gitignored (CLAUDE.md "Learnings"), so commands/rules placed only there are lost + on a fresh clone and cannot be the source of truth. E1 establishes the tracked `docs/flow-pack/**` + as durable, with `.claude/**` as a regenerable runtime install. +- E1 is the minimal viable slice that proves the loop and unblocks the parallel epics E2–E4. + +## What +A docs-first foundation: tracked contract + tracked command template → local install → working +`/flow-prime`. + +### Success Criteria +- [ ] Tracked `docs/flow-pack-methodology.md` exists (Mermaid pipeline + invariants + FLAI mapping + + portability manifest + a "Fresh-clone recovery" section). +- [ ] Tracked `docs/flow-pack/commands/flow-prime.md` exists (the canonical command template/spec). +- [ ] Local `.claude/commands/flow/flow-prime.md` present, byte-regenerable from the tracked template. +- [ ] Local `.claude/rules/umbrella-issue.md` present (durable narrative lives in the methodology doc). +- [ ] Fresh-clone recovery documented and verified: `cp docs/flow-pack/commands/*.md .claude/commands/flow/` + reproduces the local command(s). +- [ ] `/flow-prime` runs: delegates to `core_piv_loop:prime` (codebase) + gathers gh state + writes + `.flow/state.md`; output includes a current-workflow-map and a you-are-here snapshot. +- [ ] Every created artifact carries a provenance header linking to its KB source. +- [ ] `git check-ignore .claude/commands/flow/flow-prime.md` confirms the local copy is NOT tracked; + `git status` shows `docs/flow-pack/**` as the only NEW tracked additions. + +## All Needed Context + +### Documentation & References +```yaml +# MUST READ — the reverse-engineered methodology (already analyzed in Phases 0–2) +- file: .flow/state.md + why: the full Phase 0–5 decision record — chosen workflow, durable-source split, epic plan +- file: .flow/brainstorm-log.md + why: the V1→V2 dogfood, 5-dim scores, and the 3-subagent research findings E1 is built on + +- file: /home/w7-hector/_KB-BASE-BY-w7/JOB/DIA-FLOW/ai_engineering_mermaid_flow_pack/docs/flow-analysis/01-decomposition.md + why: umbrella 7-field contract + epic phase taxonomy (foundation/parallel/release) + hierarchy-as-data — the umbrella-issue.md rule is reverse-engineered from this +- file: /home/w7-hector/_KB-BASE-BY-w7/JOB/DIA-FLOW/ai_engineering_mermaid_flow_pack/docs/flow-analysis/03-continuation-discipline.md + why: baseline → V1 → 3 read-only agents → score → V2; /flow-prime captures the "baseline reality" step +- file: /home/w7-hector/_KB-BASE-BY-w7/JOB/DIA-FLOW/ai_engineering_mermaid_flow_pack/docs/flow-analysis/02-execution-pipeline.md + why: the issue→5-subtask pipeline = the existing issue-to-subtasks skill; flow: hands off, does not reimplement + +# PATTERNS TO MIRROR (house style — match exactly) +- file: .claude/commands/core_piv_loop/prime.md + why: /flow-prime DELEGATES to this for codebase priming; mirror its command structure + section style +- file: .claude/rules/commit-format.md + why: rule anatomy to mirror (Title → Purpose → Rules/tables → Examples → Before X checklist) +- file: .claude/rules/branch-naming.md + why: second rule-style reference; also dictates the branch name for this work +- file: .claude/commands/base_prp/prp-create.md + why: the hand-off target invoked per epic after /flow-epics + +# CONSTRAINTS +- file: CLAUDE.md + section: "Learnings" — ".claude/ is gitignored — skills, rules, and hooks are local-only" + critical: this is WHY the durable-source split exists; do not treat .claude/** as committed truth +- file: .gitignore + why: confirm /.claude and .claude are ignored; .flow/ is local working state (consider ignoring it too) +``` + +### Current Codebase tree (relevant slice) +```bash +.claude/ + commands/{base_cm,base_evals,base_prp,core_piv_loop,do,git-operations,prompts,validation}/ # NO flow/ yet + rules/{branch-naming,commit-format,output-formatting,product-vision,security-patterns,shadcn-ui,test-requirements,ui-design,versioning}.md # NO umbrella-issue.md yet +docs/ # tracked; NO flow-pack/ yet +.flow/ # working state (untracked): state.md, brainstorm-log.md +PRPs/ # this PRP lives here +``` + +### Desired Codebase tree (files to add + responsibility) +```bash +docs/ + flow-pack-methodology.md # TRACKED durable contract: Mermaid pipeline, invariants, FLAI mapping, portability manifest, fresh-clone recovery + flow-pack/ + commands/ + flow-prime.md # TRACKED canonical template/spec for /flow-prime (source of truth) +.claude/ + commands/flow/ + flow-prime.md # LOCAL install — regenerable byte-copy of the tracked template (gitignored, NOT durable) + rules/ + umbrella-issue.md # LOCAL rule — agent contract; durable narrative is in docs/flow-pack-methodology.md +``` + +### Known Gotchas & Quirks +```text +# CRITICAL: .claude/ is gitignored (/.claude and .claude in .gitignore). The local command + +# rule are NEVER the committed source. Durable truth = docs/flow-pack/**. Verify with: +# git check-ignore .claude/commands/flow/flow-prime.md (must print the path = ignored) +# CRITICAL: the local install must be a faithful regeneration of the tracked template, NOT a +# hand-edited divergent copy. If they drift, the tracked template wins. Recovery = cp. +# GOTCHA: commit-format.md requires every commit reference an open issue → commit against (#369); +# branch off dev per branch-naming.md, e.g. feat/flow-pack-e1-foundation. +# GOTCHA: /flow-prime must DELEGATE to core_piv_loop:prime for codebase priming (do NOT duplicate +# its git ls-files/tree/log logic) and ADD only the gh-state + you-are-here + .flow/state.md write. +# GOTCHA: .flow/ is local working state. Consider adding `.flow/` to .gitignore so it is not +# accidentally committed (optional task T6; respect the dirty worktree — do not stage uv.lock or +# docker-compose.lan.yml). +# SCOPE: do NOT create flow-analyze/brainstorm/umbrella/epics commands here — E1 ships /flow-prime +# only. /flow-analyze is permanently deferred (folded into prime/brainstorm). +# PROVENANCE: every file starts with an HTML-comment provenance header naming its KB source. +# NO new runtime deps, no agent-teams/tmux (research fan-out = plain read-only subagents). +``` + +## Implementation Blueprint + +### list of tasks (dependency order) +```yaml +Task 1 — CREATE docs/flow-pack-methodology.md (tracked source of truth): + - INCLUDE: a Mermaid diagram of the 4-command pipeline (see .flow/state.md "Chosen workflow") + - INCLUDE: invariants list (read-only-until-approval; hierarchy-as-data; exactly-3 research agents; + score bands >=40/36-39/<36; 7-field umbrella; foundation->parallel->release; every defer has a reason) + - INCLUDE: FLAI mapping table (which methodology stage = which flow: command / existing skill) + - INCLUDE: "Durable-source split" section (docs/flow-pack/** tracked vs .claude/** local install) + - INCLUDE: "Fresh-clone recovery" section with the exact cp command + - INCLUDE: "Portability manifest" — the named params to change to reuse in another repo + - HEADER: provenance comment -> flow-analysis/{01,02,03}.md + +Task 2 — CREATE docs/flow-pack/commands/flow-prime.md (tracked command template): + - MIRROR structure of: .claude/commands/core_piv_loop/prime.md (frontmatter? section headings, $ARGUMENTS) + - SPEC the /flow-prime behavior: (a) delegate to core_piv_loop:prime; (b) gather gh state + (gh issue list, milestones, releases, project); (c) emit current-workflow-map + you-are-here; + (d) write/append .flow/state.md; (e) print gate result + next command (/flow-analyze folded in or + /flow-brainstorm) per the self-guiding convention + - INCLUDE dry-run note + provenance header + +Task 3 — INSTALL .claude/commands/flow/flow-prime.md (local runtime copy): + - REGENERATE as a byte-copy of docs/flow-pack/commands/flow-prime.md (cp), so they never diverge + - VERIFY: diff docs/flow-pack/commands/flow-prime.md .claude/commands/flow/flow-prime.md == empty + +Task 4 — CREATE .claude/rules/umbrella-issue.md (local rule): + - MIRROR anatomy of: .claude/rules/commit-format.md + - CONTENT (from Phase 4 draft in .flow + 01-decomposition.md): when to create an umbrella; + 7-field body contract; epic shape + phase contract; hierarchy-as-data (gh api sub_issues, no native + gh cmd); labels+milestone; write discipline (dry-run/idempotent/approval-gated); the command + source-of-truth & install split + recovery; validation (run audit-rules-drift) + - HEADER: provenance + "durable copy: docs/flow-pack-methodology.md" + +Task 5 — DOCUMENT fresh-clone recovery (in Task 1 doc, verify here): + - The methodology doc's recovery section reproduces .claude/commands/flow/ from docs/flow-pack/commands/ + +Task 6 (OPTIONAL) — MODIFY .gitignore: + - ADD `.flow/` so local working state is not accidentally committed + - PRESERVE existing entries; do NOT touch uv.lock / docker-compose.lan.yml in the worktree +``` + +### Per-task notes +```text +# Task 2 — /flow-prime spec is a DELEGATION wrapper, not a reimplementation: +# "Run core_piv_loop:prime for codebase priming, THEN gather gh state, THEN synthesize the +# current-workflow-map + you-are-here, write .flow/state.md, and print the next-command pointer." +# Task 4 — the rule is the LOCAL agent contract; keep it short and point to docs/flow-pack-methodology.md +# for the full narrative (avoid duplicating, since the rule isn't committed). +``` + +### Integration Points +```yaml +DOCS (tracked): + - add: docs/flow-pack-methodology.md, docs/flow-pack/commands/flow-prime.md +CLAUDE (local, gitignored): + - add: .claude/commands/flow/flow-prime.md, .claude/rules/umbrella-issue.md +GITIGNORE (optional): + - add: ".flow/" +HAND-OFF: + - on E1 completion, E2–E4 each go to base_prp:prp-create (separate, later) +``` + +## Validation Loop + +### Level 1: File presence + durable-source split +```bash +# tracked source of truth exists +test -f docs/flow-pack-methodology.md && test -f docs/flow-pack/commands/flow-prime.md && echo "OK tracked" +# local install exists and is gitignored (NOT durable) +test -f .claude/commands/flow/flow-prime.md && test -f .claude/rules/umbrella-issue.md && echo "OK local" +git check-ignore .claude/commands/flow/flow-prime.md .claude/rules/umbrella-issue.md # both must print (ignored) +# local install == tracked template (no drift) +diff -q docs/flow-pack/commands/flow-prime.md .claude/commands/flow/flow-prime.md && echo "OK no drift" +# only docs/flow-pack/** are NEW tracked files (plus this PRP); .claude/** not staged +git status --short +``` + +### Level 2: Fresh-clone recovery reproduction +```bash +# simulate recovery: blow away the local install, regenerate from tracked template, confirm identical +rm -f .claude/commands/flow/flow-prime.md +cp docs/flow-pack/commands/*.md .claude/commands/flow/ +diff -q docs/flow-pack/commands/flow-prime.md .claude/commands/flow/flow-prime.md && echo "OK recovery reproduces local" +``` + +### Level 3: /flow-prime smoke +```bash +# In a Claude Code session, run the command and confirm it produces the two artifacts: +# - a current-workflow-map (existing commands/skills/rules/workflows) +# - a you-are-here snapshot (branch, version, open issues, gap) +# and writes/append .flow/state.md. Confirm it ends by printing the next-command pointer. +# (No automated assertion — this is an interactive command; verify the output sections exist.) +``` + +## Tests / checks required +- [ ] Level 1 file-presence + gitignore + no-drift checks all pass. +- [ ] Level 2 recovery reproduces the local install byte-for-byte. +- [ ] Level 3 `/flow-prime` produces both required output sections + writes `.flow/state.md`. +- [ ] `audit-rules-drift` skill run against the new `umbrella-issue.md` reports no drift from AGENTS.md/CLAUDE.md. +- [ ] Provenance header present in all 4 created files (`grep -l provenance` finds each). +- [ ] No standard repo gate is broken (markdown-only change → ruff/mypy/pyright/pytest unaffected; run them to confirm green if any tooling touches docs). + +## Final Validation Checklist +- [ ] All 4 files created + recovery path documented and verified. +- [ ] Durable-source split holds: `docs/flow-pack/**` tracked; `.claude/**` ignored + regenerable. +- [ ] `/flow-prime` runs and writes `.flow/state.md`. +- [ ] `umbrella-issue.md` mirrors house rule anatomy; `audit-rules-drift` clean. +- [ ] E2–E5 NOT implemented; no new GitHub issues created. +- [ ] Branch is `feat/flow-pack-e1-foundation` off `dev`; commit (when the user authorizes) references `(#369)`. +- [ ] No commit/push performed by this PRP execution unless explicitly requested; `uv.lock` + `docker-compose.lan.yml` left untouched. + +## Anti-Patterns to Avoid +- ❌ Treating `.claude/commands/flow/` as the source of truth (it's gitignored — use `docs/flow-pack/**`). +- ❌ Reimplementing codebase priming in `/flow-prime` instead of delegating to `core_piv_loop:prime`. +- ❌ Building flow-analyze/brainstorm/umbrella/epics here — E1 is `/flow-prime` only. +- ❌ Letting the local install drift from the tracked template. +- ❌ Creating GitHub issues, committing, or pushing as part of E1 implementation. +- ❌ Staging `uv.lock` / `docker-compose.lan.yml` (pre-existing dirty worktree — leave alone). + +--- + +## Confidence Score: 8/10 +One-pass likelihood is high: the methodology is fully reverse-engineered and dogfooded (`.flow/`), +all dependencies are confirmed, and the work is markdown-only (no runtime/type risk). −2 for the +two judgement-heavy authoring tasks (the `/flow-prime` delegation spec and the rule's fidelity to +house style) and the gitignore split, which a careless executor can get subtly wrong. diff --git a/docs/flow-pack-methodology.md b/docs/flow-pack-methodology.md new file mode 100644 index 00000000..852331b3 --- /dev/null +++ b/docs/flow-pack-methodology.md @@ -0,0 +1,270 @@ + + +# flow-pack methodology + +> Turn "what should we do next?" into a dependency-aware, parallel-friendly, release-gated +> GitHub hierarchy — one pipeline from baseline reality to executable epics ready for PRPs. + +## Pipeline overview + +```mermaid +flowchart LR + P["/flow-prime
baseline reality
writes .flow/state.md"] + B["/flow-brainstorm
V1 naive → critique
3 agents → score → V2"] + U["/flow-umbrella
umbrella issue #N
7-field body"] + E["/flow-epics
epic issues #M–N
phase-linked sub-issues"] + X["base_prp:prp-create
per epic
→ PRP → implementation"] + + P -->|"baseline + .flow/state.md"| B + B -->|"approved V2 ship list + defer"| U + U -->|"umbrella #N created"| E + E -->|"epics #M–N linked"| X +``` + +**Research fan-out inside `/flow-brainstorm`** (not shown above to keep diagram readable): after +the critique gate, exactly 3 read-only subagents run in parallel — Agent A (Known Issues), +Agent B (Best Practices), Agent C (Dependencies) — then synthesize into the score table. + +--- + +## Stage 1 — Plan + +### /flow-prime — baseline reality + +1. Delegates to `core_piv_loop:prime` for codebase priming (never re-implements it). +2. Gathers GitHub state: open issues, milestones, labels, open PRs, recent releases. +3. Maps the current workflow: `.claude/commands/`, `.claude/rules/`, `.github/workflows/`. +4. Synthesizes a **current-workflow-map** (installed commands, rules, CI workflows, hooks) and a + **you-are-here snapshot** (branch, version, open issues, milestone, flow-namespace gap). +5. Writes or updates `.flow/state.md` with both sections. +6. Prints gate result and next-command pointer: `→ Next: /flow-brainstorm`. + +### /flow-brainstorm — V1 → score → V2 + +1. Produces **V1** — a flat bullet list of 5–10 candidate items, from baseline alone, unscored, + labeled "V1" explicitly. +2. Applies the **critique gate**: tags each item with zero or more flags + `{assumption, scope-creep, no-evidence}`. Does not fix V1; labels it for research. +3. Spawns **exactly 3 read-only research subagents** in parallel: + + | Agent | Mandate | + |-------|---------| + | A — Known Issues | Open bugs, prior incidents relevant to V1 items | + | B — Best Practices | Current docs, SDK, framework changes | + | C — Dependencies | Upstream changes, blockers, API availability | + +4. **Scores** every V1 item on 5 dimensions (1–10 each, max 50): + + | Dimension | What it captures | + |-----------|-----------------| + | Value | Outcome impact for users / stakeholders | + | Risk | Probability of failure or rework | + | Readiness | Upstreams done; decisions made | + | Complexity | Size of the work chunk | + | Evidence | Grounding in research agent reports | + +5. Applies **score-band rule**: + - **≥ 40** → V2 ship list + - **< 36** → defer list (each item carries an explicit one-clause reason) + - **36–39** → negotiation zone; surfaces to human before any GitHub write + +6. Prints a `X/10` one-pass confidence score for the V2 ship list. +7. Waits for explicit human approval gate before any GitHub write. +8. Hands off to `/flow-umbrella`. + +--- + +## Stage 2 — Decompose + +### /flow-umbrella — umbrella issue + +1. Creates the umbrella GitHub issue with the **7-field body contract** (see § Umbrella contract). +2. Attaches labels `umbrella` + `flow` and the project milestone. +3. Dry-run echos the issue body; waits for approval before writing. +4. Prints gate + next-command pointer: `→ Next: /flow-epics`. + +### /flow-epics — epic issues + +1. Creates N epic issues (one per V2 ship item), phase-annotated + (Foundation / Parallel after Foundation / Release gate). +2. Links each epic as a sub-issue of the umbrella via the REST API + (`gh api repos/{owner}/{repo}/issues/{umbrella_id}/sub_issues -X POST -F sub_issue_id={epic_id}`). + No native `gh` sub-issue command — always `gh api` directly. +3. Dry-run echo + idempotent check + ~1 s rate-delay per write (GitHub API courtesy). +4. Hands off to `base_prp:prp-create` per epic. + +--- + +## Stage 3 — Execute (delegated) + +> Execution is fully handled by existing tools. The flow: suite stops at epic creation. + +| Epic-level work | Tool | +|-----------------|------| +| Write a PRP for the epic | `base_prp:prp-create` | +| Execute the PRP | `base_prp:prp-execute` | +| Decompose an epic into 5 subtasks | `issue-to-subtasks` skill | +| Session continuity across contexts | `writing-session-handoffs` / `HANDOFF.md` | +| Validate rule adherence | `audit-rules-drift` skill | + +--- + +## Invariants + +Every flow: command and every agent enforces these; violations must be flagged, not silently +bypassed: + +1. **Read-only until approval.** No GitHub write (issue create, label, sub-issue link) before an + explicit human "approve." Dry-run echo always precedes write. +2. **Hierarchy as data.** Every parent/child link is materialized via the REST sub-issue endpoint, + not just mentioned in body text. Closure rolls up natively; project board grouping is automatic. +3. **Exactly 3 research agents.** /flow-brainstorm always spawns Known Issues + Best Practices + + Dependencies. Never fewer (shallow research); never more (bloated). Additional domain agents are + allowed on top when a critique flag demands a specialist, but the 3 baseline mandates stay. +4. **Score bands are hard.** ≥ 40 ships, < 36 defers (with written reason), 36–39 goes to human. + No item ships without a complete 5-dimension row. +5. **7-field umbrella.** Every umbrella issue body must contain all 7 sections (Summary, Approach, + Decomposition, Out of scope, Success criteria, Risks, Tracking). Missing fields = not done. +6. **Foundation → Parallel → Release gate.** Exactly one Foundation epic (blocks all); N parallel + epics (feed release); exactly one Release-gate epic (closes only after Foundation + Parallel). +7. **Every defer has a reason.** A defer item with no written reason is a process failure. +8. **V1 is transient.** V1, the 3 agent reports, and the score table are working-state artifacts + (`.flow/`). Only V2 (the umbrella body) and the defer list survive as durable records. + +--- + +## Umbrella contract (7-field body) + +Required sections in every umbrella issue body (verified against live umbrella `#55` in +`w7-mgfcode/w7-base`, the reference project): + +| Field | Content | +|-------|---------| +| **Summary** | What's wrong with the current state, one paragraph | +| **Approach** | The architectural delta, one paragraph (no router / zero new runtime deps / etc.) | +| **Decomposition** | Bulleted epic list with `#N` refs + phase markers (Foundation / Parallel / Release gate) | +| **Out of scope (explicit)** | Items NOT closing this umbrella; each has a `#N` ref or a one-sentence reason it isn't tracked | +| **Success criteria** | Checkbox list (`- [ ]`) an outside reviewer can use as the close-or-not decision | +| **Risks** | Table with one mitigation per row | +| **Tracking** | Links to the project board, plan file, source-of-truth contract, and a `X/10` one-pass confidence score | + +--- + +## Epic contract + +Each epic issue body must contain: + +- Opening blockquote: `Sub-issue of #N (umbrella: )` + phase declaration + (`Foundation — blocks epics #M, #M+1 …` / `Parallel after Foundation` / `Release gate`). +- `## Purpose` — what this delivery surface gives the user. +- `## Sub-tasks` — bulleted list with `#N` references to child sub-issues. +- Labels ⊇ umbrella label set (plus the `epic` label). + +--- + +## Hierarchy-as-data (REST API) + +```bash +# Link an epic under the umbrella +gh api repos/{owner}/{repo}/issues/{umbrella_id}/sub_issues \ + -X POST -F sub_issue_id={epic_id} \ + --header "GitHub-Next-Preview: true" + +# Read sub-issues (GraphQL) +gh api graphql -f query=' + { repository(owner:"{owner}", name:"{repo}") { + issue(number: {umbrella_id}) { + subIssues(first: 20) { nodes { number title state } } + } + } }' +``` + +There is **no native `gh` sub-issue command** (cli/cli#10298). Always use `gh api` directly. +No GitHub CLI extension required. + +--- + +## FLAI mapping table + +Mapping from flow-pack methodology stages to ForecastLabAI-specific commands, skills, and tools. + +| Methodology stage (KB source) | flow: command | Delegated to / reuses | +|-------------------------------|---------------|----------------------| +| Baseline reality (03) | `/flow-prime` | `core_piv_loop:prime` (codebase), `gh` CLI (GitHub state) | +| V1 naive plan (03) | `/flow-brainstorm` | — (authored by Claude from baseline) | +| 3 read-only research agents (03) | `/flow-brainstorm` | plain subagents via Agent tool | +| Score + rerank (03) | `/flow-brainstorm` | — (5-dim table, bands ≥40/<36) | +| Human V2 approval (03) | `/flow-brainstorm` | — (explicit gate before GitHub write) | +| Umbrella issue creation (01) | `/flow-umbrella` | `gh issue create` | +| Epic creation + linking (01) | `/flow-epics` | `gh issue create` + `gh api` sub-issues | +| Sub-issue decomposition (02) | — (delegate) | `issue-to-subtasks` skill | +| PRP creation per epic | — (delegate) | `base_prp:prp-create` | +| PRP execution | — (delegate) | `base_prp:prp-execute` | +| Session continuity | `/flow-prime` | `writing-session-handoffs`, `HANDOFF.md` | +| Rules audit | `/flow-prime` | `audit-rules-drift` | + +--- + +## Durable-source split + +`.claude/` is gitignored in ForecastLabAI (see `CLAUDE.md` → Learnings). Any file placed only +in `.claude/` is lost on a fresh clone and cannot be the source of truth. + +| Layer | Location | Committed? | Purpose | +|-------|----------|------------|---------| +| Durable contract | `docs/flow-pack-methodology.md` | ✅ tracked | Narrative, invariants, API contract | +| Durable command templates | `docs/flow-pack/commands/*.md` | ✅ tracked | Source of truth for each command | +| Local runtime install | `.claude/commands/flow/*.md` | ❌ gitignored | Used by Claude Code slash-commands | +| Local agent rule | `.claude/rules/umbrella-issue.md` | ❌ gitignored | Agent contract during a session | + +**Invariant:** the local install is a faithful byte-copy of the tracked template. If they drift, +the tracked template wins. Recovery = `cp` (see § Fresh-clone recovery). + +--- + +## Fresh-clone recovery + +After a fresh clone (or after `.claude/` is wiped), regenerate the local install from the tracked +source: + +```bash +# Regenerate command(s) from tracked templates +mkdir -p .claude/commands/flow +cp docs/flow-pack/commands/*.md .claude/commands/flow/ + +# Verify no drift +diff docs/flow-pack/commands/flow-prime.md .claude/commands/flow/flow-prime.md \ + && echo "OK — no drift" + +# The umbrella-issue.md rule has no tracked template file (the durable content lives +# in this methodology doc, § Umbrella contract). Write a fresh copy from that section +# to .claude/rules/umbrella-issue.md when needed. +``` + +--- + +## Portability manifest + +To reuse the flow: suite in another repository, change these named parameters: + +| Parameter | ForecastLabAI value | What to change | +|-----------|--------------------|--------------------| +| `owner/repo` | `w7-mgfcode/ForecastLabAI` | Your GitHub org/repo | +| PRP hand-off command | `base_prp:prp-create` | Your equivalent PRP/issue command | +| Codebase prime command | `core_piv_loop:prime` | Your codebase prime command or equivalent | +| Label set | `umbrella`, `epic`, `flow` | Must be created in the target repo first | +| Command namespace | `flow` (`.claude/commands/flow/`) | Any namespace not already in use | +| Docs root | `docs/flow-pack/` | Wherever you track command templates | +| Working state dir | `.flow/` | Any local-only dir added to `.gitignore` | +| Milestone name | project-specific | Your target project's milestone | + +Nothing in the flow-pack commands is ForecastLabAI-specific except the `owner/repo` value and the +references to `base_prp:prp-create` and `core_piv_loop:prime`. Swap those two and it runs +anywhere. diff --git a/docs/flow-pack/commands/flow-prime.md b/docs/flow-pack/commands/flow-prime.md new file mode 100644 index 00000000..4ba91d72 --- /dev/null +++ b/docs/flow-pack/commands/flow-prime.md @@ -0,0 +1,154 @@ +--- +description: Capture baseline reality — delegate to core_piv_loop:prime, gather GitHub state, write .flow/state.md +--- + +<!-- provenance: flow-pack methodology stage 1 (continuation-discipline baseline step). + Source of truth: docs/flow-pack/commands/flow-prime.md (tracked). + Local install: .claude/commands/flow/flow-prime.md (gitignored, regenerable from this file). + Recovery: cp docs/flow-pack/commands/flow-prime.md .claude/commands/flow/flow-prime.md + Full methodology: docs/flow-pack-methodology.md --> + +# flow-prime: Baseline Reality Capture + +## Objective + +Capture the five baseline categories (repo state, docs, rules, issues, current state) needed to +plan a feature initiative. Produces two artifacts: + +1. **Current-workflow-map** — inventory of installed commands, rules, CI workflows, hooks, and + available skills. +2. **You-are-here snapshot** — branch, version, open issues, milestones, label gap, and a plain + "what's missing for the flow: suite" summary. + +Both artifacts are written to `.flow/state.md` (created if absent; phase sections updated if +present). The command ends by printing the gate result and the next-command pointer. + +**DELEGATION: do NOT re-implement codebase priming.** +Run the `core_piv_loop:prime` skill for all codebase reading. This command adds only the GitHub +state gathering, workflow mapping, and `.flow/state.md` writing on top of it. + +## Process + +### 1. Codebase priming (delegate) + +Run the `core_piv_loop:prime` skill. Let it produce the full project overview — purpose, +architecture, tech stack, core principles, and current state. Do not repeat that work here. + +Supplement with: + +!`git log -5 --oneline` + +!`git status --short` + +### 2. GitHub state + +Gather the five GitHub categories: + +!`gh issue list --state open --limit 20 --json number,title,labels --jq '.[] | "#\(.number): \(.title) [\(.labels | map(.name) | join(","))]"'` + +!`gh milestone list --json number,title,state --jq '.[] | "#\(.number) \(.title) (\(.state))"'` + +!`gh label list --json name --jq '[.[].name] | sort | join(", ")'` + +!`gh pr list --state open --json number,title,headRefName --jq '.[] | "#\(.number): \(.title) → \(.headRefName)"'` + +!`gh release list --limit 3 --json tagName,isPrerelease --jq '.[] | "\(.tagName)\(if .isPrerelease then " [pre]" else "" end)"'` + +### 3. Workflow map + +Inventory the installed flow: tooling without reading individual file contents. + +!`ls .claude/commands/ 2>/dev/null && echo "---" && ls .claude/rules/ 2>/dev/null && echo "---" && ls .github/workflows/ 2>/dev/null` + +!`ls .claude/commands/flow/ 2>/dev/null || echo "(no flow/ namespace yet)"` + +!`ls .claude/hooks/ 2>/dev/null || echo "(no hooks dir)"` + +Record: +- **Commands** — list of namespace dirs + top-level `.md` files under `.claude/commands/`. Note + whether a `flow/` namespace exists. +- **Rules** — list of `.md` files under `.claude/rules/`. Note whether `umbrella-issue.md` exists. +- **Workflows** — list of `.github/workflows/*.yml` files. +- **Hooks** — files in `.claude/hooks/`. +- **Skills** — list the user-invocable skills visible in your context (from the system prompt or + CLAUDE.md). Note reuse candidates for the flow: suite. + +### 4. Synthesize and write .flow/state.md + +Produce the two required sections: + +**Current-workflow-map** (what exists today): +- Commands: `<namespace-dir>` → `[file1.md, file2.md, …]`; top-level `.md` files listed +- Rules: `[file1.md, …]`; highlight `umbrella-issue.md` if present +- Workflows: `[ci.yml, cd-release.yml, …]` +- Hooks: `[hook-file, …]` +- Skills (reuse candidates): `[skill-name: purpose, …]` +- flow/ namespace: ✅ installed / ❌ missing + +**You-are-here snapshot** (current state): +- Branch: `<branch>` | Version: `<version from .release-please-manifest.json or CHANGELOG>` +- In-progress issues: `[#N title, …]` +- Active milestone: `<name>` or none +- Labels: umbrella=`[✅/❌]` epic=`[✅/❌]` flow=`[✅/❌]` +- Gap: concise plain-language statement of what the flow: suite still needs in this repo + +Write these two sections to `.flow/state.md`: +- If the file **does not exist**, create it with a provenance HTML comment at the top and a + `## Phase status` header, then append the two sections under `## Current workflow map` and + `## You are here`. +- If the file **already exists**, find and update the two matching `##` sections in place; preserve + all other content (Phase status, Gate decisions, Chosen workflow, Open epics, etc.). + +### 5. Gate and next-command + +Print the gate result using the output format below, then the next-command pointer. + +Gate is ✅ **BASELINE CAPTURED** when all five categories are present: +- repo state (branch, recent commits, git status) +- docs (CLAUDE.md / AGENTS.md / README read) +- rules (`.claude/rules/` inventory) +- issues (open issues listed) +- current state (version, active milestone, label set) + +Gate is ❌ **INCOMPLETE** if any category is missing or uncertain — list what's missing and why +before printing the next-command pointer (still print it so the user knows where to go next). + +## Output Format + +``` +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + 🗺️ flow-prime: Baseline Reality +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + +📋 Codebase (core_piv_loop:prime) + [project overview summary from the delegated skill] + +📋 GitHub State + Open issues: N | Open PRs: M + Milestones: <name or none> + Labels: umbrella=[✅/❌] epic=[✅/❌] flow=[✅/❌] + Recent release: <tag> + +📋 Workflow Map + Commands: [namespaces / top-level files] + Rules: [files] umbrella-issue.md=[✅/❌] + Workflows: [files] + flow/ namespace: [✅/❌] + +📋 You Are Here + Branch: <branch> | Version: <version> + In-progress: [#N title, …] + Gap: <plain-language description of what's missing> + +──────────────────────────────────────────── + [✅/❌] BASELINE CAPTURED → .flow/state.md updated +──────────────────────────────────────────── + +→ Next: /flow-brainstorm <initiative description> +``` + +## Arguments + +`$ARGUMENTS` — optional free-text initiative description passed through to the gate result and +the next-command pointer (e.g., `/flow-prime integrate flow-pack methodology`). If omitted, the +you-are-here snapshot stands on its own. From 2cb87b594e9bf774d0dcd6be04aef6ca1d35ce60 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Mon, 1 Jun 2026 16:23:33 +0200 Subject: [PATCH 02/44] docs(repo): harden flow-prime state markers (#369) --- docs/flow-pack/commands/flow-prime.md | 30 +++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/docs/flow-pack/commands/flow-prime.md b/docs/flow-pack/commands/flow-prime.md index 4ba91d72..579aff41 100644 --- a/docs/flow-pack/commands/flow-prime.md +++ b/docs/flow-pack/commands/flow-prime.md @@ -92,12 +92,30 @@ Produce the two required sections: - Labels: umbrella=`[✅/❌]` epic=`[✅/❌]` flow=`[✅/❌]` - Gap: concise plain-language statement of what the flow: suite still needs in this repo -Write these two sections to `.flow/state.md`: -- If the file **does not exist**, create it with a provenance HTML comment at the top and a - `## Phase status` header, then append the two sections under `## Current workflow map` and - `## You are here`. -- If the file **already exists**, find and update the two matching `##` sections in place; preserve - all other content (Phase status, Gate decisions, Chosen workflow, Open epics, etc.). +Write these two sections to `.flow/state.md` using **HTML marker pairs** so the update is +deterministic and never corrupts unrelated content: + +``` +<!-- FLOW-PRIME:CURRENT-WORKFLOW-MAP:START --> +## Current workflow map +<content> +<!-- FLOW-PRIME:CURRENT-WORKFLOW-MAP:END --> + +<!-- FLOW-PRIME:YOU-ARE-HERE:START --> +## You are here +<content> +<!-- FLOW-PRIME:YOU-ARE-HERE:END --> +``` + +Update rules: +- **File does not exist** — create it with a provenance HTML comment at the top, then write both + marker blocks (with their `## ` headings inside). +- **File exists, markers present** — replace only the content between each `START` / `END` pair; + leave every byte outside the markers unchanged. +- **File exists, markers absent** — append both marker blocks at the end of the file; do not + rewrite or delete any existing content. + +Never match on bare `##` headings to locate sections — the markers are the authoritative delimiters. ### 5. Gate and next-command From cde4dffa38d29eadf7ccc0e794ace40742a2e155 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Mon, 1 Jun 2026 16:26:19 +0200 Subject: [PATCH 03/44] docs(repo): fix review typos in methodology and PRP (#369) --- PRPs/PRP-flow-pack-E1-foundation.md | 2 +- docs/flow-pack-methodology.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/PRPs/PRP-flow-pack-E1-foundation.md b/PRPs/PRP-flow-pack-E1-foundation.md index f9e7db50..8a343f9f 100644 --- a/PRPs/PRP-flow-pack-E1-foundation.md +++ b/PRPs/PRP-flow-pack-E1-foundation.md @@ -220,7 +220,7 @@ diff -q docs/flow-pack/commands/flow-prime.md .claude/commands/flow/flow-prime.m # In a Claude Code session, run the command and confirm it produces the two artifacts: # - a current-workflow-map (existing commands/skills/rules/workflows) # - a you-are-here snapshot (branch, version, open issues, gap) -# and writes/append .flow/state.md. Confirm it ends by printing the next-command pointer. +# and writes or appends to `.flow/state.md`. Confirm it ends by printing the next-command pointer. # (No automated assertion — this is an interactive command; verify the output sections exist.) ``` diff --git a/docs/flow-pack-methodology.md b/docs/flow-pack-methodology.md index 852331b3..4966c242 100644 --- a/docs/flow-pack-methodology.md +++ b/docs/flow-pack-methodology.md @@ -87,7 +87,7 @@ Agent B (Best Practices), Agent C (Dependencies) — then synthesize into the sc 1. Creates the umbrella GitHub issue with the **7-field body contract** (see § Umbrella contract). 2. Attaches labels `umbrella` + `flow` and the project milestone. -3. Dry-run echos the issue body; waits for approval before writing. +3. Dry-run echoes the issue body; waits for approval before writing. 4. Prints gate + next-command pointer: `→ Next: /flow-epics`. ### /flow-epics — epic issues @@ -118,7 +118,7 @@ Agent B (Best Practices), Agent C (Dependencies) — then synthesize into the sc ## Invariants -Every flow: command and every agent enforces these; violations must be flagged, not silently +Every flow: command and every agent enforce these; violations must be flagged, not silently bypassed: 1. **Read-only until approval.** No GitHub write (issue create, label, sub-issue link) before an From d023a20ec0e7a732a978e03b856459485326b9d3 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Mon, 1 Jun 2026 18:07:12 +0200 Subject: [PATCH 04/44] =?UTF-8?q?docs(repo):=20add=20/flow-brainstorm=20co?= =?UTF-8?q?mmand=20=E2=80=94=20E2=20of=20flow-pack=20suite=20(#371)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/flow-pack/commands/flow-brainstorm.md | 305 +++++++++++++++++++++ 1 file changed, 305 insertions(+) create mode 100644 docs/flow-pack/commands/flow-brainstorm.md diff --git a/docs/flow-pack/commands/flow-brainstorm.md b/docs/flow-pack/commands/flow-brainstorm.md new file mode 100644 index 00000000..60a653ba --- /dev/null +++ b/docs/flow-pack/commands/flow-brainstorm.md @@ -0,0 +1,305 @@ +--- +description: V1 naive plan → 3-read-only-agent research → 5-dim score → V2 ship/defer list +--- + +<!-- provenance: flow-pack methodology stage 2 (V1 → V2 planning pipeline). + Source of truth: docs/flow-pack/commands/flow-brainstorm.md (tracked). + Local install: .claude/commands/flow/flow-brainstorm.md (gitignored, regenerable from this file). + Recovery: cp docs/flow-pack/commands/flow-brainstorm.md .claude/commands/flow/flow-brainstorm.md + Full methodology: docs/flow-pack-methodology.md --> + +# flow-brainstorm: V1 → Score → V2 + +## Objective + +Turn a baseline initiative description into a scored, human-approved V2 ship/defer list ready +for `/flow-umbrella`. Produces three outputs: + +1. **V1** — flat bullet list of 5–10 candidate items, from baseline alone, unscored, labeled "V1". +2. **V2** — approved ship list + explicit defer list + X/10 one-pass confidence score. +3. **Log entry** — full decision trail appended to `.flow/brainstorm-log.md`. + +The three read-only research subagents are the engine of this command. Claude spawns exactly 3 +(Agent A — Known Issues, Agent B — Best Practices, Agent C — Dependencies) via the Agent tool, +waits for all three, then synthesizes their findings into the score table. + +This command makes NO GitHub writes. It ends by printing the approved V2 list and the next-command +pointer. All GitHub writes (issue creation, labeling, linking) belong to E3 `/flow-umbrella`. + +**DELEGATION:** Do not re-implement codebase priming. If the baseline context needs refreshing, +run `/flow-prime` first. + +## Process + +### 1. Read baseline context + +!`ls .flow/ 2>/dev/null || echo "(no .flow/ directory yet)"` + +Determine the initiative description: +- If `$ARGUMENTS` is non-empty → use it. +- Else → read `.flow/state.md` and extract the "Gap" line from the "You are here" section. +- Else → ask the user: "What initiative should I brainstorm? Provide 1–3 sentences." + +Read `.flow/brainstorm-log.md` (if it exists) to determine the current round count. The new +round will be Round N+1 (or Round 1 if the file does not exist yet). + +!`test -f .flow/brainstorm-log.md && grep -c "^## Round" .flow/brainstorm-log.md || echo "0"` + +### 2. Produce V1 — naive plan (UNSCORED) + +Generate a flat bullet list of 5–10 candidate items **from baseline knowledge only** — no research +yet. Every item must be: + +- **Unscored** — no dimension scores; plain text only. +- **Labeled "V1"** — the section heading must read `## V1 — Naive Plan (N items, unscored)`. +- **Descriptive** — format: `- <item title>: <one-sentence description of what and why>`. + +Coverage heuristics: include obvious high-value items, known technical debt, upstreams that may +be blocked, and at least one item that is likely out of scope (to stress-test the critique gate). + +### 3. Critique gate — tag V1 items (do NOT fix them) + +For each V1 item, attach zero or more flags. Flags are labels only — do not change V1 text. + +| Flag | When to apply | +|------|---------------| +| `assumption` | Relies on a fact not verified against the codebase or docs | +| `scope-creep` | Touches E3/E4/E5 behavior or an out-of-scope system | +| `no-evidence` | No concrete codebase grounding for the stated need | + +Present as: `- <item title> [assumption, scope-creep]` or `- <item title> [none]`. + +The flags guide the research agents. An `assumption`-flagged item means "Agent A should verify +this claim." A `scope-creep` flag means "Agent B should confirm boundaries." + +### 4. Spawn 3 read-only research subagents in parallel + +Invoke the **Agent tool** to spawn all three concurrently. Each subagent is read-only — it MUST +NOT write files or make GitHub writes. Pass the V1 items + critique flags in the prompt. + +**Agent A — Known Issues** + +Prompt: +``` +You are a read-only research agent. You MUST NOT write files or make GitHub writes. + +Initiative: <initiative-description> +V1 items (with critique flags): <paste V1 list with flags> + +Task: Read the open GitHub issues, recent git log, and .flow/state.md. +Report: + 1. Which V1 items are blocked by or related to open issues? (cite #N) + 2. Which V1 items are partially done (recent branches/PRs touching them)? + 3. Which V1 `assumption` flags are contradicted by known incidents or bugs? + +Output: concise bullet list, #N refs where applicable. Read-only. +``` + +**Agent B — Best Practices** + +Prompt: +``` +You are a read-only research agent. You MUST NOT write files or make GitHub writes. + +Initiative: <initiative-description> +V1 items (with critique flags): <paste V1 list with flags> + +Task: Read CLAUDE.md, AGENTS.md, docs/flow-pack-methodology.md, and .claude/rules/. +Report: + 1. Which V1 items align with or contradict current best practices? + 2. Which V1 items are already covered by an existing skill or command? (reuse opportunity) + 3. Which V1 `scope-creep` flags are confirmed — item truly belongs to E3/E4/E5? + +Output: concise bullet list. Read-only. +``` + +**Agent C — Dependencies** + +Prompt: +``` +You are a read-only research agent. You MUST NOT write files or make GitHub writes. + +Initiative: <initiative-description> +V1 items (with critique flags): <paste V1 list with flags> + +Task: Read pyproject.toml, frontend/package.json, docker-compose.yml, +and docs/_base/API_CONTRACTS.md. +Report: + 1. Which V1 items have unresolved upstream dependencies or API blockers? + 2. Which V1 `no-evidence` flags are confirmed — no codebase grounding found? + 3. Any dependency pinning or version conflicts that affect V1 items? + +Output: concise bullet list. Read-only. +``` + +Wait for all three agents before proceeding. + +### 5. Score V1 items on 5 dimensions + +Use agent findings as evidence for the Evidence dimension. Score each item 1–10 per dimension: + +| Dimension | 1 = low | 10 = high | Evidence dimension note | +|-----------|---------|-----------|------------------------| +| **Value** | Cosmetic / irrelevant | Core user outcome | — | +| **Risk** | Low risk, well-understood | High risk, many unknowns | Higher Risk = lower desirability | +| **Readiness** | Many blockers open | All upstreams clear | Blocked = lower score | +| **Complexity** | Trivial | Enormous effort | Higher Complexity = lower desirability | +| **Evidence** | Pure assumption | Fully verified by agents | Directly from agent reports | + +Note: Risk and Complexity score INVERSELY — a low-risk, low-complexity item scores 9–10, not 1–2. +(A high-risk item is less desirable, so it scores lower on the Risk dimension.) + +Present the score table: + +``` +| Item | Value | Risk | Readiness | Complexity | Evidence | Total | Band | +|------|-------|------|-----------|------------|----------|-------|------| +| ... | 8 | 7 | 9 | 6 | 9 | 39 | 🟡 NEGOTIATE | +``` + +Band indicators: +- `✅ SHIP` — total ≥ 40 +- `🟡 NEGOTIATE` — total 36–39 (requires human decision before V2) +- `❌ DEFER` — total < 36 (requires explicit one-clause written reason) + +### 6. Handle negotiation zone (36–39 items) + +If any items score 36–39, **STOP and surface to human** before constructing V2: + +``` +N item(s) are in the negotiation zone (score 36–39): + + - <item>: score 38. Rationale: <one sentence from agent reports>. + Research note: Agent B flagged this as covered by an existing skill (reuse potential). + +Decision needed for each item — respond 'ship', 'defer', or 'defer: <reason>': +``` + +Wait for human response for each negotiate item. Record the decision in the round log. + +If all items are SHIP or DEFER, skip this step. + +### 7. Produce V2 — ship list and defer list + +**V2 ship list** (items scoring ≥ 40, plus negotiate items the human shipped): + +``` +## V2 — Ship List + +1. <item title> (score: X/50): <one-sentence rationale drawing on agent evidence> +2. ... +``` + +**Defer list** (items scoring < 36, plus negotiate items the human deferred): + +``` +## Defer List + +- <item title> (score: X/50): DEFER — <explicit one-clause reason> +``` + +Every defer item MUST have an explicit reason. "DEFER — not needed now" is not acceptable. +Good example: "DEFER — overlaps the existing `analyzing-ai-repos` skill; fold into /flow-prime +if deep external analysis is needed." + +**One-pass confidence score** on the V2 ship list: + +``` +One-pass confidence: X/10 — <one sentence: what gives confidence and what remains uncertain> +``` + +### 8. Append to `.flow/brainstorm-log.md` + +Update rules: +- **File absent** → create with provenance header + `# /flow-brainstorm — decision log` + first round section. +- **File exists** → count existing `## Round` headings, append `## Round (N+1) — <date>`. +- **NEVER overwrite previous rounds.** The log is append-only. + +Provenance header (write only on creation): +``` +<!-- provenance: /flow-brainstorm decision trail. Append-only. NOT committed. --> +# /flow-brainstorm — decision log +``` + +Round section format (exact fields — one paragraph per field, bold label): + +```markdown +## Round N — YYYY-MM-DD + +**Initiative:** <initiative description> +**V1 (N items, unscored):** (1) <item1> (2) <item2> ... +**Critique flags:** <"item title [flags]" for flagged items, or "none"> +**Research:** spawned 3 read-only subagents (A Known Issues, B Best Practices, C Dependencies) +**Agent findings (evidence-backed):** +- A: <key findings, one line> +- B: <key findings, one line> +- C: <key findings, one line> +**5-dim scores (Value/Risk/Readiness/Complexity/Evidence, ≥40 ship):** +- <item title> V/R/Re/C/E=total ✅ SHIP / 🟡 NEGOTIATE → <decision> / ❌ DEFER +**V2 SHIP:** <item1>, <item2>, ... **DEFER:** <item> — <reason>; ... +**One-pass confidence:** X/10 — <rationale> +**User response:** <what the human decided at the approval gate> +``` + +### 9. Human approval gate + +Print V2 ship list and defer list in full. Print the one-pass confidence score. + +``` +──────────────────────────────────────────── + Approve V2 ship list? + 'approve' → write log entry + print next-command pointer + 'revise: <instruction>' → adjust scores or categorizations +──────────────────────────────────────────── +``` + +After human approves, write the log entry (Step 8) with `User response: approved`. + +### 10. Gate result and next-command + +Print using the Output Format below. + +## Output Format + +``` +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + 💡 flow-brainstorm: V1 → Score → V2 +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + +📋 Baseline Context + Initiative: <description> + Source: [.flow/state.md gap | $ARGUMENTS] + Brainstorm round: N (log entry Round N appended) + +📋 V1 — Naive Plan (N items, unscored) + 1. <item title>: <one-sentence description> [flags or none] + 2. ... + +📋 Research (3 agents — parallel) + Agent A (Known Issues): <2-line summary> + Agent B (Best Practices): <2-line summary> + Agent C (Dependencies): <2-line summary> + +📋 Scoring + | Item | V | R | Re | C | E | Total | Band | + |------|----|----|----|----|----|-------|------| + ... + +📋 V2 — Approved List + Ship (N items): <item1>, <item2>, ... + Defer (M items): <item> — <reason>; ... + One-pass confidence: X/10 + +──────────────────────────────────────────── + ✅ V2 APPROVED → .flow/brainstorm-log.md updated (Round N) +──────────────────────────────────────────── + +→ Next: /flow-umbrella <initiative> +``` + +## Arguments + +`$ARGUMENTS` — the initiative description, passed as free text +(e.g., `/flow-brainstorm add batch forecasting to the system`). +If omitted, the command falls back to `.flow/state.md` Gap line; if state.md is absent, +asks the user directly. Passed through to the gate result and the next-command pointer. From 9679ce87c225349f1e1f882fa6d51ad2af3aaf24 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Mon, 1 Jun 2026 18:22:32 +0200 Subject: [PATCH 05/44] =?UTF-8?q?feat(docs,repo):=20add=20/flow-umbrella?= =?UTF-8?q?=20command=20=E2=80=94=20E3=20of=20flow-pack=20suite=20(#372)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/flow-pack/commands/flow-umbrella.md | 273 +++++++++++++++++++++++ 1 file changed, 273 insertions(+) create mode 100644 docs/flow-pack/commands/flow-umbrella.md diff --git a/docs/flow-pack/commands/flow-umbrella.md b/docs/flow-pack/commands/flow-umbrella.md new file mode 100644 index 00000000..8b55cc59 --- /dev/null +++ b/docs/flow-pack/commands/flow-umbrella.md @@ -0,0 +1,273 @@ +--- +description: Generate and create umbrella GitHub issue from V2 ship list +--- + +<!-- provenance: flow-pack methodology stage 2 (umbrella issue creation). + Source of truth: docs/flow-pack/commands/flow-umbrella.md (tracked). + Local install: .claude/commands/flow/flow-umbrella.md (gitignored, regenerable from this file). + Recovery: cp docs/flow-pack/commands/flow-umbrella.md .claude/commands/flow/flow-umbrella.md + Full methodology: docs/flow-pack-methodology.md (§ Stage 2 — Decompose) + Agent contract: .claude/rules/umbrella-issue.md --> + +# flow-umbrella: Umbrella Issue Creation + +## Objective + +Transform the approved V2 ship list (from `/flow-brainstorm`) into a GitHub umbrella issue with +the **7-field body contract** (`umbrella-issue.md`). The umbrella becomes the root of the GitHub +issue hierarchy that `/flow-epics` populates with child epic issues. + +**DELEGATION: approval gate required before any GitHub write.** +Steps 1–5 are read-only. Step 6 blocks on explicit human approval. No issue is created until +the user types "approve." + +## Process + +### 1. Read context + +Load the working-state files produced by `/flow-prime` and `/flow-brainstorm`: + +!`cat .flow/brainstorm-log.md` + +!`cat .flow/state.md` + +Extract: +- **Initiative title** — from `$ARGUMENTS` if provided; otherwise the initiative title from the + V2 ship list header in `.flow/brainstorm-log.md` (or the `In-progress` line in the + `FLOW-PRIME:YOU-ARE-HERE` block of `.flow/state.md`). +- **V2 ship items** — the approved set from the `/flow-brainstorm` output (`## V2 ship list` + section). +- **Defer items + reasons** — items from the defer list; each needs a written one-clause reason + for the Out-of-scope section. +- **Milestone name** — from the `FLOW-PRIME:YOU-ARE-HERE` marker block (`Active milestone:` line). +- **Type label** — first token before `(` in a conventional-commit title + (`feat(repo): …` → `feat`); default `feat` if the title is not in conventional-commit format. + +**Missing brainstorm log:** if `.flow/brainstorm-log.md` is absent or has no `## V2 ship list` +section, print: +``` +ERROR: No V2 ship list found in .flow/brainstorm-log.md. +Run /flow-brainstorm first, then re-run /flow-umbrella. +``` +and stop. + +### 2. Validate prerequisites + +Check that all required labels and the active milestone exist in the repo: + +!`gh label list --json name --jq '[.[].name] | sort | join(", ")'` + +!`gh api repos/{owner}/{repo}/milestones --jq '.[] | select(.state=="open") | .title'` + +Required labels: `umbrella`, `flow`, and the type label (e.g., `feat`). + +If any label or milestone is **missing**, print the exact remediation commands and **stop** — do +not proceed to draft until all prerequisites are satisfied: +```bash +# Create missing labels (run only the ones that are absent) +gh label create umbrella --color "0052CC" --description "Multi-week initiative scope owner" +gh label create flow --color "BFD4F2" --description "Managed by the flow: command suite" +gh label create epic --color "1D76DB" --description "Delivery surface within an umbrella" + +# Create missing milestone +gh api repos/{owner}/{repo}/milestones -X POST \ + -F title="<milestone-name>" -F state="open" +``` + +### 3. Idempotency check + +Search for an open issue with the same title before drafting: + +```bash +gh issue list --state open \ + --search "<proposed issue title>" \ + --json number,title \ + --jq '.[0] // empty' +``` + +If an open issue with the same title exists: +- Print: `Umbrella #N already exists: <url> — skipping create.` +- Jump to step 9 with the existing number `N`. + +Note: a **closed** umbrella with the same title does NOT block creation — a closed umbrella means +the initiative is complete; a new one with the same name may legitimately start. + +### 4. Draft the 7-field body + +Synthesize from the context loaded in step 1. **All 7 sections are required** — a body missing +any section is not done (invariant from `umbrella-issue.md`). Use the exact section headings +below: + +```markdown +## Summary +<One paragraph: what is wrong or missing in the current state. Cite baseline artifacts +(branch, existing files, gap). Describe the problem only — not the solution.> + +## Approach +<One paragraph: architectural delta only. No new services, no new routers, no new runtime +dependencies unless explicitly justified. State what will NOT change. Reference the +durable-source split if the command-file pattern is involved.> + +## Decomposition +<Phase taxonomy — exactly ONE Foundation (blocks all), N Parallel (run concurrently after +Foundation), exactly ONE Release gate (closes ONLY after Foundation + all Parallel). +Epics do not exist yet — use "not yet created" suffixes; never invent fake issue numbers.> + +- [ ] **E1 — Foundation** (blocks E2–EN): <one-line description> — not yet created +- [ ] **E2 — Parallel**: <one-line description> — not yet created +- [ ] **EN — Release gate** (closes after Foundation + all Parallel): <description> — not yet created + +## Out of scope (explicit) +<Items from the defer list and any scope boundary. +Every line MUST end with " — reason: <one sentence>". +A blank reason is a process failure (invariant: every defer has a reason).> + +- <item> — reason: <one sentence> + +## Success criteria +<Checkbox list. Each criterion must be independently verifiable by an outside reviewer. +Specific and measurable — not "everything works".> + +- [ ] <specific, measurable outcome> + +## Risks +| Risk | Mitigation | +|------|------------| +| <risk> | <mitigation> | + +## Tracking +- Source of truth: `docs/flow-pack-methodology.md` + working state `.flow/state.md` +- Milestone: <milestone name> +- **One-pass confidence: X/10** (<one-sentence rationale>) +``` + +### 5. Dry-run echo + +Print the **exact commands to be executed** — **do not run them yet**: + +```bash +cat > /tmp/umbrella-body.md << 'BODY_EOF' +[full 7-field body — show every line exactly as it will be submitted] +BODY_EOF + +gh issue create \ + --title "<initiative title>" \ + --body-file /tmp/umbrella-body.md \ + --label "umbrella" --label "flow" --label "<type>" \ + --milestone "<milestone-name>" +``` + +> **Why `--body-file`:** multi-line markdown bodies containing backticks, code fences, and pipe +> characters cause shell quoting failures with `--body "..."`. The `--body-file` approach is +> required for umbrella bodies (available since `gh` v1.x). + +### 6. Approval gate + +Print: +``` +──────────────────────────────────────────── +Awaiting approval. Type 'approve' to create the umbrella issue. +Any other response = abort (no write). +──────────────────────────────────────────── +``` + +Wait for user input. +- **"approve"** (case-insensitive) → proceed to step 7. +- **Anything else** → print `Aborted — no issue created.` and stop. + +### 7. Execute + +Create the umbrella issue using `--body-file` for safe multi-line body handling: + +```bash +cat > /tmp/umbrella-body.md << 'BODY_EOF' +[7-field body] +BODY_EOF + +gh issue create \ + --title "<initiative title>" \ + --body-file /tmp/umbrella-body.md \ + --label "umbrella" --label "flow" --label "<type>" \ + --milestone "<milestone-name>" +``` + +Capture the output URL. Extract issue number `N` from the URL +(e.g., `https://github.com/…/issues/42` → `N=42`). + +### 8. Confirm + +Verify labels and milestone were attached: + +```bash +gh issue view <N> --json number,title,labels,milestone \ + --jq '"#\(.number): \(.title) [\(.labels | map(.name) | join(","))] milestone=\(.milestone.title // "none")"' +``` + +If any label or milestone is missing, print the remediation commands: +```bash +gh issue edit <N> --add-label <missing-label> +gh issue edit <N> --milestone "<milestone-name>" +``` + +### 9. Gate and next-command + +Gate is ✅ **UMBRELLA CREATED** when: +- Issue number `N` returned from step 7. +- All 7 body sections present. +- Labels `umbrella`, `flow`, and the type label attached. +- Milestone attached. + +Gate is ❌ **FAILED** when: +- `gh issue create` returned non-zero, OR +- Confirm step shows missing labels or milestone (and remediation was not applied). + +Print the gate result, then the next-command pointer regardless of outcome: + +``` +→ Next: /flow-epics #<N> +``` + +## Output Format + +``` +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + 🏗️ flow-umbrella: Umbrella Issue +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + +📋 Context + V2 ship items: N | Defer items: M + Initiative: <title> + Milestone: <name> | Labels: umbrella ✅ flow ✅ <type> ✅ + +📋 Prerequisite check + umbrella label: [✅/❌] | flow label: [✅/❌] | <type> label: [✅/❌] + Milestone <name>: [✅/❌] | Existing umbrella: [#N / none] + +📋 Dry-run + cat > /tmp/umbrella-body.md << 'BODY_EOF' + [full 7-field body] + BODY_EOF + gh issue create --title "..." --body-file /tmp/umbrella-body.md --label ... --milestone ... + +──────────────────────────────────────────── + Awaiting approval. Type "approve" to create. +──────────────────────────────────────────── + +[After approval:] + +📋 Created + ✅ gh issue create → #N: <title> + Labels: umbrella, flow, <type> | Milestone: <name> + +──────────────────────────────────────────── + [✅/❌] UMBRELLA CREATED → #N +──────────────────────────────────────────── + +→ Next: /flow-epics #N +``` + +## Arguments + +`$ARGUMENTS` — optional initiative description. Overrides the title extracted from +`.flow/brainstorm-log.md`. Example: `/flow-umbrella integrate flow-pack as the flow: command suite`. +If omitted, the initiative title is derived from the V2 ship list header in the brainstorm log. From ae1c2017719dc727f749db3bbeabd9a4e768bf5a Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Mon, 1 Jun 2026 20:29:32 +0200 Subject: [PATCH 06/44] =?UTF-8?q?feat(docs,repo):=20flow-pack=20E4=20?= =?UTF-8?q?=E2=80=94=20/flow-epics=20command=20template=20+=20local=20inst?= =?UTF-8?q?all=20(#373)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/flow-pack/commands/flow-epics.md | 264 ++++++++++++++++++++++++++ 1 file changed, 264 insertions(+) create mode 100644 docs/flow-pack/commands/flow-epics.md diff --git a/docs/flow-pack/commands/flow-epics.md b/docs/flow-pack/commands/flow-epics.md new file mode 100644 index 00000000..94c592a5 --- /dev/null +++ b/docs/flow-pack/commands/flow-epics.md @@ -0,0 +1,264 @@ +--- +description: Create phase-ordered epic issues from an umbrella decomposition, link via REST sub-issues API, and hand off to base_prp:prp-create per epic +--- + +<!-- provenance: docs/flow-pack-methodology.md §"/flow-epics — epic issues" + §"Epic contract" + + §"Hierarchy-as-data (REST API)". Source of truth for .claude/commands/flow/flow-epics.md. + Recovery: cp docs/flow-pack/commands/flow-epics.md .claude/commands/flow/flow-epics.md + Full methodology: docs/flow-pack-methodology.md --> + +# flow-epics: Epic Decomposition + +## Objective + +Read the umbrella issue's Decomposition section, create N phase-ordered epic issues +(Foundation → Parallel → Release gate) with idempotent guards, link each as a sub-issue of +the umbrella via the GitHub REST API, and hand off to `base_prp:prp-create` per open epic. +Skips epics that already exist and/or are already linked. + +## Arguments + +`$ARGUMENTS` — umbrella issue number (e.g. `368` or `#368`). Required. +If omitted, reads the active umbrella number from the "In-progress issues" block in +`.flow/state.md` (looks for the `[umbrella,flow]`-labeled entry). + +## Process + +### 1. Parse argument + +Strip `#` prefix from `$ARGUMENTS` if present. Use the result as the umbrella issue number. + +If empty, read the umbrella number from the you-are-here snapshot: + +!`cat .flow/state.md | grep "umbrella,flow" | head -1` + +Abort with ❌ if no umbrella number can be resolved. + +### 2. Fetch umbrella + +!`gh issue view <N> --json number,title,body,labels,milestone` + +Abort with ❌ if: +- The issue is not found. +- The `umbrella` label is absent from the labels list. + +Capture: `umbrella_title`, `body`, `labels[]`, `milestone.title`. + +### 3. Extract decomposition + +Parse the body: find lines between the `## Decomposition` heading and the next `##` heading. + +For each bullet line: +- Detect phase marker from bold text: `**Foundation**` / `**Parallel**` / `**Release gate**` +- Extract scope description (used to construct epic title + Purpose body paragraph) +- Detect any embedded `#N` ref in the line (pre-existing issue pointer — use as EXISTS hint) +- Flag as `SKIP` if the line contains `(deferred)` OR if the phase is Release gate AND the + scope mentions "not yet created" or "deferred" + +**Epic title pattern** — must match existing issues exactly for idempotent search: + +``` +feat(<scope>): flow-pack <phase-label> — <scope description> +``` + +Example from live epics: `"feat(repo): flow-pack E1 — foundation (/flow-prime + tracked contract + rule + labels/milestone)"` + +### 4. Pre-flight checks + +Verify required labels exist. Use `--paginate` — repos with >30 labels require it and `gh label list` +truncates at 30 without `--paginate`, silently missing labels created later: + +!`gh api repos/{owner}/{repo}/labels --paginate --jq '.[].name' | grep -E "^(epic|flow|feat)$" | wc -l` + +Abort with ❌ if result is not `3` (one or more labels missing — create before retrying). + +Verify milestone exists: + +!`gh api repos/{owner}/{repo}/milestones --jq '[.[].title] | contains(["flow-pack-suite"])'` + +Abort with ❌ if `false`. + +Compute epic label set: take umbrella labels → remove `"umbrella"` → add `"epic"`. For umbrella +`#368` (labels: `umbrella`, `flow`) → epic labels: `epic`, `flow`, `feat`. + +### 5. Idempotent inventory + +Fetch current sub-issues of the umbrella via GraphQL: + +!`gh api graphql -f query=' + { repository(owner:"w7-mgfcode", name:"ForecastLabAI") { + issue(number: <N>) { + subIssues(first: 20) { nodes { number title state } } + } + } }'` + +For each epic in the decomposition (except SKIP items): + +**a. Search for existing issue by exact title:** + +!`gh issue list --search "<exact epic title>" --json number,title \ + --jq '.[0] | "\(.number // "none") \(.title // "")"'` + +→ Record as `EXISTS #M` / `NOT_FOUND`. Verify the returned title matches character-for-character +(GitHub search is fuzzy — reject partial matches). + +**b. Check GraphQL sub-issues list:** If `#M` appears in `subIssues.nodes` → `LINKED`. Otherwise → `UNLINKED`. + +Print inventory table before any writes: + +``` +┌──────────────────────────────────────────────────────────────────────┐ +│ Phase Title summary Exists Linked │ +│ Foundation E1 — foundation ✅ #369 ✅ linked │ +│ Parallel E2 — /flow-brainstorm ✅ #371 ❌ unlinked │ +│ Parallel E3 — /flow-umbrella ✅ #372 ❌ unlinked │ +│ Parallel E4 — /flow-epics ✅ #373 ❌ unlinked │ +│ Release gate E5 — dogfood + portability ⏭️ defer — │ +└──────────────────────────────────────────────────────────────────────┘ +``` + +### 6. Create + link loop + +For each epic where `LINKED=false` AND phase is NOT `SKIP`: + +**6a. If `NOT_FOUND` → CREATE the issue:** + +Compose body using the epic body template (§ Epic body template below). + +Echo dry-run: + +``` +┌─ DRY-RUN ───────────────────────────────────────────────────────────┐ +│ gh issue create \ │ +│ --title "<exact epic title>" \ │ +│ --body "<body from template>" \ │ +│ --label epic --label flow --label feat \ │ +│ --milestone "flow-pack-suite" │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +APPROVAL GATE: "Type 'approve' to create, anything else to skip." + +If approved: execute `gh issue create`; capture the returned issue number as `M`. + +RATE-DELAY: `sleep 1` + +**6b. Link to umbrella** (whether newly created or already `EXISTS` but `UNLINKED`): + +Echo dry-run: + +``` +┌─ DRY-RUN ───────────────────────────────────────────────────────────┐ +│ gh api repos/w7-mgfcode/ForecastLabAI/issues/<N>/sub_issues \ │ +│ -X POST -F sub_issue_id=<M> \ │ +│ --header "GitHub-Next-Preview: true" │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +APPROVAL GATE: "Type 'approve' to link, anything else to skip." + +If approved: execute `gh api POST`. Confirm: + +!`gh issue view <M> --json number,title,labels` + +RATE-DELAY: `sleep 1` + +### 7. Verify + gate + +Re-fetch sub-issues via GraphQL (same query as step 5) to confirm final state. + +Count linked epics (excluding SKIP items). Print gate result and per-epic handoff. + +## Epic body template + +Use the template matching the epic's phase. Fill `<angle-bracket>` placeholders. + +**FOUNDATION epic:** + +``` +> Sub-issue of #<umbrella_N> (umbrella: <umbrella_title>). Foundation — blocks Epics #<P1>, #<P2>, … + +## Purpose + +<One-paragraph scope description extracted from the umbrella decomposition line.> + +## Sub-tasks + +_To be decomposed via `issue-to-subtasks` when this epic is picked up._ +``` + +**PARALLEL epic:** + +``` +> Sub-issue of #<umbrella_N> (umbrella: <umbrella_title>). Parallel after Foundation (E1 #<foundation_N>). + +## Purpose + +<One-paragraph scope description extracted from the umbrella decomposition line.> + +## Sub-tasks + +_To be decomposed via `issue-to-subtasks` when this epic is picked up._ +``` + +**RELEASE GATE epic:** + +``` +> Sub-issue of #<umbrella_N> (umbrella: <umbrella_title>). Release gate — closes only after Foundation + all Parallel epics close. + +## Purpose + +<One-paragraph scope description extracted from the umbrella decomposition line.> + +## Sub-tasks + +_To be decomposed via `issue-to-subtasks` when this epic is picked up._ +``` + +## Output format + +``` +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + 🔗 flow-epics: Epic Decomposition +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + +📋 Umbrella #<N> — <title> + Phase structure: 1 Foundation · M Parallel · 1 Release gate (deferred) + +📋 Epic inventory + [phase] [exists] [linked] + ✅ #369 E1 Foundation exists linked + ✅ #371 E2 Parallel exists ❌ not linked + ✅ #372 E3 Parallel exists ❌ not linked + ✅ #373 E4 Parallel exists ❌ not linked + ⏭️ E5 Release gate deferred — + +📋 Dry-run: writes pending (awaiting approval) + [dry-run block per epic needing a link or create] + +📋 Actions taken (after approvals) + ✅ #371 linked under #<N> + ✅ #372 linked under #<N> + ✅ #373 linked under #<N> + ⏭️ E5 deferred — skipped + +──────────────────────────────────────────── + ✅ EPICS LINKED — 4/5 epics under #<N> (E5 deferred) +──────────────────────────────────────────── + +→ Next: base_prp:prp-create per open epic: + - /base_prp:prp-create (#371 — /flow-brainstorm) + - /base_prp:prp-create (#372 — /flow-umbrella) + (E4 #373 = this PRP, currently executing) +``` + +## Reuse-map + +| Need | Tool | +|------|------| +| Codebase + context priming | `core_piv_loop:prime` | +| Epic → 5 executable subtasks | `issue-to-subtasks` skill | +| PRP authoring per epic | `base_prp:prp-create` | +| Session continuity | `writing-session-handoffs` | +| Rules audit | `audit-rules-drift` | +| Umbrella creation | `/flow-umbrella` | From 7799d530806a7f74cb90a3b1e8dcc996f926ed27 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 5 Jun 2026 00:33:11 +0200 Subject: [PATCH 07/44] docs(repo): track flow-pack E2-E4 PRPs (#368) --- PRPs/PRP-flow-pack-E3-flow-umbrella.md | 577 ++++++++++++++++++++ PRPs/PRP-flow-pack-E4-flow-epics.md | 594 +++++++++++++++++++++ PRPs/flow-brainstorm.md | 704 +++++++++++++++++++++++++ 3 files changed, 1875 insertions(+) create mode 100644 PRPs/PRP-flow-pack-E3-flow-umbrella.md create mode 100644 PRPs/PRP-flow-pack-E4-flow-epics.md create mode 100644 PRPs/flow-brainstorm.md diff --git a/PRPs/PRP-flow-pack-E3-flow-umbrella.md b/PRPs/PRP-flow-pack-E3-flow-umbrella.md new file mode 100644 index 00000000..4a388fd1 --- /dev/null +++ b/PRPs/PRP-flow-pack-E3-flow-umbrella.md @@ -0,0 +1,577 @@ +name: "PRP — flow-pack E3 (/flow-umbrella: rule-driven umbrella issue creation)" +description: | + E3 parallel epic of the flow: command-suite integration. Lands the tracked + docs/flow-pack/commands/flow-umbrella.md template + the local .claude runtime install. + The command generates a 7-field umbrella body from the approved V2 ship list, echoes a + dry-run, waits for approval, and creates the umbrella GitHub issue. Parallel with E2/E4; + depends only on E1 (foundation/labels/milestone) being merged. + +<!-- provenance: + Methodology: docs/flow-pack-methodology.md (§ Stage 2 — Decompose, § Umbrella contract) + Rule source: .claude/rules/umbrella-issue.md + E1 pattern: PRPs/PRP-flow-pack-E1-foundation.md ← mirror this PRP's structure exactly + Command pattern: docs/flow-pack/commands/flow-prime.md ← mirror this command's style + Live umbrella example: gh issue view 368 (7-field body verified in .flow/state.md) + Working analysis: .flow/state.md (Phase 4) + .flow/brainstorm-log.md (Round 2) --> + +## Issue links +- Umbrella: **#368** — feat(repo): integrate flow-pack methodology as the flow: command suite +- This epic: **#372** — flow-pack E3 — /flow-umbrella (7-field umbrella body, approval-gated write) +- Milestone: **#1 flow-pack-suite** · labels: `epic`, `flow` +- Depends on: **E1 #369** (merged via PR #370) — labels/milestone/tracked-docs foundation + +--- + +## Goal + +Implement the **E3 /flow-umbrella command**: the tracked template plus its local runtime install +that generates and creates a GitHub umbrella issue from the approved V2 ship list. The end state: +a user runs `/flow-umbrella <initiative>` (after `/flow-brainstorm` has produced a V2 list), +inspects the dry-run echo of the full 7-field body, types "approve," and the umbrella issue is +created with `umbrella` + `flow` + type labels and the active milestone attached. + +**Deliverable:** 2 files (tracked template + local runtime install). No epic creation. No +milestone/label creation (must exist from E1). No commit/push. Parallel with E2 (#371) and E4 +(#373). + +## Why + +- The flow: suite's Stage 2 ("Decompose") has no umbrella-creation command yet. + `/flow-brainstorm` produces a V2 ship list but has nowhere to hand it off. +- E3 fills that gap: it wires the approved V2 list into the 7-field umbrella contract + (`umbrella-issue.md`) and materializes it as a GitHub issue that `/flow-epics` can then + decompose into child epics. +- The durable-source split (`.claude/` gitignored) requires a tracked template. Without E3's + `docs/flow-pack/commands/flow-umbrella.md`, a fresh clone loses the command. + +## What + +A docs-first delivery: tracked command template → byte-copy local install → working `/flow-umbrella`. + +### Success Criteria +- [ ] `docs/flow-pack/commands/flow-umbrella.md` exists (tracked canonical spec/template). +- [ ] `.claude/commands/flow/flow-umbrella.md` present, byte-identical to the tracked template + (confirmed by `diff -q`). +- [ ] `git check-ignore .claude/commands/flow/flow-umbrella.md` prints the path (confirms it's + gitignored — not the durable artifact). +- [ ] Fresh-clone recovery works: `cp docs/flow-pack/commands/*.md .claude/commands/flow/` + reproduces the local command. +- [ ] `/flow-umbrella` validates prerequisites (labels/milestone) before drafting. +- [ ] `/flow-umbrella` performs idempotency check before dry-run. +- [ ] `/flow-umbrella` dry-run echoes the full `gh issue create` command + body before any write. +- [ ] `/flow-umbrella` approval gate prevents write on any response other than "approve." +- [ ] The created umbrella issue carries all 7 required sections, labels `umbrella`+`flow`+type, + and the active milestone. +- [ ] Every created artifact carries a provenance header linking to its source. +- [ ] E2/E4/E5 NOT implemented here. No epic creation, no sub-issue linking. + +--- + +## All Needed Context + +### Documentation & References +```yaml +# THE PATTERN TO MIRROR — read these before writing anything +- file: PRPs/PRP-flow-pack-E1-foundation.md + why: The E3 PRP's sibling; mirror its exact PRP structure. This implementation follows the + same two-file pattern (tracked template + local install). Read every section heading. + +- file: docs/flow-pack/commands/flow-prime.md + why: The only existing flow: command. Mirror its file structure exactly: + frontmatter YAML (description:) → provenance HTML comment → # Title → ## Objective → + ## Process (numbered steps with bash blocks) → ## Output Format (fenced block) → + ## Arguments ($ARGUMENTS line). flow-umbrella.md must follow this layout. + +- file: docs/flow-pack-methodology.md + sections: + - "## Stage 2 — Decompose — /flow-umbrella" (steps 1–4 + next-pointer spec) + - "## Umbrella contract (7-field body)" (exact field names + content rules) + - "## Durable-source split" (table: tracked vs local vs purpose) + - "## Fresh-clone recovery" (the cp command + diff check) + why: The authoritative spec for what /flow-umbrella does and the 7-field contract. + +- file: .claude/rules/umbrella-issue.md + sections: + - "## Umbrella body — 7-field contract" (field table with rules) + - "## Write discipline" (dry-run echo / idempotent check / approval gate / rate-delay) + - "## Labels and milestone" (required labels + milestone policy) + - "## Source-of-truth split (CRITICAL)" (the durable vs local split + recovery) + why: The agent contract for umbrella creation; flow-umbrella.md must implement every clause. + +# LIVE UMBRELLA EXAMPLE (for 7-field body reference) +- bash: "gh issue view 368 --json body --jq '.body'" + why: A real 7-field umbrella body created for this exact project. Use as a reference for + tone, section depth, and the "not yet created" placeholder pattern in Decomposition. + +# WORKING STATE +- file: .flow/brainstorm-log.md + why: Contains the V2 ship list, defer list with reasons, and the 5-dim scores that + /flow-umbrella reads to synthesize the 7-field body. Shows V2 items that became + the E1–E5 epics of #368. + +- file: .flow/state.md + sections: "FLOW-PRIME:YOU-ARE-HERE" marker block + why: Contains the active milestone name, label status, and current branch/version that + /flow-umbrella uses to validate prerequisites. + +# HAND-OFF SPEC +- file: docs/flow-pack-methodology.md + section: "## Stage 3 — Execute (delegated)" and the FLAI mapping table rows for + "Umbrella issue creation (01)" and "Epic creation + linking (01)" + why: Confirms /flow-umbrella ends with "→ Next: /flow-epics #N" and that epic creation + belongs entirely to /flow-epics (E4 #373). Do not blur the boundary. +``` + +### Current Codebase tree (relevant slice) +```bash +docs/ + flow-pack-methodology.md # ✅ tracked; § Stage 2 = /flow-umbrella spec + flow-pack/ + commands/ + flow-prime.md # ✅ tracked; MIRROR this structure + # flow-umbrella.md does NOT exist yet — to create +.claude/ + commands/flow/ + flow-prime.md # ✅ local install (gitignored); byte-copy of tracked + # flow-umbrella.md does NOT exist yet — to create + rules/ + umbrella-issue.md # ✅ local rule; contains 7-field contract + write discipline +.flow/ + state.md # working state; has You-Are-Here with milestone/labels + brainstorm-log.md # V2 ship list + defer list +PRPs/ + PRP-flow-pack-E1-foundation.md # ✅ the sibling PRP — mirror its layout + PRP-flow-pack-E3-flow-umbrella.md # this PRP +``` + +### Desired Codebase tree (files to add + responsibility) +```bash +docs/ + flow-pack/ + commands/ + flow-umbrella.md # TRACKED durable template/spec for /flow-umbrella + # Contains: frontmatter, provenance, full 9-step process, + # output format, $ARGUMENTS spec. + # Source of truth for the command. +.claude/ + commands/flow/ + flow-umbrella.md # LOCAL install — byte-copy of the tracked template (gitignored). + # Claude Code reads this when the user types /flow-umbrella. + # Recovery: cp docs/flow-pack/commands/flow-umbrella.md .claude/commands/flow/ +``` + +### Known Gotchas & Quirks +```text +# CRITICAL: .claude/ is gitignored (confirmed: /.claude and .claude in .gitignore). +# The local command at .claude/commands/flow/flow-umbrella.md is NEVER the durable +# artifact. Durable truth = docs/flow-pack/commands/flow-umbrella.md. The same +# gitignore split from E1 applies here — never treat the local copy as the source of truth. + +# CRITICAL: Local install must be byte-identical to tracked template. Verify: +# diff -q docs/flow-pack/commands/flow-umbrella.md \ +# .claude/commands/flow/flow-umbrella.md && echo "OK no drift" +# If they drift, the tracked template wins. Recovery = cp. + +# CRITICAL: The 7-field body must contain ALL 7 sections with their exact headings: +# ## Summary / ## Approach / ## Decomposition / ## Out of scope (explicit) / +# ## Success criteria / ## Risks / ## Tracking +# A body missing any section = not done (umbrella-issue.md invariant). + +# CRITICAL: Write discipline order — the command MUST do these in sequence: +# 1. prerequisites check (labels + milestone exist) ← fail fast before any draft +# 2. idempotency check (existing issue title search) ← skip create if already exists +# 3. draft 7-field body +# 4. dry-run echo (full body + gh command) +# 5. approval gate (block on user input) +# 6. execute gh issue create (only on "approve") +# 7. confirm (gh issue view) +# 8. print gate + next pointer +# Never swap 1 and 2. Never skip 4 or 5. + +# CRITICAL: gh issue create --body with multi-line content. +# Use --body-file <tempfile> (NOT --body "...") to avoid shell quoting issues +# with multi-line markdown bodies. Pattern: +# cat > /tmp/umbrella-body.md << 'BODY_EOF' +# [body content] +# BODY_EOF +# gh issue create --title "..." --body-file /tmp/umbrella-body.md --label ... --milestone ... +# gh CLI --body-file has been available since gh v1.x; verified against E1 groundwork. + +# CRITICAL: Type label derivation. +# The title follows conventional-commit format: "feat(repo): <initiative>". +# The type label is the first token before "(" — default "feat" if ambiguous. +# The type label must exist in the repo (e.g., gh label list | grep feat). +# Umbrella-issue.md says: "Labels ⊇ umbrella label set (plus the `epic` label)" for epics, +# but for the UMBRELLA issue itself: umbrella + flow + type label. + +# GOTCHA: Decomposition section — epic #N refs. +# When /flow-umbrella runs, epic issues don't exist yet. Use PROPOSED descriptions +# with "(not yet created)" suffixes. Pattern from live #368: +# - [ ] **E1 — Foundation** (blocks all): <description> — not yet created +# - [ ] **E2 — Parallel**: <description> — not yet created +# Do NOT put fake "#N" refs for unborn issues. /flow-epics will assign real numbers. + +# GOTCHA: Idempotency check searches open issues only (--state open). +# A closed umbrella with the same title won't block creation. This is intentional: +# a closed umbrella = finished initiative; a new one may legitimately start. + +# GOTCHA: Milestone name matching in gh CLI. +# gh issue create --milestone "<name>" requires the EXACT milestone title string +# (case-sensitive). Always read the milestone name from .flow/state.md You-Are-Here +# or from `gh api repos/{owner}/{repo}/milestones --jq '.[0].title'` rather than +# hard-coding it. + +# GOTCHA: commit-format.md requires every commit to reference an open issue. +# If a commit is needed (authorized by user), reference #372 — that's E3's issue. +# Branch = feat/flow-pack-e3-flow-umbrella. But NO commit/push happens in this PRP. + +# SCOPE: Do NOT create flow-brainstorm, flow-epics, or any other command here. +# E3 ships /flow-umbrella only. E2 (#371) and E4 (#373) are separate parallel epics. +# E5 is the release gate and remains deferred. +``` + +--- + +## Implementation Blueprint + +### list of tasks (dependency order) +```yaml +Task 1 — CREATE docs/flow-pack/commands/flow-umbrella.md (tracked canonical template): + - MIRROR structure of: docs/flow-pack/commands/flow-prime.md + (frontmatter YAML → provenance HTML comment → # Title → ## Objective → ## Process + (9 numbered steps, bash blocks) → ## Output Format (fenced block) → ## Arguments) + - INCLUDE frontmatter: "description: Generate and create umbrella GitHub issue from V2 ship list" + - INCLUDE provenance header naming: docs/flow-pack-methodology.md (§ Stage 2), umbrella-issue.md + - SPEC the 9-step process (see § Per-task notes below for full content) + - INCLUDE the output format block (gate + next-command pointer) + - INCLUDE "$ARGUMENTS" line (initiative description; derived from brainstorm log if omitted) + - HEADER: provenance comment → docs/flow-pack-methodology.md, .claude/rules/umbrella-issue.md + +Task 2 — INSTALL .claude/commands/flow/flow-umbrella.md (local runtime copy): + - GENERATE as a byte-copy: + cp docs/flow-pack/commands/flow-umbrella.md .claude/commands/flow/flow-umbrella.md + - VERIFY no drift: + diff -q docs/flow-pack/commands/flow-umbrella.md .claude/commands/flow/flow-umbrella.md + && echo "OK no drift" + - DO NOT hand-edit the local copy — copy only. + +Task 3 — VERIFY the durable-source split: + - git check-ignore .claude/commands/flow/flow-umbrella.md (must print the path) + - diff -q docs/flow-pack/commands/flow-umbrella.md .claude/commands/flow/flow-umbrella.md + - git status --short (only docs/flow-pack/commands/flow-umbrella.md should be a new tracked file) + - Confirm .claude/commands/flow/flow-umbrella.md is NOT staged. + +Task 4 — VERIFY fresh-clone recovery (optional, proves robustness): + - Simulate: rm -f .claude/commands/flow/flow-umbrella.md + - Regenerate: cp docs/flow-pack/commands/*.md .claude/commands/flow/ + - Confirm: diff -q docs/flow-pack/commands/flow-umbrella.md .claude/commands/flow/flow-umbrella.md +``` + +### Per-task notes — full /flow-umbrella command content spec + +**Task 1 is the high-value task.** The command file must contain the following process, verbatim +or faithful to this spec. Each step maps to a clause in `umbrella-issue.md` § Write discipline. + +```text +## Process + +### 1. Read context + - Load .flow/brainstorm-log.md: extract V2 ship list, defer list (with reasons), initiative title. + - Load .flow/state.md FLOW-PRIME:YOU-ARE-HERE block: extract milestone name, type label, branch. + - $ARGUMENTS overrides the initiative description if provided. + - If .flow/brainstorm-log.md is missing or has no V2 section, print: + "ERROR: No V2 ship list found. Run /flow-brainstorm first, then re-run /flow-umbrella." + and stop. + +### 2. Validate prerequisites + Commands to run: + gh label list --json name --jq '[.[].name]' # check for umbrella, flow, <type> + gh api repos/{owner}/{repo}/milestones \ + --jq '.[] | select(.state=="open") | .title' # check milestone exists + If any required label or milestone is missing: + print the exact gh label create / gh milestone create commands to remediate. + STOP — do not proceed to draft. + +### 3. Idempotency check + Command: + gh issue list --state open \ + --search "<proposed issue title>" \ + --json number,title \ + --jq '.[0] // empty' + If an open issue with the same title exists: + print "Umbrella #N already exists: <url> — skipping create." + jump to step 8 (print gate + next-pointer with the existing number). + +### 4. Draft the 7-field body + Synthesize from context (step 1). ALL 7 sections required: + + ## Summary + <One paragraph: what is wrong/missing in the current state. Cite baseline artifacts + (e.g., "ForecastLabAI has X but lacks Y …"). Do not describe the solution here.> + + ## Approach + <One paragraph: architectural delta only. No new runtime deps, no new services, no new + routers unless justified. Describe the shape of the change ("thin commands that delegate + to existing primitives …").> + + ## Decomposition + Phase taxonomy (invariant from docs/flow-pack-methodology.md § Invariants): + - Exactly ONE Foundation epic (blocks all others) + - N Parallel epics (run concurrently after Foundation) + - Exactly ONE Release-gate epic (closes ONLY after Foundation + all Parallel) + Format per entry: + - [ ] **EN — <Phase>** (<phase-note>): <one-line description> — not yet created + Use "not yet created" because /flow-epics hasn't run. Do NOT invent fake #N refs. + + ## Out of scope (explicit) + <Every item from the defer list + any scope boundary.> + Each line must end with " — reason: <one sentence>". + Never leave a blank reason (umbrella-issue.md invariant: every defer has a reason). + + ## Success criteria + Checkbox list. Each criterion must be independently verifiable by an outside reviewer: + - [ ] <Specific, measurable outcome — not "everything works"> + + ## Risks + Markdown table. One row per risk, one mitigation per row: + | Risk | Mitigation | + |------|------------| + | <risk> | <mitigation> | + + ## Tracking + - Source of truth: `docs/flow-pack-methodology.md` + working state `.flow/state.md` + - Milestone: <milestone name> + - **One-pass confidence: X/10** (<one-sentence rationale for the score>) + +### 5. Dry-run echo + Print the EXACT commands to be executed (do not run them yet): + cat > /tmp/umbrella-body.md << 'BODY_EOF' + [full 7-field body as it will be submitted — show every line] + BODY_EOF + gh issue create \ + --title "<title>" \ + --body-file /tmp/umbrella-body.md \ + --label "umbrella" --label "flow" --label "<type>" \ + --milestone "<milestone-name>" + +### 6. Approval gate + Print: + "────────────────────────────────────────── + Awaiting approval. Type 'approve' to create the umbrella issue. + Any other response = abort (no write). + ──────────────────────────────────────────" + Wait for user input. On "approve" (case-insensitive): proceed to step 7. + On anything else: print "Aborted — no issue created." and stop. + +### 7. Execute + Run: + cat > /tmp/umbrella-body.md << 'BODY_EOF' + [7-field body] + BODY_EOF + gh issue create \ + --title "<title>" \ + --body-file /tmp/umbrella-body.md \ + --label "umbrella" --label "flow" --label "<type>" \ + --milestone "<milestone-name>" + Capture the issue URL; extract the issue number N from the URL. + +### 8. Confirm + Run: + gh issue view <N> --json number,title,labels,milestone \ + --jq '"#\(.number): \(.title) [\(.labels | map(.name) | join(","))] milestone=\(.milestone.title // "none")"' + If labels or milestone are missing, print remediation: + gh issue edit <N> --add-label <missing-label> + gh issue edit <N> --milestone "<milestone-name>" + +### 9. Gate and next-command + Gate ✅ UMBRELLA CREATED when: created + all 7 sections present + labels ✅ + milestone ✅ + Gate ❌ FAILED when: gh issue create returned non-zero or confirm shows missing labels/milestone. + Always print the next-command pointer, even on failure (so the user knows where to go): + → Next: /flow-epics #<N> +``` + +**Output format block to include at the end of the command file:** +```text +## Output Format + +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + 🏗️ flow-umbrella: Umbrella Issue +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + +📋 Context + V2 ship items: N | Defer items: M + Initiative: <title> + Milestone: <name> | Labels: umbrella ✅ flow ✅ <type> ✅ + +📋 Prerequisite check + umbrella label: [✅/❌] | flow label: [✅/❌] | <type> label: [✅/❌] + Milestone <name>: [✅/❌] | Existing umbrella: [#N / none] + +📋 Dry-run + cat > /tmp/umbrella-body.md ... + gh issue create --title "..." --body-file /tmp/umbrella-body.md --label ... --milestone ... + [full body printed] + +──────────────────────────────────────────── + Awaiting approval. Type "approve" to create. +──────────────────────────────────────────── + +[After approval:] + +📋 Created + ✅ gh issue create → #N: <title> + Labels: umbrella, flow, <type> | Milestone: <name> + +──────────────────────────────────────────── + [✅/❌] UMBRELLA CREATED → #N +──────────────────────────────────────────── + +→ Next: /flow-epics #N +``` + +### Integration Points +```yaml +DOCS (tracked): + - add: docs/flow-pack/commands/flow-umbrella.md + - no modifications to docs/flow-pack-methodology.md required + (it already describes /flow-umbrella in § Stage 2; no new content needed) + +CLAUDE (local, gitignored): + - add: .claude/commands/flow/flow-umbrella.md (byte-copy via cp) + +HAND-OFF: + - /flow-umbrella ends with "→ Next: /flow-epics #N" + - /flow-epics is E4 (#373); /flow-umbrella does NOT call it — it only prints the pointer + - base_prp:prp-create is invoked per epic AFTER /flow-epics, not by /flow-umbrella +``` + +--- + +## Validation Loop + +### Level 1: File presence + durable-source split +```bash +# tracked source of truth exists +test -f docs/flow-pack/commands/flow-umbrella.md && echo "OK tracked" + +# local install exists and is gitignored (NOT durable) +test -f .claude/commands/flow/flow-umbrella.md && echo "OK local" +git check-ignore .claude/commands/flow/flow-umbrella.md # must print the path + +# local install == tracked template (no drift) +diff -q docs/flow-pack/commands/flow-umbrella.md \ + .claude/commands/flow/flow-umbrella.md && echo "OK no drift" + +# only docs/flow-pack/commands/flow-umbrella.md is a new tracked file; .claude/** not staged +git status --short +``` + +Expected output: +- `OK tracked`, `OK local`, path printed by gitignore check, `OK no drift` +- `git status` shows one new `A` entry: `docs/flow-pack/commands/flow-umbrella.md` +- `.claude/commands/flow/flow-umbrella.md` does NOT appear in `git status` + +### Level 2: Fresh-clone recovery reproduction +```bash +# simulate recovery: blow away the local install, regenerate, confirm identical +rm -f .claude/commands/flow/flow-umbrella.md +cp docs/flow-pack/commands/*.md .claude/commands/flow/ +diff -q docs/flow-pack/commands/flow-umbrella.md \ + .claude/commands/flow/flow-umbrella.md && echo "OK recovery reproduces local" +``` + +### Level 3: Command structure smoke (manual inspection) +```bash +# Confirm all 9 process steps are present in the tracked template +grep -c "^### [0-9]\." docs/flow-pack/commands/flow-umbrella.md +# Expected: 9 + +# Confirm all 7 body section headings appear in the spec +grep "## Summary\|## Approach\|## Decomposition\|## Out of scope\|## Success criteria\|## Risks\|## Tracking" \ + docs/flow-pack/commands/flow-umbrella.md | wc -l +# Expected: 7 + +# Confirm dry-run and approval-gate keywords present +grep -c "dry.run\|DRY RUN\|approve\|approval" docs/flow-pack/commands/flow-umbrella.md +# Expected: >= 4 (dry-run in step 5, approval in step 6) + +# Confirm the next-command pointer spec is present +grep "flow-epics" docs/flow-pack/commands/flow-umbrella.md +# Expected: at least one match showing "→ Next: /flow-epics #N" +``` + +### Level 4: Interactive smoke (post-install, manual) +```text +# In a Claude Code session, type: /flow-umbrella test-initiative +# Verify the command: +# - Reads .flow/brainstorm-log.md (or notes its absence with a helpful error) +# - Runs prerequisite checks (labels + milestone) and reports results +# - Performs idempotency check +# - Prints a dry-run echo with full 7-field body +# - Blocks on approval gate (does NOT create without "approve") +# - On "approve": creates the issue and prints #N + gate result +# - Ends with "→ Next: /flow-epics #N" +# (No automated assertion — interactive command; verify output sections manually.) +``` + +--- + +## Tests / checks required +- [ ] Level 1 file-presence + gitignore + no-drift checks all pass (4 assertions green). +- [ ] Level 2 recovery reproduces the local install byte-for-byte. +- [ ] Level 3 structure smoke: 9 steps present, 7 section headings, ≥4 dry-run/approval hits, + flow-epics pointer present. +- [ ] Provenance header present in both created files + (`grep -l "provenance" docs/flow-pack/commands/flow-umbrella.md .claude/commands/flow/flow-umbrella.md`). +- [ ] `git status --short` shows `docs/flow-pack/commands/flow-umbrella.md` as the ONLY new tracked + file; `.claude/commands/flow/flow-umbrella.md` does NOT appear. +- [ ] No standard repo gate is broken (markdown-only change → ruff/mypy/pyright/pytest unaffected; + run `uv run ruff check . && uv run pytest -v -m "not integration"` to confirm green). +- [ ] `docs/flow-pack/commands/flow-prime.md` structure matches `flow-umbrella.md` structure + (frontmatter → provenance → title → Objective → Process → Output Format → Arguments). +- [ ] E2/E4/E5 NOT implemented; no GitHub issues created. + +--- + +## Final Validation Checklist +- [ ] Both files created + byte-identical (diff clean). +- [ ] Durable-source split holds: `docs/flow-pack/commands/flow-umbrella.md` tracked; `.claude/commands/flow/flow-umbrella.md` gitignored + regenerable. +- [ ] Command spec contains all 9 process steps in the correct order (prereq → idempotent → draft → dry-run → approval → execute → confirm → gate → next-pointer). +- [ ] 7-field body headings all present in the spec with correct names and field rules. +- [ ] Write discipline clauses all present: dry-run echo, approval gate, idempotency check, no write without "approve." +- [ ] --body-file approach used (not --body "...") for multi-line body safety. +- [ ] Type label derivation documented in the command spec. +- [ ] "→ Next: /flow-epics #N" pointer present in output format. +- [ ] E2 (#371) and E4 (#373) NOT touched; no epic creation in this scope. +- [ ] No commit/push performed; `uv.lock` + `docker-compose.lan.yml` left untouched. +- [ ] Branch for implementation: `feat/flow-pack-e3-flow-umbrella` off `dev`; commit (when user authorizes) references `(#372)`. + +--- + +## Anti-Patterns to Avoid +- ❌ Treating `.claude/commands/flow/` as the source of truth (it's gitignored — durable truth is `docs/flow-pack/commands/flow-umbrella.md`). +- ❌ Hand-editing the local install instead of copying from the tracked template. +- ❌ Swapping the write-discipline order (prerequisite check must come before idempotency check, which must come before dry-run, which must come before execute). +- ❌ Using `--body "..."` instead of `--body-file` for multi-line gh issue create bodies. +- ❌ Putting fake `#N` refs for unborn epic issues in the Decomposition section. +- ❌ Creating epic issues inside /flow-umbrella — that is /flow-epics (E4 #373). +- ❌ Creating milestone or labels inside /flow-umbrella — those must exist from E1; the command validates and fails fast if missing. +- ❌ Diverging from the `docs/flow-pack/commands/flow-prime.md` file structure (frontmatter, section headings, output-format block, Arguments line). +- ❌ Staging `uv.lock` / `docker-compose.lan.yml` (pre-existing dirty worktree — leave alone). +- ❌ Implementing any part of E2 (/flow-brainstorm), E4 (/flow-epics), or E5 here. + +--- + +## Confidence Score: 9/10 + +One-pass likelihood is high: +- Pattern is fully established by the E1 PRP + flow-prime.md (mirror exactly). +- All 9 process steps are specified to the line level, including exact `gh` commands. +- Methodology is fully reverse-engineered and dogfooded (`.flow/` state docs). +- Work is markdown-only: zero Python/TS runtime risk, no type system, no DB. +- Durable-source split is already understood by the E1 precedent. + +−1 for two authoring judgment calls: (a) the quality of the synthesized 7-field body (depends on +what the V2 ship list says — must be coherent and match the live umbrella #368 style), and (b) +correctly handling the case where `.flow/brainstorm-log.md` is absent or partially formed (the +command must degrade gracefully with a clear error + `→ /flow-brainstorm` pointer). diff --git a/PRPs/PRP-flow-pack-E4-flow-epics.md b/PRPs/PRP-flow-pack-E4-flow-epics.md new file mode 100644 index 00000000..77b9b720 --- /dev/null +++ b/PRPs/PRP-flow-pack-E4-flow-epics.md @@ -0,0 +1,594 @@ +name: "PRP — flow-pack E4 /flow-epics (tracked template + local install + epic decomposition + sub-issue linking)" +description: | + E4 of the flow: command-suite integration. Ships the /flow-epics command: reads an umbrella + issue decomposition, creates phase-ordered epic issues with idempotent guards, links them as + sub-issues via the GitHub REST API, and hands off to base_prp:prp-create per epic. + Parallel epic — runs after E1 (#369 merged). Docs-only: no app/backend/frontend changes. + +<!-- provenance: docs/flow-pack-methodology.md § "/flow-epics — epic issues" + § "Epic contract" + + § "Hierarchy-as-data (REST API)" + § "Durable-source split". + Epic scope: GitHub issue #373 body. + Structural pattern: PRPs/PRP-flow-pack-E1-foundation.md --> + +## Issue links +- Umbrella: **#368** — feat(repo): integrate flow-pack methodology as the flow: command suite +- This epic: **#373** — flow-pack E4 — /flow-epics (epic decomposition, sub-issue linking, prp-create handoff) +- Milestone: **#1 flow-pack-suite** · labels: `epic`, `flow` + +--- + +## Goal + +Implement the **E4 deliverable** of the `flow:` command suite: the `/flow-epics` command. + +End state: a user can run `/flow-epics 368` (or `/flow-epics #368`) and the command will: +1. Read umbrella #368's decomposition section to build an epic inventory +2. Check which epics already exist (title search) and which are already linked (GraphQL) +3. For each epic not yet a sub-issue: dry-run echo → user "approve" → `gh issue create` → rate-delay → `gh api` sub-issue link +4. Skip E5 Release gate (deferred per #373 scope) +5. Verify the final sub-issue list via GraphQL +6. Print gate result + per-epic handoff to `base_prp:prp-create` + +**Deliverable:** 2 files + 1 documented recovery path (see Desired tree). No E2/E3/E5 behavior. +No app/backend/frontend code changes. No commit/push without explicit user authorization. + +## Why + +- The `flow:` suite's value ends at `/flow-umbrella` unless epics exist as linked sub-issues. + `/flow-epics` closes the loop: umbrella → epics → base_prp:prp-create per epic. +- Hierarchy-as-data (REST sub-issue API) is required so project-board grouping, closure rollup, + and dependency ordering work natively. Body `#N` mentions alone are documentation, not data. +- E4 is parallel with E2/E3; it can ship independently once E1 is merged. +- The write-discipline invariants (dry-run/idempotent/approval/rate-delay/confirm) are already + fully specified in `docs/flow-pack-methodology.md` and `.claude/rules/umbrella-issue.md` — + this PRP operationalises them into a reusable slash-command. + +## What + +A docs-first deliverable: tracked canonical template → local runtime install → working `/flow-epics`. + +### Success Criteria +- [ ] Tracked `docs/flow-pack/commands/flow-epics.md` exists with a complete, self-contained + command spec (all 8 required sections — see Task 1). +- [ ] Local `.claude/commands/flow/flow-epics.md` present, byte-regenerable from the tracked template. +- [ ] Fresh-clone recovery documented and verified: + `cp docs/flow-pack/commands/*.md .claude/commands/flow/` reproduces both command files. +- [ ] Running `/flow-epics 368` shows a correct inventory (E1 exists/linked, E2–E4 exist, E5 deferred) + without writing anything until the user approves. +- [ ] Every dry-run echo shows the exact `gh issue create` + `gh api sub_issues POST` commands before + any execution. +- [ ] All created/linked epics carry labels `epic` + `flow` + `feat` + milestone `flow-pack-suite`. +- [ ] Sub-issue links verified via GraphQL after each write batch. +- [ ] No GitHub write without explicit user "approve" for each write operation. +- [ ] E5 Release gate shown as ⏭️ SKIP (deferred) — never auto-created. +- [ ] No app/backend/frontend code touched; `uv.lock` / `docker-compose.lan.yml` / uncommitted + `flow-prime.md` left untouched. + +## All Needed Context + +### Documentation & References +```yaml +# SOURCE OF TRUTH — read these first +- file: docs/flow-pack-methodology.md + section: > + "/flow-epics — epic issues" (Step 1–4 of Stage 2 — Decompose), + "Epic contract" (phase blockquote + Purpose + Sub-tasks), + "Hierarchy-as-data (REST API)" (exact gh api calls), + "Durable-source split" (tracked vs gitignored) + why: > + Authoritative spec; exact gh api patterns; epic body contract; + portability invariants the command must uphold + +- file: .claude/rules/umbrella-issue.md + section: > + "Epic body — phase contract" (blockquote templates), + "Hierarchy-as-data" (POST endpoint, no native gh cmd), + "Write discipline" (5-step dry-run/idempotent/approval/rate-delay/confirm), + "Labels and milestone" (label superset rule) + why: Rule-level contract the command encodes; exact write discipline steps + +# PATTERN TO MIRROR — match structure exactly +- file: docs/flow-pack/commands/flow-prime.md + why: > + The only other tracked command template; mirror its YAML frontmatter, + HTML provenance comment, section headings, $ARGUMENTS convention, + inline !`bash` commands, and output-format block word-for-word in style + +# LIVE UMBRELLA BEING SERVED +- issue: "#368" + command: "gh issue view 368 --json number,title,body,labels,milestone" + why: > + Live umbrella whose Decomposition section defines which epics to create; + its current sub-issues list is the ground truth for idempotency + +# EXISTING EPICS (idempotency ground truth) +- issue: "#369 E1 Foundation — merged/linked" +- issue: "#371 E2 Parallel — /flow-brainstorm" +- issue: "#372 E3 Parallel — /flow-umbrella" +- issue: "#373 E4 Parallel — /flow-epics (this epic)" + why: All four exist; command must detect them and skip create, only link if not yet a sub-issue + +# E1 PRP STRUCTURAL REFERENCE +- file: PRPs/PRP-flow-pack-E1-foundation.md + why: > + Exact PRP structure pattern — frontmatter, issue links, goal/why/what, + context YAML, current+desired trees, Known Gotchas block, task list format, + integration points YAML, validation levels, final checklist, confidence score + +# CONSTRAINTS +- file: CLAUDE.md + section: "Learnings — .claude/ is gitignored" + critical: > + Local install is NOT the durable artifact. docs/flow-pack/** is tracked. + .claude/commands/flow/flow-epics.md must be gitignored (verify with git check-ignore). + +- file: .claude/rules/output-formatting.md + why: > + Emoji status indicators (✅ ❌ ⏭️ ⚠️ 🔄) + box-drawing separators (━━━/────) + the command's printed output must match exactly + +- file: .claude/rules/commit-format.md + why: Branch name (feat/flow-pack-e4-flow-epics) + commit format (feat(docs,repo): ... (#373)) +``` + +### Current Codebase tree (relevant slice) +```bash +docs/ + flow-pack-methodology.md # TRACKED — source-of-truth spec + flow-pack/ + commands/ + flow-prime.md # TRACKED — structural pattern to mirror (MODIFIED unstaged) +.claude/ + commands/flow/ + flow-prime.md # LOCAL — byte-copy of tracked template + rules/ + umbrella-issue.md # LOCAL — write-discipline + sub-issue API contract +PRPs/ + PRP-flow-pack-E1-foundation.md # REFERENCE — PRP structural pattern + PRP-flow-pack-E4-flow-epics.md # THIS PRP (being executed) +``` + +### Desired Codebase tree (files to add + responsibility) +```bash +docs/ + flow-pack/ + commands/ + flow-epics.md # TRACKED — canonical template/spec for /flow-epics + # source of truth; the committed, portable contract +.claude/ + commands/flow/ + flow-epics.md # LOCAL install — regenerable byte-copy of the tracked + # template (gitignored; NOT durable) +``` + +### Known Gotchas & Quirks +```text +# CRITICAL: No native `gh` sub-issue command exists (cli/cli#10298). ALWAYS use: +# gh api repos/w7-mgfcode/ForecastLabAI/issues/{umbrella_N}/sub_issues \ +# -X POST -F sub_issue_id={epic_N} \ +# --header "GitHub-Next-Preview: true" +# The --header "GitHub-Next-Preview: true" is REQUIRED for the REST POST write. +# It is NOT required for GraphQL read queries. + +# CRITICAL: Idempotent check BEFORE create — search by exact issue title: +# gh issue list --search "<exact title>" --json number,title \ +# --jq '.[0].number // "none"' +# If result != "none" → issue already exists → skip create, proceed to link check only. +# GitHub search is fuzzy — verify the returned title matches character-for-character. + +# CRITICAL: Idempotent link check — fetch current sub-issues BEFORE any POST: +# gh api graphql -f query=' +# { repository(owner:"w7-mgfcode", name:"ForecastLabAI") { +# issue(number: N) { subIssues(first: 20) { nodes { number title } } } +# } }' +# If epic_N already appears in subIssues.nodes → skip POST (already linked). + +# GOTCHA: sleep 1 (rate-delay) between consecutive gh api WRITE calls — mandatory, not advisory. +# This applies between (create + link) pairs AND between two consecutive creates. +# Do NOT sleep between a read and a write, only between consecutive writes. + +# GOTCHA: Epic title naming convention — must match the umbrella decomposition text exactly so +# idempotent search finds existing issues: +# Pattern from live E1–E4: "feat(repo): flow-pack <phase-label> — <scope description>" +# Example: "feat(repo): flow-pack E5 — release gate (end-to-end dogfood + portability manifest)" + +# GOTCHA: Label superset rule — epic labels must include ALL umbrella labels minus "umbrella", +# plus "epic". Umbrella #368 has: umbrella, flow → epic labels: epic, flow, feat. +# Read umbrella labels from: gh issue view <N> --json labels --jq '[.labels[].name]' +# Then remove "umbrella" from the list and add "epic". + +# GOTCHA: docs/flow-pack/commands/flow-prime.md is MODIFIED (not staged) in the current worktree. +# Run `git diff docs/flow-pack/commands/flow-prime.md` before branching to see the delta. +# Do NOT stage or commit that file as part of this PRP — it belongs to a separate fix. +# Only stage docs/flow-pack/commands/flow-epics.md (the new tracked file). + +# SCOPE BOUNDARY — E5 is DEFERRED: +# Issue #373 Out-of-scope: "Release-gate epic (E5) — deferred until E2–E4 are implemented." +# When the command encounters the Release-gate line in the umbrella decomposition, it MUST +# show it as ⏭️ SKIP (deferred) and never auto-create it. This is a hard scope boundary. + +# SCOPE BOUNDARY — command creates epics, nothing else: +# - Does NOT author PRP content (→ base_prp:prp-create) +# - Does NOT create sub-tasks within epics (→ issue-to-subtasks) +# - Does NOT implement any epic's feature code +# - Does NOT run any validation gates (ruff/mypy/pytest) — markdown-only change +``` + +## Implementation Blueprint + +### Tasks (dependency order) + +```yaml +Task 1 — CREATE docs/flow-pack/commands/flow-epics.md (tracked canonical template): + + MIRROR structure of: docs/flow-pack/commands/flow-prime.md + The file must contain ALL 8 sections in this exact order: + + ── Section A: YAML frontmatter ── + --- + description: Create phase-ordered epic issues from an umbrella decomposition, link via REST + sub-issues API, and hand off to base_prp:prp-create per epic + --- + + ── Section B: HTML provenance comment ── + <!-- provenance: docs/flow-pack-methodology.md §"/flow-epics — epic issues" + §"Epic contract" + + §"Hierarchy-as-data (REST API)". Source of truth for .claude/commands/flow/flow-epics.md. + Recovery: cp docs/flow-pack/commands/flow-epics.md .claude/commands/flow/flow-epics.md + Full methodology: docs/flow-pack-methodology.md --> + + ── Section C: Title + Objective ── + # flow-epics: Epic Decomposition + ## Objective + One-paragraph prose: "Read the umbrella issue's Decomposition section, create N phase-ordered + epic issues (Foundation → Parallel → Release gate) with idempotent guards, link each as a + sub-issue of the umbrella via the GitHub REST API, and hand off to base_prp:prp-create per + open epic. Skips epics that already exist and/or are already linked." + + ── Section D: Arguments ── + ## Arguments + "$ARGUMENTS — umbrella issue number (e.g. 368 or #368). Required. + If omitted, reads the active umbrella number from the 'In-progress issues' block in + .flow/state.md (looks for the [umbrella,flow]-labeled entry)." + + ── Section E: Process (7 numbered steps with inline !`bash` commands) ── + ## Process + + ### 1. Parse argument + Strip '#' prefix from $ARGUMENTS if present. + If empty: read .flow/state.md and find the [umbrella,flow] open issue number. + !`cat .flow/state.md | grep "umbrella,flow" | head -1` + + ### 2. Fetch umbrella + !`gh issue view <N> --json number,title,body,labels,milestone` + Abort with ❌ if not found or if the `umbrella` label is absent. + + ### 3. Extract decomposition + Parse the body: find lines between "## Decomposition" and the next "##" heading. + For each bullet line: + - Detect phase marker: "Foundation" / "Parallel" / "Release gate" (from the bold label) + - Extract scope description (used to construct the epic title and Purpose) + - Detect any embedded "#N" ref already in the line (pre-existing issue pointer) + - Flag as SKIP if the line contains "(deferred)" OR the phase is Release gate + AND the scope mentions "not yet created" or "deferred" + + ### 4. Pre-flight checks + !`gh label list --json name --jq '[.[].name] | contains(["epic","flow","feat"])'` + Abort with ❌ if result is false (missing labels). + !`gh api repos/w7-mgfcode/ForecastLabAI/milestones --jq '.[].title'` + Abort with ❌ if "flow-pack-suite" not present. + + ### 5. Idempotent inventory + Fetch current sub-issues of the umbrella: + !`gh api graphql -f query=' + { repository(owner:"w7-mgfcode", name:"ForecastLabAI") { + issue(number: <N>) { + subIssues(first: 20) { nodes { number title state } } + } + } }'` + + For each epic in the decomposition (except SKIP items): + a. Search for existing issue by title: + !`gh issue list --search "<exact epic title>" --json number,title \ + --jq '.[0] | "\(.number // "none") \(.title // "")"'` + → Record as EXISTS #M / NOT_FOUND + b. Check if already in the GraphQL subIssues list above → LINKED / UNLINKED + + Print inventory table: + [phase] [title-summary] [exists] [linked] + Foundation E1 foundation ... ✅ #369 ✅ linked + Parallel E2 /flow-brainstorm ... ✅ #371 ✅/❌ ? + Parallel E3 /flow-umbrella ... ✅ #372 ✅/❌ ? + Parallel E4 /flow-epics ... ✅ #373 ✅/❌ ? + Release gate E5 dogfood ... ⏭️ deferred — + + ### 6. Create + link loop (write-discipline, per epic not yet linked) + For each epic where LINKED=false AND phase != SKIP: + + 6a. If NOT_FOUND → CREATE the issue: + Compose body using the epic body template (see § Epic body template). + Echo dry-run: + ┌─ DRY-RUN ──────────────────────────────────────────────┐ + │ gh issue create \ │ + │ --title "<title>" \ │ + │ --body "<body>" \ │ + │ --label epic --label flow --label feat \ │ + │ --milestone "flow-pack-suite" │ + └────────────────────────────────────────────────────────┘ + APPROVAL GATE: "Type 'approve' to create, anything else to skip." + If approved: execute gh issue create; capture issue number M from output. + RATE-DELAY: sleep 1 + + 6b. Link to umbrella (whether newly created or already existing but UNLINKED): + Echo dry-run: + ┌─ DRY-RUN ──────────────────────────────────────────────┐ + │ gh api repos/w7-mgfcode/ForecastLabAI/issues/<N>/sub_issues \│ + │ -X POST -F sub_issue_id=<M> \ │ + │ --header "GitHub-Next-Preview: true" │ + └────────────────────────────────────────────────────────┘ + APPROVAL GATE: "Type 'approve' to link, anything else to skip." + If approved: execute gh api POST; confirm: + !`gh issue view <M> --json number,title,labels` + RATE-DELAY: sleep 1 + + ### 7. Verify + gate + Re-fetch sub-issues via GraphQL (same query as step 5) to confirm final state. + Print gate result + handoff (see § Output format). + + ── Section F: Epic body template ── + ## Epic body template + + Use one of these three templates based on phase. Fill <angle-bracket> placeholders. + + FOUNDATION epic: + ``` + > Sub-issue of #<umbrella_N> (umbrella: <umbrella_title>). Foundation — blocks Epics #<P1>, #<P2>, … + + ## Purpose + + <One-paragraph scope description extracted from the umbrella decomposition line.> + + ## Sub-tasks + + _To be decomposed via `issue-to-subtasks` when this epic is picked up._ + ``` + + PARALLEL epic: + ``` + > Sub-issue of #<umbrella_N> (umbrella: <umbrella_title>). Parallel after Foundation (E1 #<foundation_N>). + + ## Purpose + + <One-paragraph scope description.> + + ## Sub-tasks + + _To be decomposed via `issue-to-subtasks` when this epic is picked up._ + ``` + + RELEASE GATE epic: + ``` + > Sub-issue of #<umbrella_N> (umbrella: <umbrella_title>). Release gate — closes only after Foundation + all Parallel epics close. + + ## Purpose + + <One-paragraph scope description.> + + ## Sub-tasks + + _To be decomposed via `issue-to-subtasks` when this epic is picked up._ + ``` + + ── Section G: Output format ── + ## Output format + + ``` + ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + 🔗 flow-epics: Epic Decomposition + ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + + 📋 Umbrella #<N> — <title> + Phase structure: 1 Foundation · M Parallel · 1 Release gate (deferred) + + 📋 Epic inventory + [phase] [exists] [linked] + ✅ #369 E1 Foundation exists linked + ✅ #371 E2 Parallel exists ❌ not linked + ✅ #372 E3 Parallel exists ❌ not linked + ✅ #373 E4 Parallel exists ❌ not linked + ⏭️ E5 Release gate deferred — + + 📋 Dry-run: writes pending (awaiting approval) + [dry-run block per epic needing a link] + + 📋 Actions taken (after approvals) + ✅ #371 linked under #<N> + ✅ #372 linked under #<N> + ✅ #373 linked under #<N> + ⏭️ E5 deferred — skipped + + ──────────────────────────────────────────── + ✅ EPICS LINKED — 4/5 epics under #<N> (E5 deferred) + ──────────────────────────────────────────── + + → Next: base_prp:prp-create per open epic: + - /base_prp:prp-create (#371 — /flow-brainstorm) + - /base_prp:prp-create (#372 — /flow-umbrella) + (E4 #373 = this PRP, currently executing) + ``` + + ── Section H: Reuse-map (match umbrella-issue.md style) ── + ## Reuse-map + + | Need | Tool | + |------|------| + | Codebase + context priming | core_piv_loop:prime | + | Epic → 5 executable subtasks | issue-to-subtasks skill | + | PRP authoring per epic | base_prp:prp-create | + | Session continuity | writing-session-handoffs | + | Rules audit | audit-rules-drift | + | Umbrella creation | /flow-umbrella | + + +Task 2 — INSTALL .claude/commands/flow/flow-epics.md (local runtime copy): + + REGENERATE as a byte-copy: cp docs/flow-pack/commands/flow-epics.md .claude/commands/flow/ + + VERIFY no drift: + diff -q docs/flow-pack/commands/flow-epics.md .claude/commands/flow/flow-epics.md + → must produce no output (identical files) + + CONFIRM gitignored: + git check-ignore .claude/commands/flow/flow-epics.md + → must print the path (confirms it is ignored and will never appear in git status as tracked) +``` + +### Integration Points +```yaml +DOCS (tracked — the only committed change): + - add: docs/flow-pack/commands/flow-epics.md + +CLAUDE (local, gitignored — never staged or committed): + - add: .claude/commands/flow/flow-epics.md + +DIRTY WORKTREE (handle carefully): + - INSPECT before branching: git diff docs/flow-pack/commands/flow-prime.md + - DO NOT STAGE: uv.lock, docker-compose.lan.yml, docs/flow-pack/commands/flow-prime.md + - IF flow-prime.md modification is a bug fix relevant to this work, commit it separately first + +BRANCH: + - create off dev: feat/flow-pack-e4-flow-epics + - verify: git branch --show-current + +COMMIT (only when user explicitly authorizes — no auto-commit): + - format: feat(docs,repo): flow-pack E4 — /flow-epics command template + local install (#373) + - stage only: docs/flow-pack/commands/flow-epics.md + - .claude/commands/flow/flow-epics.md is gitignored — it will NOT appear in git status + - verify pre-commit hook passes: .claude/hooks/check-commit-format.sh + +METHODOLOGY DOC (no changes needed): + - docs/flow-pack-methodology.md already documents /flow-epics fully — do NOT edit it +``` + +## Validation Loop + +### Level 1: File presence + durable-source split +```bash +# tracked source of truth exists and has content +test -f docs/flow-pack/commands/flow-epics.md \ + && wc -l docs/flow-pack/commands/flow-epics.md \ + && echo "OK: tracked file present" + +# local install exists +test -f .claude/commands/flow/flow-epics.md && echo "OK: local install present" + +# local install is gitignored (CRITICAL — must print the file path, not empty) +git check-ignore .claude/commands/flow/flow-epics.md +# expected output: .claude/commands/flow/flow-epics.md + +# local install == tracked template (no drift) +diff -q docs/flow-pack/commands/flow-epics.md .claude/commands/flow/flow-epics.md \ + && echo "OK: no drift between tracked and local" + +# only docs/flow-pack/commands/flow-epics.md is a new tracked addition +# .claude/** must NOT appear as staged or tracked +git status --short +# expected: A docs/flow-pack/commands/flow-epics.md (and possibly M flow-prime.md unstaged) +``` + +### Level 2: Fresh-clone recovery reproduction +```bash +# simulate a fresh-clone by removing the local install, then regenerate +rm -f .claude/commands/flow/flow-epics.md +cp docs/flow-pack/commands/*.md .claude/commands/flow/ + +# verify recovery reproduces the file byte-for-byte +diff -q docs/flow-pack/commands/flow-epics.md .claude/commands/flow/flow-epics.md \ + && echo "OK: recovery reproduces local install" + +# both flow-prime and flow-epics should be present after cp +ls .claude/commands/flow/ +# expected: flow-epics.md flow-prime.md +``` + +### Level 3: Smoke test — dry-run against live umbrella #368 +```bash +# In a Claude Code session, invoke: +# /flow-epics 368 +# +# Verify the printed output contains ALL of the following (no gh writes yet): +# ✅ Inventory table is shown with correct issue numbers for E1–E4 +# ✅ E5 Release gate shown as ⏭️ SKIP (deferred) — NOT created, NOT in dry-run queue +# ✅ Any UNLINKED epics (E2/E3/E4) shown in the dry-run pending section +# ✅ Dry-run block echoes the exact gh api POST command with --header "GitHub-Next-Preview: true" +# ✅ "Type 'approve' to link" gate appears before any write executes +# ✅ Command ends with "→ Next: base_prp:prp-create" for each open epic +# +# This is interactive verification — no automated assertion. +# After confirming the dry-run, optionally approve the link operations for E2/E3/E4. +``` + +## Tests / checks required +- [ ] Level 1: file-presence + gitignore + no-drift — all assertions pass. +- [ ] Level 2: recovery reproduces local install from tracked template. +- [ ] Level 3: `/flow-epics 368` dry-run shows correct inventory; E5 shows ⏭️ SKIP. +- [ ] `docs/flow-pack/commands/flow-epics.md` contains provenance HTML comment: + `grep "provenance:" docs/flow-pack/commands/flow-epics.md` +- [ ] All 8 required sections present in the command spec: + `grep -E "^## (Objective|Arguments|Process|Epic body template|Output format|Reuse-map)" docs/flow-pack/commands/flow-epics.md | wc -l` + → must print 6 (+ frontmatter and provenance = 8 total) +- [ ] Sub-issue link command uses `gh api ... --header "GitHub-Next-Preview: true"`: + `grep "GitHub-Next-Preview" docs/flow-pack/commands/flow-epics.md` +- [ ] Idempotent check uses `--jq '.[0].number // "none"'`: + `grep '"none"' docs/flow-pack/commands/flow-epics.md` +- [ ] No mention of E2/E3/E5 implementation logic in flow-epics.md — E5 is only shown as SKIP. +- [ ] Standard repo gates unaffected (markdown-only change): + `uv run ruff check . && uv run mypy app/ && uv run pytest -v -m "not integration"` + → all must be green (unchanged by this PRP) + +## Final Validation Checklist +- [ ] 2 files created: `docs/flow-pack/commands/flow-epics.md` (tracked) and + `.claude/commands/flow/flow-epics.md` (local, gitignored). +- [ ] Durable-source split holds: docs tracked, .claude ignored + regenerable from `cp`. +- [ ] Command spec is self-contained: an agent reading only `docs/flow-pack/commands/flow-epics.md` + and codebase can implement `/flow-epics` correctly without additional context. +- [ ] All write-discipline invariants encoded in the Process section: + dry-run echo → idempotent check → approval gate → rate-delay → confirm. +- [ ] Sub-issue REST API used correctly: + `gh api ... -X POST -F sub_issue_id=N --header "GitHub-Next-Preview: true"` for writes, + GraphQL for reads. +- [ ] E5 Release gate hard-coded as ⏭️ SKIP (never auto-created). +- [ ] E2/E3 behavior not included in flow-epics.md. +- [ ] Branch is `feat/flow-pack-e4-flow-epics` off `dev`; commit references `(#373)`. +- [ ] No commit/push performed by this PRP execution unless explicitly requested by the user. +- [ ] `uv.lock` + `docker-compose.lan.yml` + unstaged `flow-prime.md` left untouched. +- [ ] Provenance header present in `docs/flow-pack/commands/flow-epics.md`. + +## Anti-Patterns to Avoid +- ❌ Using `gh` CLI native sub-issue support or any undocumented extension — `gh api POST` directly. +- ❌ Omitting `--header "GitHub-Next-Preview: true"` from the sub-issue POST call. +- ❌ Skipping the dry-run echo before any `gh` write. +- ❌ Auto-proceeding after dry-run without waiting for "approve". +- ❌ Auto-creating E5 (Release gate) — explicitly deferred per #373 scope. +- ❌ Writing app/backend/frontend code (this is a docs-only PRP). +- ❌ Staging `uv.lock` / `docker-compose.lan.yml` / `flow-prime.md`. +- ❌ Treating `.claude/commands/flow/flow-epics.md` as the committed source of truth. +- ❌ Letting the local install drift from the tracked template after writing. +- ❌ Implementing E2 (/flow-brainstorm) or E3 (/flow-umbrella) behavior here. + +--- + +## Confidence Score: 7/10 + +One-pass likelihood is moderate-to-high: methodology is fully documented, `gh api` patterns are +verified in the umbrella rule and methodology doc, and this is markdown-only (no runtime/type +risk). −3 for: + +1. **Decomposition parser complexity** — the umbrella body uses varied text patterns (bold labels, + embedded `#N` refs, "(deferred)" markers, "not yet created" prose); the implementing agent + must handle all variants without getting any text extraction subtly wrong. +2. **Two-phase idempotency** — `exists?` + `linked?` are independent checks that must compose + correctly (four states: exists+linked, exists+unlinked, not-exists, deferred). Getting one + case wrong silently skips a link or attempts a duplicate create. +3. **Authoring a ~150-line markdown spec** — the command file is long and the implementing agent + must maintain section ordering, template exactness (epic blockquote wording), and style + consistency with `flow-prime.md` throughout. diff --git a/PRPs/flow-brainstorm.md b/PRPs/flow-brainstorm.md new file mode 100644 index 00000000..2060abc2 --- /dev/null +++ b/PRPs/flow-brainstorm.md @@ -0,0 +1,704 @@ +name: "flow-pack E2 — /flow-brainstorm command" +description: | + Implement the /flow-brainstorm command: the V1-naive-plan → 3-read-only-agent-research → + 5-dimensional-score → V2-ship/defer pipeline for the flow-pack methodology suite. + Delivers two files (tracked template + local install); no backend, frontend, + migration, or runtime changes. + +**Issue:** #371 | **Umbrella:** #368 | **Branch:** `feat/flow-brainstorm-command` +**Depends on:** E1 #369 merged (flow-prime live, `docs/flow-pack/commands/` exists, labels/milestone created). +**Working-tree caveat:** `docker-compose.lan.yml` (untracked) + `uv.lock` (M) are pre-existing — do NOT stage either. + +--- + +## Goal + +Implement `/flow-brainstorm` as E2 of the flow-pack suite. Deliverables are two files: + +| File | Action | Role | +|------|--------|------| +| `docs/flow-pack/commands/flow-brainstorm.md` | CREATE | Tracked canonical template — committed, source of truth | +| `.claude/commands/flow/flow-brainstorm.md` | CREATE | Local runtime install — gitignored, byte-copy of tracked template | + +No `app/`, `frontend/`, `alembic/`, or any runtime code is touched. No GitHub writes. +No E3 (`/flow-umbrella`) or E4 (`/flow-epics`) behavior is implemented. + +### Success Criteria + +- [ ] `docs/flow-pack/commands/flow-brainstorm.md` exists, committed under `docs(repo): add /flow-brainstorm command — E2 of flow-pack suite (#371)` +- [ ] `.claude/commands/flow/flow-brainstorm.md` is a byte-copy (`diff` exits 0) +- [ ] Command file follows the exact structure of `docs/flow-pack/commands/flow-prime.md` (frontmatter, provenance comment, numbered steps, output format block, $ARGUMENTS section) +- [ ] All 5 scoring dimensions defined: Value, Risk, Readiness, Complexity, Evidence (1–10 each, max 50) +- [ ] Score bands enforced: ≥ 40 → SHIP, 36–39 → NEGOTIATE (human gate), < 36 → DEFER with written reason +- [ ] Exactly 3 read-only subagents: A (Known Issues), B (Best Practices), C (Dependencies) +- [ ] `.flow/brainstorm-log.md` append rules documented (append-only, round-numbered) +- [ ] Human approval gate printed before next-command pointer +- [ ] Next-command pointer: `→ Next: /flow-umbrella <initiative>` +- [ ] Only `docs/flow-pack/commands/flow-brainstorm.md` committed; `.claude/` never staged + +--- + +## Why + +The flow-pack pipeline (`docs/flow-pack-methodology.md`) is a 4-command chain: + +``` +/flow-prime → /flow-brainstorm → /flow-umbrella → /flow-epics +``` + +E1 (flow-prime, PR #370) is merged. E2 delivers the second link: it turns a baseline snapshot +into a scored, human-approved V2 ship/defer list ready for `/flow-umbrella`. Without it, the +pipeline breaks at the first handoff — a user can prime but cannot plan. + +The command is pure tooling: no application code, no database changes, no new runtime +dependencies. It encodes the "V1 → critique → 3-agent research → 5-dim score → V2" pattern +defined in `docs/flow-pack-methodology.md § /flow-brainstorm`. + +--- + +## What + +### Behavior summary (what the command does when invoked) + +1. Reads initiative description from `$ARGUMENTS` or falls back to `.flow/state.md` "Gap" +2. Produces **V1** — flat bullet list of 5–10 items, from baseline alone, unscored, labeled "V1" +3. Applies **critique gate** — tags each V1 item with `{assumption, scope-creep, no-evidence}`; does NOT modify V1 +4. Spawns **exactly 3 read-only research subagents** (Agent tool) in parallel: + - Agent A — Known Issues: open bugs and prior incidents relevant to V1 + - Agent B — Best Practices: docs, existing skills, reuse candidates + - Agent C — Dependencies: blockers, upstream availability, API confirmation +5. **Scores** every V1 item on 5 dimensions (1–10 each, max 50) +6. Applies **score-band rule**: ≥ 40 SHIP · 36–39 NEGOTIATE (stop for human) · < 36 DEFER +7. Waits for human approval on negotiate-zone items before constructing V2 +8. Produces **V2**: ship list + defer list with explicit one-clause reasons + X/10 confidence +9. **Appends** decision trail (V1, scores, defer reasons) to `.flow/brainstorm-log.md` +10. Waits for **human approval gate** on the V2 list +11. Prints gate result + `→ Next: /flow-umbrella <initiative>` + +### What the command does NOT do + +- Does not create GitHub issues (E3 /flow-umbrella) +- Does not generate 7-field umbrella bodies (E3) +- Does not link sub-issues (E4 /flow-epics) +- Does not write to `.flow/state.md` (that is owned by /flow-prime) +- Does not make any GitHub writes before explicit human approval + +--- + +## All Needed Context + +### Documentation & References + +```yaml +- file: docs/flow-pack-methodology.md + section: "§ /flow-brainstorm — V1 → score → V2" and "§ Invariants" + why: The AUTHORITATIVE spec. Contains the 5 dimensions, score bands, 3-agent mandates, + and invariants. Read this section before writing the command — the PRP quotes the + key facts but the methodology doc is the single source of truth. + +- file: docs/flow-pack/commands/flow-prime.md + why: The CANONICAL PATTERN for flow: command files. Mirror its structure exactly: + YAML frontmatter (description:), HTML provenance comment block, ## heading, + ## Objective paragraph, ## Process numbered steps, !-prefix bash commands, + ## Output Format fenced block, ## Arguments section. Do not invent structure. + +- file: .flow/brainstorm-log.md + why: The EXISTING append log (created during E1 dogfood). Shows the exact format to + replicate: provenance comment on creation, ## Round N — YYYY-MM-DD heading, + V1 bullets, critique flags, 5-dim score notation "Value/Risk/Readiness/Complexity/Evidence", + SHIP/NEGOTIATE/DEFER notation, user-response line. The command must append in exactly + this format. + +- file: .flow/state.md + why: Input to the command at runtime. "You are here" and "Gap" sections provide initiative + context when $ARGUMENTS is absent. Also holds the phase status the command should + update to "[x] Phase N — /flow-brainstorm done" (see Update rules below). + +- file: .claude/rules/umbrella-issue.md + why: Downstream contract — shows what /flow-umbrella expects as input from /flow-brainstorm. + Confirms the V2 ship list is the ONLY durable output; V1 + scores are transient + working-state artifacts (invariant: "V1 is transient"). +``` + +### Desired codebase tree + +``` +docs/flow-pack/commands/ + flow-prime.md ← existing (pattern reference, DO NOT modify) + flow-brainstorm.md ← CREATE (tracked canonical template) + +.claude/commands/flow/ + flow-prime.md ← existing + flow-brainstorm.md ← CREATE (byte-copy of tracked template) + +.flow/ ← existing (runtime working dir, NOT committed) + state.md ← existing (input at runtime) + brainstorm-log.md ← existing (append target at runtime) +``` + +### Known Gotchas + +``` +# GOTCHA 1: .claude/ is gitignored — NEVER commit the local install +# docs/flow-pack/commands/flow-brainstorm.md = durable source of truth (commit this) +# .claude/commands/flow/flow-brainstorm.md = regenerable local install (do NOT commit) +# Source: docs/flow-pack-methodology.md § "Durable-source split" +# Recovery line: cp docs/flow-pack/commands/flow-brainstorm.md .claude/commands/flow/flow-brainstorm.md +# This MUST appear in the provenance comment of the command file. + +# GOTCHA 2: Subagents are PROSE instructions, NOT ! bash commands +# The 3 research subagents are invoked via the Agent tool, described as prose in the +# command file: "Spawn 3 read-only research subagents in parallel (Agent tool): ..." +# They are NOT `!`-prefixed lines. A `!` prefix runs bash; Agent tool invocations +# are instructional prose that Claude follows when executing the command. +# Example from flow-prime.md: !`git log -5 --oneline` is bash. +# "Spawn Agent A (Known Issues) with the following prompt: ..." is agent invocation prose. + +# GOTCHA 3: brainstorm-log.md is APPEND-ONLY (NOT HTML markers like state.md) +# state.md uses <!-- FLOW-PRIME:...:START/END --> marker pairs (replacement strategy). +# brainstorm-log.md is different — each run appends a NEW ## Round N section. +# NEVER overwrite or replace previous rounds. +# Update rule in the command file must say: +# - File absent → create with provenance header + ## Round 1 section +# - File exists → count current max N, append ## Round (N+1) — <date> + +# GOTCHA 4: Score-band NEGOTIATE (36–39) requires a HARD STOP +# Items in 36–39 MUST be surfaced to the human before constructing V2. +# The command must STOP and present the negotiate items with their scores and +# one-sentence rationale, asking the human to decide: ship or defer. +# Only after the human responds does V2 get constructed. +# Do NOT auto-ship negotiate items. Per methodology § Invariants: "Score bands are hard." + +# GOTCHA 5: Every DEFER item MUST have an explicit one-clause written reason +# Per methodology § Invariants: "Every defer has a reason. A defer item with no +# written reason is a process failure." +# Format: "<item title> (score: X/50): DEFER — <one clause reason>" +# Acceptable reason: "DEFER — overlaps existing analyzing-ai-repos skill; revisit if a +# future initiative needs deep reverse-engineering." +# NOT acceptable: "DEFER" alone, or "DEFER — not needed now" + +# GOTCHA 6: V1 must be explicitly labeled "V1" and UNSCORED +# Per methodology: "labeled 'V1' explicitly" and the item list is "unscored". +# Do not add dimension scores in V1. V1 is the raw brainstorm before research. +# The critique gate TAGS items (flags) but does not score them. + +# GOTCHA 7: Critique flags are LABELS, NOT FIXES +# The critique gate attaches zero or more flags to each V1 item: +# assumption — relies on an unverified fact +# scope-creep — touches E3/E4/E5 or out-of-scope systems +# no-evidence — no concrete codebase grounding for the need +# The flags focus research agents but do NOT change V1 text. + +# GOTCHA 8: $ARGUMENTS fallback chain +# 1. If $ARGUMENTS is non-empty, use it as the initiative description. +# 2. Else, read .flow/state.md "Gap" line. +# 3. Else, ask the user: "What initiative should I brainstorm? (1–3 sentences)" + +# GOTCHA 9: The command file IS the agent instruction — no code runs +# Unlike a Python module or TypeScript file, the command file is read by Claude Code +# and followed as instructions. "Implementation" = writing the markdown correctly. +# The PRP's task is to specify the exact content of that markdown file. +``` + +--- + +## Implementation Blueprint + +There are two tasks: (1) write the tracked template file, (2) create the local install copy. Task 3 is the commit. + +--- + +### Task 1: Write `docs/flow-pack/commands/flow-brainstorm.md` + +Write this exact file. Every section is specified below. Mirror the structure of +`docs/flow-pack/commands/flow-prime.md` — do not invent new structural conventions. + +--- + +**File content spec** (write verbatim, substituting `<YYYY-MM-DD>` with today): + +```markdown +--- +description: V1 naive plan → 3-read-only-agent research → 5-dim score → V2 ship/defer list +--- + +<!-- provenance: flow-pack methodology stage 2 (V1 → V2 planning pipeline). + Source of truth: docs/flow-pack/commands/flow-brainstorm.md (tracked). + Local install: .claude/commands/flow/flow-brainstorm.md (gitignored, regenerable from this file). + Recovery: cp docs/flow-pack/commands/flow-brainstorm.md .claude/commands/flow/flow-brainstorm.md + Full methodology: docs/flow-pack-methodology.md --> + +# flow-brainstorm: V1 → Score → V2 + +## Objective + +Turn a baseline initiative description into a scored, human-approved V2 ship/defer list ready +for `/flow-umbrella`. Produces three outputs: + +1. **V1** — flat bullet list of 5–10 candidate items, from baseline alone, unscored, labeled "V1". +2. **V2** — approved ship list + explicit defer list + X/10 one-pass confidence score. +3. **Log entry** — full decision trail appended to `.flow/brainstorm-log.md`. + +The three read-only research subagents are the engine of this command. Claude spawns exactly 3 +(Agent A — Known Issues, Agent B — Best Practices, Agent C — Dependencies) via the Agent tool, +waits for all three, then synthesizes their findings into the score table. + +This command makes NO GitHub writes. It ends by printing the approved V2 list and the next-command +pointer. All GitHub writes (issue creation, labeling, linking) belong to E3 `/flow-umbrella`. + +**DELEGATION:** Do not re-implement codebase priming. If the baseline context needs refreshing, +run `/flow-prime` first. + +## Process + +### 1. Read baseline context + +!`ls .flow/ 2>/dev/null || echo "(no .flow/ directory yet)"` + +Determine the initiative description: +- If `$ARGUMENTS` is non-empty → use it. +- Else → read `.flow/state.md` and extract the "Gap" line from the "You are here" section. +- Else → ask the user: "What initiative should I brainstorm? Provide 1–3 sentences." + +Read `.flow/brainstorm-log.md` (if it exists) to determine the current round count. The new +round will be Round N+1 (or Round 1 if the file does not exist yet). + +!`test -f .flow/brainstorm-log.md && grep -c "^## Round" .flow/brainstorm-log.md || echo "0"` + +### 2. Produce V1 — naive plan (UNSCORED) + +Generate a flat bullet list of 5–10 candidate items **from baseline knowledge only** — no research +yet. Every item must be: + +- **Unscored** — no dimension scores; plain text only. +- **Labeled "V1"** — the section heading must read `## V1 — Naive Plan (N items, unscored)`. +- **Descriptive** — format: `- <item title>: <one-sentence description of what and why>`. + +Coverage heuristics: include obvious high-value items, known technical debt, upstreams that may +be blocked, and at least one item that is likely out of scope (to stress-test the critique gate). + +### 3. Critique gate — tag V1 items (do NOT fix them) + +For each V1 item, attach zero or more flags. Flags are labels only — do not change V1 text. + +| Flag | When to apply | +|------|---------------| +| `assumption` | Relies on a fact not verified against the codebase or docs | +| `scope-creep` | Touches E3/E4/E5 behavior or an out-of-scope system | +| `no-evidence` | No concrete codebase grounding for the stated need | + +Present as: `- <item title> [assumption, scope-creep]` or `- <item title> [none]`. + +The flags guide the research agents. An `assumption`-flagged item means "Agent A should verify +this claim." A `scope-creep` flag means "Agent B should confirm boundaries." + +### 4. Spawn 3 read-only research subagents in parallel + +Invoke the **Agent tool** to spawn all three concurrently. Each subagent is read-only — it MUST +NOT write files or make GitHub writes. Pass the V1 items + critique flags in the prompt. + +**Agent A — Known Issues** + +Prompt: +``` +You are a read-only research agent. You MUST NOT write files or make GitHub writes. + +Initiative: <initiative-description> +V1 items (with critique flags): <paste V1 list with flags> + +Task: Read the open GitHub issues, recent git log, and .flow/state.md. +Report: + 1. Which V1 items are blocked by or related to open issues? (cite #N) + 2. Which V1 items are partially done (recent branches/PRs touching them)? + 3. Which V1 `assumption` flags are contradicted by known incidents or bugs? + +Output: concise bullet list, #N refs where applicable. Read-only. +``` + +**Agent B — Best Practices** + +Prompt: +``` +You are a read-only research agent. You MUST NOT write files or make GitHub writes. + +Initiative: <initiative-description> +V1 items (with critique flags): <paste V1 list with flags> + +Task: Read CLAUDE.md, AGENTS.md, docs/flow-pack-methodology.md, and .claude/rules/. +Report: + 1. Which V1 items align with or contradict current best practices? + 2. Which V1 items are already covered by an existing skill or command? (reuse opportunity) + 3. Which V1 `scope-creep` flags are confirmed — item truly belongs to E3/E4/E5? + +Output: concise bullet list. Read-only. +``` + +**Agent C — Dependencies** + +Prompt: +``` +You are a read-only research agent. You MUST NOT write files or make GitHub writes. + +Initiative: <initiative-description> +V1 items (with critique flags): <paste V1 list with flags> + +Task: Read pyproject.toml, frontend/package.json, docker-compose.yml, +and docs/_base/API_CONTRACTS.md. +Report: + 1. Which V1 items have unresolved upstream dependencies or API blockers? + 2. Which V1 `no-evidence` flags are confirmed — no codebase grounding found? + 3. Any dependency pinning or version conflicts that affect V1 items? + +Output: concise bullet list. Read-only. +``` + +Wait for all three agents before proceeding. + +### 5. Score V1 items on 5 dimensions + +Use agent findings as evidence for the Evidence dimension. Score each item 1–10 per dimension: + +| Dimension | 1 = low | 10 = high | Evidence dimension note | +|-----------|---------|-----------|------------------------| +| **Value** | Cosmetic / irrelevant | Core user outcome | — | +| **Risk** | Low risk, well-understood | High risk, many unknowns | Higher Risk = lower desirability | +| **Readiness** | Many blockers open | All upstreams clear | Blocked = lower score | +| **Complexity** | Trivial | Enormous effort | Higher Complexity = lower desirability | +| **Evidence** | Pure assumption | Fully verified by agents | Directly from agent reports | + +Note: Risk and Complexity score INVERSELY — a low-risk, low-complexity item scores 9–10, not 1–2. +(A high-risk item is less desirable, so it scores lower on the Risk dimension.) + +Present the score table: + +``` +| Item | Value | Risk | Readiness | Complexity | Evidence | Total | Band | +|------|-------|------|-----------|------------|----------|-------|------| +| ... | 8 | 7 | 9 | 6 | 9 | 39 | 🟡 NEGOTIATE | +``` + +Band indicators: +- `✅ SHIP` — total ≥ 40 +- `🟡 NEGOTIATE` — total 36–39 (requires human decision before V2) +- `❌ DEFER` — total < 36 (requires explicit one-clause written reason) + +### 6. Handle negotiation zone (36–39 items) + +If any items score 36–39, **STOP and surface to human** before constructing V2: + +``` +N item(s) are in the negotiation zone (score 36–39): + + - <item>: score 38. Rationale: <one sentence from agent reports>. + Research note: Agent B flagged this as covered by an existing skill (reuse potential). + +Decision needed for each item — respond 'ship', 'defer', or 'defer: <reason>': +``` + +Wait for human response for each negotiate item. Record the decision in the round log. + +If all items are SHIP or DEFER, skip this step. + +### 7. Produce V2 — ship list and defer list + +**V2 ship list** (items scoring ≥ 40, plus negotiate items the human shipped): + +``` +## V2 — Ship List + +1. <item title> (score: X/50): <one-sentence rationale drawing on agent evidence> +2. ... +``` + +**Defer list** (items scoring < 36, plus negotiate items the human deferred): + +``` +## Defer List + +- <item title> (score: X/50): DEFER — <explicit one-clause reason> +``` + +Every defer item MUST have an explicit reason. "DEFER — not needed now" is not acceptable. +Good example: "DEFER — overlaps the existing `analyzing-ai-repos` skill; fold into /flow-prime +if deep external analysis is needed." + +**One-pass confidence score** on the V2 ship list: + +``` +One-pass confidence: X/10 — <one sentence: what gives confidence and what remains uncertain> +``` + +### 8. Append to `.flow/brainstorm-log.md` + +Update rules: +- **File absent** → create with provenance header + `# /flow-brainstorm — decision log` + first round section. +- **File exists** → count existing `## Round` headings, append `## Round (N+1) — <date>`. +- **NEVER overwrite previous rounds.** The log is append-only. + +Provenance header (write only on creation): +``` +<!-- provenance: /flow-brainstorm decision trail. Append-only. NOT committed. --> +# /flow-brainstorm — decision log +``` + +Round section format (exact fields — one paragraph per field, bold label): + +```markdown +## Round N — YYYY-MM-DD + +**Initiative:** <initiative description> +**V1 (N items, unscored):** (1) <item1> (2) <item2> ... +**Critique flags:** <"item title [flags]" for flagged items, or "none"> +**Research:** spawned 3 read-only subagents (A Known Issues, B Best Practices, C Dependencies) +**Agent findings (evidence-backed):** +- A: <key findings, one line> +- B: <key findings, one line> +- C: <key findings, one line> +**5-dim scores (Value/Risk/Readiness/Complexity/Evidence, ≥40 ship):** +- <item title> V/R/Re/C/E=total ✅ SHIP / 🟡 NEGOTIATE → <decision> / ❌ DEFER +**V2 SHIP:** <item1>, <item2>, ... **DEFER:** <item> — <reason>; ... +**One-pass confidence:** X/10 — <rationale> +**User response:** <what the human decided at the approval gate> +``` + +### 9. Human approval gate + +Print V2 ship list and defer list in full. Print the one-pass confidence score. + +``` +──────────────────────────────────────────── + Approve V2 ship list? + 'approve' → write log entry + print next-command pointer + 'revise: <instruction>' → adjust scores or categorizations +──────────────────────────────────────────── +``` + +After human approves, write the log entry (Step 8) with `User response: approved`. + +### 10. Gate result and next-command + +Print using the Output Format below. + +## Output Format + +``` +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + 💡 flow-brainstorm: V1 → Score → V2 +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + +📋 Baseline Context + Initiative: <description> + Source: [.flow/state.md gap | $ARGUMENTS] + Brainstorm round: N (log entry Round N appended) + +📋 V1 — Naive Plan (N items, unscored) + 1. <item title>: <one-sentence description> [flags or none] + 2. ... + +📋 Research (3 agents — parallel) + Agent A (Known Issues): <2-line summary> + Agent B (Best Practices): <2-line summary> + Agent C (Dependencies): <2-line summary> + +📋 Scoring + | Item | V | R | Re | C | E | Total | Band | + |------|----|----|----|----|----|-------|------| + ... + +📋 V2 — Approved List + Ship (N items): <item1>, <item2>, ... + Defer (M items): <item> — <reason>; ... + One-pass confidence: X/10 + +──────────────────────────────────────────── + ✅ V2 APPROVED → .flow/brainstorm-log.md updated (Round N) +──────────────────────────────────────────── + +→ Next: /flow-umbrella <initiative> +``` + +## Arguments + +`$ARGUMENTS` — the initiative description, passed as free text +(e.g., `/flow-brainstorm add batch forecasting to the system`). +If omitted, the command falls back to `.flow/state.md` Gap line; if state.md is absent, +asks the user directly. Passed through to the gate result and the next-command pointer. +``` + +--- + +### Task 2: Create `.claude/commands/flow/flow-brainstorm.md` (local runtime install) + +After writing Task 1, create the local install as a byte-copy: + +```bash +cp docs/flow-pack/commands/flow-brainstorm.md .claude/commands/flow/flow-brainstorm.md +``` + +Verify no drift: + +```bash +diff docs/flow-pack/commands/flow-brainstorm.md .claude/commands/flow/flow-brainstorm.md \ + && echo "OK — identical" || echo "DRIFT DETECTED — fix before proceeding" +``` + +The local install MUST NOT be committed (`.claude/` is gitignored). Its only purpose is to +make `/flow:flow-brainstorm` available in Claude Code for the current working session. + +--- + +### Task 3: Commit the tracked template only + +Stage ONLY the tracked template: + +```bash +# Verify issue #371 is open before committing +gh issue view 371 --json state --jq '.state' # must return "OPEN" + +# Stage ONLY the tracked template +git add docs/flow-pack/commands/flow-brainstorm.md + +# Verify staged files — must show only the tracked template +git diff --cached --name-only + +# Commit +git commit -m "docs(repo): add /flow-brainstorm command — E2 of flow-pack suite (#371)" +``` + +**Do NOT stage:** +- `.claude/commands/flow/flow-brainstorm.md` (gitignored — correct that `git add` ignores it) +- `uv.lock` (pre-existing modification unrelated to this PRP) +- `docker-compose.lan.yml` (local-only untracked file) + +--- + +## Validation Loop + +### Level 1: File existence and byte-identity + +```bash +# Both files must exist +test -f docs/flow-pack/commands/flow-brainstorm.md && echo "tracked: OK" || echo "tracked: MISSING" +test -f .claude/commands/flow/flow-brainstorm.md && echo "local: OK" || echo "local: MISSING" + +# No drift +diff docs/flow-pack/commands/flow-brainstorm.md .claude/commands/flow/flow-brainstorm.md \ + && echo "identical: OK" || echo "DRIFT: fix with cp" +``` + +### Level 2: Content completeness + +```bash +F=docs/flow-pack/commands/flow-brainstorm.md + +# Frontmatter +head -3 "$F" | grep -q "description:" && echo "frontmatter: OK" || echo "frontmatter: MISSING" + +# Provenance comment — must match flow-prime pattern +grep -q "Source of truth: docs/flow-pack/commands/flow-brainstorm.md" "$F" && echo "provenance: OK" + +# All 5 scoring dimensions +for dim in Value Risk Readiness Complexity Evidence; do + grep -q "\*\*${dim}\*\*\|${dim} |${dim}:" "$F" && echo "dim-${dim}: OK" || echo "dim-${dim}: MISSING" +done + +# Score bands +grep -q "≥ 40\|>= 40\|≥40" "$F" && echo "band-40: OK" || echo "band-40: MISSING" +grep -q "36–39\|36-39" "$F" && echo "band-36-39: OK" || echo "band-36-39: MISSING" +grep -q "< 36\|<36" "$F" && echo "band-36: OK" || echo "band-36: MISSING" + +# Exactly 3 named agents +grep -c "Agent [ABC]" "$F" | xargs -I{} sh -c '[ {} -ge 3 ] && echo "3-agents: OK" || echo "3-agents: MISSING"' + +# Append-only rule for brainstorm-log +grep -q "append\|Append-only" "$F" && echo "append-rule: OK" || echo "append-rule: MISSING" + +# Next-command pointer +grep -q "flow-umbrella" "$F" && echo "next-cmd: OK" || echo "next-cmd: MISSING" + +# $ARGUMENTS section +grep -q "ARGUMENTS\|\$ARGUMENTS" "$F" && echo "args: OK" || echo "args: MISSING" +``` + +### Level 3: Commit integrity + +```bash +# Commit message format +git log -1 --format='%s' | grep -E "^docs\(repo\): add /flow-brainstorm command" \ + && echo "commit-msg: OK" || echo "commit-msg: WRONG" + +# Issue reference +git log -1 --format='%s' | grep -q "#371" && echo "issue-ref: OK" || echo "issue-ref: MISSING" + +# .claude/ not committed +git show --name-only HEAD | grep ".claude/" \ + && echo "ERROR: .claude/ committed — must not be" || echo ".claude/-clean: OK" + +# uv.lock not committed +git show --name-only HEAD | grep "uv.lock" \ + && echo "ERROR: uv.lock committed — unstage it" || echo "uv.lock-clean: OK" + +# Only the tracked template committed +git show --name-only HEAD | grep -v "^commit\|^Author\|^Date\|^$\|^ " \ + | grep -v "^docs/flow-pack/commands/flow-brainstorm.md$" \ + && echo "UNEXPECTED FILES in commit" || echo "commit-scope: OK" +``` + +### Level 4: Functional smoke test (manual, after commit) + +In Claude Code, run: +``` +/flow:flow-brainstorm test initiative +``` + +Verify sequentially: +1. ✅ V1 produced: 5–10 items, labeled "V1 — Naive Plan", no dimension scores visible +2. ✅ Critique flags applied: each item annotated with `[assumption|scope-creep|no-evidence|none]` +3. ✅ 3 subagents spawned: Agent A, Agent B, Agent C appear in Agent tool output +4. ✅ Score table produced: all 5 columns (V, R, Re, C, E) + Total + Band +5. ✅ Score bands applied: items labeled SHIP / NEGOTIATE / DEFER +6. ✅ Negotiate gate fires (if any 36–39 items): human asked before V2 constructed +7. ✅ Defer items carry explicit one-clause reasons +8. ✅ `.flow/brainstorm-log.md` appended with new `## Round N` section +9. ✅ Approval gate prints before next-command pointer +10. ✅ Gate result ends with `→ Next: /flow-umbrella <initiative>` + +--- + +## Final Validation Checklist + +- [ ] `docs/flow-pack/commands/flow-brainstorm.md` exists, committed, correct branch +- [ ] `.claude/commands/flow/flow-brainstorm.md` is a byte-copy (`diff` exits 0) +- [ ] Frontmatter `description:` present in tracked template +- [ ] Provenance comment matches flow-prime.md pattern (source-of-truth, recovery cp line) +- [ ] All 5 scoring dimensions named: Value, Risk, Readiness, Complexity, Evidence +- [ ] Score bands documented: ≥ 40 SHIP · 36–39 NEGOTIATE · < 36 DEFER +- [ ] Negotiate zone requires human stop — not auto-shipped +- [ ] Every DEFER item carries explicit one-clause reason (invariant from methodology) +- [ ] Exactly 3 subagents: A (Known Issues), B (Best Practices), C (Dependencies) +- [ ] Subagent prompts are read-only — "MUST NOT write files or make GitHub writes" +- [ ] `.flow/brainstorm-log.md` append rules documented (append-only, round-numbered) +- [ ] Brainstorm log: provenance header only on creation; round section on every run +- [ ] Human approval gate documented before the next-command pointer +- [ ] Next-command pointer: `→ Next: /flow-umbrella <initiative>` +- [ ] `$ARGUMENTS` fallback chain documented (args → state.md gap → ask user) +- [ ] Only `docs/flow-pack/commands/flow-brainstorm.md` committed; `.claude/` never staged +- [ ] `uv.lock` and `docker-compose.lan.yml` NOT staged +- [ ] Commit message: `docs(repo): add /flow-brainstorm command — E2 of flow-pack suite (#371)` +- [ ] `gh issue view 371 --json state --jq '.state'` returns `"OPEN"` before commit +- [ ] Level 1–3 validation scripts all pass + +--- + +## Anti-Patterns to Avoid + +- ❌ Don't commit `.claude/commands/flow/flow-brainstorm.md` — it is gitignored and local-only +- ❌ Don't implement GitHub writes (issue create / label / sub-issue link) — E3/E4 scope +- ❌ Don't auto-ship negotiate-zone items (36–39) — always stop and ask the human +- ❌ Don't write a defer item without a one-clause explicit reason — process failure per invariants +- ❌ Don't spawn 2 agents or 4 agents — exactly 3, always +- ❌ Don't use HTML START/END markers in brainstorm-log.md — that pattern belongs to state.md +- ❌ Don't score V1 items before running the research agents — scoring depends on Evidence +- ❌ Don't stage `uv.lock` or `docker-compose.lan.yml` +- ❌ Don't invent new structural conventions for the command file — mirror flow-prime.md exactly +- ❌ Don't use `!` prefix for Agent tool invocations — `!` is for bash, not for subagent spawning From 1f7fd82f1081ef5d0b494ac9e586462c2acb6735 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 5 Jun 2026 00:33:36 +0200 Subject: [PATCH 08/44] fix(repo): correct flow-prime milestone query (#371) --- docs/flow-pack/commands/flow-prime.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/flow-pack/commands/flow-prime.md b/docs/flow-pack/commands/flow-prime.md index 579aff41..c46d4432 100644 --- a/docs/flow-pack/commands/flow-prime.md +++ b/docs/flow-pack/commands/flow-prime.md @@ -46,7 +46,7 @@ Gather the five GitHub categories: !`gh issue list --state open --limit 20 --json number,title,labels --jq '.[] | "#\(.number): \(.title) [\(.labels | map(.name) | join(","))]"'` -!`gh milestone list --json number,title,state --jq '.[] | "#\(.number) \(.title) (\(.state))"'` +!`gh api repos/{owner}/{repo}/milestones --jq '.[] | "#\(.number) \(.title) (\(.state))"'` !`gh label list --json name --jq '[.[].name] | sort | join(", ")'` From 49847133fe6b4efda29f5fcf7703a8481eaed9e2 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Sun, 7 Jun 2026 17:11:29 +0200 Subject: [PATCH 09/44] docs(repo): add onboarding guide, role deep-dives, and architecture diagrams (#368) --- .gitignore | 3 + docs/ONBOARDING.md | 146 ++++++++ docs/_repoKB-deepdive/deepdive-ai-engineer.md | 256 +++++++++++++ .../deepdive-product-manager.md | 230 ++++++++++++ .../deepdive-software-architect.md | 292 +++++++++++++++ .../deepdive-software-developer.md | 281 ++++++++++++++ docs/arch-dia.md | 346 ++++++++++++++++++ docs/arch-techstack.md | 247 +++++++++++++ 8 files changed, 1801 insertions(+) create mode 100644 docs/ONBOARDING.md create mode 100644 docs/_repoKB-deepdive/deepdive-ai-engineer.md create mode 100644 docs/_repoKB-deepdive/deepdive-product-manager.md create mode 100644 docs/_repoKB-deepdive/deepdive-software-architect.md create mode 100644 docs/_repoKB-deepdive/deepdive-software-developer.md create mode 100644 docs/arch-dia.md create mode 100644 docs/arch-techstack.md diff --git a/.gitignore b/.gitignore index f7432f39..062e8379 100644 --- a/.gitignore +++ b/.gitignore @@ -48,3 +48,6 @@ HANDOFF.md # Local CI / dogfood logs and screenshots (per-session, never committed) .ci-logs/ docs/manual_hun/ + +# Understand-Anything knowledge-graph generator state (local-only, multi-MB) +.understand-anything/ diff --git a/docs/ONBOARDING.md b/docs/ONBOARDING.md new file mode 100644 index 00000000..5b814928 --- /dev/null +++ b/docs/ONBOARDING.md @@ -0,0 +1,146 @@ +# ForecastLabAI — Onboarding Guide + +> Generated from the Understand-Anything knowledge graph (`.understand-anything/knowledge-graph.json`). +> Snapshot: commit `1f7fd82` · 860 files · 2,434 graph nodes · 5,011 edges · 8 layers. +> This is a navigational map, not a substitute for `README.md`, `AGENTS.md`, or `docs/_base/*`. + +--- + +## 1. Project Overview + +**ForecastLabAI** is a portfolio-grade, **single-host retail demand-forecasting system** that exercises the full ML lifecycle and runs end-to-end with one `docker compose up`. It covers: data platform → ingest → time-safe feature engineering → forecasting → backtesting → model registry → RAG → agentic layer → React dashboard. + +| | | +|---|---| +| **Backend** | Python 3.12 · FastAPI · SQLAlchemy 2.0 (async) · Pydantic v2 · Alembic · structlog | +| **Database** | PostgreSQL 16 + **pgvector** (vector store lives in the same container — no separate service) | +| **ML / AI** | pandas · NumPy · scikit-learn · LightGBM/XGBoost (opt-in) · PydanticAI · OpenAI · Anthropic · tiktoken | +| **Frontend** | React 19 · TypeScript · Vite · Tailwind CSS · shadcn/ui · TanStack Query/Table · React Router · Recharts | +| **Tooling** | uv (Python) · pnpm (JS) · Docker/Compose · GitHub Actions + release-please · ruff · mypy/pyright `--strict` · pytest/Vitest | + +**Defining traits to internalize early:** +- **Vertical-slice architecture** — every domain lives under `app/features/<slice>/{models,schemas,service,routes,tests}.py`. A slice may **not** import another slice; cross-cutting code goes through `app/core/` or `app/shared/`. +- **Time-safety is the load-bearing invariant** — feature engineering must never read past the caller's `cutoff_date`. `app/features/featuresets/tests/test_leakage.py` is the executable spec; it must never be weakened. +- **Single-host by design** — no managed-cloud SDK in the core path; `docker compose up` is the only prerequisite besides Python + Node. +- **Docs-first** — work flows `INITIAL-*.md` → `PRPs/PRP-*.md` → vertical-slice implementation → CI gates. + +--- + +## 2. Architecture Layers + +The graph groups all 864 file-level nodes into **8 layers**: + +| Layer | Files | What lives here | +|-------|------:|-----------------| +| **Backend Core & Infrastructure** | 67 | `app/core/*` (config, db engine, logging, middleware, problem-details, health) + `app/shared/*` (cross-slice ORM, seeder "The Forge"). The cross-cutting foundation every slice depends on. | +| **Backend Feature Slices** | 261 | The 17+ vertical domain slices under `app/features/` (forecasting, agents, registry, rag, scenarios, backtesting, analytics, batch, demo, …), each self-contained. | +| **Data & Migrations** | 23 | Alembic `versions/*` (forward-only migration chain) + SQL example queries that define/evolve the Postgres+pgvector schema. | +| **Frontend (React SPA)** | 240 | `frontend/src/` — pages, shadcn/ui components, TanStack Query hooks, API client/lib, `types/api.ts`, build config. | +| **Documentation & PRPs** | 202 | `docs/`, ADRs, phase guides, and the `PRPs/` / `INITIAL-*` requirement plans that gate every slice. | +| **CI/CD & Containerization** | 26 | `.github/workflows/*`, Dockerfiles, `docker-compose*.yml`, devcontainer. | +| **Scripts & Demos** | 36 | CLI utilities + demo drivers (`scripts/`, `examples/`) outside the app/frontend trees. | +| **Project Configuration** | 9 | Root tooling/env config (`pyproject.toml`, lockfiles, pre-commit, release-please, `.env.example`). | + +--- + +## 3. Key Concepts & Patterns + +- **The vertical slice (read one to learn all).** `models.py` (SQLAlchemy ORM) → `schemas.py` (Pydantic v2 boundary) → `service.py` (business logic) → `routes.py` (HTTP) → `tests/`. The **registry** slice is the cleanest exemplar. +- **RFC 7807 errors everywhere.** All errors return `application/problem+json` via `app/core/problem_details.py` / `app/core/exceptions.py` — never a bare 500, never an ad-hoc error shape. +- **Config through one door.** Feature code reads `app/core/config.get_settings()` (cached singleton) — never `os.environ` directly. Use `pathlib.Path`, never `os.path`. +- **Async ORM.** `app/core/database.py` owns the async engine, session-maker, `get_db` dependency, and declarative `Base`. Every model inherits `Base`; every service opens a session through `get_db`. +- **Time-safe features.** Lags via `shift(k)`, rolling via `shift(1).rolling(...)`, entity-aware `groupby` — enforced by `test_leakage.py`. +- **Forward-only migrations.** Once an Alembic migration merges, never edit it — add a new one. CI's `migration-check` replays the chain on a fresh DB every PR. +- **HITL agent gate.** Mutating PydanticAI tools (`create_alias`, `archive_run`, `save_scenario`) block on human approval via `agent_require_approval`. Never widen the agent's mutation surface without adding the tool there. +- **Registry trust model.** A run moves `pending → running → success/failed → archived`; an alias may point only to a `success` run; artifacts are SHA-256-verified with path-traversal prevention. + +--- + +## 4. Guided Tour (recommended reading order) + +A 14-step path from entry point to single-host deploy. Each step names the files to open. + +1. **Project Overview** — `README.md` + `AGENTS.md`. The roadmap, stack, validation gates, and vertical-slice brief every later step assumes. +2. **Application Entry Point** — `app/main.py`. FastAPI factory: wires every slice's router, CORS, request-ID middleware, RFC 7807 handlers, lifespan. The bird's-eye map of the backend surface. +3. **Core: Config, DB, Errors** — `app/core/config.py` (cached `get_settings()`), `app/core/database.py` (async engine, `get_db`, `Base`), `app/core/problem_details.py`. Highest-fan-in backend files — breakage cascades. +4. **The Data Platform (Domain Model)** — `app/features/data_platform/models.py`. The 7-table retail core (store/product/calendar/sales_daily/price_history/promotion/inventory) + Phase-2 tables. The vocabulary the whole system speaks; grain = one `sales_daily` row per store × product × date. +5. **Time-Safe Feature Engineering** — `app/features/featuresets/service.py` + `tests/test_leakage.py`. Leakage-prevented lag/rolling/calendar/exogenous/lifecycle features; the test is the spec. +6. **Forecasting & Backtesting** — `forecasting/service.py` + `models.py` (model zoo), `backtesting/splitter.py` (expanding/sliding folds), `backtesting/metrics.py` (MAE/sMAPE/WAPE/bias/RMSE + per-bucket). +7. **A Slice End-to-End: the Model Registry** — `registry/{models,schemas,service,routes}.py` + `storage.py`. Run state machine, comparable-run/feature-frame invariants, aliases, SHA-256 artifact integrity. +8. **Database Migrations** — `alembic/versions/`. Forward-only chain applied via `alembic upgrade head` at container start; CI replays it every PR. +9. **RAG Knowledge Base (pgvector)** — `rag/service.py`. Idempotent (content-hash) indexing + HNSW retrieval inside the same Postgres container; embedding dim is fixed per provider. +10. **The Agentic Layer with HITL** — `agents/service.py`, `deps.py`, `websocket.py`. PydanticAI sessions, streaming, and the human-in-the-loop approval gate for mutating tools. +11. **Frontend Contract & Data Layer** — `frontend/src/types/api.ts` (mirrors backend schemas; most-depended-on file in the repo), `lib/api.ts` (fetch + RFC 7807 → typed `ApiError`), `hooks/use-demo-pipeline.ts`. +12. **A Key Page: the Showcase** — `frontend/src/pages/showcase.tsx` (drives the live demo pipeline in-browser) + `knowledge.tsx` (RAG corpus + semantic search). +13. **The End-to-End Demo Pipeline** — `app/features/demo/pipeline.py`. Capstone: seed → features → train → backtest → register → alias → RAG → agent in-process; mirrors `scripts/run_demo.py`. +14. **Containerization, CI, Config** — `docker-compose.yml`, `Dockerfile.backend`, `.github/workflows/ci.yml` (4 blocking gates), `pyproject.toml`. + +--- + +## 5. File Map — the highest-leverage files + +**Most-depended-on (fan-in) — change these carefully:** + +| File | Importers | Role | +|------|----------:|------| +| `frontend/src/types/api.ts` | 116 | Single source of truth for backend schema types | +| `app/core/config.py` | 68 | Cached settings singleton | +| `app/core/database.py` | 51 | Async engine / session / `Base` | +| `frontend/src/components/ui/button.tsx` | 47 | shadcn primitive | +| `frontend/src/components/ui/card.tsx` | 46 | shadcn primitive | +| `frontend/src/lib/utils.ts` | 44 | FE utility helpers (`cn`, etc.) | +| `app/features/data_platform/models.py` | 43 | De-facto shared ORM layer (all fact-table FKs) | +| `frontend/src/lib/api.ts` | 42 | Fetch wrapper + RFC 7807 parsing | +| `app/core/logging.py` | 41 | structlog setup | +| `app/features/forecasting/schemas.py` | 34 | Forecast train/predict contracts | +| `app/shared/seeder/config.py` | 33 | Seeder scenario presets ("The Forge") | +| `app/main.py` | 28 | Router/middleware wiring hub | + +**By layer (entry points to start reading):** +- **Backend Core** → `app/main.py`, `app/core/{config,database,problem_details,exceptions,logging,middleware,health}.py` +- **Feature Slices** → pick one and read M→S→S→R→T; `registry/` is the model slice, `forecasting/` and `agents/` are the richest +- **Data & Migrations** → `alembic/versions/*` (newest = current schema), `examples/*.sql` +- **Frontend** → `types/api.ts` → `lib/api.ts` → `hooks/*` → `pages/showcase.tsx` +- **Scripts & Demos** → `scripts/run_demo.py`, `scripts/seed_random.py` + +--- + +## 6. Complexity Hotspots — approach carefully + +Files the analyzer rated **complex**. Concentrated in **batch**, **forecasting**, **analytics**, **agents/tools**, and **backtesting**: + +- **Batch slice** — `batch/runner.py` (bounded-concurrency async runner w/ cancel/drain), `batch/service.py`, `batch/models.py`, `batch/tests/test_runner.py`. Concurrency + cancellation semantics; read the tests alongside the code. +- **Forecasting** — `forecasting/models.py` (baseline→regression→LightGBM/XGBoost/prophet-like factory), `forecasting/service.py` (leakage-safe regression matrices), `forecasting/schemas.py` (config union), plus its test suite (`test_service`, `test_models`, `test_feature_metadata`, `test_persistence`, `test_schemas`). +- **Analytics** — `analytics/{service,routes,schemas}.py` + integration tests (SQL GROUP-BY aggregation; date-range validation). +- **Agents / tools** — `agents/tools/registry_tools.py` & `backtesting_tools.py` (HITL-gated mutations), `agents/tests/test_tools.py`. +- **Backtesting** — `backtesting/metrics.py` (metric math + per-bucket aggregation), `backtesting/schemas.py`. +- **Core** — `app/core/exceptions.py` (domain exception hierarchy → RFC 7807 handlers). + +> Gotcha worth flagging: Pydantic v2 **strict mode** on request bodies 422s ISO-string values for `date`/`datetime`/`UUID`/`Decimal` unless the field has `Field(strict=False, ...)` — see `forecasting/tests/test_schemas.py` and `app/core/tests/test_strict_mode_policy.py`. + +--- + +## 7. Getting Started (validation gates) + +```bash +cp .env.example .env # set OPENAI_API_KEY / ANTHROPIC_API_KEY +docker compose up -d # Postgres+pgvector on :5433 +uv sync --extra dev # backend deps (Python 3.12) +uv run alembic upgrade head # migrations +uv run uvicorn app.main:app --reload --port 8123 +cd frontend && corepack enable pnpm && pnpm install && pnpm dev # UI on :5173 +``` + +Run before every commit (all five gate merge in CI): + +```bash +uv run ruff check . && uv run ruff format --check . +uv run mypy app/ && uv run pyright app/ # both --strict +uv run pytest -v -m "not integration" +``` + +`make demo` runs the full end-to-end pipeline; the **Showcase** page (`/showcase`) drives it live in-browser. + +--- + +*Explore interactively:* `/understand-anything:understand-dashboard` · *Ask questions:* `/understand-anything:understand-chat` · *Deep-dive a file:* `/understand-anything:understand-explain`. diff --git a/docs/_repoKB-deepdive/deepdive-ai-engineer.md b/docs/_repoKB-deepdive/deepdive-ai-engineer.md new file mode 100644 index 00000000..4c8ef8b0 --- /dev/null +++ b/docs/_repoKB-deepdive/deepdive-ai-engineer.md @@ -0,0 +1,256 @@ +# Deep Dive: AI Engineer + +## Scope + +This document studies ForecastLabAI through the AI engineer lens: forecasting mechanics, feature safety, model execution, registry semantics, scenario logic, RAG pipeline, agent orchestration, and AI risk controls. + +## 1. Research + +### AI surface area in the repo + +ForecastLabAI has four distinct AI/ML layers: + +1. classical forecasting and feature-engineered prediction +2. evaluation and model selection +3. retrieval-augmented generation +4. tool-using agents with approval-gated mutations + +That separation is important. This is not one generic "AI layer." It is several different reasoning and execution systems sharing one product. + +### Forecasting system + +`app/features/forecasting/service.py` is the central forecasting orchestrator. It handles: + +- training data loading +- model instantiation +- feature-frame handling +- artifact persistence +- prediction +- feature metadata extraction + +The service explicitly documents time-safety and contains version-aware feature-frame logic. That is the right shape for a forecasting platform where leakage is a core product risk. + +### Feature safety model + +The repo treats time-safety as a first-class invariant: + +- lagged features +- shifted rolling windows +- feature-frame contracts in shared code +- leakage tests treated as specification + +This is stronger than many ML repos that mention leakage conceptually but do not operationalize it. Here, time-safety is both an implementation rule and a testing rule. + +### Model inventory and execution + +The forecasting layer supports a mix of: + +- naive and seasonal baselines +- moving average variants +- regression-style feature-aware models +- optional heavier models such as LightGBM and XGBoost behind flags + +This is a sensible AI-engineering tradeoff: + +- cheap baselines for control and comparability +- feature-aware models for richer behavior +- optional advanced learners so the core install stays light + +### Backtesting and model selection + +The repo goes beyond training into disciplined evaluation: + +- time-series CV +- fold metrics +- horizon-bucket metrics +- candidate ranking +- champion workflows +- promotion to aliases + +This matters because the AI story is not "we can forecast," but "we can evaluate, compare, and operationalize forecasts." + +### Scenario simulation + +`app/features/scenarios/service.py` introduces two different planning methods: + +1. heuristic post-forecast adjustment +2. `model_exogenous` re-forecasting for feature-aware models + +That distinction is product-important and technically honest. The code explicitly labels which path is heuristic and which path is model-driven. + +### Registry and artifact lifecycle + +Model runs live in registry tables while artifacts live on disk. The registry tracks: + +- configs +- metrics +- runtime info +- feature frame version and metadata +- aliasing and compare flows + +This is the core reproducibility seam of the ML system. Without it, scenario planning, explainability, and promotion would be much weaker. + +### RAG pipeline + +`app/features/rag/service.py` shows a standard but careful retrieval pipeline: + +- source ingest from content or path +- content hashing for idempotency +- path traversal protection +- chunking by source type +- embedding generation +- pgvector storage +- semantic retrieval with thresholds + +The local/provider-switchable design is especially practical for this repo's single-host identity. + +### Agent system + +`app/features/agents/service.py` orchestrates: + +- session lifecycle +- agent selection +- streaming +- tool execution +- approval state +- token and tool-call accounting + +The service intentionally forces sequential tool execution because all tools share one DB session and concurrent use would violate SQLAlchemy session constraints. That is a good example of AI engineering being informed by infrastructure reality. + +### AI provider control + +`app/core/config.py` and the config slice expose runtime control over: + +- agent default and fallback model +- embedding provider +- embedding dimensions +- Ollama host/model +- approval requirements +- session limits and timeouts + +The ability to switch providers live from the product is one of the stronger AI-platform features in the repo. + +## 2. Compose A Role-Based Plan + +### AI engineer reading plan + +Recommended order: + +1. `app/core/config.py` +2. `app/features/featuresets/*` +3. `app/features/forecasting/service.py` +4. `app/features/backtesting/*` +5. `app/features/model_selection/*` +6. `app/features/registry/*` +7. `app/features/scenarios/*` +8. `app/features/rag/service.py` +9. `app/features/agents/service.py` +10. `app/features/agents/tools/*` +11. `frontend/src/pages/knowledge.tsx` +12. `frontend/src/pages/chat.tsx` + +### AI systems review plan + +Review the repo in these four layers: + +1. Predictive ML + - feature safety + - train/predict contract + - artifact compatibility +2. Evaluation and governance + - backtest output shape + - candidate ranking + - alias lifecycle +3. Retrieval + - ingestion safety + - chunking strategy + - embedding/provider constraints +4. Agents + - tool exposure + - approval gate correctness + - session and streaming behavior + +### High-value improvement plan + +An AI engineer would likely prioritize: + +1. clearer model-bundle versioning and compatibility guarantees +2. stronger observability around token use, retrieval quality, and tool outcomes +3. offline evaluation harnesses for retrieval and agent quality +4. explicit latency and cost reporting per provider and workflow +5. tighter provenance reporting across agent answers and scenario-save actions + +## 3. Validate + +### AI engineering strengths + +1. Leakage is treated as a real systems constraint. +2. Model governance is not an afterthought. +3. Scenario simulation is transparent about heuristic versus model-driven logic. +4. RAG indexing is idempotent and includes path safety checks. +5. Agent mutation is approval-gated. +6. Sequential tool execution avoids unsafe session concurrency. + +### AI engineering risks + +1. Single-host execution couples ML latency to API runtime. +2. Artifact compatibility can become subtle as feature-frame versions evolve. +3. Retrieval quality depends on provider/model settings that can change at runtime. +4. Agent reliability depends on tool schemas, provider behavior, and approval UX all staying aligned. +5. There is limited built-in evaluation telemetry for retrieval and agent quality compared with the maturity of the forecasting layer. + +### AI risk controls already present + +- strict Pydantic validation at boundaries +- explicit provider allow-lists +- approval-required mutation tools +- request/session limits +- timeout and retry controls +- content-hash idempotency for RAG +- no direct unsafe execution of model output + +## 4. Generate + +## Generated AI Engineering Findings + +### What kind of AI system this is + +ForecastLabAI is an applied AI product with two very different forms of intelligence: + +1. deterministic statistical/ML forecasting +2. probabilistic LLM-based reasoning and tool use + +The repo handles them separately enough to stay sane, while still exposing them inside one product. + +### Strongest AI design choices + +1. treating time-safety as non-negotiable +2. keeping baseline models in the product +3. preserving registry and artifact provenance +4. making scenario methods explicit +5. showing the RAG corpus directly in the UI +6. forcing human approval for mutating agent actions + +### Where the AI system is most mature + +The predictive ML and governance story feels the most mature: + +- forecasting +- backtesting +- registry +- model selection +- scenario compatibility awareness + +The RAG and agent layers are credible and well integrated, but they still have more room for evaluation and observability depth than the classical forecasting side. + +### Recommended AI engineering priorities + +1. Add retrieval-quality and agent-quality evaluation fixtures. +2. Surface token, provider, retrieval, and tool-call metrics more explicitly. +3. Formalize artifact and feature-frame compatibility in one canonical contract. +4. Keep expanding provenance in agent-visible and user-visible outputs. +5. Protect the distinction between deterministic model workflows and LLM-generated reasoning; that clarity is a strength. + +### Final AI engineer view + +This is a serious applied AI repository because it does not collapse every intelligence problem into "call an LLM." It uses the right tool for each layer: statistical models for forecasting, retrieval for grounded context, and agents for guided orchestration. The next maturity step is better evaluation and observability around the LLM-powered layers, not more raw capability. diff --git a/docs/_repoKB-deepdive/deepdive-product-manager.md b/docs/_repoKB-deepdive/deepdive-product-manager.md new file mode 100644 index 00000000..ec71810a --- /dev/null +++ b/docs/_repoKB-deepdive/deepdive-product-manager.md @@ -0,0 +1,230 @@ +# Deep Dive: Product Manager + +## Scope + +This document studies ForecastLabAI as a product manager would: product surface, audience, workflows, differentiation, maturity, roadmap pressure, and delivery implications. + +## 1. Research + +### Product identity + +ForecastLabAI is a portfolio-grade retail demand forecasting product, not just an ML service and not just a dashboard. The repo combines: + +- synthetic retail data generation +- exploratory analytics +- forecasting workflows +- backtesting +- model governance +- scenario planning +- knowledge retrieval +- chat-driven AI assistance +- an operator-facing UI + +That matters because the product story is end-to-end value, not one isolated feature. + +### Implied target audience + +The product appears aimed at four overlapping audiences: + +1. reviewers hiring for platform, ML, or applied AI roles +2. technical stakeholders evaluating architecture breadth +3. operators exploring demand, model quality, and forecast actions +4. builders who want a local-first forecasting reference system + +It is not aimed at: + +- multi-tenant enterprise admins +- external consumers via public auth flows +- real-time decisioning users +- non-technical retail end users + +### Main user-facing workflows + +The route map in `frontend/src/App.tsx` exposes the product's real information architecture: + +- `dashboard`: KPI snapshot and top performers +- `showcase`: guided end-to-end live pipeline +- `ops`: operational system state +- `explorer/*`: drill into stores, products, runs, jobs, sales +- `visualize/forecast`: forecast execution +- `visualize/backtest`: model evaluation +- `visualize/demand`: demand planning view +- `visualize/planner`: what-if scenario planning +- `visualize/batch`: batch execution +- `visualize/champion`: champion selection +- `knowledge`: RAG corpus and semantic retrieval +- `chat`: agent interaction +- `guide`: agent education +- `admin`: AI model and provider controls + +This is a mature workflow map for a pre-1.0 product. + +### Core value propositions + +The repo currently offers these strong product claims: + +1. "See the whole forecasting lifecycle in one local product." +2. "Move from raw retail data to model governance and AI assistance." +3. "Inspect not just predictions, but provenance, aliases, backtests, scenario deltas, and knowledge sources." +4. "Switch AI providers and embeddings without restarting the app." +5. "Run a complete live showcase from the browser or CLI." + +### Product differentiation + +The differentiator is not raw forecasting novelty. The differentiator is integration quality: + +- forecasting plus governance +- planning plus agent workflows +- RAG plus live system state +- demoability plus local reproducibility + +Many projects demonstrate one of those. This repo demonstrates their connection. + +### Product maturity signals + +Signals that the product is beyond an internal prototype: + +- dedicated guide and user-guide docs +- multiple specialized visual workflows +- champion-selection and batch-runner surfaces +- scenario library and compare flow +- admin UI for AI provider management +- knowledge page exposing RAG sources and live retrieval +- approval-gated agent actions + +### Product constraints + +The explicit product guardrails are strong: + +- single-host +- no auth/RBAC +- no multi-tenancy +- no streaming architecture +- retail demand forecasting only + +These constraints narrow the addressable market, but sharpen the product identity. + +## 2. Compose A Role-Based Plan + +### Product manager reading plan + +A PM should read the product through these artifacts: + +1. `README.md` +2. `docs/user-guide/*` +3. `frontend/src/App.tsx` +4. `frontend/src/pages/showcase.tsx` +5. `frontend/src/pages/knowledge.tsx` +6. `frontend/src/pages/chat.tsx` +7. `docs/_base/API_CONTRACTS.md` +8. `.claude/rules/product-vision.md` + +### Product analysis plan + +Analyze the repo through these questions: + +1. What problem story is the product telling? +2. Which workflows are most polished today? +3. Which workflows are demonstrably complete, versus technically available but less integrated? +4. Which user journeys require too much prior knowledge? +5. Which capabilities are platform foundations versus presentation layers? + +### Product segmentation plan + +The current product can be segmented into four capability groups: + +1. Retail analytics and exploration +2. Forecasting and model operations +3. Planning and decision support +4. AI-assisted knowledge and action + +That grouping is useful for roadmap and documentation, because the repo now spans more than one narrative. + +### Product roadmap framing + +Near-term roadmap should probably focus on deepening coherence more than widening scope: + +1. tighter cross-linking between pages and workflows +2. stronger in-product explanations of model and scenario outputs +3. better operational visibility for long-running jobs and AI provider state +4. smoother "happy path" narrative for first-time reviewers +5. less conceptual separation between forecast intelligence, governance, knowledge, and agent actions + +## 3. Validate + +### Evidence that the product story is real + +- `showcase.tsx` turns the demo pipeline into a live product experience. +- `knowledge.tsx` exposes both indexed corpus and live system state. +- `chat.tsx` supports session creation, streaming, and approval workflows. +- `use-model-selection.ts` shows a full operator workflow, not just a static page. +- `README.md` and `docs/user-guide/*` describe practical usage, not theoretical future plans. + +### Product strengths + +1. Strong breadth with credible implementation +2. Good local demo story +3. Clear relationship between analytics, forecasting, governance, and AI +4. Frontend route structure reflects actual user jobs to be done +5. Admin/provider management keeps AI from feeling bolted on + +### Product weaknesses + +1. The breadth can dilute first-time comprehension. +2. There are multiple advanced surfaces competing for attention. +3. Some product stories are better documented than they are narratively unified in the UI. +4. No auth means some enterprise-flavored workflows remain intentionally absent. +5. The portfolio identity can mask which capabilities are intended as "hero" features. + +### Product risks + +1. Scope creep beyond the single-host product identity +2. More features without stronger onboarding hierarchy +3. AI features outpacing explanation and trust framing +4. operational complexity becoming visible before operational tooling catches up + +## 4. Generate + +## Generated Product Findings + +### What product this really is + +ForecastLabAI is best understood as a ForecastOps workbench with built-in AI and evidence surfaces. It is not merely a forecasting API and not merely an AI chat wrapper around docs. + +### Strongest product narratives + +The strongest narratives today are: + +1. "End-to-end forecasting platform on one machine" +2. "Forecast plus compare plus promote plus monitor" +3. "What-if planning tied to real runs" +4. "Knowledge-aware assistant with visible corpus and guarded actions" + +### Best current hero experiences + +1. Showcase +2. Champion selector +3. What-if planner +4. Knowledge page +5. Chat plus approval flow + +These are the places where the product demonstrates differentiated value rather than just plumbing. + +### Product opportunities + +1. Stronger first-run narrative across Dashboard -> Showcase -> Explorer -> Planner -> Knowledge -> Chat +2. Better opinionated defaults and guidance around model-choice workflows +3. More surfaced trust signals around scenario quality and AI answer provenance +4. Better role-oriented views for analyst, operator, and reviewer personas + +### Recommended PM priorities + +1. Clarify primary personas and map each page to one. +2. Define two or three canonical demos instead of one broad capability inventory. +3. Tighten in-product copy around decision support and limitations. +4. Make cross-page navigation reinforce the product story. +5. Preserve the local-first identity; it is part of the differentiation. + +### Final PM view + +The product is already rich enough that the next challenge is curation, not raw capability count. The codebase proves that the platform can do a lot. Product management now needs to decide what the user should understand first, second, and third so the strongest value is obvious in a five-minute walkthrough. diff --git a/docs/_repoKB-deepdive/deepdive-software-architect.md b/docs/_repoKB-deepdive/deepdive-software-architect.md new file mode 100644 index 00000000..2930b36b --- /dev/null +++ b/docs/_repoKB-deepdive/deepdive-software-architect.md @@ -0,0 +1,292 @@ +# Deep Dive: Software Architect + +## Scope + +This document studies ForecastLabAI as a systems architect would: platform boundaries, component ownership, runtime topology, persistence model, coupling, scaling limits, and evolution paths. It is grounded in `app/main.py`, `app/core/*`, `app/features/*`, `frontend/src/*`, `docker-compose.yml`, `pyproject.toml`, `frontend/package.json`, `alembic/versions/*`, `Makefile`, and the base docs under `docs/_base/`. + +## 1. Research + +### System identity + +ForecastLabAI is a single-host, end-to-end retail demand forecasting product. The repository intentionally owns the full loop: + +1. Data platform +2. Batch ingest +3. Leakage-safe feature engineering +4. Forecast training and prediction +5. Backtesting +6. Registry and aliases +7. Scenario planning +8. RAG knowledge base +9. Agentic workflows with approval gates +10. React dashboard surfaces + +That identity is enforced both socially and structurally: + +- `app/features/<slice>/` defines vertical slices. +- `docker-compose.yml` keeps runtime local and single-host. +- `.claude/rules/product-vision.md` rejects multi-tenant SaaS, streaming infra, and managed-cloud-first expansion. + +### Architecture style + +The backend is a modular monolith with vertical-slice boundaries. Each slice usually exposes: + +- `models.py` +- `schemas.py` +- `service.py` +- `routes.py` +- `tests/` + +Cross-cutting concerns live in: + +- `app/core/` for config, database, middleware, exceptions, health, problem-details, logging +- `app/shared/` for reusable data-model and feature-frame logic + +The frontend is a route-driven SPA with: + +- page composition in `frontend/src/pages/*` +- reusable domain components in `frontend/src/components/*` +- server-state access in `frontend/src/hooks/*` +- a thin fetch wrapper in `frontend/src/lib/api.ts` + +### Runtime topology + +The production-like local topology is small by design: + +- Postgres 16 + pgvector +- one FastAPI process +- one Vite/React UI +- optional Ollama container for local embeddings + +This yields strong demo portability and a low operational surface, but it also centralizes all CPU-bound training, backtesting, and agent orchestration onto one host. + +### Entrypoints and wiring + +`app/main.py` is the composition root. It wires: + +- middleware +- exception handling +- router registration for all slices +- startup config override replay + +This makes `app/main.py` a high-blast-radius file. Any architectural shift that changes router composition, middleware order, or startup behavior lands here. + +### Persistence model + +The data plane mixes three persistence styles: + +1. relational warehouse-like retail data in `app/features/data_platform/models.py` +2. JSONB-heavy operational metadata in registry, jobs, sessions, scenarios +3. pgvector-backed chunk storage in RAG tables + +That is a pragmatic design for a portfolio system: + +- relational where grain and joins matter +- JSONB where flexibility matters +- vector columns where retrieval matters + +The tradeoff is schema readability: business-critical semantics live partly in migrations and partly in JSON payload conventions. + +### API surface + +The API is broad and coherent. Key groups are: + +- exploratory read APIs: `/dimensions`, `/analytics`, `/ops` +- operational execution APIs: `/forecasting`, `/backtesting`, `/jobs`, `/batch`, `/model-selection` +- model governance APIs: `/registry`, `/config` +- AI APIs: `/rag`, `/agents` +- demo and seeding APIs: `/seeder`, `/demo` +- planning APIs: `/scenarios` + +Error handling is normalized through RFC 7807 problem details, which is a strong contract decision for a system with many slices. + +### Frontend information architecture + +The frontend is organized around user intent rather than backend slices: + +- Dashboard +- Showcase +- Ops +- Explorer +- Visualize +- Knowledge +- Chat +- Guide +- Admin + +This is the right abstraction. The backend is phase-oriented; the UI is workflow-oriented. + +### Testing and governance + +The repository is heavy on validation: + +- `ruff` +- `mypy --strict` +- `pyright --strict` +- unit tests +- integration tests +- migration checks + +Observed footprint from the repo: + +- 328 Python files under `app/` +- 176 backend test files +- 229 frontend TS/TSX files +- 57 frontend test files +- 18 Alembic migrations + +That is a meaningful sign of architecture discipline for a pre-1.0 system. + +## 2. Compose A Role-Based Plan + +### Architectural reading plan + +For an architect onboarding to this codebase, the minimum effective reading order is: + +1. `AGENTS.md` +2. `docs/_base/ARCHITECTURE.md` +3. `docs/_base/API_CONTRACTS.md` +4. `docs/_base/DOMAIN_MODEL.md` +5. `app/main.py` +6. `app/core/config.py` +7. `app/core/database.py` +8. `app/features/data_platform/models.py` +9. `app/features/forecasting/service.py` +10. `app/features/rag/service.py` +11. `app/features/agents/service.py` +12. `frontend/src/App.tsx` + +### Architecture review plan + +Review the system in these lenses: + +1. Boundary integrity + - verify slices depend on `core` and `shared`, not freely on each other + - pay attention to lazy-import seams already used to break cycles +2. Runtime concentration + - identify CPU-heavy paths that still run inline on the API host + - compare jobs, batch, model selection, demo pipeline, and agent activity +3. Data durability + - map what is canonical in tables versus in JSONB versus on disk artifacts +4. Contract stability + - inspect how frontend hooks depend on backend shapes and polling behavior +5. AI safety posture + - inspect where retrieval, tool calling, approval gates, and provider switches can fail + +### Architecture decisions already present + +The codebase has already made these strategic decisions: + +- modular monolith over microservices +- async FastAPI over sync API server +- Postgres as both OLTP-ish store and vector store +- file-based model artifacts instead of external artifact services +- local-first provider switching rather than cloud orchestration +- workflow visibility in-product rather than in external ops tooling + +### Near-term architecture planning topics + +An architect would likely focus next on: + +1. formalizing cross-slice dependency rules with automated checks +2. isolating CPU-heavy training/backtesting from request latency +3. making artifact and JSONB conventions easier to inspect and evolve +4. strengthening app-level observability beyond logs and request IDs +5. reducing hidden coupling between demo orchestration and slice APIs + +## 3. Validate + +### Evidence that the current architecture is coherent + +- The slice map in `app/main.py` matches the product lifecycle. +- `app/core/config.py` centralizes runtime control instead of scattering env reads. +- `app/core/database.py` keeps session creation standardized. +- Multiple services use documented lazy imports to avoid import-cycle collapse. +- `frontend/src/App.tsx` is route-structured and uses lazy loading, which fits the breadth of the UI. +- `docker-compose.yml` keeps the full stack reproducible on one machine. +- `docs/_base/API_CONTRACTS.md` and `docs/_base/DOMAIN_MODEL.md` already track core system invariants. + +### Architectural strengths + +1. Strong vertical-slice organization +2. Clear local deploy story +3. Typed boundaries everywhere +4. Explicit anti-leakage posture in forecasting/featuresets +5. Practical AI safety guardrails with approval-required mutating tools +6. Good UX-to-backend alignment through workflow-based frontend pages + +### Architectural tensions + +1. Single-host simplicity versus CPU-heavy ML workflows +2. Slice purity versus necessary cross-slice orchestration +3. JSONB flexibility versus discoverability and query clarity +4. Broad product scope versus maintainability for one repo and one host +5. Local-first AI flexibility versus provider-specific runtime drift + +### Main risks + +1. `app/main.py` as central blast radius +2. long-running work inside the application process +3. file artifact lifecycle complexity +4. limited observability for concurrency and performance debugging +5. increasing product breadth without a stronger architecture map of ownership and dependency budgets + +## 4. Generate + +## Generated Architectural Findings + +### High-level assessment + +ForecastLabAI is a well-shaped modular monolith. It has enough structure to feel like a real platform, but it still preserves a single-machine demo story. That balance is the repository's main architectural achievement. + +### What the architecture optimizes for + +It optimizes for: + +- demonstrability +- local reproducibility +- architectural breadth +- typed boundaries +- explainable workflows + +It does not optimize for: + +- horizontal scale +- high-throughput asynchronous execution +- multi-tenant isolation +- cloud-native elasticity + +Those are intentional non-goals, not omissions. + +### Primary architectural seams + +The most important seams in the system are: + +1. `core` vs feature slices +2. relational facts/dimensions vs JSONB operational state +3. artifact-on-disk vs metadata-in-registry +4. backend phase APIs vs frontend workflow pages +5. deterministic ML pipeline logic vs probabilistic LLM/agent flows + +### Best-fit mental model + +Treat the repo as four systems sharing one host: + +1. a retail analytics API +2. an ML execution engine +3. an AI retrieval-and-agent layer +4. an operator-facing product shell + +The design works because those systems are colocated but not completely blended. + +### Recommended architectural priorities + +1. Add dependency-graph enforcement for slice boundaries. +2. Make long-running model work more explicitly job-owned and easier to isolate. +3. Introduce richer observability around durations, failures, queue-like backlogs, and artifact usage. +4. Publish a canonical artifact contract covering model bundle versions, registry metadata, and scenario compatibility. +5. Continue treating time-safety, RFC 7807, and approval-gated mutation as non-negotiable architectural invariants. + +### Final architect view + +This repository is already beyond a toy demo. Its value is not just that it has many features, but that those features are connected through consistent contracts. The next architectural challenge is no longer "can it do the whole flow?" but "can the whole flow keep growing without hidden coupling and host saturation?" diff --git a/docs/_repoKB-deepdive/deepdive-software-developer.md b/docs/_repoKB-deepdive/deepdive-software-developer.md new file mode 100644 index 00000000..3fc787e0 --- /dev/null +++ b/docs/_repoKB-deepdive/deepdive-software-developer.md @@ -0,0 +1,281 @@ +# Deep Dive: Software Developer + +## Scope + +This document studies ForecastLabAI as a working developer would: how to navigate it, how the backend and frontend are wired, which files matter first, how to add features safely, and where the sharp edges are. + +## 1. Research + +### First impression of the repo + +ForecastLabAI is not a starter template. It is a broad working application with: + +- a slice-based FastAPI backend +- a route-rich React frontend +- an end-to-end demo path +- ML, RAG, and agent workflows +- real migrations and tests + +The repo is large, but its structure is disciplined enough that a developer can move with confidence if they follow local patterns. + +### High-value entrypoints + +For a developer, these are the first files worth reading: + +- `AGENTS.md` +- `pyproject.toml` +- `frontend/package.json` +- `app/main.py` +- `app/core/config.py` +- `app/core/database.py` +- `frontend/src/App.tsx` +- `frontend/src/lib/api.ts` + +These files define the rules, the stack, the app wiring, and the transport contract. + +### Backend development model + +The backend is organized around vertical slices under `app/features/`. Each slice typically owns: + +- route handlers +- schemas +- business/service logic +- persistence models when needed +- tests + +This is the key implementation rule: do not start by asking "which helper can I invent?" Start by asking "which slice owns this behavior?" + +### Backend slice inventory + +The repo currently contains these major backend feature areas: + +- analytics +- agents +- backtesting +- batch +- config +- data_platform +- demo +- dimensions +- explainability +- featuresets +- forecasting +- ingest +- jobs +- model_selection +- ops +- rag +- registry +- scenarios +- seeder + +### Frontend development model + +The frontend is structured by workflow: + +- pages under `frontend/src/pages/*` +- reusable components under `frontend/src/components/*` +- API/state hooks under `frontend/src/hooks/*` +- transport and utility helpers under `frontend/src/lib/*` + +That means the usual page implementation path is: + +1. page composes the workflow +2. hooks fetch and mutate data +3. components render domain-specific UI +4. helpers format or transform data + +### API access pattern + +`frontend/src/lib/api.ts` is the frontend transport seam. It: + +- builds URLs from `VITE_API_BASE_URL` +- serializes JSON request bodies +- parses JSON and `application/problem+json` +- throws typed `ApiError` failures + +This is important because frontend fixes should generally use this helper instead of raw fetches. + +### React Query pattern + +Hooks such as `frontend/src/hooks/use-model-selection.ts` show the preferred pattern: + +- one hook per query or mutation +- stable `queryKey`s +- mutation success invalidation or cache seeding +- polling only where workflow state requires it + +This keeps page components focused on state transitions and rendering. + +### WebSocket pattern + +`frontend/src/hooks/use-websocket.ts` wraps: + +- connection status +- JSON message parsing +- reconnect logic +- send/disconnect/reconnect helpers + +That avoids scattering WebSocket lifecycle logic through pages like `chat.tsx` and `showcase.tsx`. + +### Testing footprint + +Observed from repo inspection: + +- 328 Python files under `app/` +- 176 backend test files +- 229 frontend TS/TSX files +- 57 frontend test files + +This is a repo where tests are part of the implementation surface. A developer should expect to touch them. + +## 2. Compose A Role-Based Plan + +### Developer onboarding plan + +Recommended order: + +1. read `AGENTS.md` +2. read `README.md` +3. inspect `app/main.py` +4. inspect `frontend/src/App.tsx` +5. pick one backend slice end to end +6. pick one frontend workflow end to end +7. inspect the matching tests + +### Safe change plan + +When implementing anything non-trivial: + +1. find the owner slice or page workflow +2. read route, schema, service, and tests together +3. inspect adjacent frontend code if the change is user-visible +4. reuse existing helper patterns +5. add or update tests before calling the work done + +### Backend change workflow + +For endpoint or service changes: + +1. inspect `routes.py` +2. inspect `schemas.py` +3. inspect `service.py` +4. inspect relevant `models.py` +5. inspect `tests/` +6. patch the narrowest owning surface + +### Frontend change workflow + +For UI or workflow changes: + +1. inspect page component +2. inspect relevant hook +3. inspect domain component +4. inspect utility/helper module if any +5. inspect matching tests +6. patch the narrowest surface + +### When to use `core` or `shared` + +Move code to `app/core/` only when it is truly cross-cutting platform behavior: + +- config +- logging +- database/session +- middleware +- error handling + +Move code to `app/shared/` when multiple slices need the same pure or semi-pure logic. Forecast feature-frame logic is a good example of this pattern. + +### Developer risk map + +Handle these areas carefully: + +- `app/main.py` +- `app/core/database.py` +- `app/core/problem_details.py` +- `app/features/featuresets/tests/test_leakage.py` +- `alembic/versions/*` + +These are high-blast-radius files or rules. + +## 3. Validate + +### Evidence that the repo is developer-friendly + +- Commands are clearly documented. +- Stack configuration is centralized. +- Slice structure is consistent. +- There are many examples of the preferred patterns. +- Frontend transport is standardized. +- WebSocket behavior is abstracted. +- Quality gates are explicit and strict. + +### Backend sharp edges + +1. Import cycles between slices can happen; some services already use lazy imports to avoid them. +2. Long-running work may be triggered from API-managed workflows. +3. Artifact and run compatibility rules span multiple slices. +4. Time-safety requirements make "small" ML changes riskier than they first appear. + +### Frontend sharp edges + +1. Polling workflows can hide backend state assumptions. +2. Route-level UX often depends on specific backend response fields. +3. WebSocket flows need careful streaming and terminal-state handling. +4. Multiple advanced pages can share subtle utility logic. + +### Practical verification habits + +For backend: + +- run relevant slice tests first +- run integration tests if schema or DB behavior changed +- verify error paths still return RFC 7807 responses + +For frontend: + +- run the nearest component or utility tests first +- verify page behavior against the real endpoint contract +- check loading, empty, success, and error states + +For cross-stack: + +- verify request/response field names exactly +- verify polling and WebSocket state transitions +- verify any new config field is reflected end to end + +## 4. Generate + +## Generated Developer Findings + +### Best mental model + +The repo is easiest to work in when you think in workflows, not just files. The backend slices and frontend pages are different projections of the same product flows. + +### Biggest strengths for day-to-day development + +1. consistent backend slice architecture +2. consistent frontend route and hook layering +3. strong validation gates +4. real examples of nearly every pattern you need +5. explicit local-first runtime story + +### Biggest developer risks + +1. changing shared forecasting assumptions without updating downstream consumers +2. breaking import-order or dependency assumptions in backend slices +3. drifting frontend expectations away from backend contracts +4. under-testing changes that touch AI, ML, or orchestration surfaces + +### Recommended developer heuristics + +1. Stay inside the owner slice until forced out. +2. Treat tests as part of the design. +3. Prefer additive schema changes over broad rewrites. +4. Inspect the workflow end to end before patching. +5. Respect the repo's invariants: time-safety, migrations, strict typing, RFC 7807, approval-gated mutation. + +### Final developer view + +ForecastLabAI is broad but navigable. It rewards disciplined developers who follow established seams and punishes casual cross-cutting edits. The fastest correct path is to read the owner slice, read its tests, and change the smallest coherent unit. diff --git a/docs/arch-dia.md b/docs/arch-dia.md new file mode 100644 index 00000000..e0407f6a --- /dev/null +++ b/docs/arch-dia.md @@ -0,0 +1,346 @@ +# ForecastLabAI Architecture Diagrams + +This document provides diagram-first views of the repository's logic, workflows, stack, APIs, architecture, and reusable patterns. The diagrams are grounded in the inspected code under `app/`, `frontend/src/`, `docker-compose.yml`, and the base docs. + +## 1. System context + +```mermaid +flowchart LR + User[User or Reviewer] + UI[React SPA<br/>Vite + React Query + Router] + API[FastAPI App<br/>app/main.py] + DB[(PostgreSQL 16 + pgvector)] + Artifacts[(Local Artifacts<br/>models backtests registry)] + Providers[OpenAI / Anthropic / Gemini / Ollama] + + User --> UI + UI --> API + API --> DB + API --> Artifacts + API --> Providers +``` + +## 2. Backend slice map + +```mermaid +flowchart TB + Main[app/main.py] + Core[app/core] + Shared[app/shared] + + Main --> Core + Main --> Dimensions + Main --> Analytics + Main --> Ingest + Main --> Featuresets + Main --> Forecasting + Main --> Backtesting + Main --> Registry + Main --> Scenarios + Main --> RAG + Main --> Agents + Main --> Jobs + Main --> Batch + Main --> ModelSelection + Main --> Ops + Main --> Seeder + Main --> Demo + Main --> Config + Main --> Explainability + + Featuresets --> Shared + Forecasting --> Shared + Scenarios --> Shared + Backtesting --> Shared +``` + +## 3. Request handling pattern + +```mermaid +sequenceDiagram + participant Browser + participant Route as FastAPI route + participant Schema as Pydantic schema + participant Service as Slice service + participant DB as AsyncSession + + Browser->>Route: HTTP request + Route->>Schema: Validate request body/query + Route->>Service: Call typed service method + Service->>DB: Read/write data + DB-->>Service: Rows/state + Service-->>Route: Response model + Route-->>Browser: JSON or problem+json +``` + +## 4. Retail data model + +```mermaid +erDiagram + STORE ||--o{ SALES_DAILY : sells + PRODUCT ||--o{ SALES_DAILY : sold_as + CALENDAR ||--o{ SALES_DAILY : dated_by + STORE ||--o{ PRICE_HISTORY : prices + PRODUCT ||--o{ PRICE_HISTORY : price_subject + STORE ||--o{ PROMOTION : promotes + PRODUCT ||--o{ PROMOTION : promotion_subject + STORE ||--o{ INVENTORY_SNAPSHOT_DAILY : stocks + PRODUCT ||--o{ INVENTORY_SNAPSHOT_DAILY : stocked_item + CALENDAR ||--o{ INVENTORY_SNAPSHOT_DAILY : snapshot_date +``` + +## 5. Forecast training flow + +```mermaid +flowchart LR + Sales[Sales and retail history] + Features[Time-safe feature assembly] + Train[ForecastingService.train_model] + Model[Trained model bundle] + Registry[Registry run metadata] + + Sales --> Features + Features --> Train + Train --> Model + Train --> Registry +``` + +## 6. Prediction and planning flow + +```mermaid +flowchart LR + Run[Registered run or artifact] + Predict[Predict endpoint] + Scenario[Scenario simulation] + Planner[Planner UI] + + Run --> Predict + Predict --> Planner + Run --> Scenario + Scenario --> Planner +``` + +## 7. Backtesting and champion selection + +```mermaid +flowchart TD + Availability[Pair availability] + Candidates[Candidate model configs] + Backtests[Backtest each candidate] + Rank[Rank by metrics] + Winner[Winner summary] + Train[Train selected or winner] + Promote[Promote alias] + + Availability --> Candidates + Candidates --> Backtests + Backtests --> Rank + Rank --> Winner + Winner --> Train + Train --> Promote +``` + +## 8. Registry and artifact governance + +```mermaid +flowchart LR + Train[Training workflow] + Artifact[Model artifact on disk] + Run[(model_run)] + Alias[(run_alias)] + Compare[Compare and verify APIs] + + Train --> Artifact + Train --> Run + Run --> Alias + Run --> Compare + Artifact --> Compare +``` + +## 9. RAG indexing workflow + +```mermaid +flowchart LR + Source[Markdown / OpenAPI / docs file] + Hash[Content hash] + Chunk[Chunker] + Embed[Embedding provider] + Store[(rag_source + rag_chunk)] + + Source --> Hash + Hash --> Chunk + Chunk --> Embed + Embed --> Store +``` + +## 10. RAG retrieval workflow + +```mermaid +sequenceDiagram + participant UI as Knowledge page or agent + participant API as /rag/retrieve + participant VDB as pgvector chunks + participant Provider as Embedding provider + + UI->>API: query text + API->>Provider: embed query + Provider-->>API: query vector + API->>VDB: similarity search + VDB-->>API: ranked chunks + API-->>UI: citations and excerpts +``` + +## 11. Agent chat and approval flow + +```mermaid +sequenceDiagram + participant User + participant ChatUI as Chat page + participant WS as /agents/stream + participant Agent as AgentService + participant Tools as Agent tools + participant Approve as /agents/sessions/{id}/approve + + User->>ChatUI: send message + ChatUI->>WS: session_id + message + WS->>Agent: invoke agent + Agent->>Tools: tool call + alt approval required + Agent-->>ChatUI: approval_required + User->>ChatUI: approve or reject + ChatUI->>Approve: decision + Approve-->>Agent: continue or stop + end + Agent-->>ChatUI: text_delta / complete +``` + +## 12. Demo pipeline orchestration + +```mermaid +flowchart LR + Start[Showcase or make demo] + Seed[Seeder] + Features[Featuresets] + Train[Train models] + Backtest[Backtest] + Register[Register winner] + Alias[Create alias] + Knowledge[RAG probe] + Agent[Agent probe] + Finish[Summary] + + Start --> Seed + Seed --> Features + Features --> Train + Train --> Backtest + Backtest --> Register + Register --> Alias + Alias --> Knowledge + Knowledge --> Agent + Agent --> Finish +``` + +## 13. Frontend route topology + +```mermaid +flowchart TD + App[frontend/src/App.tsx] + Dashboard[Dashboard] + Showcase[Showcase] + Ops[Ops] + Explorer[Explorer pages] + Visualize[Visualize pages] + Knowledge[Knowledge] + Chat[Chat] + Guide[Guide] + Admin[Admin] + + App --> Dashboard + App --> Showcase + App --> Ops + App --> Explorer + App --> Visualize + App --> Knowledge + App --> Chat + App --> Guide + App --> Admin +``` + +## 14. Frontend data-flow pattern + +```mermaid +flowchart LR + Page[Page] + Hook[React Query hook] + Api[api helper] + Backend[FastAPI endpoint] + + Page --> Hook + Hook --> Api + Api --> Backend +``` + +## 15. Runtime deployment topology + +```mermaid +flowchart TB + subgraph Compose + Postgres[postgres<br/>pgvector/pg16] + Backend[backend<br/>uvicorn + FastAPI] + Frontend[frontend<br/>Vite] + Ollama[ollama<br/>optional GPU profile] + end + + Frontend --> Backend + Backend --> Postgres + Backend --> Ollama +``` + +## 16. CI/CD flow + +```mermaid +flowchart LR + Dev[Feature branch] + PR[PR to dev] + CI[lint + typecheck + tests + migration check] + MergeDev[Merge to dev] + MainPR[PR dev to main] + ReleasePR[release-please Release PR] + Tag[Tag and release artifacts] + + Dev --> PR + PR --> CI + CI --> MergeDev + MergeDev --> MainPR + MainPR --> CI + CI --> ReleasePR + ReleasePR --> Tag +``` + +## 17. Reusable architectural patterns + +```mermaid +mindmap + root((Reusable patterns)) + Vertical slice + routes + schemas + service + models + tests + Shared backend contracts + settings + db session + problem details + logging + Frontend workflow pattern + page + hook + component + lib helper + AI safety pattern + schema validated tools + approval gate + provider allow-lists + timeouts and caps +``` diff --git a/docs/arch-techstack.md b/docs/arch-techstack.md new file mode 100644 index 00000000..f5c4c81b --- /dev/null +++ b/docs/arch-techstack.md @@ -0,0 +1,247 @@ +# ForecastLabAI Technical Concepts And Tech Stack + +## Overview + +ForecastLabAI is a single-host retail demand forecasting platform implemented as a modular monolith with a React SPA frontend. It combines data-platform, ML, RAG, and agentic capabilities in one repository and one local runtime topology. + +## Layered technical model + +| Layer | Main technology | Responsibility | +|---|---|---| +| UI | React 19, TypeScript 5.9, Vite 7, Tailwind 4, shadcn/ui | Workflow surfaces, charts, controls, streaming UX | +| API | FastAPI, Pydantic v2 | Typed HTTP and WebSocket contracts | +| Services | Python service modules per slice | Business logic and orchestration | +| Persistence | SQLAlchemy 2.0 async, PostgreSQL 16, pgvector | Relational data, JSONB state, vector retrieval | +| ML | pandas, numpy, scikit-learn, joblib, optional LightGBM/XGBoost | Forecast training, prediction, evaluation | +| AI | PydanticAI, OpenAI, Anthropic, optional Gemini and Ollama | RAG embeddings, agent reasoning, tool use | +| Tooling | uv, pnpm, Alembic, Ruff, mypy, pyright, pytest | Development, quality, migration, release | + +## Backend stack + +### Runtime and framework + +- Python 3.12 +- FastAPI +- Uvicorn +- Pydantic v2 +- Pydantic Settings v2 + +Key concepts: + +- async request handling +- typed request and response models +- RFC 7807 error contracts +- startup lifecycle hooks +- WebSocket streaming for agents and demo pipeline + +### Data access + +- SQLAlchemy 2.0 async ORM +- `asyncpg` driver +- session creation via `app/core/database.py` +- migration management via Alembic + +Key concepts: + +- `Mapped[]` ORM typing +- `mapped_column()` +- async session dependency injection +- commit/rollback at request scope + +### Persistence patterns + +ForecastLabAI uses three persistence patterns: + +1. relational dimensions and facts + - `store`, `product`, `calendar`, `sales_daily`, `price_history`, `promotion`, `inventory_snapshot_daily` +2. operational JSONB-rich entities + - runs, jobs, sessions, scenarios, config +3. vector-backed retrieval entities + - RAG sources and chunks with embeddings + +### Cross-cutting backend concepts + +- centralized settings in `app/core/config.py` +- structured logging via `structlog` +- request correlation via middleware +- problem-details serialization for failures +- strict type-checking as a design constraint, not just a lint step + +## Frontend stack + +### Core libraries + +- React 19 +- TypeScript 5.9 +- Vite 7 +- React Router +- TanStack Query +- TanStack Table +- Recharts +- Tailwind CSS 4 +- shadcn/ui and Radix primitives +- Lucide icons + +### Frontend concepts + +- route-oriented application shell +- lazy-loaded page modules +- React Query hooks for API state +- reusable domain components +- helper libraries for formatting and transform logic +- dedicated WebSocket hook for streaming flows + +### Major frontend domains + +- dashboard and KPI summaries +- explorer pages for stores, products, jobs, runs, and sales +- visualize pages for forecasting, backtesting, batch, champion selection, demand, and planning +- showcase page for the full demo pipeline +- knowledge page for RAG state and retrieval +- chat page for agent interaction +- admin page for AI provider and model settings + +## ML and forecasting stack + +### Core packages + +- pandas +- numpy +- scikit-learn +- joblib +- optional LightGBM +- optional XGBoost + +### Core concepts + +- time-safe feature engineering +- train/predict split by service boundary +- backtesting with time-series folds +- model-family and feature metadata +- persisted model bundles +- registry-backed governance and aliases + +### ML design choices + +- baselines remain first-class +- advanced models are optional extras +- artifact persistence is local filesystem based +- scenario simulation differentiates heuristic and model-driven methods +- model selection is a distinct workflow, not a side effect of training + +## RAG and agent stack + +### RAG + +- pgvector +- OpenAI embeddings or Ollama embeddings +- chunkers by source type +- similarity retrieval with thresholding +- idempotent indexing using content hashes + +### Agents + +- PydanticAI +- Anthropic and OpenAI as main hosted providers +- optional Gemini identifiers supported in config +- tool-calling with schema validation +- approval gate for mutating actions +- session persistence in Postgres +- streaming token/tool events over WebSocket + +### AI control-plane concepts + +- live provider switching through config APIs +- fallback model support +- session TTL and tool-call caps +- timeout and retry controls +- explicit allow-lists for model identifier providers + +## Database and schema stack + +### Database + +- PostgreSQL 16 +- pgvector extension +- local port `5433` to container `5432` + +### Migration management + +- Alembic +- forward-only migration policy after merge +- 18 migrations observed in the repo at inspection time + +### Data model concepts + +- star-schema-like retail data platform +- JSONB for flexible operational entities +- vector embeddings inside Postgres instead of a separate vector store + +## Development and quality stack + +### Backend package and environment tooling + +- `uv` +- `.env` + Pydantic settings + +### Frontend package tooling + +- `pnpm` +- corepack-enabled workflow + +### Quality gates + +- Ruff +- mypy `--strict` +- pyright `--strict` +- pytest + +### CI/CD + +- GitHub Actions +- release-please + +Key pipeline concepts: + +- blocking lint, typecheck, test, and migration jobs +- Release PR flow from `dev` to `main` +- wheel and sdist build on release creation + +## Runtime topology + +### Core local services + +1. Postgres +2. backend API +3. frontend dev server +4. optional Ollama + +### Container strategy + +- Docker Compose for local orchestration +- bind mounts for hot reload +- shared named volume for artifacts +- health checks for all main services + +## Architectural conventions enforced by the stack + +1. Vertical slices own their business logic. +2. `core` and `shared` are the sanctioned cross-cutting surfaces. +3. Schema changes require migrations. +4. API boundaries require Pydantic validation. +5. Time-safe feature engineering is mandatory. +6. AI mutation tools require approval. +7. The product must remain single-host runnable. + +## Why this stack fits the repo + +The stack fits because the product needs: + +- a fast local development loop +- typed API and schema boundaries +- strong data tooling for forecasting +- one database that can handle relational and vector workloads +- a modern dashboard frontend +- enough AI flexibility to compare hosted and local providers + +The stack would be a poor fit for a high-scale multi-tenant SaaS, but that is not the repository's goal. From 7a34b3b6c7bf7555b8b8d62e3f9e49f6170bade7 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Sun, 7 Jun 2026 17:52:46 +0200 Subject: [PATCH 10/44] docs(repo): correct slice count 11 to 19 in base docs (#376) --- docs/_base/ARCHITECTURE.md | 4 ++-- docs/_base/REPO_MAP_INDEX.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/_base/ARCHITECTURE.md b/docs/_base/ARCHITECTURE.md index 2c087069..01804385 100644 --- a/docs/_base/ARCHITECTURE.md +++ b/docs/_base/ARCHITECTURE.md @@ -7,7 +7,7 @@ ### What This Repo Owns - The entire stack: FastAPI backend (`app/`), React 19 SPA (`frontend/`), Alembic migrations (`alembic/`), data seeder (`app/shared/seeder/` + `scripts/seed_random.py`), `.claude/` policy + skills + hooks, docs (`docs/`, `PRPs/` incl. `PRPs/INITIAL/`). - 7-table retail data platform (`store`, `product`, `calendar`, `sales_daily`, `price_history`, `promotion`, `inventory_snapshot_daily`) + registry, jobs, RAG sources/chunks, agent sessions. -- 11 backend vertical slices under `app/features/` + cross-cutting `app/core/` + `app/shared/`. +- 19 backend vertical slices under `app/features/` + cross-cutting `app/core/` + `app/shared/`. ### What This Repo Depends On | Dependency | Interface | Owner | Change Process | @@ -34,7 +34,7 @@ ForecastLabAI repo ├── app/ # FastAPI process (uvicorn :8123) │ ├── core/ # config, db engine, logging, middleware, problem-details, health │ ├── shared/ # cross-slice models + seeder ("The Forge") -│ └── features/<slice>/ # vertical slices (11 of them) +│ └── features/<slice>/ # vertical slices (19 of them) └── frontend/ # Vite dev server :5173 (proxies → :8123) ``` diff --git a/docs/_base/REPO_MAP_INDEX.md b/docs/_base/REPO_MAP_INDEX.md index 5300f157..f43f3440 100644 --- a/docs/_base/REPO_MAP_INDEX.md +++ b/docs/_base/REPO_MAP_INDEX.md @@ -6,7 +6,7 @@ ## System at a Glance -ForecastLabAI is a portfolio-grade, single-host retail-demand-forecasting system. One developer maintains it; one `docker-compose up` brings it up. The backend is FastAPI + SQLAlchemy 2.0 async against PostgreSQL 16 + pgvector; the frontend is React 19 + Vite + Tailwind 4 + shadcn/ui. Eleven vertical slices under `app/features/` cover the full lifecycle (data platform → ingest → features → forecasting → backtesting → registry → RAG → agents → dashboard surfaces). Pre-1.0; release-please drives SemVer; merges flow `dev` → `main`. +ForecastLabAI is a portfolio-grade, single-host retail-demand-forecasting system. One developer maintains it; one `docker-compose up` brings it up. The backend is FastAPI + SQLAlchemy 2.0 async against PostgreSQL 16 + pgvector; the frontend is React 19 + Vite + Tailwind 4 + shadcn/ui. Nineteen vertical slices under `app/features/` cover the full lifecycle (data platform → ingest → features → forecasting → backtesting → registry → RAG → agents → dashboard surfaces). Pre-1.0; release-please drives SemVer; merges flow `dev` → `main`. ## Document Index From d15622a7fed97f8b1ab58100633d5d05fb030f49 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Thu, 11 Jun 2026 20:53:16 +0200 Subject: [PATCH 11/44] fix(api): reject doubled provider prefixes in agent model ids (#334) --- app/core/config.py | 21 +++++++- .../agents/tests/test_config_validation.py | 13 +++++ app/features/config/service.py | 18 ++++++- app/features/config/tests/test_routes.py | 12 +++++ app/features/config/tests/test_schemas.py | 50 +++++++++++++++++++ app/features/config/tests/test_service.py | 43 ++++++++++++++++ 6 files changed, 154 insertions(+), 3 deletions(-) diff --git a/app/core/config.py b/app/core/config.py index e2d76a85..033c77d9 100644 --- a/app/core/config.py +++ b/app/core/config.py @@ -32,8 +32,12 @@ def validate_model_identifier(v: str) -> str: The validated model identifier, unchanged. Raises: - ValueError: If the format is invalid, the model name is blank, or the - provider is not in :data:`VALID_MODEL_PROVIDERS`. + ValueError: If the format is invalid, the model name is blank, the + provider is not in :data:`VALID_MODEL_PROVIDERS`, or the model + name nests another provider prefix (e.g. + ``google-gla:google-gla:gemini-3-flash-preview``). Multi-colon + Ollama tag identifiers stay valid (``ollama:llama3.1:8b`` — + ``llama3.1`` is not a provider). """ if ":" not in v: raise ValueError( @@ -56,6 +60,19 @@ def validate_model_identifier(v: str) -> str: raise ValueError( f"Unknown provider '{provider}'. Valid providers: {list(VALID_MODEL_PROVIDERS)}" ) + + # Reject a nested/doubled provider prefix inside the model name + # (issue #334: 'google-gla:google-gla:gemini-…' → Gemini 404 at run time). + # Ollama tags ('ollama:llama3.1:8b') stay valid: 'llama3.1' is not a provider. + if ":" in model_name: + nested = model_name.split(":", 1)[0] + if nested in VALID_MODEL_PROVIDERS: + suggested = f"{provider}:{model_name.split(':', 1)[1]}" + raise ValueError( + f"Nested provider prefix '{nested}:' inside the model name of '{v}'. " + f"Did you mean '{suggested}'? " + "Expected format: 'provider:model-name' with the provider given once." + ) return v diff --git a/app/features/agents/tests/test_config_validation.py b/app/features/agents/tests/test_config_validation.py index fd559b12..229514b1 100644 --- a/app/features/agents/tests/test_config_validation.py +++ b/app/features/agents/tests/test_config_validation.py @@ -38,6 +38,19 @@ def test_invalid_model_identifier_empty_provider(self): with pytest.raises(ValidationError, match="Unknown provider"): Settings(agent_default_model=":model-name") + def test_doubled_prefix_rejected_at_settings_boot(self): + """A doubled provider prefix fails Settings construction (issue #334).""" + with pytest.raises(ValidationError, match="Nested provider"): + Settings( + agent_default_model="anthropic:anthropic:claude-sonnet-4-5", + _env_file=None, + ) + + def test_ollama_tag_accepted_at_settings_boot(self): + """A multi-colon Ollama name:tag identifier constructs Settings fine.""" + settings = Settings(agent_default_model="ollama:llama3.1:8b", _env_file=None) + assert settings.agent_default_model == "ollama:llama3.1:8b" + class TestAPIKeyValidation: """Test API key validation for models.""" diff --git a/app/features/config/service.py b/app/features/config/service.py index b4967ef8..7242eec3 100644 --- a/app/features/config/service.py +++ b/app/features/config/service.py @@ -24,7 +24,7 @@ from sqlalchemy.dialects.postgresql import insert as pg_insert from sqlalchemy.ext.asyncio import AsyncSession -from app.core.config import get_settings +from app.core.config import get_settings, validate_model_identifier from app.core.logging import get_logger from app.features.agents.agents.base import reset_agent_caches from app.features.config.models import AppConfig @@ -46,6 +46,10 @@ # Scalar types an ``app_config`` override value may hold. OverrideValue = str | int | float | bool +# Override keys holding a "provider:model-name" identifier — re-validated at +# startup so a malformed persisted row never goes live (issue #334). +_MODEL_ID_KEYS = frozenset({"agent_default_model", "agent_fallback_model"}) + # ============================================================================= # Helpers @@ -190,6 +194,18 @@ async def apply_overrides_on_startup(db: AsyncSession) -> None: for key, value in overrides.items(): if key not in ALLOWED_OVERRIDE_KEYS: continue + if key in _MODEL_ID_KEYS and isinstance(value, str): + try: + validate_model_identifier(value) + except ValueError as exc: + # Skip-and-warn: leave the env/default value in effect — the + # function's no-crash-on-startup contract stands. + logger.warning( + "config.override_invalid_model_id", + key=key, + error=str(exc), + ) + continue setattr(settings, key, value) if key in SECRET_KEYS and isinstance(value, str): os.environ[SECRET_ENV_NAMES[key]] = value diff --git a/app/features/config/tests/test_routes.py b/app/features/config/tests/test_routes.py index 4675d145..4d3fb96a 100644 --- a/app/features/config/tests/test_routes.py +++ b/app/features/config/tests/test_routes.py @@ -117,6 +117,18 @@ def test_patch_rejects_invalid_model(self, client): response = client.patch("/config/ai", json={"agent_default_model": "nope"}) assert response.status_code == 422 + def test_patch_rejects_doubled_provider_prefix(self, client): + """A doubled provider prefix is a 422 problem+json naming the field (#334).""" + response = client.patch( + "/config/ai", + json={"agent_default_model": "google-gla:google-gla:gemini-3-flash-preview"}, + ) + assert response.status_code == 422 + body = response.json() + assert body["code"] == "VALIDATION_ERROR" + assert body["errors"][0]["field"] == "agent_default_model" + assert "Nested provider prefix" in body["errors"][0]["message"] + def test_patch_surfaces_dimension_conflict(self, client): """A 409 from the dimension guard propagates to the caller.""" with patch( diff --git a/app/features/config/tests/test_schemas.py b/app/features/config/tests/test_schemas.py index c1fd8712..6b3847af 100644 --- a/app/features/config/tests/test_schemas.py +++ b/app/features/config/tests/test_schemas.py @@ -42,6 +42,44 @@ def test_validate_model_identifier_rejects_unknown_provider(self): with pytest.raises(ValueError, match="Unknown provider"): validate_model_identifier("pinecone:model") + @pytest.mark.parametrize( + "identifier", + [ + "google-gla:google-gla:gemini-3-flash-preview", + "anthropic:anthropic:claude-sonnet-4-5", + "openai:openai:gpt-4o", + "google-vertex:google-vertex:gemini-2.0", + "ollama:ollama:llama3.1", + ], + ) + def test_rejects_doubled_provider_prefix(self, identifier): + """A doubled provider prefix inside the model name is rejected (issue #334).""" + with pytest.raises(ValueError, match="Nested provider prefix"): + validate_model_identifier(identifier) + + def test_rejects_mixed_provider_prefix(self): + """A different known provider nested in the model name is also rejected.""" + with pytest.raises(ValueError, match="Nested provider prefix"): + validate_model_identifier("openai:anthropic:claude-x") + + def test_error_message_suggests_correction(self): + """The rejection message tells the operator the likely intended id.""" + with pytest.raises(ValueError) as exc: + validate_model_identifier("google-gla:google-gla:gemini-3-flash-preview") + assert "Did you mean 'google-gla:gemini-3-flash-preview'?" in str(exc.value) + + @pytest.mark.parametrize( + "identifier", + ["ollama:llama3.1:8b", "ollama:gemma4:e2b", "ollama:qwen3:8b"], + ) + def test_accepts_ollama_tag_identifiers(self, identifier): + """Multi-colon Ollama name:tag identifiers stay valid.""" + assert validate_model_identifier(identifier) == identifier + + def test_accepts_model_named_like_provider(self): + """A model literally named like a provider (no colon after it) stays valid.""" + assert validate_model_identifier("ollama:openai") == "ollama:openai" + class TestAIModelConfigUpdate: """Tests for the PATCH /config/ai request body.""" @@ -93,6 +131,18 @@ def test_model_validate_json_path(self): assert upd.agent_temperature == 0.5 assert upd.agent_default_model == "ollama:llama3.1" + def test_rejects_doubled_prefix_via_model_validate(self): + """A doubled prefix fails through the validate_python (JSON) path too.""" + with pytest.raises(ValidationError, match="Nested provider prefix"): + AIModelConfigUpdate.model_validate({"agent_default_model": "google-gla:google-gla:x"}) + + def test_fallback_model_also_guarded(self): + """The fallback-model field is guarded by the same nested-prefix rule.""" + with pytest.raises(ValidationError, match="Nested provider prefix"): + AIModelConfigUpdate.model_validate( + {"agent_fallback_model": "anthropic:anthropic:claude-sonnet-4-5"} + ) + class TestResponseSchemas: """Tests for the response-only schemas.""" diff --git a/app/features/config/tests/test_service.py b/app/features/config/tests/test_service.py index 1bcdfd99..1c95c77d 100644 --- a/app/features/config/tests/test_service.py +++ b/app/features/config/tests/test_service.py @@ -193,6 +193,49 @@ async def test_update_config_secret_mirrored_to_environment(self): assert os.environ["OPENAI_API_KEY"] == "sk-new-openai-key-123" +# ============================================================================= +# Unit tests — apply_overrides_on_startup +# ============================================================================= + + +class TestApplyOverridesOnStartup: + """Tests for the startup override loader's model-id guard (issue #334).""" + + @pytest.mark.asyncio + async def test_startup_skips_invalid_model_id_override(self): + """A malformed persisted model id is skipped with a warning; valid keys apply.""" + settings = get_settings() + # Snapshot mutated fields and restore in finally so the mutation never + # leaks into another test (settings-singleton precedent, 24ed5cd). + original_model = settings.agent_default_model + original_temperature = settings.agent_temperature + try: + with ( + patch( + "app.features.config.service._load_overrides", + new=AsyncMock( + return_value={ + "agent_default_model": "google-gla:google-gla:x", + "agent_temperature": 0.5, + } + ), + ), + patch.object(service.logger, "warning") as mock_warning, + ): + await service.apply_overrides_on_startup(_mock_db()) + + # Malformed model id skipped — env/default value stays in effect. + assert settings.agent_default_model == original_model + # Valid keys in the same batch are still applied. + assert settings.agent_temperature == 0.5 + mock_warning.assert_called_once() + assert mock_warning.call_args.args[0] == "config.override_invalid_model_id" + assert mock_warning.call_args.kwargs["key"] == "agent_default_model" + finally: + settings.agent_default_model = original_model + settings.agent_temperature = original_temperature + + # ============================================================================= # Unit tests — provider health + ollama models # ============================================================================= From 07fdee4a26f19f10a12b204a5d9d0b07f84c4d5b Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Thu, 11 Jun 2026 20:53:23 +0200 Subject: [PATCH 12/44] docs(repo): track reliability E1 prp for doubled provider prefixes (#334) --- ...eliability-E1-doubled-provider-prefixes.md | 494 ++++++++++++++++++ 1 file changed, 494 insertions(+) create mode 100644 PRPs/PRP-reliability-E1-doubled-provider-prefixes.md diff --git a/PRPs/PRP-reliability-E1-doubled-provider-prefixes.md b/PRPs/PRP-reliability-E1-doubled-provider-prefixes.md new file mode 100644 index 00000000..0ca54bc4 --- /dev/null +++ b/PRPs/PRP-reliability-E1-doubled-provider-prefixes.md @@ -0,0 +1,494 @@ +name: "PRP — Reliability E1: reject doubled provider prefixes in agent model ids" +description: | + Foundation epic of umbrella #380 (platform reliability hardening). + Issue: #334 · Branch: `fix/config-reject-doubled-provider-prefix` off `dev` · Commit scope: `api` + (allow-list has no `config` scope; `app/core/*` + cross-feature config wiring = `api`, + precedent: `test(api)` 24ed5cd, `feat(api,ui)` db530d5 — both touched `app/features/config/`). + +--- + +## Goal + +Harden `validate_model_identifier` (`app/core/config.py:20`) so an agent model identifier whose +model-name part *starts with another known provider prefix* — e.g. +`google-gla:google-gla:gemini-3-flash-preview` — is **rejected at config time** (Pydantic +`ValueError` → RFC 7807 422 at the HTTP boundary, `ValidationError` at `Settings` boot), while +legitimate multi-colon Ollama tag identifiers (`ollama:llama3.1:8b`, `ollama:gemma4:e2b`) keep +validating. Additionally, `apply_overrides_on_startup` must *skip-and-warn* (never crash, never +apply) a malformed model-id override already persisted in the `app_config` table. + +**Deliverable:** one modified validator, one guarded startup loader, and regression tests at all +three layers the issue demands (schema-level, route-level 422, Settings-level) plus a +service-level startup-skip test. + +**Success definition:** every test in the Final Validation Checklist passes; the exact failure +from the issue (`PATCH /config/ai` returning 200 for a doubled-prefix id, then a Gemini 404 at +agent-run time) is impossible to reproduce. + +## Why + +- **Root cause removal.** A doubled prefix persisted via `PATCH /config/ai` produced + `404 NOT_FOUND — models/google-gla:gemini-3-flash-preview is not found` from Google *at + agent-run time* — the failure surfaced far from its cause and required hand-correcting the id + (issue #334). +- **Foundation for E2 (#335).** The fallback-failure-surfacing epic builds an error-classification + matrix; this fix removes one whole failure class first so E2 tests against a stable surface + (umbrella #380, Foundation ordering). +- **Upstream will never catch it.** Verified against installed pydantic-ai 1.96.0: + `parse_model_id` does `model.split(':', maxsplit=1)` and passes everything after the first + colon **verbatim** as the model name to the provider class. The validation burden is entirely + on this repo's config layer. + +## What + +### Behavior change + +| Input | Today | After | +|-------|-------|-------| +| `google-gla:google-gla:gemini-3-flash-preview` | PASS (bug) | **REJECT** — ValueError naming the nested prefix + suggesting `google-gla:gemini-3-flash-preview` | +| `anthropic:anthropic:claude-sonnet-4-5` | PASS (bug) | **REJECT** | +| `ollama:ollama:llama3.1` | PASS (bug) | **REJECT** | +| `openai:anthropic:claude-x` (mixed prefix) | PASS (bug) | **REJECT** (same malformation class; "Apply to all supported providers" per issue) | +| `ollama:llama3.1:8b` (Ollama `name:tag`) | PASS | PASS (unchanged) | +| `ollama:gemma4:e2b` | PASS | PASS (unchanged) | +| `ollama:openai` (model literally named like a provider, **no** colon after it) | PASS | PASS — narrow rule fires only on `provider:`-shaped *prefixes* inside the model name | +| `anthropic:claude-sonnet-4-5`, `openai:gpt-4o` | PASS | PASS (unchanged) | + +**Rejection rule (exact):** after `provider, model_name = v.split(":", 1)` and the existing +checks, reject iff `":" in model_name and model_name.split(":", 1)[0] in VALID_MODEL_PROVIDERS`. + +**Design decision — REJECT, not normalize.** The issue allows either. Rejection is chosen +because the real-world bug vector is the Settings UI: `ai-models-panel.tsx` *concatenates* +`${agentProvider}:${agentModel}` on save (`frontend/src/components/admin/ai-models-panel.tsx:99-111`), +so a user pasting a full `provider:model` id into the model field produces the doubled prefix. +Silent normalization would mask that input-bug class forever; a 422 with a "did you mean +`<corrected>`" message surfaces it immediately. (Frontend input sanitization = out of scope, +see below.) + +### Startup guard + +`apply_overrides_on_startup` (`app/features/config/service.py:166-202`) currently `setattr`s +persisted overrides with **no re-validation**. A doubled-prefix row persisted before this fix +would still go live in memory on every boot. Add a guard for exactly the two model-id keys: +validate, and on `ValueError` skip the key + `logger.warning("config.override_invalid_model_id", …)` +— the env/default value stays in effect. Never raise (the function's "never let config crash +startup" contract at `service.py:176-182` stands). + +### Success Criteria + +- [ ] `validate_model_identifier` rejects `<p1>:<p2>:rest` for every `p2 ∈ VALID_MODEL_PROVIDERS`, with an error message that names the offending id, identifies the nested prefix, and suggests the corrected id +- [ ] `ollama:llama3.1:8b`, `ollama:gemma4:e2b`, `ollama:openai` still validate +- [ ] `PATCH /config/ai` with `{"agent_default_model": "google-gla:google-gla:x"}` → **422** `application/problem+json` with `code="VALIDATION_ERROR"` and `errors[0].field == "agent_default_model"`; happy-path PATCH still 200 +- [ ] `Settings(agent_default_model="anthropic:anthropic:x", _env_file=None)` raises `ValidationError` +- [ ] `apply_overrides_on_startup` with a malformed persisted `agent_default_model` boots clean, leaves the `Settings` value untouched, and logs `config.override_invalid_model_id` +- [ ] All five validation gates green; no existing test modified to weaken (only extended) + +## All Needed Context + +### Documentation & References + +```yaml +# ── The defect and its single fix point ────────────────────────────────────── +- file: app/core/config.py + lines: 11-17, 20-59, 214-218 + why: | + VALID_MODEL_PROVIDERS tuple (the allow-list the new rule reuses), the + validate_model_identifier function to modify (split(":", 1) at line 44; the new check + slots after the provider check at line 55-58), and the Settings field_validator that + gives the fix Settings-boot coverage for free. Mirror the existing ValueError message + style: full offending id + expected format + concrete examples. + +# ── Second call site (free coverage, do not duplicate logic) ───────────────── +- file: app/features/config/schemas.py + lines: 93-125 + why: | + AIModelConfigUpdate._check_model_identifier delegates to the same function — DO NOT + add slice-local logic; the fix in app/core/config.py covers PATCH /config/ai + automatically. Note ConfigDict(strict=True) at line 99: both model fields are + `str | None` (JSON-native), so NO Field(strict=False) override is needed and + app/core/tests/test_strict_mode_policy.py stays green. + +# ── Startup loader to guard ────────────────────────────────────────────────── +- file: app/features/config/service.py + lines: 166-202 + why: | + apply_overrides_on_startup: the setattr loop to guard (only for keys + agent_default_model / agent_fallback_model). Mirror the existing structlog pattern: + logger.warning("config.overrides_load_failed", error=str(exc), error_type=...) at + 176-182 and logger.info("config.overrides_applied", keys=...) at 201. Also read + 204-266 (update_config): the service does NOT re-validate — the schema boundary owns + validation; keep it that way. + +# ── Error shape a route test must assert ───────────────────────────────────── +- file: app/core/exceptions.py + lines: 293-336 + why: | + RequestValidationError → RFC 7807 handler. Pydantic ValueError inside a field + validator surfaces as 422 problem+json: + {type:"/errors/validation", title:"Validation Error", status:422, + code:"VALIDATION_ERROR", errors:[{field:"agent_default_model", + message:"Value error, …", type:"value_error"}], request_id:"…"} + +# ── Test patterns to mirror (extend, never weaken) ─────────────────────────── +- file: app/features/config/tests/test_schemas.py + lines: 16-95 + why: | + TestValidateModelIdentifier (direct validator tests via pytest.raises(ValueError, + match=...)) — add the new cases here. TestAIModelConfigUpdate + the + test_model_validate_json_path pattern (strict-mode policy: at least one case through + Model.model_validate({...}) — FastAPI's validate_python path). +- file: app/features/config/tests/test_routes.py + lines: 90-129 + why: | + TestUpdateAIConfig.test_patch_rejects_invalid_model — the PATCH→422 pattern + the + conftest client fixture. Add doubled-prefix 422 case asserting the problem-details + fields, and keep one 200 happy path. +- file: app/features/agents/tests/test_config_validation.py + lines: 10-48 + why: | + TestModelIdentifierValidation — Settings-level coverage pattern: + Settings(agent_default_model=..., _env_file=None) + pytest.raises(ValidationError, + match=...). The _env_file=None kwarg is MANDATORY (see gotcha below). +- file: app/features/config/tests/test_service.py + lines: 139-194 + why: | + Async service-test pattern (_mock_db style) for the new startup-skip test. ALSO: + settings-singleton teardown precedent (commit 24ed5cd "test(api): scope teardown + deletes and restore mutated settings singleton") — any test that mutates the cached + Settings must restore it. + +# ── Consumers that must keep working unchanged (read-only context) ─────────── +- file: app/features/agents/agents/base.py + lines: 132-165, 329 + why: | + build_agent_model splits provider/model with split(":", 1) — for ollama it passes + "qwen3:8b"-style names to OpenAIChatModel. The fix changes NOTHING here; these are + the call sites the validator protects. Fixtures "ollama:qwen3:8b" at + app/features/agents/tests/test_base.py:458, test_service.py:539/647, and + app/features/demo/tests/test_pipeline.py:1815 MUST still pass. +- file: frontend/src/components/admin/ai-models-panel.tsx + lines: 43-58, 99-111 + why: | + Read-only context — the bug vector. deriveForm splits provider off the stored id; + saveAgent reconstructs `${provider}:${model}`. No frontend change in this PRP; the + 422 message is what the operator sees. + +# ── External references (verified 2026-06-11) ──────────────────────────────── +- url: https://pydantic.dev/docs/ai/models/overview/ + section: "Models and Providers" + why: documents the `<provider>:<model>` identifier contract PydanticAI consumes +- url: https://docs.pydantic.dev/latest/concepts/validators/#field-validators + why: field_validator semantics — the existing validators run mode="after" on typed str values +- url: https://github.com/ollama/ollama/blob/main/docs/api.md + section: "Model names" + why: | + "Model names follow a model:tag format … tag optional, defaults to latest" — the + documented reason `ollama:<name>:<tag>` is a LEGITIMATE 2-colon identifier and a + blanket "no second colon" rule is WRONG. +``` + +### Current Codebase tree (relevant subset) + +``` +app/core/config.py # validate_model_identifier + Settings ← MODIFY +app/core/exceptions.py # 422 problem-details handler (read) +app/core/tests/test_config.py # Settings defaults tests (read) +app/features/config/ + schemas.py # AIModelConfigUpdate (delegates) (no change) + service.py # apply_overrides_on_startup ← MODIFY + routes.py # PATCH /config/ai (no change) + tests/ + test_schemas.py # TestValidateModelIdentifier ← EXTEND + test_routes.py # TestUpdateAIConfig ← EXTEND + test_service.py # service/startup tests ← EXTEND +app/features/agents/tests/ + test_config_validation.py # Settings-level validator tests ← EXTEND +``` + +### Desired Codebase tree + +No new files. Four test files extended, two source files modified. No migration (the +`app_config` table schema is untouched; malformed *rows* are handled by skip-and-warn, not +backfill). + +### Known Gotchas & Library Quirks + +```python +# ── VERIFIED LIBRARY CLAIM #1: pydantic-ai passes the post-colon remainder verbatim ── +# pydantic-ai 1.96.0: parse_model_id does model.split(':', maxsplit=1) — provider is +# everything before the FIRST colon; the rest is the model name, sent to the provider +# unchecked. There is NO upstream guard for doubled prefixes (checked pydantic-ai issues; +# none open). Re-verify on pydantic-ai upgrade with: +# uv run python -c " +# import inspect, pydantic_ai, pydantic_ai.models as m +# print(pydantic_ai.__version__) +# print([l.strip() for l in inspect.getsource(m.parse_model_id).splitlines() if 'split' in l])" +# # → ["provider_name, model_name = model.split(':', maxsplit=1)"] + +# ── VERIFIED BUG REPRO (pre-fix baseline, run 2026-06-11) ──────────────────────────── +# uv run python -c " +# from app.core.config import validate_model_identifier as v +# v('google-gla:google-gla:gemini-3-flash-preview') # currently returns — the bug +# v('ollama:llama3.1:8b') # returns — MUST STAY VALID" +# After the fix the first call must raise ValueError; the second must not. + +# ── GOTCHA: Ollama tags make "reject any 2nd colon" WRONG ──────────────────────────── +# ollama model ids are name:tag (docs/api.md "Model names"), so ollama:llama3.1:8b and +# ollama:gemma4:e2b are real, working ids used in tests/fixtures TODAY +# (agents/tests/test_base.py:458, agents/tests/test_service.py:539,647, +# demo/tests/test_pipeline.py:1815). The rule must check whether the FIRST SEGMENT of +# model_name is a KNOWN PROVIDER followed by ':' — nothing broader. + +# ── GOTCHA: .env bleed into Settings tests ─────────────────────────────────────────── +# Settings() reads .env via SettingsConfigDict(env_file=".env"). Every Settings-level +# test MUST pass _env_file=None (RUNBOOKS "Settings tests fail because they pick up the +# local .env"; existing pattern in agents/tests/test_config_validation.py). + +# ── GOTCHA: get_settings() is lru_cached; tests that mutate it must restore it ─────── +# Precedent commit 24ed5cd. The startup-skip test mutates/reads the singleton — use the +# existing teardown/fixture pattern in app/features/config/tests/ (conftest.py) so later +# tests see pristine settings. + +# ── GOTCHA: strict-mode policy on request bodies ───────────────────────────────────── +# AIModelConfigUpdate keeps ConfigDict(strict=True). Both model-id fields are str|None → +# JSON-native, no Field(strict=False) needed. Include ≥1 new case through +# AIModelConfigUpdate.model_validate({...}) (the validate_python path) per +# .claude/rules/security-patterns.md § strict mode. + +# ── GOTCHA: error-message style is load-bearing for UX ─────────────────────────────── +# Existing messages embed the full offending id + expected format + examples +# (config.py:38-58). The new message must also tell the operator the LIKELY FIX: +# "Nested provider prefix 'google-gla:' inside the model name of +# 'google-gla:google-gla:gemini-3-flash-preview'. Did you mean +# 'google-gla:gemini-3-flash-preview'?" +# The suggestion = v with the first duplicated segment dropped (provider + rest after the +# nested prefix). Keep it deterministic — tests match on substrings of this message. + +# ── GOTCHA: repo has mixed CRLF/LF line endings ────────────────────────────────────── +# Check `git diff --stat` after editing: if a file shows ~all lines changed, your editor +# rewrote line endings — re-edit preserving the file's existing endings. + +# ── GOTCHA: never raise from apply_overrides_on_startup ───────────────────────────── +# Its contract is "config must never crash startup" (service.py docstring + main.py +# lifespan try/except). The guard SKIPS the bad key and warns — it must not raise. +``` + +## Implementation Blueprint + +### Data models and structure + +No new models. One pure-function change + one loader guard: + +```python +# app/core/config.py — inside validate_model_identifier, AFTER the existing +# unknown-provider check (line 55-58), add: + + # Reject a nested/doubled provider prefix inside the model name + # (issue #334: 'google-gla:google-gla:gemini-…' → Gemini 404 at run time). + # Ollama tags ('ollama:llama3.1:8b') stay valid: 'llama3.1' is not a provider. + if ":" in model_name: + nested = model_name.split(":", 1)[0] + if nested in VALID_MODEL_PROVIDERS: + suggested = f"{provider}:{model_name.split(':', 1)[1]}" + raise ValueError( + f"Nested provider prefix '{nested}:' inside the model name of '{v}'. " + f"Did you mean '{suggested}'? " + "Expected format: 'provider:model-name' with the provider given once." + ) +``` + +```python +# app/features/config/service.py — apply_overrides_on_startup, inside the +# `for key, value in overrides.items():` loop, before setattr: + +_MODEL_ID_KEYS = frozenset({"agent_default_model", "agent_fallback_model"}) # module level + + if key in _MODEL_ID_KEYS and isinstance(value, str): + try: + validate_model_identifier(value) + except ValueError as exc: + logger.warning( + "config.override_invalid_model_id", + key=key, + error=str(exc), + ) + continue # leave env/default in effect; do NOT crash startup +``` + +(`validate_model_identifier` is already imported in the slice via `schemas.py`; `service.py` +needs its own import from `app.core.config`.) + +### Tasks (in order) + +```yaml +Task 1: +MODIFY app/core/config.py: + - FIND: validate_model_identifier, the `if provider not in VALID_MODEL_PROVIDERS:` block + - INJECT the nested-prefix check AFTER it (so 'pinecone:pinecone:x' still reports + "Unknown provider" first — provider validity outranks shape) + - UPDATE the docstring: add the new ValueError condition + an ollama-tag example + ("ollama:llama3.1:8b stays valid") so the docstring documents the carve-out + +Task 2: +MODIFY app/features/config/service.py: + - ADD import: from app.core.config import validate_model_identifier + - ADD module-level _MODEL_ID_KEYS frozenset next to ALLOWED_OVERRIDE_KEYS usage + - INJECT the skip-and-warn guard in apply_overrides_on_startup per pseudocode + - PRESERVE: the function must never raise; logger event name + "config.override_invalid_model_id" (structlog kwargs style, key NAMES only — never + log secret values, per security-patterns.md) + +Task 3: +EXTEND app/features/config/tests/test_schemas.py (class TestValidateModelIdentifier): + - test_rejects_doubled_provider_prefix: parametrize over + "google-gla:google-gla:gemini-3-flash-preview", + "anthropic:anthropic:claude-sonnet-4-5", + "openai:openai:gpt-4o", + "google-vertex:google-vertex:gemini-2.0", + "ollama:ollama:llama3.1" + → pytest.raises(ValueError, match="Nested provider prefix") + - test_rejects_mixed_provider_prefix: "openai:anthropic:claude-x" → same match + - test_error_message_suggests_correction: match the literal suggested id + "google-gla:gemini-3-flash-preview" in the message + - test_accepts_ollama_tag_identifiers: parametrize "ollama:llama3.1:8b", + "ollama:gemma4:e2b", "ollama:qwen3:8b" → returns input unchanged + - test_accepts_model_named_like_provider: "ollama:openai" → returns unchanged + EXTEND class TestAIModelConfigUpdate: + - test_rejects_doubled_prefix_via_model_validate: AIModelConfigUpdate.model_validate( + {"agent_default_model": "google-gla:google-gla:x"}) → pydantic.ValidationError + (this is the strict-mode JSON-path case) + - test_fallback_model_also_guarded: same via agent_fallback_model + +Task 4: +EXTEND app/features/agents/tests/test_config_validation.py (TestModelIdentifierValidation): + - test_doubled_prefix_rejected_at_settings_boot: + Settings(agent_default_model="anthropic:anthropic:claude-sonnet-4-5", + _env_file=None) → pytest.raises(ValidationError, match="Nested provider") + - test_ollama_tag_accepted_at_settings_boot: + Settings(agent_default_model="ollama:llama3.1:8b", _env_file=None) constructs + +Task 5: +EXTEND app/features/config/tests/test_routes.py (TestUpdateAIConfig): + - test_patch_rejects_doubled_provider_prefix: PATCH /config/ai + {"agent_default_model": "google-gla:google-gla:gemini-3-flash-preview"} → + assert 422; body["code"] == "VALIDATION_ERROR"; + body["errors"][0]["field"] == "agent_default_model"; + "Nested provider prefix" in body["errors"][0]["message"] + (mirror the existing test_patch_rejects_invalid_model fixture usage exactly) + - keep/confirm one 200 happy-path PATCH with a valid id in the same class + +Task 6: +EXTEND app/features/config/tests/test_service.py: + - test_startup_skips_invalid_model_id_override: + mock _load_overrides → {"agent_default_model": "google-gla:google-gla:x", + "agent_temperature": 0.5} + run apply_overrides_on_startup; assert settings.agent_default_model UNCHANGED, + settings.agent_temperature == 0.5 (valid keys still applied), + and the warning event fired (capture structlog or patch logger). + RESTORE the settings singleton in teardown (follow conftest/24ed5cd pattern). + +Task 7 (docs, same commit or follow-up in the PR): + - docs/_base/RUNBOOKS.md gets NO new incident entry (the failure class is now impossible + at the boundary), but IF the PR reviewer wants operator visibility, add one line to + the existing agent-incident notes pointing at the 422 message. OPTIONAL — skip by + default to keep the diff minimal. +``` + +### Integration Points + +```yaml +DATABASE: none — no migration; malformed persisted rows handled by skip-and-warn +ROUTES: none — PATCH /config/ai unchanged; validation fires in the existing schema boundary +CONFIG: app/core/config.py validator change propagates to BOTH call sites automatically +FRONTEND: none in this PRP — the 422 problem-details message is the operator surface +``` + +## Validation Loop + +### Level 1: Syntax & Style + +```bash +uv run ruff check app/core/config.py app/features/config/ app/features/agents/tests/test_config_validation.py +uv run ruff format --check . +uv run mypy app/ && uv run pyright app/ # both --strict; zero new errors +``` + +### Level 2: Unit tests (no DB) + +```bash +uv run pytest -v \ + app/features/config/tests/test_schemas.py \ + app/features/config/tests/test_routes.py \ + app/features/config/tests/test_service.py \ + app/features/agents/tests/test_config_validation.py +# Then the full unit gate — proves no consumer fixture broke (ollama:qwen3:8b et al.): +uv run pytest -v -m "not integration" +``` + +### Level 3: Integration (live API) + +```bash +docker compose up -d && uv run uvicorn app.main:app --port 8123 & # or existing stack +# Doubled prefix → 422 problem+json naming the field: +curl -si -X PATCH http://localhost:8123/config/ai \ + -H 'Content-Type: application/json' \ + -d '{"agent_default_model":"google-gla:google-gla:gemini-3-flash-preview"}' \ + | head -20 # expect: HTTP/1.1 422, application/problem+json, "Nested provider prefix" +# Happy path still works: +curl -si -X PATCH http://localhost:8123/config/ai \ + -H 'Content-Type: application/json' \ + -d '{"agent_default_model":"anthropic:claude-sonnet-4-5"}' | head -5 # expect 200 +# Ollama tag still accepted: +curl -si -X PATCH http://localhost:8123/config/ai \ + -H 'Content-Type: application/json' \ + -d '{"agent_default_model":"ollama:llama3.1:8b"}' | head -5 # expect 200 +``` + +### Level 4 (optional dogfood): Settings UI + +Open `/settings` (AI Models panel), paste a full `google-gla:gemini-…` id into the model field +with provider already selected, save → the UI should surface the 422 with the "Did you mean" +message instead of silently persisting a broken id. + +## Final Validation Checklist + +- [ ] `uv run ruff check . && uv run ruff format --check .` clean +- [ ] `uv run mypy app/ && uv run pyright app/` clean (strict) +- [ ] `uv run pytest -v -m "not integration"` green — including the pre-existing + `ollama:qwen3:8b` fixtures in agents/demo tests (untouched) +- [ ] New tests cover: 5 doubled-prefix rejections, mixed-prefix rejection, message + suggestion, 3 ollama-tag acceptances, `ollama:openai` edge, JSON-path schema case, + Settings boot rejection + acceptance, route 422 + 200, startup skip-and-warn +- [ ] Level-3 curl matrix matches expected statuses +- [ ] `git diff --stat` shows surgical diffs (no whole-file line-ending churn) +- [ ] Commits: `fix(api): reject doubled provider prefixes in agent model ids (#334)` + (+ `test(api): …` if tests are split out); no AI trailers +- [ ] PR into `dev` from `fix/config-reject-doubled-provider-prefix`; CI green + +--- + +## Out of Scope (this PRP) + +- **Frontend input sanitization** in `ai-models-panel.tsx` (stripping a pasted provider + prefix client-side) — the 422 message now surfaces the mistake; promote to its own issue + if dogfood shows operators hitting it often. +- **#335 fallback-failure surfacing** — E2 of umbrella #380, builds on this foundation. +- **Normalization mode** (silently collapsing the prefix) — rejected by design decision above. +- **Backfill/migration of existing malformed `app_config` rows** — skip-and-warn at startup + + a corrective PATCH is sufficient for a single-host system. + +## Anti-Patterns to Avoid + +- ❌ Don't reject every id containing a second colon — Ollama tags are legitimate. +- ❌ Don't duplicate the check in `app/features/config/schemas.py` — one source of truth. +- ❌ Don't let `apply_overrides_on_startup` raise — its no-crash contract is load-bearing. +- ❌ Don't log override *values* alongside secrets handling paths — key names + error text only. +- ❌ Don't forget `_env_file=None` in Settings tests — .env bleed is a known incident class. +- ❌ Don't weaken or rewrite existing validator tests — extend the existing classes. + +--- + +**One-pass confidence score: 9/10** — single pure-function change with two pre-wired call +sites, all consumers verified tolerant (`split(":", 1)` everywhere), bug + carve-out both +runtime-verified, test patterns mirrored from named existing classes. The startup-guard test +is the only piece with moderate friction (settings-singleton restore discipline). From a060ff66a1bf43b422018a30d2356e628d8b19a8 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Thu, 11 Jun 2026 22:02:40 +0200 Subject: [PATCH 13/44] fix(agents,api): surface fallback model failures with classified details (#335) --- app/core/exceptions.py | 46 ++++- app/core/problem_details.py | 27 ++- app/core/tests/test_problem_details.py | 77 ++++++++ app/features/agents/failures.py | 156 +++++++++++++++ app/features/agents/schemas.py | 30 +++ app/features/agents/service.py | 48 ++++- app/features/agents/tests/test_failures.py | 212 +++++++++++++++++++++ app/features/agents/tests/test_routes.py | 51 +++++ app/features/agents/tests/test_service.py | 178 ++++++++++++++++- docs/_base/API_CONTRACTS.md | 4 +- 10 files changed, 823 insertions(+), 6 deletions(-) create mode 100644 app/core/tests/test_problem_details.py create mode 100644 app/features/agents/failures.py create mode 100644 app/features/agents/tests/test_failures.py diff --git a/app/core/exceptions.py b/app/core/exceptions.py index 1e6279ea..d67d9a0b 100644 --- a/app/core/exceptions.py +++ b/app/core/exceptions.py @@ -10,6 +10,7 @@ from app.core.logging import get_logger from app.core.problem_details import ( + AGENT_FALLBACK_EXHAUSTED_CODE, EMBEDDING_AUTH_CODE, ERROR_TYPES, ProblemDetailResponse, @@ -40,6 +41,7 @@ def __init__( code: str = "INTERNAL_ERROR", status_code: int = 500, details: dict[str, Any] | None = None, + extensions: dict[str, Any] | None = None, ) -> None: """Initialize application error. @@ -47,13 +49,19 @@ def __init__( message: Human-readable error message. code: Machine-readable error code. status_code: HTTP status code. - details: Additional error context. + details: Additional error context. LOG-ONLY — the exception + handler never copies it into the response body (it may carry + internals). + extensions: RFC 7807 extension members the handler DOES merge + into the problem+json response body (#335). Only put + client-safe, already-sanitized data here. """ super().__init__(message) self.message = message self.code = code self.status_code = status_code self.details = details or {} + self.extensions = extensions or {} @property def title(self) -> str: @@ -254,6 +262,41 @@ def __init__( ) +class AgentFallbackExhaustedError(ForecastLabError): + """502 — every model in the agent's fallback chain failed (issue #335). + + Raised when the PydanticAI ``FallbackModel`` chain (or a single configured + model) fails with provider-API errors on every leg. Mirrors + :class:`EmbeddingProviderAuthError`: keeps the public status at 502 (an + upstream failure from the caller's perspective) and emits a + *machine-readable* ``AGENT_FALLBACK_EXHAUSTED`` problem ``type``/``code`` + so clients can classify it. The per-model classified failures ride the + response-visible ``extensions`` channel as a ``failures`` member — + ``details`` stays log-only by design. + """ + + error_type_uri: str = ERROR_TYPES[AGENT_FALLBACK_EXHAUSTED_CODE] + + def __init__( + self, + message: str, + failures: list[dict[str, Any]], + ) -> None: + """Initialize with the human summary and classified per-model legs. + + Args: + message: Human-actionable summary (already secret-safe). + failures: Serialized ``ModelFailureDetail`` dicts — sanitized + upstream by the classifier; surfaced verbatim to the client. + """ + super().__init__( + message=message, + code=AGENT_FALLBACK_EXHAUSTED_CODE, + status_code=502, + extensions={"failures": failures}, + ) + + # ============================================================================= # Exception Handlers (RFC 7807) # ============================================================================= @@ -287,6 +330,7 @@ async def forecastlab_exception_handler( title=exc.title, detail=exc.message, error_code=exc.code, + extensions=exc.extensions or None, ) diff --git a/app/core/problem_details.py b/app/core/problem_details.py index f8bba455..1078a1b9 100644 --- a/app/core/problem_details.py +++ b/app/core/problem_details.py @@ -29,6 +29,11 @@ # demo pipeline's classifier) so the marker never drifts between the two. EMBEDDING_AUTH_CODE = "EMBEDDING_AUTH" +# Machine-readable code for an exhausted agent model fallback chain (#335). +# Single source of truth shared by the producer (AgentFallbackExhaustedError) +# and any consumer classifying the 502 — mirrors EMBEDDING_AUTH_CODE. +AGENT_FALLBACK_EXHAUSTED_CODE = "AGENT_FALLBACK_EXHAUSTED" + ERROR_TYPES = { "NOT_FOUND": f"{ERROR_TYPE_BASE}/not-found", "VALIDATION_ERROR": f"{ERROR_TYPE_BASE}/validation", @@ -43,8 +48,16 @@ "SERVICE_UNAVAILABLE": f"{ERROR_TYPE_BASE}/service-unavailable", "GATEWAY_TIMEOUT": f"{ERROR_TYPE_BASE}/gateway-timeout", EMBEDDING_AUTH_CODE: f"{ERROR_TYPE_BASE}/embedding-auth", + AGENT_FALLBACK_EXHAUSTED_CODE: f"{ERROR_TYPE_BASE}/agent-fallback-exhausted", } +# RFC 7807 extension members may never shadow the spec/base fields the +# ProblemDetail schema already owns — reserved keys are dropped from any +# `extensions` merge in problem_response (#335). +_RESERVED_PROBLEM_KEYS = frozenset( + {"type", "title", "status", "detail", "instance", "errors", "code", "request_id"} +) + # ============================================================================= # Problem Detail Schema @@ -172,6 +185,7 @@ def problem_response( detail: str | None = None, error_code: str = "INTERNAL_ERROR", errors: list[dict[str, Any]] | None = None, + extensions: dict[str, Any] | None = None, ) -> ProblemDetailResponse: """Create a ProblemDetailResponse with proper content type. @@ -181,6 +195,9 @@ def problem_response( detail: Detailed explanation (optional). error_code: Internal error code for type URI lookup. errors: Field-level validation errors (optional). + extensions: Additional RFC 7807 extension members merged into the + response body (optional, #335). Reserved base-field keys are + silently dropped — extensions can never shadow type/status/etc. Returns: JSONResponse with problem+json content type. """ @@ -192,9 +209,17 @@ def problem_response( errors=errors, ) + # Merge on the serialized dict (not ProblemDetail(**extensions)) so + # arbitrary extension payloads never fight the pydantic constructor. + content = problem.model_dump(exclude_none=True) + if extensions: + content.update( + {key: value for key, value in extensions.items() if key not in _RESERVED_PROBLEM_KEYS} + ) + return ProblemDetailResponse( status_code=status, - content=problem.model_dump(exclude_none=True), + content=content, ) diff --git a/app/core/tests/test_problem_details.py b/app/core/tests/test_problem_details.py new file mode 100644 index 00000000..af00213f --- /dev/null +++ b/app/core/tests/test_problem_details.py @@ -0,0 +1,77 @@ +"""Unit tests for RFC 7807 problem_response extension members (issue #335). + +The `extensions` channel lets a ForecastLabError surface client-safe data +(e.g. classified per-model failures) in the problem+json body without going +through the log-only `details` attribute. +""" + +import json +from typing import Any + +from app.core.problem_details import problem_response + + +def _body(response: Any) -> dict[str, Any]: + """Decode a ProblemDetailResponse body.""" + decoded: dict[str, Any] = json.loads(response.body) + return decoded + + +def test_problem_response_without_extensions_unchanged() -> None: + """Default (no extensions) output keeps the existing shape exactly.""" + response = problem_response( + status=404, + title="Not Found", + detail="Resource not found", + error_code="NOT_FOUND", + ) + + body = _body(response) + assert response.status_code == 404 + assert body["status"] == 404 + assert body["code"] == "NOT_FOUND" + assert body["type"] == "/errors/not-found" + assert "failures" not in body + + +def test_problem_response_merges_extensions() -> None: + """Extension members are merged into the serialized body.""" + response = problem_response( + status=502, + title="Agent Fallback Exhausted", + detail="All configured agent models failed", + error_code="AGENT_FALLBACK_EXHAUSTED", + extensions={"failures": [{"model_name": "m1", "reason": "model_not_found"}]}, + ) + + body = _body(response) + assert body["code"] == "AGENT_FALLBACK_EXHAUSTED" + assert body["type"] == "/errors/agent-fallback-exhausted" + assert body["failures"] == [{"model_name": "m1", "reason": "model_not_found"}] + + +def test_problem_response_extensions_cannot_override_reserved() -> None: + """Reserved base-field keys in extensions are silently dropped.""" + response = problem_response( + status=502, + title="Agent Fallback Exhausted", + detail="real detail", + error_code="AGENT_FALLBACK_EXHAUSTED", + extensions={ + "status": 200, + "code": "HACK", + "detail": "spoofed", + "type": "about:blank", + "title": "spoofed", + "safe_key": "kept", + }, + ) + + body = _body(response) + assert response.status_code == 502 + assert body["status"] == 502 + assert body["code"] == "AGENT_FALLBACK_EXHAUSTED" + assert body["detail"] == "real detail" + assert body["type"] == "/errors/agent-fallback-exhausted" + assert body["title"] == "Agent Fallback Exhausted" + assert body["safe_key"] == "kept" diff --git a/app/features/agents/failures.py b/app/features/agents/failures.py new file mode 100644 index 00000000..57b56803 --- /dev/null +++ b/app/features/agents/failures.py @@ -0,0 +1,156 @@ +"""Classify provider-API model failures into secret-safe, actionable details. + +When every model in the PydanticAI ``FallbackModel`` chain fails (or a single +configured model fails with a provider error), the raw exception surface is an +opaque one-liner (``All models from FallbackModel failed (2 sub-exceptions)``) +and ``str(ModelHTTPError)`` embeds the provider response body verbatim — a +secret-leak risk. This module turns that exception tree into a list of +:class:`ModelFailureDetail` entries plus a deterministic human summary that the +chat UI renders as-is (issue #335). + +Pure functions only — fully unit-testable without a DB or network. +""" + +from __future__ import annotations + +import re + +from pydantic_ai.exceptions import ModelAPIError, ModelHTTPError +from pydantic_ai.models.fallback import ResponseRejected + +from app.features.agents.schemas import FailureReason, ModelFailureDetail + +# Secret-shaped substrings scrubbed from any surfaced provider message. +# Issue #335 hard constraint: no API keys / Bearer tokens, ever. +_SECRET_PATTERNS: tuple[re.Pattern[str], ...] = ( + re.compile(r"AIza[0-9A-Za-z_\-]{10,}"), # Google API keys + re.compile(r"sk-[A-Za-z0-9_\-]{10,}"), # OpenAI/Anthropic-style keys + re.compile(r"(?i)bearer\s+[A-Za-z0-9._\-]+"), # Authorization bearer tokens + re.compile(r"(?i)(api[_-]?key|token|authorization)[=:]\s*\S+"), +) + +# Cap on the surfaced per-model detail string. +_MAX_DETAIL_LEN = 300 + +# Placeholder model names for failures that carry none. +_RESPONSE_REJECTED_MODEL = "(response rejected)" +_UNKNOWN_MODEL = "(unknown model)" + +# Human labels for the summary string (rendered verbatim by the chat UI). +_REASON_LABELS: dict[FailureReason, str] = { + "model_not_found": "model not found / invalid model name", + "quota_exhausted": "quota or rate limit exhausted", + "auth_error": "authentication/permission error", + "provider_unavailable": "provider unavailable", + "provider_error": "provider error", + "response_rejected": "response rejected", + "unknown": "unexpected failure", +} + + +def _sanitize(text: str) -> str: + """Scrub secret-shaped substrings, then truncate to the detail cap.""" + for pattern in _SECRET_PATTERNS: + text = pattern.sub("[redacted]", text) + return text[:_MAX_DETAIL_LEN] + + +def _provider_message(body: object | None) -> str: + """Extract the provider's human message from an HTTP error body. + + Handles the Google/OpenAI ``{"error": {"message": ...}}`` shape; a plain + string passes through; anything else is stringified. Callers MUST pass the + result through :func:`_sanitize` before surfacing it. + """ + if body is None: + return "" + if isinstance(body, dict): + error = body.get("error") + if isinstance(error, dict): + message = error.get("message") + if isinstance(message, str): + return message + if isinstance(body, str): + return body + return str(body) + + +def _classify_http_status(status_code: int) -> FailureReason: + """Map an HTTP status to the issue #335 reason taxonomy.""" + if status_code == 404: + return "model_not_found" + if status_code == 429: + return "quota_exhausted" + if status_code in (401, 403): + return "auth_error" + if status_code >= 500: + return "provider_unavailable" + return "provider_error" + + +def classify_model_failures(exc: BaseException) -> list[ModelFailureDetail]: + """Flatten an exception (group) into classified per-model failures. + + ``FallbackExceptionGroup.exceptions`` is a tuple and sub-groups can nest — + recurse into groups and classify only the leaves, preserving leg order. + """ + if isinstance(exc, BaseExceptionGroup): + details: list[ModelFailureDetail] = [] + for member in exc.exceptions: + details.extend(classify_model_failures(member)) + return details + if isinstance(exc, ModelHTTPError): + return [ + ModelFailureDetail( + model_name=exc.model_name, + status_code=exc.status_code, + reason=_classify_http_status(exc.status_code), + detail=_sanitize(_provider_message(exc.body)), + ) + ] + if isinstance(exc, ResponseRejected): + return [ + ModelFailureDetail( + model_name=_RESPONSE_REJECTED_MODEL, + status_code=None, + reason="response_rejected", + detail=_sanitize(str(exc)), + ) + ] + if isinstance(exc, ModelAPIError): + return [ + ModelFailureDetail( + model_name=exc.model_name, + status_code=None, + reason="provider_error", + detail=_sanitize(str(exc)), + ) + ] + return [ + ModelFailureDetail( + model_name=_UNKNOWN_MODEL, + status_code=None, + reason="unknown", + detail=_sanitize(str(exc)), + ) + ] + + +def summarize_model_failures(failures: list[ModelFailureDetail]) -> str: + """Build the deterministic human summary the chat UI renders verbatim. + + One failure → ``The configured agent model failed — {leg}``; several → + ``All configured agent models failed — {leg}; {leg}; …`` where each leg is + ``{model_name}: {label} (HTTP {status_code})`` (HTTP part omitted when the + failure was not HTTP-shaped). + """ + legs: list[str] = [] + for failure in failures: + leg = f"{failure.model_name}: {_REASON_LABELS[failure.reason]}" + if failure.status_code is not None: + leg = f"{leg} (HTTP {failure.status_code})" + legs.append(leg) + joined = "; ".join(legs) + if len(failures) == 1: + return f"The configured agent model failed — {joined}" + return f"All configured agent models failed — {joined}" diff --git a/app/features/agents/schemas.py b/app/features/agents/schemas.py index 69b74261..f6f02724 100644 --- a/app/features/agents/schemas.py +++ b/app/features/agents/schemas.py @@ -301,6 +301,33 @@ class CompleteEvent(BaseModel): tool_calls_count: int +FailureReason = Literal[ + "model_not_found", + "quota_exhausted", + "auth_error", + "provider_unavailable", + "provider_error", + "response_rejected", + "unknown", +] + + +class ModelFailureDetail(BaseModel): + """One classified per-model failure from a FallbackModel chain (issue #335). + + Args: + model_name: Provider-prefixed model identifier that failed. + status_code: HTTP status from the provider, when the failure was HTTP. + reason: Machine-readable failure classification. + detail: Sanitized + truncated provider message — NEVER the raw body. + """ + + model_name: str + status_code: int | None = None + reason: FailureReason + detail: str = "" + + class ErrorEvent(BaseModel): """Error event. @@ -308,11 +335,14 @@ class ErrorEvent(BaseModel): error: Error message. error_type: Type of error. recoverable: Whether the session can continue. + failures: Classified per-model failures when ``error_type`` is + ``fallback_exhausted`` (issue #335); ``None`` otherwise. """ error: str error_type: str recoverable: bool = True + failures: list[ModelFailureDetail] | None = None # ============================================================================= diff --git a/app/features/agents/service.py b/app/features/agents/service.py index 6372fd9c..ba865e9f 100644 --- a/app/features/agents/service.py +++ b/app/features/agents/service.py @@ -22,13 +22,15 @@ import structlog from pydantic_ai import Agent, capture_run_messages -from pydantic_ai.exceptions import UnexpectedModelBehavior +from pydantic_ai.exceptions import FallbackExceptionGroup, ModelAPIError, UnexpectedModelBehavior from pydantic_ai.messages import ModelMessage, ModelMessagesTypeAdapter, ToolReturnPart from sqlalchemy import select from sqlalchemy.ext.asyncio import AsyncSession from app.core.config import get_settings +from app.core.exceptions import AgentFallbackExhaustedError from app.features.agents.deps import AgentDeps +from app.features.agents.failures import classify_model_failures, summarize_model_failures from app.features.agents.models import AgentSession, AgentType, SessionStatus from app.features.agents.schemas import ( ApprovalResponse, @@ -310,6 +312,23 @@ async def chat( raise TimeoutError( f"Agent response timed out after {self.settings.agent_timeout_seconds} seconds" ) from e + except (FallbackExceptionGroup, ModelAPIError) as e: + # Every model in the fallback chain failed (or the single + # configured model failed) with a provider-API error before any + # output was produced — nothing to salvage. Classify each leg into + # a secret-safe detail and surface the summary as a 502 + # problem+json via the global handler (issue #335). + failures = classify_model_failures(e) + logger.warning( + "agents.chat_fallback_exhausted", + session_id=session_id, + failure_count=len(failures), + reasons=[f.reason for f in failures], + ) + raise AgentFallbackExhaustedError( + summarize_model_failures(failures), + failures=[f.model_dump(mode="json") for f in failures], + ) from e except UnexpectedModelBehavior as e: # The model misbehaved (e.g. a tool call exceeded its retry budget). # This is recoverable from the user's perspective — surface a clean @@ -694,6 +713,33 @@ async def stream_chat( raise TimeoutError( f"Agent response timed out after {self.settings.agent_timeout_seconds} seconds" ) from e + except (FallbackExceptionGroup, ModelAPIError) as e: + # Every model in the fallback chain failed (or the single + # configured model failed) with a provider-API error before any + # output was produced — nothing to salvage. Yield ONE classified, + # secret-safe `error` event instead of letting the raw exception + # reach the generic WebSocket backstop (issue #335). + failures = classify_model_failures(e) + logger.warning( + "agents.stream_chat_fallback_exhausted", + session_id=session_id, + failure_count=len(failures), + reasons=[f.reason for f in failures], + ) + fallback_now = datetime.now(UTC) + session.last_activity = fallback_now + await db.flush() + yield StreamEvent( + event_type="error", + data={ + "error": summarize_model_failures(failures), + "error_type": "fallback_exhausted", + "recoverable": True, + "failures": [f.model_dump(mode="json") for f in failures], + }, + timestamp=fallback_now, + ) + return except UnexpectedModelBehavior as e: # The model misbehaved (e.g. a tool call exceeded its retry budget). # Emit a clean, recoverable `error` event rather than letting the raw diff --git a/app/features/agents/tests/test_failures.py b/app/features/agents/tests/test_failures.py new file mode 100644 index 00000000..386fc969 --- /dev/null +++ b/app/features/agents/tests/test_failures.py @@ -0,0 +1,212 @@ +"""Unit tests for the model-failure classifier (issue #335). + +Covers the status-code classification matrix, exception-group recursion, +secret scrubbing, detail truncation, and the deterministic human summary. +""" + +import pytest +from pydantic_ai.exceptions import FallbackExceptionGroup, ModelAPIError, ModelHTTPError +from pydantic_ai.models.fallback import ResponseRejected + +from app.features.agents.failures import ( + classify_model_failures, + summarize_model_failures, +) +from app.features.agents.schemas import ModelFailureDetail + + +class TestClassifyModelFailures: + """Classification matrix for classify_model_failures.""" + + @pytest.mark.parametrize( + ("status_code", "expected_reason"), + [ + (404, "model_not_found"), + (429, "quota_exhausted"), + (401, "auth_error"), + (403, "auth_error"), + (500, "provider_unavailable"), + (503, "provider_unavailable"), + (418, "provider_error"), + ], + ) + def test_http_status_matrix(self, status_code: int, expected_reason: str) -> None: + """Each HTTP status maps to its issue #335 reason.""" + failures = classify_model_failures(ModelHTTPError(status_code, "test:model")) + + assert len(failures) == 1 + assert failures[0].model_name == "test:model" + assert failures[0].status_code == status_code + assert failures[0].reason == expected_reason + + def test_fallback_group_preserves_leg_order(self) -> None: + """A 404 + 429 group yields two details in model order.""" + group = FallbackExceptionGroup( + "All models from FallbackModel failed", + [ + ModelHTTPError(404, "google-gla:gemini-3-flash-preview"), + ModelHTTPError(429, "google-gla:gemini-2.5-flash"), + ], + ) + + failures = classify_model_failures(group) + + assert len(failures) == 2 + assert failures[0].model_name == "google-gla:gemini-3-flash-preview" + assert failures[0].reason == "model_not_found" + assert failures[1].model_name == "google-gla:gemini-2.5-flash" + assert failures[1].reason == "quota_exhausted" + + def test_nested_group_flattens_leaves(self) -> None: + """Sub-groups inside the group are recursed into, not classified as legs.""" + inner = FallbackExceptionGroup( + "inner", + [ModelHTTPError(429, "inner:model")], + ) + outer = FallbackExceptionGroup( + "outer", + [ModelHTTPError(404, "outer:model"), inner], + ) + + failures = classify_model_failures(outer) + + assert [f.model_name for f in failures] == ["outer:model", "inner:model"] + assert [f.reason for f in failures] == ["model_not_found", "quota_exhausted"] + + def test_bare_model_api_error_is_provider_error(self) -> None: + """A non-HTTP ModelAPIError (connection failure) → provider_error, no status.""" + failures = classify_model_failures( + ModelAPIError("ollama:gemma4-agent", "connection refused") + ) + + assert len(failures) == 1 + assert failures[0].model_name == "ollama:gemma4-agent" + assert failures[0].status_code is None + assert failures[0].reason == "provider_error" + + def test_response_rejected_member(self) -> None: + """A ResponseRejected group member classifies as response_rejected.""" + group = FallbackExceptionGroup( + "All models from FallbackModel failed", + [ResponseRejected(2)], + ) + + failures = classify_model_failures(group) + + assert len(failures) == 1 + assert failures[0].reason == "response_rejected" + assert failures[0].status_code is None + + def test_unknown_exception_is_unknown(self) -> None: + """Anything else inside the group classifies as unknown.""" + failures = classify_model_failures(RuntimeError("boom")) + + assert len(failures) == 1 + assert failures[0].reason == "unknown" + assert failures[0].status_code is None + assert "boom" in failures[0].detail + + @pytest.mark.parametrize( + "secret", + [ + "AIzaFakeKey1234567890abcdef", + "sk-fakekey1234567890abcdef", + "Bearer xyz.abc-123", + "api_key=supersecretvalue", + ], + ) + def test_secret_scrubbed_from_detail(self, secret: str) -> None: + """Secret-shaped substrings in the provider body never reach the detail.""" + exc = ModelHTTPError( + 429, + "test:model", + body={"error": {"message": f"quota exceeded for {secret} retry later"}}, + ) + + failures = classify_model_failures(exc) + + assert "[redacted]" in failures[0].detail + assert "AIzaFake" not in failures[0].detail + assert "sk-fake" not in failures[0].detail + assert "xyz.abc-123" not in failures[0].detail + assert "supersecretvalue" not in failures[0].detail + + def test_detail_truncated_to_cap(self) -> None: + """A 1000-char provider message is truncated to the 300-char cap.""" + exc = ModelHTTPError( + 500, + "test:model", + body={"error": {"message": "x" * 1000}}, + ) + + failures = classify_model_failures(exc) + + assert len(failures[0].detail) <= 300 + + def test_provider_message_string_body(self) -> None: + """A plain-string body passes through (sanitized).""" + failures = classify_model_failures(ModelHTTPError(404, "test:model", body="not found")) + + assert failures[0].detail == "not found" + + def test_provider_message_none_body(self) -> None: + """A missing body yields an empty detail.""" + failures = classify_model_failures(ModelHTTPError(404, "test:model")) + + assert failures[0].detail == "" + + +class TestSummarizeModelFailures: + """Deterministic summary shapes (rendered verbatim by the chat UI).""" + + def test_single_leg_shape(self) -> None: + failures = [ + ModelFailureDetail( + model_name="anthropic:claude-test", + status_code=401, + reason="auth_error", + ) + ] + + summary = summarize_model_failures(failures) + + assert summary == ( + "The configured agent model failed — " + "anthropic:claude-test: authentication/permission error (HTTP 401)" + ) + + def test_two_leg_shape(self) -> None: + failures = [ + ModelFailureDetail( + model_name="google-gla:gemini-3-flash-preview", + status_code=404, + reason="model_not_found", + ), + ModelFailureDetail( + model_name="google-gla:gemini-2.5-flash", + status_code=429, + reason="quota_exhausted", + ), + ] + + summary = summarize_model_failures(failures) + + assert summary == ( + "All configured agent models failed — " + "google-gla:gemini-3-flash-preview: model not found / invalid model name (HTTP 404); " + "google-gla:gemini-2.5-flash: quota or rate limit exhausted (HTTP 429)" + ) + + def test_non_http_leg_omits_status(self) -> None: + failures = [ + ModelFailureDetail( + model_name="ollama:gemma4-agent", + status_code=None, + reason="provider_error", + ) + ] + + summary = summarize_model_failures(failures) + + assert "(HTTP" not in summary + assert "ollama:gemma4-agent: provider error" in summary diff --git a/app/features/agents/tests/test_routes.py b/app/features/agents/tests/test_routes.py index d53bb914..12ff4711 100644 --- a/app/features/agents/tests/test_routes.py +++ b/app/features/agents/tests/test_routes.py @@ -7,6 +7,7 @@ import pytest from httpx import AsyncClient +from pydantic_ai.exceptions import FallbackExceptionGroup, ModelHTTPError from app.features.agents.schemas import ExperimentReport @@ -162,6 +163,56 @@ async def test_chat_session_not_found(self, client: AsyncClient) -> None: assert response.status_code == 404 + @pytest.mark.asyncio + async def test_chat_fallback_exhausted_returns_502_problem_json( + self, client: AsyncClient + ) -> None: + """Both fallback legs failing → 502 problem+json with classified + per-model failures (#335, umbrella #380 route-level criterion).""" + with patch("app.features.agents.agents.experiment.get_experiment_agent") as mock_get: + group = FallbackExceptionGroup( + "All models from FallbackModel failed", + [ + ModelHTTPError( + 404, + "google-gla:gemini-3-flash-preview", + body={"error": {"message": "models/gemini-3-flash-preview is not found"}}, + ), + ModelHTTPError( + 429, + "google-gla:gemini-2.5-flash", + body={"error": {"message": "RESOURCE_EXHAUSTED key AIzaFakeKey123456789"}}, + ), + ], + ) + mock_agent = MagicMock() + mock_agent.run = AsyncMock(side_effect=group) + mock_get.return_value = mock_agent + + create_response = await client.post( + "/agents/sessions", + json={"agent_type": "experiment"}, + ) + session_id = create_response.json()["session_id"] + + response = await client.post( + f"/agents/sessions/{session_id}/chat", + json={"message": "hello"}, + ) + + assert response.status_code == 502 + assert response.headers["content-type"].startswith("application/problem+json") + body = response.json() + assert body["code"] == "AGENT_FALLBACK_EXHAUSTED" + assert body["type"].endswith("/errors/agent-fallback-exhausted") + assert len(body["failures"]) == 2 + assert body["failures"][0]["reason"] == "model_not_found" + assert body["failures"][1]["reason"] == "quota_exhausted" + assert "request_id" in body + # The opaque group string and secrets must never reach the client. + assert "sub-exceptions" not in body["detail"] + assert "AIza" not in response.text + @pytest.mark.integration class TestApprovalRoutes: diff --git a/app/features/agents/tests/test_service.py b/app/features/agents/tests/test_service.py index 759e0284..cf751bf2 100644 --- a/app/features/agents/tests/test_service.py +++ b/app/features/agents/tests/test_service.py @@ -8,7 +8,11 @@ import pytest from pydantic_ai import Agent -from pydantic_ai.exceptions import UnexpectedModelBehavior +from pydantic_ai.exceptions import ( + FallbackExceptionGroup, + ModelHTTPError, + UnexpectedModelBehavior, +) from pydantic_ai.messages import ( ModelMessage, ModelRequest, @@ -18,6 +22,7 @@ UserPromptPart, ) +from app.core.exceptions import AgentFallbackExhaustedError from app.features.agents.deps import AgentDeps from app.features.agents.models import AgentSession, AgentType, SessionStatus from app.features.agents.schemas import ExperimentReport @@ -422,6 +427,58 @@ async def test_chat_runs_tools_sequentially( mock_mode.assert_called_once_with("sequential") + @pytest.mark.asyncio + async def test_chat_fallback_exhausted_raises_classified_error( + self, + sample_active_session: AgentSession, + ) -> None: + """A FallbackExceptionGroup from agent.run must raise the classified + 502 AgentFallbackExhaustedError, not bubble the raw group (#335).""" + service = AgentService() + mock_db = AsyncMock() + + mock_result = MagicMock() + mock_result.scalar_one_or_none.return_value = sample_active_session + mock_db.execute.return_value = mock_result + + group = FallbackExceptionGroup( + "All models from FallbackModel failed", + [ + ModelHTTPError( + 404, + "google-gla:gemini-3-flash-preview", + body={"error": {"message": "models/gemini-3-flash-preview is not found"}}, + ), + ModelHTTPError( + 429, + "google-gla:gemini-2.5-flash", + body={"error": {"message": "RESOURCE_EXHAUSTED key AIzaFakeKey123456789"}}, + ), + ], + ) + mock_agent = MagicMock() + mock_agent.run = AsyncMock(side_effect=group) + + with patch.object(service, "_get_agent", return_value=mock_agent): + with pytest.raises(AgentFallbackExhaustedError) as exc_info: + await service.chat( + db=mock_db, + session_id=sample_active_session.session_id, + message="Hello", + ) + + exc = exc_info.value + assert exc.status_code == 502 + assert exc.code == "AGENT_FALLBACK_EXHAUSTED" + failures = exc.extensions["failures"] + assert len(failures) == 2 + assert failures[0]["reason"] == "model_not_found" + assert failures[1]["reason"] == "quota_exhausted" + assert "sub-exceptions" not in exc.message + # Issue #335 hard constraint: no secret-like material anywhere. + serialized = json.dumps({"message": exc.message, "extensions": exc.extensions}) + assert "AIza" not in serialized + class TestAgentServiceStreamChat: """Tests for streaming chat functionality.""" @@ -478,6 +535,125 @@ async def __aexit__(self, *exc: object) -> bool: assert events[0].data["error_type"] == "model_behavior_error" assert "exceeded max retries" not in events[0].data["error"] + @pytest.mark.asyncio + async def test_stream_chat_fallback_exhausted_yields_classified_error( + self, + sample_active_session: AgentSession, + monkeypatch: pytest.MonkeyPatch, + ) -> None: + """All fallback legs failing must yield ONE classified `error` event + with per-model failures — never the raw group string (#335).""" + service = AgentService() + # Pin a streaming-capable (cloud) provider so this exercises the + # run_stream path regardless of the local .env (#342). + monkeypatch.setattr(service.settings, "agent_default_model", "anthropic:claude-test") + mock_db = AsyncMock() + + mock_result = MagicMock() + mock_result.scalar_one_or_none.return_value = sample_active_session + mock_db.execute.return_value = mock_result + + class _RaisingStream: + """Async context manager that fails on entry like an exhausted chain.""" + + async def __aenter__(self) -> Any: + raise FallbackExceptionGroup( + "All models from FallbackModel failed", + [ + ModelHTTPError( + 404, + "google-gla:gemini-3-flash-preview", + body={ + "error": {"message": "models/gemini-3-flash-preview is not found"} + }, + ), + ModelHTTPError( + 429, + "google-gla:gemini-2.5-flash", + body={ + "error": {"message": "RESOURCE_EXHAUSTED key AIzaFakeKey123456789"} + }, + ), + ], + ) + + async def __aexit__(self, *exc: object) -> bool: + return False + + mock_agent = MagicMock() + mock_agent.run_stream = MagicMock(return_value=_RaisingStream()) + + with patch.object(service, "_get_agent", return_value=mock_agent): + events = [ + event + async for event in service.stream_chat( + db=mock_db, + session_id=sample_active_session.session_id, + message="Hello", + ) + ] + + assert len(events) == 1 + assert events[0].event_type == "error" + assert events[0].data["error_type"] == "fallback_exhausted" + assert events[0].data["recoverable"] is True + failures = events[0].data["failures"] + assert len(failures) == 2 + assert failures[0]["reason"] == "model_not_found" + assert failures[1]["reason"] == "quota_exhausted" + assert "google-gla:gemini-3-flash-preview" in events[0].data["error"] + assert "google-gla:gemini-2.5-flash" in events[0].data["error"] + # The opaque group string must never reach the client. + assert "sub-exceptions" not in events[0].data["error"] + # Issue #335 hard constraint: no secret-like material anywhere. + assert "AIza" not in json.dumps(events[0].model_dump(mode="json")) + + @pytest.mark.asyncio + async def test_stream_chat_bare_model_api_error_classified( + self, + sample_active_session: AgentSession, + monkeypatch: pytest.MonkeyPatch, + ) -> None: + """A bare ModelAPIError (single-model config, no fallback wired) gets + the same classified treatment as a 1-element failures list (#335).""" + service = AgentService() + monkeypatch.setattr(service.settings, "agent_default_model", "anthropic:claude-test") + mock_db = AsyncMock() + + mock_result = MagicMock() + mock_result.scalar_one_or_none.return_value = sample_active_session + mock_db.execute.return_value = mock_result + + class _RaisingStream: + """Async context manager that fails on entry like a provider 401.""" + + async def __aenter__(self) -> Any: + raise ModelHTTPError(401, "anthropic:claude-test") + + async def __aexit__(self, *exc: object) -> bool: + return False + + mock_agent = MagicMock() + mock_agent.run_stream = MagicMock(return_value=_RaisingStream()) + + with patch.object(service, "_get_agent", return_value=mock_agent): + events = [ + event + async for event in service.stream_chat( + db=mock_db, + session_id=sample_active_session.session_id, + message="Hello", + ) + ] + + assert len(events) == 1 + assert events[0].event_type == "error" + assert events[0].data["error_type"] == "fallback_exhausted" + failures = events[0].data["failures"] + assert len(failures) == 1 + assert failures[0]["reason"] == "auth_error" + assert failures[0]["model_name"] == "anthropic:claude-test" + @pytest.mark.asyncio async def test_chat_surfaces_pending_action_on_model_misbehavior( self, diff --git a/docs/_base/API_CONTRACTS.md b/docs/_base/API_CONTRACTS.md index 232b7da3..27d75ea1 100644 --- a/docs/_base/API_CONTRACTS.md +++ b/docs/_base/API_CONTRACTS.md @@ -52,7 +52,7 @@ All endpoints serve JSON; error responses use `application/problem+json` (RFC 78 | rag | DELETE | `/rag/sources/{source_id}` | Delete source + cascaded chunks | | agents | POST | `/agents/sessions` | Create session (`agent_type`: `experiment` or `rag_assistant`) | | agents | GET | `/agents/sessions/{session_id}` | Status + message history (Postgres JSONB) | -| agents | POST | `/agents/sessions/{session_id}/chat` | Send user message; returns full response | +| agents | POST | `/agents/sessions/{session_id}/chat` | Send user message; returns full response. **#335** — when every model in the agent's fallback chain fails with a provider error, returns **502** `application/problem+json` with `code="AGENT_FALLBACK_EXHAUSTED"`, `type=/errors/agent-fallback-exhausted`, and an additive `failures: [{model_name, status_code, reason, detail}]` extension member (secret-scrubbed, 300-char-capped details) | | agents | POST | `/agents/sessions/{session_id}/approve` | Approve/reject a pending tool call (HITL gate) | | agents | DELETE | `/agents/sessions/{session_id}` | Close session | | agents | WS | `/agents/stream` | Token-by-token streaming + tool-call events | @@ -77,7 +77,7 @@ Verified against `app/features/agents/websocket.py` and `app/features/agents/sch - `tool_call_end` — `data: {"tool_name": str, "tool_call_id": str, "result": Any, "duration_ms": float}` (`ToolCallEndEvent`) - `approval_required` — emitted when a tool in `agent_require_approval` is pending; the chat REST `/agents/sessions/{id}/approve` endpoint releases it - `complete` — `data: {"message": str, "tokens_used": int, "tool_calls_count": int}` (`CompleteEvent`) - - `error` — `data: {"error": str, "error_type": str, "recoverable": bool}` (`ErrorEvent`). On `recoverable: false` (e.g., `session_not_found`, `session_expired`), the client should close. + - `error` — `data: {"error": str, "error_type": str, "recoverable": bool}` (`ErrorEvent`). On `recoverable: false` (e.g., `session_not_found`, `session_expired`), the client should close. **#335** — when every model in the agent's fallback chain fails with a provider error, the event carries `error_type="fallback_exhausted"`, `recoverable=true`, a human-actionable per-leg summary in `error`, and an additive Optional `failures: [{model_name, status_code: int|null, reason, detail}]` key (`reason` ∈ `model_not_found` / `quota_exhausted` / `auth_error` / `provider_unavailable` / `provider_error` / `response_rejected` / `unknown`; `detail` is secret-scrubbed and 300-char-capped). ## WebSocket Events (`/demo/stream`) From 7c5764118c5a68755cc115130e7c110755144652 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Thu, 11 Jun 2026 22:02:48 +0200 Subject: [PATCH 14/44] docs(repo): track reliability E2 prp for surfacing fallback failures (#335) --- ...eliability-E2-surface-fallback-failures.md | 647 ++++++++++++++++++ 1 file changed, 647 insertions(+) create mode 100644 PRPs/PRP-reliability-E2-surface-fallback-failures.md diff --git a/PRPs/PRP-reliability-E2-surface-fallback-failures.md b/PRPs/PRP-reliability-E2-surface-fallback-failures.md new file mode 100644 index 00000000..55a7855c --- /dev/null +++ b/PRPs/PRP-reliability-E2-surface-fallback-failures.md @@ -0,0 +1,647 @@ +name: "PRP — Reliability E2: surface fallback model failures with classified, actionable details" +description: | + Parallel epic of umbrella #380 (platform reliability hardening), after Foundation E1 (#334). + Issue: #335 · Branch: `fix/agents-surface-fallback-failures` off `dev` · Commit scope: `agents,api` + (primary surface is `app/features/agents/`; the additive RFC 7807 extension plumbing touches + `app/core/{exceptions,problem_details}.py` = `api`, mirroring E1's scope reasoning). + +--- + +## Goal + +When every model in the PydanticAI `FallbackModel` chain fails (or a single configured model +fails with a provider error), the client must receive a **classified, secret-safe summary of +each per-model failure** — `{model_name, status_code, reason, detail}` — instead of today's +generic `Stream error: All models from FallbackModel failed (2 sub-exceptions)`: + +- **WebSocket `/agents/stream`** — one `error` StreamEvent with `error_type="fallback_exhausted"`, + a human-actionable `error` summary string, and a structured `failures` list. +- **REST `POST /agents/sessions/{id}/chat`** — a **502** `application/problem+json` with + `code="AGENT_FALLBACK_EXHAUSTED"` and a `failures` extension member. + +**Deliverable:** one new classifier module (`app/features/agents/failures.py`), one new schema +(`ModelFailureDetail`), two new `except` arms in `AgentService.chat` / `stream_chat`, one new +core exception (`AgentFallbackExhaustedError`) riding a new additive `extensions` pass-through +in the RFC 7807 helpers, plus tests at classifier / service / route levels. + +**Success definition:** the exact failure from the issue (primary `404` model-not-found + +fallback `429` quota-exhausted) renders in the chat UI as a readable two-leg diagnosis with +zero frontend changes, and a route test proves the REST 502 carries both classified legs. +No secret-like material (API keys, bearer tokens) can appear in any surfaced payload. + +## Why + +- **Diagnosability from the UI.** The 2026-06-01 incident (issue #335) required reading + container logs to learn that the primary leg was a 404 (bad model name) and the fallback leg + a 429 (free-tier quota). Both causes were sitting in `agents.websocket_stream_error`; the + client got an opaque one-liner. +- **E1 (#334) stabilized the surface.** The doubled-prefix 404 class is now rejected at config + time (PR #382), so the classification matrix built here tests against a stable failure + surface (umbrella #380 Foundation ordering). +- **Zero-frontend-change win.** `frontend/src/pages/chat.tsx:95-108` renders + `Error: ${event.data.error}` verbatim — making the backend's `error` string itself the + classified human summary upgrades the UI for free; the structured `failures` list is the + additive machine-readable layer for future UI work. + +## What + +### Behavior change + +| Surface | Today | After | +|---------|-------|-------| +| WS `error` event, both models fail | `error="Stream error: All models from FallbackModel failed (2 sub-exceptions)"`, `error_type="ExceptionGroup"`-ish class name (from `websocket.py` generic catch) | `error_type="fallback_exhausted"`, `error="All configured agent models failed — google-gla:gemini-3-flash-preview: model not found / invalid model name (HTTP 404); google-gla:gemini-2.5-flash: quota or rate limit exhausted (HTTP 429)"`, `failures=[{model_name, status_code, reason, detail}, …]`, `recoverable=true` | +| REST chat, both models fail | uncaught `FallbackExceptionGroup` → generic 500 `INTERNAL_ERROR` problem+json | **502** problem+json, `code="AGENT_FALLBACK_EXHAUSTED"`, `type="/errors/agent-fallback-exhausted"`, `detail=<same human summary>`, `failures=[…]` extension | +| Single-model config (no fallback wired), provider error | same generic surfaces | same classified treatment (a bare `ModelAPIError` is classified as a 1-element `failures` list) | +| Model misbehavior (`UnexpectedModelBehavior`) | salvage → friendly message / `error_type="model_behavior_error"` | **unchanged** — the new arm catches only provider-API failures | +| Secrets in provider response bodies | `str(ModelHTTPError)` embeds `body` verbatim (leak risk if echoed) | surfaced `detail` is extracted → scrubbed (`AIza…`, `sk-…`, `Bearer …`, `api_key=…`) → truncated to 300 chars | + +### Reason classification (exact) + +| Evidence | `reason` | +|----------|----------| +| `ModelHTTPError.status_code == 404` | `model_not_found` | +| `status_code == 429` | `quota_exhausted` | +| `status_code in (401, 403)` | `auth_error` | +| `status_code >= 500` | `provider_unavailable` | +| any other `ModelHTTPError` | `provider_error` | +| non-HTTP `ModelAPIError` (connection, etc.) | `provider_error` (status_code `null`) | +| `pydantic_ai.models.fallback.ResponseRejected` member | `response_rejected` | +| anything else inside the group | `unknown` | + +### Success Criteria + +- [ ] `classify_model_failures` maps 404/429/401/403/5xx/other-HTTP/non-HTTP/`ResponseRejected`/unknown and recurses into nested `ExceptionGroup`s +- [ ] Stream path: a `FallbackExceptionGroup(404 + 429)` raised by `agent.run_stream` yields exactly ONE `error` event with `error_type="fallback_exhausted"`, `recoverable=True`, a 2-entry `failures` list, and a summary naming both models — and the raw group string (`"sub-exceptions"`) does NOT appear +- [ ] REST path: the same failure → 502 `application/problem+json` with `code="AGENT_FALLBACK_EXHAUSTED"` and `failures` extension (route test covers both legs — umbrella #380 criterion) +- [ ] A planted secret (`AIzaFakeKey123…` / `sk-fake…` / `Bearer xyz`) in `ModelHTTPError.body` never appears in any serialized event/response payload (regression test asserts on the full JSON dump) +- [ ] Single bare `ModelAPIError` (no FallbackModel) gets the same classified treatment +- [ ] Existing `model_behavior_error` behavior and tests untouched (only extended) +- [ ] All five validation gates green; `docs/_base/API_CONTRACTS.md` updated additively + +## All Needed Context + +### Documentation & References + +```yaml +# ── Where the failures escape today (the two catch points to add) ──────────── +- file: app/features/agents/service.py + lines: 24-26, 295-354, 520-570, 693-771 + why: | + Imports (line 25 already pulls UnexpectedModelBehavior from pydantic_ai.exceptions — + extend it). chat(): the try at 298-308 wraps agent.run; excepts at 309 (TimeoutError) + and 313 (UnexpectedModelBehavior) — the NEW arm slots between them. stream_chat(): + try at 525 wraps run_stream (533, streaming) AND agent.run (560-568, #342 ollama + non-streaming fallback) — one new arm covers both; excepts at 693/697; the + misbehavior error-yield at 759-770 is the EXACT yield pattern to mirror + (data dict with error/error_type/recoverable, datetime.now(UTC) timestamp, + session.last_activity update + db.flush() before yielding, then `return`). + +# ── The generic backstop that produced the bad UX (do NOT remove — keep as backstop) +- file: app/features/agents/websocket.py + lines: 96-123, 132-158 + why: | + The `except Exception` at 109-123 is what stringified the group today + (f"Stream error: {e}", error_type=type(e).__name__) and logged + "agents.websocket_stream_error". After this PRP the service yields the classified + event BEFORE the exception reaches here; the handler stays as the backstop for + everything else. NO changes in this file. + +# ── Schema home for the new detail model + additive ErrorEvent field ───────── +- file: app/features/agents/schemas.py + lines: 145-163, 229-248, 304-316 + why: | + ChatResponse (no error field — REST errors go through problem+json, NOT this model), + StreamEvent (data is dict[str, Any] — the failures list rides inside data), + ErrorEvent (error/error_type/recoverable) — add Optional `failures` here so the + documented event shape matches what the service emits. Define ModelFailureDetail + in this file (schemas.py is the slice's schema home). + +# ── FallbackModel construction (read-only — explains when a group vs bare error escapes) +- file: app/features/agents/agents/base.py + lines: 168-176, 201-249 + why: | + build_agent_model_with_fallback returns a bare primary model when no distinct + key-backed fallback exists (→ bare ModelAPIError escapes, no group) and + FallbackModel(primary, fallback) otherwise (→ FallbackExceptionGroup escapes when + BOTH legs fail). reset_agent_caches (168) is why PATCH /config/ai applies live — + used by the Level-3 plan. + +# ── RFC 7807 plumbing: the precedent and the two additive core edits ───────── +- file: app/core/exceptions.py + lines: 27-61, 227-254, 262-290 + why: | + ForecastLabError base (gains optional `extensions` kwarg; note `details` is + LOG-ONLY — the handler at 279-288 drops it from the response body, which is WHY + the new extensions channel exists). EmbeddingProviderAuthError (227-254) is the + EXACT precedent to mirror for AgentFallbackExhaustedError: module-level code + constant, error_type_uri from ERROR_TYPES, fixed status 502, narrow __init__. + forecastlab_exception_handler (262-290) passes title=exc.title (derived from code: + "AGENT_FALLBACK_EXHAUSTED" → "Agent Fallback Exhausted") — add extensions pass-through. +- file: app/core/problem_details.py + lines: 28-46, 54-114, 135-199 + why: | + EMBEDDING_AUTH_CODE constant pattern (30) + ERROR_TYPES dict (32-46) — add + AGENT_FALLBACK_EXHAUSTED. ProblemDetail has ConfigDict(extra="allow") (RFC 7807 + extension members are sanctioned). problem_response (169-199) serializes via + model_dump(exclude_none=True) — merge extensions into the serialized dict there + (NOT via ProblemDetail(**extensions); see gotcha on the mypy/pydantic-plugin trap). + +# ── Test patterns to mirror (extend, never weaken) ─────────────────────────── +- file: app/features/agents/tests/test_service.py + lines: 426-480 + why: | + test_stream_chat_model_misbehavior_yields_error_event — THE pattern for the new + stream test: AgentService() + monkeypatch settings.agent_default_model to + "anthropic:claude-test" (line 444 — pins the run_stream path, #342), mock_db AsyncMock with + scalar_one_or_none → sample_active_session fixture, _RaisingStream async CM that + raises on __aenter__, patch.object(service, "_get_agent"), collect events, assert + on events[0].data. Note it asserts the LITERAL "model_behavior_error" (line 478) — + error_type strings are load-bearing; pick "fallback_exhausted" once and keep it stable. +- file: app/features/agents/tests/test_routes.py + lines: 1-60 + why: | + Route tests are @pytest.mark.integration (real Postgres via conftest `client` + fixture). Pattern: create a session via POST /agents/sessions with the agent + factory patched, then exercise the endpoint. The new 502 test patches + AgentService.chat (or _get_agent with an agent whose run raises the group) and + asserts status/content-type/code/failures on the problem+json body. +- file: app/features/agents/tests/conftest.py + why: sample_active_session fixture used by the service tests; client fixture for routes. + +# ── Frontend consumer (READ-ONLY — proves no frontend change is needed) ────── +- file: frontend/src/pages/chat.tsx + lines: 95-108 + why: | + case 'error' renders `Error: ${event.data.error}` verbatim into the transcript. + The human summary string IS the UI improvement. AgentStreamEvent.data is + Record<string, unknown> (frontend/src/types/api.ts:601-605) so the additive + failures key needs no type change. + +# ── External references (verified against installed pydantic-ai 1.96.0, 2026-06-11) +- url: https://pydantic.dev/docs/ai/models/overview/ + section: "Fallback Model" + why: FallbackModel semantics — falls back on ModelAPIError; raises FallbackExceptionGroup when all legs fail +- url: https://pydantic.dev/docs/ai/api/pydantic-ai/exceptions/ + why: ModelHTTPError / ModelAPIError / FallbackExceptionGroup API reference +- url: https://docs.python.org/3/library/exceptions.html#exception-groups + why: | + ExceptionGroup.exceptions is a TUPLE; sub-groups can nest — the classifier must + recurse. A plain `except FallbackExceptionGroup:` works (it subclasses Exception); + `except*` syntax is NOT needed and would complicate the single-yield contract. +``` + +### Current Codebase tree (relevant subset) + +``` +app/core/ + exceptions.py # ForecastLabError + handler ← MODIFY (additive) + problem_details.py # ERROR_TYPES + problem_response ← MODIFY (additive) + tests/ # (no problem_details test file today) ← ADD test file +app/features/agents/ + agents/base.py # FallbackModel construction (read-only) + service.py # chat() / stream_chat() except arms ← MODIFY + websocket.py # generic backstop (no change) + schemas.py # StreamEvent / ErrorEvent ← MODIFY (additive) + routes.py # chat endpoint (no change — global handler covers 502) + tests/ + test_service.py # stream/chat error tests ← EXTEND + test_routes.py # integration route tests ← EXTEND +docs/_base/API_CONTRACTS.md # WS ErrorEvent + chat endpoint docs ← EXTEND +``` + +### Desired Codebase tree + +``` +app/features/agents/failures.py # NEW — classify_model_failures / summarize_model_failures / _sanitize +app/features/agents/tests/test_failures.py # NEW — classification matrix + secret-scrub + summary tests +app/core/tests/test_problem_details.py # NEW — extensions merge + reserved-key guard + no-extensions unchanged +``` + +No migration (nothing persisted changes). No frontend changes. No new dependencies. + +### Known Gotchas & Library Quirks + +```python +# ── VERIFIED LIBRARY CLAIM #1: the exception family (pydantic-ai 1.96.0) ────────────── +# uv run python -c " +# from pydantic_ai.exceptions import FallbackExceptionGroup, ModelHTTPError, ModelAPIError +# print(FallbackExceptionGroup.__mro__) # → ExceptionGroup → BaseExceptionGroup → Exception +# print(ModelHTTPError.__mro__) # → ModelAPIError → AgentRunError → RuntimeError +# import inspect; print(inspect.signature(ModelHTTPError.__init__))" +# # → (self, status_code: 'int', model_name: 'str', body: 'object | None' = None) +# ModelHTTPError IS a ModelAPIError → FallbackModel's default fallback_on=(ModelAPIError,) +# catches it per-leg; the group only escapes when ALL legs fail. Re-verify on upgrade. + +# ── VERIFIED LIBRARY CLAIM #2: group anatomy ────────────────────────────────────────── +# uv run python -c " +# from pydantic_ai.exceptions import FallbackExceptionGroup, ModelHTTPError +# g = FallbackExceptionGroup('All models from FallbackModel failed', +# [ModelHTTPError(404, 'm1'), ModelHTTPError(429, 'm2')]) +# print(type(g.exceptions), g.message)" +# # → <class 'tuple'> All models from FallbackModel failed +# .exceptions is an immutable TUPLE (not list). The constructor REJECTS an empty list. +# The message literal 'All models from FallbackModel failed' is what users saw — assert +# it does NOT leak into the new surfaced error string. + +# ── VERIFIED LIBRARY CLAIM #3: str(ModelHTTPError) embeds body VERBATIM ─────────────── +# uv run python -c " +# from pydantic_ai.exceptions import ModelHTTPError +# print(str(ModelHTTPError(404, 'gemini-x', body={'error': {'message': 'nope'}})))" +# # → status_code: 404, model_name: gemini-x, body: {'error': {'message': 'nope'}} +# NEVER put str(exc) or exc.body raw into a client payload. Extract the provider message +# (Google/OpenAI shape body['error']['message'], else str(body)), scrub, truncate (300). +# Issue #335 hard constraint: no API keys / Bearer tokens / AIza… values, ever. + +# ── VERIFIED LIBRARY CLAIM #4: ResponseRejected can be a group member ───────────────── +# uv run python -c " +# from pydantic_ai.models.fallback import ResponseRejected; print(str(ResponseRejected(2)))" +# # → 2 model response(s) rejected by fallback_on handler +# It carries NO model_name → classify with model_name="(response rejected)" or similar +# deterministic placeholder, reason="response_rejected". + +# ── GOTCHA: classification arm placement & non-overlap ──────────────────────────────── +# UnexpectedModelBehavior is NOT a ModelAPIError (separate AgentRunError branches), so +# `except (FallbackExceptionGroup, ModelAPIError) as e:` cannot shadow the existing +# misbehavior arm. Place the new arm AFTER TimeoutError, BEFORE UnexpectedModelBehavior +# in BOTH chat() and stream_chat(). Do NOT attempt _salvage_* in the new arm — nothing +# ran, there is nothing to salvage. + +# ── GOTCHA: inner `except Exception` at service.py:545 ─────────────────────────────── +# stream_text() iteration errors are swallowed by an inner handler (structured-output +# agents can't stream deltas); a mid-stream provider failure re-raises from +# result.get_output() and still hits the OUTER except arms. Put the new arm on the +# OUTER try only — do not touch the inner handler. + +# ── GOTCHA: forecastlab_exception_handler DROPS exc.details from the response ───────── +# app/core/exceptions.py:279-288 logs details but problem_response never receives them. +# That is BY DESIGN (details may carry internals). Do NOT stuff failures into details — +# add the parallel `extensions` channel (default None ⇒ zero behavior change for every +# existing raiser) and pass it through explicitly. + +# ── GOTCHA: merge extensions on the SERIALIZED dict, not via ProblemDetail(**ext) ────── +# ProblemDetail has extra="allow", but unpacking arbitrary **dict[str, Any] into a +# pydantic-plugin-checked constructor risks mypy/pyright --strict errors. problem_response +# already does problem.model_dump(exclude_none=True) — update THAT dict, guarded by a +# reserved-key frozenset {type,title,status,detail,instance,errors,code,request_id}. + +# ── GOTCHA: error_type strings are load-bearing test/UI contracts ───────────────────── +# test_service.py:477 asserts the literal "model_behavior_error". The new literal is +# "fallback_exhausted" — used in service.py, asserted in tests, documented in +# API_CONTRACTS.md. Pick once; never rename casually. + +# ── GOTCHA: StreamEvent.data must stay JSON-serializable ───────────────────────────── +# websocket.py sends event.model_dump(mode="json"). Put PLAIN DICTS in data: +# failures=[f.model_dump(mode="json") for f in details] — not BaseModel instances. + +# ── GOTCHA: .env bleed + settings singleton (only if a test touches Settings) ───────── +# Service tests monkeypatch service.settings fields (see test_service.py:443) — that +# pattern self-restores. If any new test constructs Settings(...), pass _env_file=None +# (RUNBOOKS incident class). + +# ── GOTCHA: Level-3 mutates the operator's persisted config — snapshot/restore ──────── +# PATCH /config/ai persists to app_config AND applies live (reset_agent_caches, +# config/service.py:214-216). The local operator override is agent_default_model= +# ollama:gemma4-agent — GET /config/ai first, restore the exact values after the curl +# matrix (E1 session precedent). + +# ── GOTCHA: repo has mixed CRLF/LF line endings ─────────────────────────────────────── +# Check `git diff --stat` after editing: if a file shows ~all lines changed, your editor +# rewrote line endings — re-edit preserving the file's existing endings. +``` + +## Implementation Blueprint + +### Data models and structure + +```python +# app/features/agents/schemas.py — new model + additive ErrorEvent field + +FailureReason = Literal[ + "model_not_found", "quota_exhausted", "auth_error", + "provider_unavailable", "provider_error", "response_rejected", "unknown", +] + +class ModelFailureDetail(BaseModel): + """One classified per-model failure from a FallbackModel chain (issue #335).""" + model_name: str + status_code: int | None = None + reason: FailureReason + detail: str = "" # sanitized + truncated provider message — NEVER raw body + +class ErrorEvent(BaseModel): + error: str + error_type: str + recoverable: bool = True + failures: list[ModelFailureDetail] | None = None # additive (issue #335) +``` + +```python +# app/features/agents/failures.py — NEW module (pure functions, fully unit-testable) + +_SECRET_PATTERNS = ( + re.compile(r"AIza[0-9A-Za-z_\-]{10,}"), # Google API keys + re.compile(r"sk-[A-Za-z0-9_\-]{10,}"), # OpenAI/Anthropic-style keys + re.compile(r"(?i)bearer\s+[A-Za-z0-9._\-]+"), # Authorization bearer tokens + re.compile(r"(?i)(api[_-]?key|token|authorization)[=:]\s*\S+"), +) +_MAX_DETAIL_LEN = 300 + +def _sanitize(text: str) -> str: + # sub each pattern with "[redacted]", then truncate to _MAX_DETAIL_LEN + +def _provider_message(body: object | None) -> str: + # dict with Google/OpenAI shape → body["error"]["message"]; str → as-is; + # anything else → str(body) or "" — ALWAYS through _sanitize at the call site + +def classify_model_failures(exc: BaseException) -> list[ModelFailureDetail]: + # ExceptionGroup (incl. FallbackExceptionGroup): recurse over exc.exceptions (tuple) + # ModelHTTPError: status map per the reason table; detail=_sanitize(_provider_message(body)) + # ResponseRejected: reason="response_rejected", model_name placeholder + # other ModelAPIError: reason="provider_error", status None, detail=_sanitize(str(exc)) + # fallback: reason="unknown", detail=_sanitize(str(exc)) + +def summarize_model_failures(failures: list[ModelFailureDetail]) -> str: + # deterministic (tests match substrings): + # 1 failure → "The configured agent model failed — {leg}" + # n failures → "All configured agent models failed — {leg}; {leg}; …" + # leg = "{model_name}: {human label} (HTTP {status_code})" (omit HTTP part when None) + # labels: model_not_found→"model not found / invalid model name", + # quota_exhausted→"quota or rate limit exhausted", + # auth_error→"authentication/permission error", + # provider_unavailable→"provider unavailable", + # provider_error→"provider error", response_rejected→"response rejected", + # unknown→"unexpected failure" +``` + +```python +# app/core/problem_details.py — additive +AGENT_FALLBACK_EXHAUSTED_CODE = "AGENT_FALLBACK_EXHAUSTED" # next to EMBEDDING_AUTH_CODE +ERROR_TYPES[AGENT_FALLBACK_EXHAUSTED_CODE] = f"{ERROR_TYPE_BASE}/agent-fallback-exhausted" + +_RESERVED_PROBLEM_KEYS = frozenset( + {"type", "title", "status", "detail", "instance", "errors", "code", "request_id"} +) + +def problem_response(..., extensions: dict[str, Any] | None = None) -> ProblemDetailResponse: + content = problem.model_dump(exclude_none=True) + if extensions: + content.update({k: v for k, v in extensions.items() if k not in _RESERVED_PROBLEM_KEYS}) + return ProblemDetailResponse(status_code=status, content=content) +``` + +```python +# app/core/exceptions.py — additive +class ForecastLabError(Exception): + def __init__(self, message, code=..., status_code=..., details=None, + extensions: dict[str, Any] | None = None) -> None: + ... + self.extensions = extensions or {} # RESPONSE-VISIBLE (details stays log-only) + +# handler: problem_response(..., extensions=exc.extensions or None) + +class AgentFallbackExhaustedError(ForecastLabError): + """502 — every model in the agent's fallback chain failed (issue #335). + + Mirrors EmbeddingProviderAuthError: machine-readable code so clients can + classify; carries the per-model failures as an RFC 7807 extension member. + """ + error_type_uri = ERROR_TYPES[AGENT_FALLBACK_EXHAUSTED_CODE] + def __init__(self, message: str, failures: list[dict[str, Any]]) -> None: + super().__init__(message=message, code=AGENT_FALLBACK_EXHAUSTED_CODE, + status_code=502, extensions={"failures": failures}) +``` + +### Tasks (in order) + +```yaml +Task 1: +MODIFY app/features/agents/schemas.py: + - ADD FailureReason Literal alias + ModelFailureDetail near ErrorEvent (line ~304) + - ADD `failures: list[ModelFailureDetail] | None = None` to ErrorEvent + - PRESERVE every existing field and Literal value on StreamEvent/ErrorEvent + +Task 2: +CREATE app/features/agents/failures.py: + - Pure module: _SECRET_PATTERNS, _sanitize, _provider_message, + classify_model_failures, summarize_model_failures per blueprint + - Imports: pydantic_ai.exceptions (ModelAPIError, ModelHTTPError), + pydantic_ai.models.fallback (ResponseRejected), app.features.agents.schemas + - Recursion guard: ExceptionGroup members may nest — recurse; classify leaves only + +Task 3: +MODIFY app/core/problem_details.py: + - ADD AGENT_FALLBACK_EXHAUSTED_CODE constant next to EMBEDDING_AUTH_CODE (line 30) + - ADD ERROR_TYPES entry "/errors/agent-fallback-exhausted" + - ADD optional `extensions` param to problem_response; merge on the serialized dict + guarded by _RESERVED_PROBLEM_KEYS (see gotcha — do NOT ProblemDetail(**extensions)) + - PRESERVE the no-extensions output byte-for-byte (default None) + +Task 4: +MODIFY app/core/exceptions.py: + - ADD optional `extensions` kwarg on ForecastLabError.__init__ (stored attribute) + - ADD AgentFallbackExhaustedError mirroring EmbeddingProviderAuthError (lines 227-254) + - MODIFY forecastlab_exception_handler: pass extensions=exc.extensions or None + - PRESERVE: details stays log-only; every existing subclass signature unchanged + +Task 5: +MODIFY app/features/agents/service.py: + - EXTEND import line 25: from pydantic_ai.exceptions import ( + FallbackExceptionGroup, ModelAPIError, UnexpectedModelBehavior) + - ADD import: classify_model_failures, summarize_model_failures from .failures; + AgentFallbackExhaustedError from app.core.exceptions + - chat(): NEW arm between TimeoutError (309) and UnexpectedModelBehavior (313): + except (FallbackExceptionGroup, ModelAPIError) as e: + failures = classify_model_failures(e) + logger.warning("agents.chat_fallback_exhausted", session_id=session_id, + failure_count=len(failures), + reasons=[f.reason for f in failures]) # safe fields only + raise AgentFallbackExhaustedError( + summarize_model_failures(failures), + failures=[f.model_dump(mode="json") for f in failures]) from e + - stream_chat(): NEW arm between TimeoutError (693) and UnexpectedModelBehavior (697), + mirroring the misbehavior tail at 759-770: + except (FallbackExceptionGroup, ModelAPIError) as e: + failures = classify_model_failures(e) + logger.warning("agents.stream_chat_fallback_exhausted", ...) # same safe fields + now = datetime.now(UTC); session.last_activity = now; await db.flush() + yield StreamEvent(event_type="error", data={ + "error": summarize_model_failures(failures), + "error_type": "fallback_exhausted", + "recoverable": True, + "failures": [f.model_dump(mode="json") for f in failures], + }, timestamp=now) + return + - PRESERVE: no _salvage_* calls in the new arms; misbehavior arms byte-identical + +Task 6: +CREATE app/features/agents/tests/test_failures.py: + - Classification matrix: parametrize ModelHTTPError statuses + (404→model_not_found, 429→quota_exhausted, 401/403→auth_error, + 500/503→provider_unavailable, 418→provider_error) + - Group of (404 + 429) → 2 details preserving model_name order + - Nested group (group inside group) → flattened leaves + - Bare ModelAPIError (construct a minimal subclass or ModelHTTPError-free instance) + → provider_error, status None + - ResponseRejected member → response_rejected + - Unknown exception → unknown + - Secret scrub: body={"error": {"message": "key AIzaFakeKey1234567890abcdef leaked"}} + → "[redacted]" in detail, "AIza" not in detail; same for "sk-fake…" and "Bearer x.y.z" + - Truncation: 1000-char provider message → len(detail) <= 300 + - summarize_model_failures: exact-substring asserts for 1-leg and 2-leg shapes + +Task 7: +EXTEND app/features/agents/tests/test_service.py: + - TestAgentServiceStreamChat.test_stream_chat_fallback_exhausted_yields_classified_error: + MIRROR test_stream_chat_model_misbehavior_yields_error_event (426-480) exactly, + but _RaisingStream.__aenter__ raises FallbackExceptionGroup( + "All models from FallbackModel failed", + [ModelHTTPError(404, "google-gla:gemini-3-flash-preview", + body={"error": {"message": "models/... is not found"}}), + ModelHTTPError(429, "gemini-2.5-flash", + body={"error": {"message": "RESOURCE_EXHAUSTED ... AIzaFakeKey123456789"}})]) + ASSERT: len(events)==1; event_type=="error"; data["error_type"]=="fallback_exhausted"; + data["recoverable"] is True; len(data["failures"])==2; + failures[0]["reason"]=="model_not_found"; failures[1]["reason"]=="quota_exhausted"; + "sub-exceptions" not in data["error"]; + "AIza" not in json.dumps(events[0].model_dump(mode="json")) + - TestAgentServiceStreamChat.test_stream_chat_bare_model_api_error_classified: + same harness, __aenter__ raises ModelHTTPError(401, "anthropic:claude-test") → + 1 error event, failures==1, reason=="auth_error" + - TestAgentServiceChat.test_chat_fallback_exhausted_raises_classified_error: + MIRROR the chat misbehavior test harness; agent.run = AsyncMock(side_effect=<group>); + pytest.raises(AgentFallbackExhaustedError) → exc.status_code==502, + exc.code=="AGENT_FALLBACK_EXHAUSTED", len(exc.extensions["failures"])==2 + +Task 8: +EXTEND app/features/agents/tests/test_routes.py (integration): + - test_chat_fallback_exhausted_returns_502_problem_json: + create session (patched agent factory, existing pattern), then patch the service + agent so run raises the 404+429 group; POST /agents/sessions/{id}/chat → + ASSERT status 502; headers content-type startswith "application/problem+json"; + body["code"]=="AGENT_FALLBACK_EXHAUSTED"; + body["type"].endswith("/errors/agent-fallback-exhausted"); + len(body["failures"])==2 with both reasons; "request_id" present + +Task 9: +CREATE app/core/tests/test_problem_details.py: + - test_problem_response_without_extensions_unchanged: no extensions → body has no + "failures" key; code/type/status as before + - test_problem_response_merges_extensions: extensions={"failures":[{"a":1}]} → in body + - test_problem_response_extensions_cannot_override_reserved: + extensions={"status": 200, "code": "HACK"} → body keeps the real status/code + +Task 10 (docs, same PR): +EXTEND docs/_base/API_CONTRACTS.md: + - WS `/agents/stream` error bullet: document `error_type="fallback_exhausted"` and the + additive Optional `failures: [{model_name, status_code, reason, detail}]` data key + - agents chat row: note the 502 AGENT_FALLBACK_EXHAUSTED problem+json (additive) +``` + +### Integration Points + +```yaml +DATABASE: none — nothing persisted changes; no migration +ROUTES: none — REST surface comes via the global ForecastLabError handler (502) +WEBSOCKET: service-level yield only; websocket.py generic handler untouched (backstop) +CONFIG: none — no new settings; no change to agent_require_approval (no new mutation surface) +FRONTEND: none — chat.tsx renders the summary string as-is; failures key is additive +DOCS: docs/_base/API_CONTRACTS.md (Task 10) +``` + +## Validation Loop + +### Level 1: Syntax & Style + +```bash +uv run ruff check app/features/agents/ app/core/ && uv run ruff format --check . +uv run mypy app/ && uv run pyright app/ # both --strict; zero new errors +``` + +### Level 2: Unit tests (no DB) + +```bash +uv run pytest -v \ + app/features/agents/tests/test_failures.py \ + app/features/agents/tests/test_service.py \ + app/core/tests/test_problem_details.py +# Full unit gate — proves misbehavior/salvage paths and every other consumer untouched: +uv run pytest -v -m "not integration" +``` + +### Level 3: Integration (live API; snapshot config FIRST — see gotcha) + +```bash +docker compose up -d +uv run pytest -v -m integration app/features/agents/tests/test_routes.py + +# Live REST leg (fresh uvicorn; snapshot + restore the operator's persisted config!): +curl -s http://localhost:8123/config/ai # SNAPSHOT current model ids +curl -si -X PATCH http://localhost:8123/config/ai -H 'Content-Type: application/json' \ + -d '{"agent_default_model":"openai:gpt-nonexistent-e2","agent_fallback_model":"openai:gpt-also-nonexistent"}' +SID=$(curl -s -X POST http://localhost:8123/agents/sessions \ + -H 'Content-Type: application/json' -d '{"agent_type":"experiment"}' | python3 -c 'import sys,json;print(json.load(sys.stdin)["session_id"])') +curl -si -X POST http://localhost:8123/agents/sessions/$SID/chat \ + -H 'Content-Type: application/json' -d '{"message":"hello"}' | head -30 +# expect: HTTP/1.1 502, application/problem+json, code AGENT_FALLBACK_EXHAUSTED, +# failures[] with reason "model_not_found" on both legs +curl -si -X PATCH http://localhost:8123/config/ai -H 'Content-Type: application/json' \ + -d '{"agent_default_model":"<snapshot>","agent_fallback_model":"<snapshot>"}' # RESTORE +``` + +### Level 4 (optional dogfood): chat UI over WebSocket + +With the broken model pair patched in, open `/chat` (localhost:5173), send a message → +the transcript should show the classified two-leg summary (`model not found … (HTTP 404); …`) +instead of `Stream error: All models from FallbackModel failed`. Restore config after. + +## Final Validation Checklist + +- [ ] `uv run ruff check . && uv run ruff format --check .` clean +- [ ] `uv run mypy app/ && uv run pyright app/` clean (strict) +- [ ] `uv run pytest -v -m "not integration"` green — including the untouched + `model_behavior_error` and salvage tests +- [ ] New tests cover: status matrix, nested group, bare ModelAPIError, ResponseRejected, + secret scrub (AIza/sk-/Bearer), truncation, summary shapes, stream 404+429 event, + stream bare-401 event, chat raise, route 502 (both legs), extensions merge + guard +- [ ] `uv run pytest -v -m integration app/features/agents/tests/test_routes.py` green +- [ ] Level-3 curl matrix matches; operator config snapshot RESTORED and re-verified +- [ ] No secret-like string in any serialized payload (asserted on full JSON dumps) +- [ ] `git diff --stat` shows surgical diffs (no whole-file line-ending churn) +- [ ] Commits: `fix(agents,api): surface fallback model failures with classified details (#335)` + (+ `docs(docs): …` for API_CONTRACTS if split); no AI trailers +- [ ] PR into `dev` from `fix/agents-surface-fallback-failures`; CI green + +--- + +## Out of Scope (this PRP) + +- **Frontend failure-detail rendering** (chips/expandable list from the `failures` key) — + the summary string already lands in the transcript; promote to its own `feat(ui)` issue + if dogfood demands richer rendering. +- **Retry/circuit-breaker middleware or metrics** — explicitly rejected in umbrella #380 + (violates the no-external-observability / single-host principle). +- **Classifying `UsageLimitExceeded` / `ConcurrencyLimitExceeded`** — pydantic-ai usage-cap + errors, not provider failures; today's behavior (generic backstop) stands. +- **Surfacing agent-BUILD failures** (missing API key → `ValueError` in + `build_agent_model_with_fallback`) — a config-time failure class, already log-visible; + separate concern from run-time provider failure. +- **E6 release-gate dogfood** — umbrella #380's own closing epic. + +## Anti-Patterns to Avoid + +- ❌ Don't put `str(exception)` or `exc.body` raw into any client payload — sanitize-then-truncate only. +- ❌ Don't stuff failures into `ForecastLabError.details` — the handler drops it by design; use `extensions`. +- ❌ Don't use `except*` — a plain `except FallbackExceptionGroup` keeps the single-yield contract simple. +- ❌ Don't touch `websocket.py` — the generic handler is the deliberate backstop. +- ❌ Don't salvage (`_salvage_*`) in the new arms — no model ran; there is nothing to salvage. +- ❌ Don't rename `model_behavior_error` or weaken its tests — extend alongside. +- ❌ Don't widen `agent_require_approval` or any mutation surface — this is read-path-only hardening. +- ❌ Don't forget to RESTORE the operator's persisted `ollama:gemma4-agent` override after Level 3. + +--- + +**One-pass confidence score: 8/10** — every catch point, schema, and precedent is +runtime-verified with exact line anchors, and the classifier is a pure module with a mirrored +test harness. Deductions: the stream-test async-CM mocking is fiddly (mitigated by mirroring +test_service.py:426-480 verbatim), and the `extensions` merge must dodge the +pydantic-plugin/strict-mypy trap (mitigated by the serialized-dict merge decision). From a6907c61dabcbcfa1715f31e1a161f3ac73ac051 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Thu, 11 Jun 2026 22:08:18 +0200 Subject: [PATCH 15/44] test(agents): cover review feedback on failure classification edges (#335) --- app/core/tests/test_problem_details.py | 33 +++++++++++++++++++++ app/features/agents/tests/test_failures.py | 20 +++++++++++++ app/features/agents/tests/test_schemas.py | 34 ++++++++++++++++++++++ app/features/agents/tests/test_service.py | 2 ++ 4 files changed, 89 insertions(+) diff --git a/app/core/tests/test_problem_details.py b/app/core/tests/test_problem_details.py index af00213f..9db1673d 100644 --- a/app/core/tests/test_problem_details.py +++ b/app/core/tests/test_problem_details.py @@ -8,6 +8,13 @@ import json from typing import Any +import pytest +from fastapi import Request + +from app.core.exceptions import ( + AgentFallbackExhaustedError, + forecastlab_exception_handler, +) from app.core.problem_details import problem_response @@ -75,3 +82,29 @@ def test_problem_response_extensions_cannot_override_reserved() -> None: assert body["type"] == "/errors/agent-fallback-exhausted" assert body["title"] == "Agent Fallback Exhausted" assert body["safe_key"] == "kept" + + +@pytest.mark.asyncio +async def test_exception_handler_propagates_extensions() -> None: + """The full exception → handler → problem+json path carries extensions. + + Guards the wiring: ForecastLabError.extensions must reach the response + body via forecastlab_exception_handler's pass-through (issue #335). + """ + failures = [ + {"model_name": "m1", "status_code": 404, "reason": "model_not_found", "detail": ""}, + {"model_name": "m2", "status_code": 429, "reason": "quota_exhausted", "detail": ""}, + ] + exc = AgentFallbackExhaustedError("All configured agent models failed", failures=failures) + request = Request(scope={"type": "http", "method": "POST", "path": "/", "headers": []}) + + response = await forecastlab_exception_handler(request, exc) + + body = _body(response) + assert response.status_code == 502 + assert body["status"] == 502 + assert body["code"] == "AGENT_FALLBACK_EXHAUSTED" + assert body["type"] == "/errors/agent-fallback-exhausted" + assert body["title"] == "Agent Fallback Exhausted" + assert body["detail"] == "All configured agent models failed" + assert body["failures"] == failures diff --git a/app/features/agents/tests/test_failures.py b/app/features/agents/tests/test_failures.py index 386fc969..cf5a218c 100644 --- a/app/features/agents/tests/test_failures.py +++ b/app/features/agents/tests/test_failures.py @@ -73,6 +73,26 @@ def test_nested_group_flattens_leaves(self) -> None: assert [f.model_name for f in failures] == ["outer:model", "inner:model"] assert [f.reason for f in failures] == ["model_not_found", "quota_exhausted"] + def test_mixed_group_classifies_unknown_members(self) -> None: + """A group mixing known and unexpected members flattens in order, + classifying the unexpected member as unknown.""" + group = FallbackExceptionGroup( + "All models from FallbackModel failed", + [ + ModelHTTPError(404, "google-gla:gemini-3-flash-preview"), + RuntimeError("boom"), + ], + ) + + failures = classify_model_failures(group) + + assert len(failures) == 2 + assert failures[0].model_name == "google-gla:gemini-3-flash-preview" + assert failures[0].reason == "model_not_found" + assert failures[1].reason == "unknown" + assert failures[1].status_code is None + assert "boom" in failures[1].detail + def test_bare_model_api_error_is_provider_error(self) -> None: """A non-HTTP ModelAPIError (connection failure) → provider_error, no status.""" failures = classify_model_failures( diff --git a/app/features/agents/tests/test_schemas.py b/app/features/agents/tests/test_schemas.py index 7294a294..9c50ea31 100644 --- a/app/features/agents/tests/test_schemas.py +++ b/app/features/agents/tests/test_schemas.py @@ -12,8 +12,10 @@ ChatMessage, ChatRequest, ChatResponse, + ErrorEvent, ExperimentPlan, ExperimentReport, + ModelFailureDetail, PendingAction, RAGAnswer, SessionCreateRequest, @@ -304,6 +306,38 @@ def test_error_event(self) -> None: assert event.event_type == "error" +class TestErrorEvent: + """Tests for the ErrorEvent schema (failures added by issue #335).""" + + def test_non_fallback_error_has_no_failures(self) -> None: + """Non-fallback error types must keep failures None in serialized output.""" + event = ErrorEvent( + error="The assistant produced an invalid tool call.", + error_type="model_behavior_error", + ) + + serialized = event.model_dump(mode="json") + assert serialized.get("failures") is None + + def test_fallback_exhausted_carries_failures(self) -> None: + """fallback_exhausted events carry the classified per-model failures.""" + event = ErrorEvent( + error="All configured agent models failed", + error_type="fallback_exhausted", + failures=[ + ModelFailureDetail( + model_name="google-gla:gemini-3-flash-preview", + status_code=404, + reason="model_not_found", + ) + ], + ) + + serialized = event.model_dump(mode="json") + assert serialized["failures"][0]["reason"] == "model_not_found" + assert serialized["failures"][0]["status_code"] == 404 + + class TestExperimentPlan: """Tests for ExperimentPlan schema.""" diff --git a/app/features/agents/tests/test_service.py b/app/features/agents/tests/test_service.py index cf751bf2..612783cb 100644 --- a/app/features/agents/tests/test_service.py +++ b/app/features/agents/tests/test_service.py @@ -534,6 +534,8 @@ async def __aexit__(self, *exc: object) -> bool: assert events[0].data["recoverable"] is True assert events[0].data["error_type"] == "model_behavior_error" assert "exceeded max retries" not in events[0].data["error"] + # failures is exclusive to fallback_exhausted events (#335). + assert "failures" not in events[0].data @pytest.mark.asyncio async def test_stream_chat_fallback_exhausted_yields_classified_error( From 1482144124073347e014e2e8a144095a348d6890 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Thu, 11 Jun 2026 22:43:19 +0200 Subject: [PATCH 16/44] fix(ui): avoid crypto.randomUUID crash on lan http showcase (#332) --- frontend/eslint.config.js | 15 +++++++ .../components/demo/RunHistoryStrip.test.tsx | 24 +++++++++++ .../src/components/demo/RunHistoryStrip.tsx | 3 +- frontend/src/lib/uuid-utils.test.ts | 41 +++++++++++++++++++ frontend/src/lib/uuid-utils.ts | 31 ++++++++++++++ 5 files changed, 113 insertions(+), 1 deletion(-) create mode 100644 frontend/src/lib/uuid-utils.test.ts create mode 100644 frontend/src/lib/uuid-utils.ts diff --git a/frontend/eslint.config.js b/frontend/eslint.config.js index 8f1e12ab..75881f96 100644 --- a/frontend/eslint.config.js +++ b/frontend/eslint.config.js @@ -27,4 +27,19 @@ export default defineConfig([ 'react-refresh/only-export-components': 'off', }, }, + // #332 — crypto.randomUUID is undefined outside secure contexts (plain-HTTP LAN). + { + files: ['**/*.{ts,tsx}'], + rules: { + 'no-restricted-properties': [ + 'error', + { + object: 'crypto', + property: 'randomUUID', + message: + 'crypto.randomUUID is undefined outside secure contexts (plain-HTTP LAN origins). Use safeRandomUUID() from @/lib/uuid-utils instead. (#332)', + }, + ], + }, + }, ]) diff --git a/frontend/src/components/demo/RunHistoryStrip.test.tsx b/frontend/src/components/demo/RunHistoryStrip.test.tsx index ab5d41c0..5f74e422 100644 --- a/frontend/src/components/demo/RunHistoryStrip.test.tsx +++ b/frontend/src/components/demo/RunHistoryStrip.test.tsx @@ -8,6 +8,7 @@ const STORAGE_KEY = 'forecastlab.showcase.runs.v1' afterEach(() => { cleanup() window.localStorage.clear() + vi.unstubAllGlobals() }) beforeEach(() => { @@ -84,6 +85,29 @@ describe('RunHistoryStrip', () => { ) }) + it('appends a history entry without crashing when crypto.randomUUID is unavailable (#332)', () => { + // Non-secure contexts (plain-HTTP LAN origins) expose getRandomValues but + // NOT randomUUID. jsdom's crypto has randomUUID, so the LAN shape must be + // stubbed explicitly — an unstubbed render passes even against the bug. + const realGetRandomValues = globalThis.crypto.getRandomValues.bind(globalThis.crypto) + vi.stubGlobal('crypto', { + getRandomValues: realGetRandomValues, + } as unknown as Crypto) + + const { container } = render( + <RunHistoryStrip onReplay={() => {}} summary={summary} scenario="showcase_rich" />, + ) + + expect(container.textContent).toContain('showcase_rich') + const stored = window.localStorage.getItem(STORAGE_KEY) + expect(stored).not.toBeNull() + const items = JSON.parse(stored!) + expect(items).toHaveLength(1) + expect(items[0].id).toMatch( + /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/, + ) + }) + it('Clear button empties history + localStorage', () => { const { container } = render( <RunHistoryStrip onReplay={() => {}} summary={summary} scenario="demo_minimal" />, diff --git a/frontend/src/components/demo/RunHistoryStrip.tsx b/frontend/src/components/demo/RunHistoryStrip.tsx index fce287ba..39addd31 100644 --- a/frontend/src/components/demo/RunHistoryStrip.tsx +++ b/frontend/src/components/demo/RunHistoryStrip.tsx @@ -14,6 +14,7 @@ import { useCallback, useEffect, useState } from 'react' import { Button } from '@/components/ui/button' import { Card, CardContent } from '@/components/ui/card' +import { safeRandomUUID } from '@/lib/uuid-utils' import type { DemoRunRequest, ScenarioPreset } from '@/types/api' import type { DemoSummary } from '@/hooks/use-demo-pipeline' @@ -72,7 +73,7 @@ export function RunHistoryStrip({ onReplay, summary, scenario }: RunHistoryStrip setItems((prev) => [ { - id: crypto.randomUUID(), + id: safeRandomUUID(), runId: summary.winningRunId, timestamp: new Date().toISOString(), scenario, diff --git a/frontend/src/lib/uuid-utils.test.ts b/frontend/src/lib/uuid-utils.test.ts new file mode 100644 index 00000000..32373630 --- /dev/null +++ b/frontend/src/lib/uuid-utils.test.ts @@ -0,0 +1,41 @@ +import { afterEach, describe, expect, it, vi } from 'vitest' +import { safeRandomUUID } from './uuid-utils' + +const V4_REGEX = /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/ + +afterEach(() => { + vi.unstubAllGlobals() +}) + +describe('safeRandomUUID', () => { + it('delegates to crypto.randomUUID when available', () => { + vi.stubGlobal('crypto', { + randomUUID: vi.fn(() => 'fixed-uuid'), + } as unknown as Crypto) + + expect(safeRandomUUID()).toBe('fixed-uuid') + }) + + it('falls back to getRandomValues v4 when randomUUID is missing (LAN-HTTP shape)', () => { + // The real plain-HTTP LAN shape: getRandomValues present, randomUUID absent (#332). + vi.stubGlobal('crypto', { + getRandomValues: globalThis.crypto.getRandomValues.bind(globalThis.crypto), + } as unknown as Crypto) + + const first = safeRandomUUID() + const second = safeRandomUUID() + expect(first).toMatch(V4_REGEX) + expect(second).toMatch(V4_REGEX) + expect(first).not.toBe(second) + }) + + it('falls back to Math.random v4 when crypto is entirely absent', () => { + vi.stubGlobal('crypto', undefined) + + const first = safeRandomUUID() + const second = safeRandomUUID() + expect(first).toMatch(V4_REGEX) + expect(second).toMatch(V4_REGEX) + expect(first).not.toBe(second) + }) +}) diff --git a/frontend/src/lib/uuid-utils.ts b/frontend/src/lib/uuid-utils.ts new file mode 100644 index 00000000..81f6e3ce --- /dev/null +++ b/frontend/src/lib/uuid-utils.ts @@ -0,0 +1,31 @@ +/** + * #332 — crypto.randomUUID() exists only in secure contexts (HTTPS or localhost). + * On a plain-HTTP LAN origin (the showcase dogfood setup) it is undefined and a direct + * call TypeErrors. crypto.getRandomValues is NOT secure-context-gated, so the fallback + * keeps cryptographically-strong entropy; Math.random is a last resort for environments + * with no Web Crypto at all (ids here are React keys / history ids, not security tokens). + */ +export function safeRandomUUID(): string { + // eslint-disable-next-line no-restricted-properties -- feature-detecting the restricted member + if (typeof crypto !== 'undefined' && typeof crypto.randomUUID === 'function') { + // eslint-disable-next-line no-restricted-properties -- the one sanctioned call site + return crypto.randomUUID() + } + if (typeof crypto !== 'undefined' && typeof crypto.getRandomValues === 'function') { + const bytes = new Uint8Array(16) + crypto.getRandomValues(bytes) + bytes[6] = ((bytes[6] ?? 0) & 0x0f) | 0x40 // version 4 + bytes[8] = ((bytes[8] ?? 0) & 0x3f) | 0x80 // variant 10xx + const hex = Array.from(bytes, (b) => b.toString(16).padStart(2, '0')).join('') + return `${hex.slice(0, 8)}-${hex.slice(8, 12)}-${hex.slice(12, 16)}-${hex.slice(16, 20)}-${hex.slice(20)}` + } + // No Web Crypto at all — uniqueness only, not cryptographic strength. + let uuid = '' + for (let i = 0; i < 36; i++) { + if (i === 8 || i === 13 || i === 18 || i === 23) uuid += '-' + else if (i === 14) uuid += '4' + else if (i === 19) uuid += (((Math.random() * 4) | 0) | 8).toString(16) // 8,9,a,b + else uuid += ((Math.random() * 16) | 0).toString(16) + } + return uuid +} From 3300d67f5f0b928299386e4e1c8a85290524e69e Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Thu, 11 Jun 2026 22:43:19 +0200 Subject: [PATCH 17/44] docs(repo): track reliability E3 prp for safe uuid fallback (#332) --- ...ability-E3-safe-uuid-non-secure-context.md | 438 ++++++++++++++++++ 1 file changed, 438 insertions(+) create mode 100644 PRPs/PRP-reliability-E3-safe-uuid-non-secure-context.md diff --git a/PRPs/PRP-reliability-E3-safe-uuid-non-secure-context.md b/PRPs/PRP-reliability-E3-safe-uuid-non-secure-context.md new file mode 100644 index 00000000..73466b3b --- /dev/null +++ b/PRPs/PRP-reliability-E3-safe-uuid-non-secure-context.md @@ -0,0 +1,438 @@ +name: "PRP — Reliability E3: safe-UUID fallback for non-secure contexts (Showcase LAN-HTTP white-screen)" +description: | + Parallel epic of umbrella #380 (platform reliability hardening), after Foundation E1 (#334). + Issue: #332 · Branch: `fix/ui-safe-uuid-non-secure-context` off `dev` · Commit scope: `ui` + (frontend-only — the single mutation surface is `frontend/src/`; zero backend changes). + +--- + +## Goal + +On any plain-HTTP LAN origin (e.g. `http://10.0.0.226:5173/showcase`), the Showcase page must +survive pipeline completion instead of white-screening. Today `RunHistoryStrip.tsx:75` calls +`crypto.randomUUID()` directly; outside a secure context (`window.isSecureContext === false`) +that property is `undefined`, the `TypeError: crypto.randomUUID is not a function` throws +**during render** (the append happens in the render-phase `setItems` updater), and React +unmounts the whole tree → blank page while the backend pipeline keeps completing fine. + +**Deliverable:** +1. A new shared utility `frontend/src/lib/uuid-utils.ts` exporting `safeRandomUUID(): string` — + delegates to `crypto.randomUUID()` when available, falls back to an RFC-4122-v4 generator + built on `crypto.getRandomValues()` (which is NOT secure-context-gated), with a final + `Math.random()`-based last resort if Web Crypto is absent entirely. +2. A colocated vitest suite `frontend/src/lib/uuid-utils.test.ts`. +3. `RunHistoryStrip.tsx:75` switched to `safeRandomUUID()` + a non-secure-context regression + case added to `RunHistoryStrip.test.tsx`. +4. An ESLint `no-restricted-properties` guard in `frontend/eslint.config.js` that bans direct + `crypto.randomUUID` member access repo-wide (exempting the helper itself), so the bug class + cannot silently come back at a future call site. + +**Success definition:** a full showcase run over a LAN-IP HTTP origin completes and the +"Recent runs" strip renders the new entry (manual dogfood, umbrella #380 risk row: "LAN-HTTP +behavior not covered by CI"); vitest covers both helper paths and the component regression; +`pnpm tsc --noEmit && pnpm lint && pnpm test --run` all green. + +## Why + +- **Demo-killer with a silent failure mode.** Anyone demoing the showcase from another device + (the whole point of the LAN setup) sees the page freeze/blank at the finish line and reads it + as "the pipeline hung at step 21" — the backend actually succeeded (runs registered, ops + snapshot computed, scenario plans persisted). Diagnosed in issue #332 via Playwright: + `localhost` → `isSecureContext=true` (works); LAN IP → `isSecureContext=false` (crashes). +- **Umbrella #380 acceptance** explicitly lists: "`/showcase` completes a full run over + plain-HTTP LAN origin without white-screen; UUID fallback util has a vitest (#332 closed)". +- **One-line bug, repo-wide guard.** The audit found exactly ONE call site today, but nothing + stops the next feature from re-introducing `crypto.randomUUID()`. The lint guard mirrors the + repo's "the test is the spec" guard-rail ethos (cf. `test_leakage.py`, + `test_strict_mode_policy.py`) at the cheapest possible layer. + +## What + +### Behavior change + +| Surface | Today | After | +|---------|-------|-------| +| `/showcase` on `http://<lan-ip>:5173` at `pipeline_complete` | `TypeError: crypto.randomUUID is not a function` thrown in render → React tree unmounts → white screen | `safeRandomUUID()` falls back to the `getRandomValues` v4 generator → history entry appended, "Recent runs" strip renders | +| `/showcase` on `http://localhost:5173` or HTTPS | works (`crypto.randomUUID` defined) | **unchanged** — helper delegates to the native call | +| Any future `crypto.randomUUID(` member access in `frontend/src` | compiles + lints clean, crashes at runtime on LAN HTTP | `eslint .` fails with a message pointing at `safeRandomUUID()` | + +### Out of scope + +- No backend change, no API change, no `docs/_base/` contract update (frontend-only fix). +- No new npm dependency (`uuid` / `nanoid` considered and rejected — a ~20-line util keeps the + dependency-light footprint per `.claude/rules/product-vision.md`; both packages would be + pulled in for exactly one call site). +- Other secure-context-gated APIs: audit confirmed zero uses of `crypto.subtle` and + `navigator.clipboard` in `frontend/src` — nothing else to fix. + +### Success Criteria + +- [ ] `safeRandomUUID()` returns the native `crypto.randomUUID()` value when available +- [ ] With `crypto.randomUUID` absent but `getRandomValues` present (the real LAN-HTTP shape), + output matches `/^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/` + and successive calls differ +- [ ] With `crypto` absent entirely, the `Math.random` last resort still matches the v4 shape +- [ ] `RunHistoryStrip` renders a history entry without throwing when global `crypto` lacks + `randomUUID` (component regression test — would have caught the original bug) +- [ ] `eslint .` rejects a direct `crypto.randomUUID` access anywhere except `src/lib/uuid-utils.ts` +- [ ] `cd frontend && pnpm tsc --noEmit && pnpm lint && pnpm test --run` all green +- [ ] Manual dogfood: showcase run over a LAN-IP HTTP origin completes; no white screen; + "Recent runs" strip shows the new entry (documented step per umbrella risk row) + +## All Needed Context + +### Documentation & References + +```yaml +# MUST READ — secure-context semantics (the entire bug) +- url: https://developer.mozilla.org/en-US/docs/Web/API/Crypto/randomUUID + why: "Secure context: This feature is available only in secure contexts (HTTPS)" — the + method is literally undefined on the Crypto object otherwise; that is why the call + throws TypeError (not a permission error). + +- url: https://developer.mozilla.org/en-US/docs/Web/API/Crypto/getRandomValues + why: getRandomValues is NOT marked secure-context-only — it is available on plain-HTTP + origins, which is what makes it the correct fallback entropy source (cryptographically + strong, unlike Math.random). + +- url: https://developer.mozilla.org/en-US/docs/Web/Security/Secure_Contexts + why: localhost / 127.0.0.1 count as secure contexts even over HTTP — explains why the bug + only reproduces via a LAN IP and why local dev never caught it. + +# MUST READ — codebase files +- file: frontend/src/components/demo/RunHistoryStrip.tsx + why: The single call site (line 75, inside a render-phase setItems updater — that placement + is WHY the throw unmounts the tree). Note the file's existing defensive ethos + (SSR guards, swallowed quota errors) — the fix matches that spirit. + +- file: frontend/src/components/demo/RunHistoryStrip.test.tsx + why: The suite to extend. Style to mirror: describe/it, @testing-library/react render, + beforeEach/afterEach localStorage cleanup, shared `summary` fixture (lines 17-25). + +- file: frontend/src/lib/api.test.ts + why: The house pattern for stubbing globals in vitest — `vi.stubGlobal(...)` + an + `afterEach(() => vi.unstubAllGlobals())`. Reuse exactly this for stubbing `crypto`. + +- file: frontend/src/lib/utils.ts + why: Export style for src/lib — small named-export functions, no default exports. + +- file: frontend/eslint.config.js + why: Flat-config layout to extend. Note the existing per-path override block for + src/components/ui/** — add the guard rule + the uuid-utils exemption as two more + config objects in the same style. + +- file: frontend/vitest.config.ts + why: jsdom environment, colocated `src/**/*.test.{ts,tsx}` include, `@` alias — no setup + file exists; do not add one for this. + +- file: PRPs/PRP-reliability-E2-surface-fallback-failures.md + why: Sibling epic of the same umbrella — house format for reliability PRPs (this file + mirrors its structure). +``` + +### Current Codebase tree (relevant slice) + +```bash +frontend/src/ +├── components/demo/ +│ ├── RunHistoryStrip.tsx # line 75: id: crypto.randomUUID() ← THE BUG +│ ├── RunHistoryStrip.test.tsx # 5 existing cases, none non-secure-context +│ └── demo-step-card.tsx # Date.now() for elapsed time only — NOT an id, leave alone +├── hooks/use-demo-pipeline.ts # no id generation (verified) +└── lib/ + ├── utils.ts # cn() — named-export style to follow + ├── api.test.ts # vi.stubGlobal pattern to follow + └── <17 other utils, almost all with colocated .test.ts> +frontend/eslint.config.js # flat config; gets the no-restricted-properties guard +``` + +### Desired Codebase tree + +```bash +frontend/src/lib/uuid-utils.ts # NEW — safeRandomUUID() (named export) +frontend/src/lib/uuid-utils.test.ts # NEW — 3 paths: native / getRandomValues / Math.random +frontend/src/components/demo/RunHistoryStrip.tsx # MODIFIED — import + 1-line swap +frontend/src/components/demo/RunHistoryStrip.test.tsx # MODIFIED — +1 regression case +frontend/eslint.config.js # MODIFIED — guard rule + uuid-utils exemption +``` + +### Known Gotchas & Library Quirks + +```typescript +// GOTCHA 1 — the bug does NOT reproduce naturally in vitest. +// jsdom's window.crypto (Node webcrypto) HAS randomUUID. Verified 2026-06-11: +// cd frontend && node -e "const{JSDOM}=require('jsdom');const d=new JSDOM(''); +// console.log(typeof d.window.crypto.randomUUID)" // -> "function" +// So every non-secure-context test MUST stub the global: +// vi.stubGlobal('crypto', { getRandomValues: globalThis.crypto.getRandomValues.bind(globalThis.crypto) } as Crypto) +// and restore with vi.unstubAllGlobals() in afterEach (api.test.ts pattern). +// A test that "just calls the component" passes with or without the fix — worthless. + +// GOTCHA 2 — TypeScript's lib.dom.d.ts types Crypto.randomUUID as NON-optional. +// `typeof crypto.randomUUID === 'function'` is statically "always true" to the checker but +// compiles fine (no strictness flag fires; @typescript-eslint/no-unnecessary-condition is +// not enabled in this repo). Do NOT use `crypto.randomUUID?.()` — optional-call on a +// non-optional member is also fine for TS but reads as if the types said it could be absent; +// the explicit typeof guard documents the runtime reality. Either compiles; use typeof. + +// GOTCHA 3 — the v4 bit-twiddling is exact, not approximate: +// bytes[6] = (bytes[6] & 0x0f) | 0x40 // version nibble -> 4 +// bytes[8] = (bytes[8] & 0x3f) | 0x80 // variant -> 10xxxxxx (8|9|a|b) +// Verified 2026-06-11 in Node 'node -e' (see PRP research log): output matches +// /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/ +// Re-verify with the same one-liner if you change the byte math. + +// GOTCHA 4 — eslint no-restricted-properties flags MEMBER ACCESS (crypto.randomUUID), +// not object-literal property definitions ({ randomUUID: vi.fn() } in test stubs is fine). +// The exemption override is needed only for src/lib/uuid-utils.ts itself. + +// GOTCHA 5 — RunHistoryStrip appends DURING render (documented at lines 66-69 as the React +// "storing information from previous renders" pattern). Keep the fix to the one-line +// id: safeRandomUUID() swap — do NOT refactor the render-phase setState; that pattern is +// deliberate (react-hooks/set-state-in-effect) and out of scope. + +// GOTCHA 6 — localhost IS a secure context even over plain HTTP. Manual verification MUST +// use a LAN IP (or any non-loopback hostname) to actually exercise the fallback path. + +// GOTCHA 7 — repo has mixed CRLF/LF line endings with no policy. New files are fine (LF); +// for the three modified files check `git diff --stat` shows only the intended hunks, not a +// whole-file line-ending rewrite. + +// GOTCHA 8 — frontend/tsconfig.app.json sets "noUncheckedIndexedAccess": true, so a +// Uint8Array index READ is `number | undefined` and `bytes[6] & 0x0f` is a TS2532 error. +// The blueprint uses `(bytes[6] ?? 0) & ...` for this reason — keep the ?? 0. +// ALSO: the root frontend/tsconfig.json is solution-style (files: [] + references), so +// `pnpm tsc --noEmit` type-checks ZERO files and exits 0 vacuously. The real type gate is +// `npx tsc -b` (what `pnpm build` runs) — and on current dev it already fails with +// pre-existing errors in untouched files (ai-models-panel.tsx, forecast-chart.tsx, +// job-picker.tsx, demand-utils.test.ts — noUncheckedIndexedAccess fallout). The bar for +// THIS PRP: `npx tsc -b` introduces NO NEW errors in the files this PRP touches. The hollow +// gate + pre-existing breakage deserves its own issue — flag it in the PR, don't fix here. +``` + +## Implementation Blueprint + +### Data models and structure + +No backend models, no schemas, no migrations. One pure function: + +```typescript +// frontend/src/lib/uuid-utils.ts — complete implementation (it is small enough to spec fully) + +/** + * #332 — crypto.randomUUID() exists only in secure contexts (HTTPS or localhost). + * On a plain-HTTP LAN origin (the showcase dogfood setup) it is undefined and a direct + * call TypeErrors. crypto.getRandomValues is NOT secure-context-gated, so the fallback + * keeps cryptographically-strong entropy; Math.random is a last resort for environments + * with no Web Crypto at all (ids here are React keys / history ids, not security tokens). + */ +export function safeRandomUUID(): string { + if (typeof crypto !== 'undefined' && typeof crypto.randomUUID === 'function') { + // eslint-disable-next-line no-restricted-properties -- the one sanctioned call site + return crypto.randomUUID() + } + if (typeof crypto !== 'undefined' && typeof crypto.getRandomValues === 'function') { + const bytes = new Uint8Array(16) + crypto.getRandomValues(bytes) + bytes[6] = ((bytes[6] ?? 0) & 0x0f) | 0x40 // version 4 (?? 0: Gotcha 8) + bytes[8] = ((bytes[8] ?? 0) & 0x3f) | 0x80 // variant 10xx + const hex = Array.from(bytes, (b) => b.toString(16).padStart(2, '0')).join('') + return `${hex.slice(0, 8)}-${hex.slice(8, 12)}-${hex.slice(12, 16)}-${hex.slice(16, 20)}-${hex.slice(20)}` + } + // No Web Crypto at all — uniqueness only, not cryptographic strength. + let uuid = '' + for (let i = 0; i < 36; i++) { + if (i === 8 || i === 13 || i === 18 || i === 23) uuid += '-' + else if (i === 14) uuid += '4' + else if (i === 19) uuid += (((Math.random() * 4) | 0) | 8).toString(16) // 8,9,a,b + else uuid += ((Math.random() * 16) | 0).toString(16) + } + return uuid +} +``` + +(The inline `eslint-disable-next-line` is the ONLY exemption — do NOT also add a config-level +`'no-restricted-properties': 'off'` override for uuid-utils.ts. With the rule disabled at the +config level the inline directive becomes unused, and ESLint 9 flat config reports unused +disable directives as warnings by default — `pnpm lint` would no longer be clean.) + +### Tasks (in order) + +```yaml +Task 1: +CREATE frontend/src/lib/uuid-utils.ts: + - CONTENT: exactly the implementation above (named export, no default — matches lib/utils.ts) + - DOC COMMENT references issue #332 and the secure-context cause + +Task 2: +CREATE frontend/src/lib/uuid-utils.test.ts: + - MIRROR stubbing pattern from: frontend/src/lib/api.test.ts (vi.stubGlobal + afterEach unstub) + - CASES (describe('safeRandomUUID')): + 1. "delegates to crypto.randomUUID when available" — + vi.stubGlobal('crypto', { randomUUID: vi.fn(() => 'fixed-uuid') } as unknown as Crypto) + expect(safeRandomUUID()).toBe('fixed-uuid') + 2. "falls back to getRandomValues v4 when randomUUID is missing (LAN-HTTP shape)" — + stub crypto with ONLY getRandomValues (bind the real one: + globalThis.crypto.getRandomValues.bind(globalThis.crypto)); + assert v4 regex /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/ + AND two successive calls differ + 3. "falls back to Math.random v4 when crypto is entirely absent" — + vi.stubGlobal('crypto', undefined); same regex + uniqueness assertions + - GOTCHA: case 2/3 are the tests that fail on the unfixed direct-call code path — they are + the spec for the fallback + +Task 3: +MODIFY frontend/src/components/demo/RunHistoryStrip.tsx: + - ADD import: import { safeRandomUUID } from '@/lib/uuid-utils' + - FIND: "id: crypto.randomUUID()," (line 75) + - REPLACE with: "id: safeRandomUUID()," + - PRESERVE everything else (render-phase append pattern is deliberate — Gotcha 5) + +Task 4: +MODIFY frontend/src/components/demo/RunHistoryStrip.test.tsx: + - ADD regression case to the existing describe('RunHistoryStrip'): + "appends a history entry without crashing when crypto.randomUUID is unavailable (#332)" + — vi.stubGlobal('crypto', { getRandomValues: globalThis.crypto.getRandomValues.bind(globalThis.crypto) } as unknown as Crypto) + BEFORE render (capture the real getRandomValues reference first); + — render(<RunHistoryStrip onReplay={() => {}} summary={summary} scenario="showcase_rich" />) + must not throw; + — assert localStorage entry exists, items[0].id matches the v4 regex; + — vi.unstubAllGlobals() in afterEach (extend the existing afterEach) + - REUSE the existing `summary` fixture (lines 17-25) + - VERIFY-IT-BITES: `git stash` the Task-3 change once and confirm this test fails with + "crypto.randomUUID is not a function", then unstash + +Task 5: +MODIFY frontend/eslint.config.js: + - APPEND one config object after the existing src/components/ui override, same style: + { + // #332 — crypto.randomUUID is undefined outside secure contexts (plain-HTTP LAN). + files: ['**/*.{ts,tsx}'], + rules: { + 'no-restricted-properties': [ + 'error', + { + object: 'crypto', + property: 'randomUUID', + message: + 'crypto.randomUUID is undefined outside secure contexts (plain-HTTP LAN origins). Use safeRandomUUID() from @/lib/uuid-utils instead. (#332)', + }, + ], + }, + }, + - The single sanctioned call site inside uuid-utils.ts uses the inline + eslint-disable-next-line from Task 1 — do NOT add a config-level off-override for that + file (it would make the inline directive unused → unused-disable-directive warning). + - VERIFY-IT-BITES: temporarily revert Task 3's swap (or add `const x = crypto.randomUUID` in + a scratch file) and confirm `pnpm lint` fails with the message above, then restore + +Task 6: +RUN validation gates (Level 1-2 below); then manual dogfood (Level 3). +``` + +### Integration Points + +```yaml +BACKEND: none — zero Python changes, no migration, no docs/_base contract change +CONFIG: frontend/eslint.config.js only (Task 5) +DOCS: none required; optional 1-line note in docs/_base/RUNBOOKS.md Showcase incident + list is NOT needed (the incident ceases to exist once fixed) +GIT: + branch: fix/ui-safe-uuid-non-secure-context (off dev; type/kebab, ≤50 chars) + commits: fix(ui): avoid crypto.randomUUID crash on lan http showcase (#332) + # the PRP file itself lands as: docs(repo): track reliability E3 prp for safe uuid fallback (#332) + # (mirrors 7c57641 "docs(repo): track reliability E2 prp ..." precedent) +``` + +## Validation Loop + +### Level 1: Syntax & Style (frontend gates) + +```bash +cd frontend +pnpm tsc --noEmit # the documented repo gate — but VACUOUS here (solution-style + # tsconfig type-checks zero files; Gotcha 8). Run it for the record. +npx tsc -b 2>&1 | grep uuid-utils # the REAL check for this PRP: no NEW errors in touched + # files (tsc -b already fails on dev for pre-existing, unrelated + # files — see Gotcha 8; do not fix those here) +pnpm lint # must pass WITH the new rule; see Task 5 verify-it-bites +``` + +### Level 2: Unit Tests + +```bash +cd frontend && pnpm test --run +# Expected: uuid-utils.test.ts 3 cases green, RunHistoryStrip.test.tsx 6 cases green +# (5 existing + 1 regression), zero regressions elsewhere. +# If pnpm itself fails with "cannot execute binary file" (WSL IntxLNK corruption): +# rm -rf node_modules && corepack enable pnpm && pnpm install && pnpm rebuild esbuild +``` + +### Level 3: Real-browser verification (the umbrella-mandated dogfood) + +```bash +# The bug ONLY manifests on a non-loopback HTTP origin (Gotcha 6). Two tiers: + +# 3a — quick: serve Vite LAN-reachable, verify the context shape from a real browser. +cd frontend && ./node_modules/.bin/vite --host 0.0.0.0 # (bypasses pnpm 11 depsStatusCheck) +# Playwright via native Python + snap Chromium (Playwright MCP / `playwright install` +# both fail on this host — see memory note; executable_path=/snap/bin/chromium): +# page.goto("http://<lan-ip>:5173/showcase") +# assert page.evaluate("window.isSecureContext") is False +# assert page.evaluate("typeof crypto.randomUUID") == "undefined" +# page reaches the Showcase UI with no console TypeError + +# 3b — full: end-to-end completion over LAN HTTP (backend must also be reachable from the +# browser's origin — use the docker-compose.lan.yml override pattern or point frontend/.env +# VITE_API_BASE_URL at http://<lan-ip>:8123 while uvicorn binds 0.0.0.0). +# Run demo_minimal from the page; on pipeline_complete expect: +# - NO white screen, no console TypeError +# - "Recent runs" strip appears with the new entry (PASS + wall-clock) +# - localStorage 'forecastlab.showcase.runs.v1' entry id matches the v4 regex +``` + +### Level 4: Backend gates untouched (sanity) + +```bash +# No app/ changes, but CI runs everything — confirm nothing leaks: +uv run ruff check . && uv run mypy app/ && uv run pyright app/ \ + && uv run pytest -v -m "not integration" +``` + +## Final Validation Checklist + +- [ ] `cd frontend && pnpm tsc --noEmit` — clean (vacuous; for the record) AND `npx tsc -b` + reports zero errors in the files this PRP touches (Gotcha 8 — pre-existing errors in + unrelated files are out of scope; flag them in the PR body) +- [ ] `cd frontend && pnpm lint` — clean, AND the rule bites on a planted `crypto.randomUUID` access +- [ ] `cd frontend && pnpm test --run` — all green, including the two new fallback specs and the + component regression (which fails when Task 3 is reverted — proven once during Task 4) +- [ ] Manual LAN-HTTP dogfood done and described in the PR body (screenshot or console excerpt + of `isSecureContext=false` + completed run) +- [ ] `git diff --stat` shows surgical hunks only (no line-ending rewrites — Gotcha 7) +- [ ] Commits formatted `fix(ui): ... (#332)`; no AI co-author trailer; branch off `dev` +- [ ] Backend gates green (no accidental `app/` or `uv.lock` churn in the PR — note the working + tree currently has an unrelated dirty `uv.lock`; do NOT sweep it into this branch) + +--- + +## Anti-Patterns to Avoid + +- ❌ Don't add `uuid`/`nanoid` as a dependency for one call site — the 20-line util is the fix +- ❌ Don't "fix" by gating on `window.isSecureContext` — feature-detect the function itself; + secure context is the *cause*, the absent function is the *fact* +- ❌ Don't use `Math.random` as the primary fallback — `getRandomValues` is available in + non-secure contexts and strictly better; Math.random is last-resort only +- ❌ Don't refactor RunHistoryStrip's render-phase append while you're in there (Gotcha 5) +- ❌ Don't write the component regression test without stubbing `crypto` — jsdom has + `randomUUID`, so an unstubbed test passes against the broken code (Gotcha 1) +- ❌ Don't skip the LAN-IP manual check by testing on `localhost` — localhost is a secure + context and proves nothing (Gotcha 6) + +## Confidence Score: 9/10 + +One-pass implementation is highly likely: single verified call site, complete helper spec +inline, verified byte-math, exact test stubbing recipe with the jsdom trap documented, and a +lint guard with a verify-it-bites step. The 1-point deduction is for the Level-3 dogfood, +which depends on host/LAN environment state (Vite binding, backend reachability from the LAN +origin) rather than anything in the diff itself. From 17d0aee1b07acf101a9c70de7b7153d633af130d Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Thu, 11 Jun 2026 23:33:34 +0200 Subject: [PATCH 18/44] refactor(forecast,registry): move model family taxonomy to shared (#268) --- app/features/batch/service.py | 19 ++- app/features/forecasting/feature_metadata.py | 49 ++------ app/features/forecasting/schemas.py | 24 ++-- app/features/forecasting/service.py | 24 ++-- app/features/model_selection/capabilities.py | 9 +- app/features/model_selection/service.py | 15 ++- .../model_selection/tests/test_service.py | 4 +- app/features/registry/schemas.py | 27 ++--- app/shared/model_taxonomy.py | 71 +++++++++++ app/shared/tests/__init__.py | 0 app/shared/tests/test_model_taxonomy.py | 110 ++++++++++++++++++ docs/_base/ARCHITECTURE.md | 13 ++- 12 files changed, 246 insertions(+), 119 deletions(-) create mode 100644 app/shared/model_taxonomy.py create mode 100644 app/shared/tests/__init__.py create mode 100644 app/shared/tests/test_model_taxonomy.py diff --git a/app/features/batch/service.py b/app/features/batch/service.py index ec657389..06a32597 100644 --- a/app/features/batch/service.py +++ b/app/features/batch/service.py @@ -2,8 +2,8 @@ Submits one ``batch_job`` and N ``batch_job_item`` rows in one transaction, then loops a partial-index-backed picker (``FOR UPDATE SKIP LOCKED``) and -delegates each item to ``JobService.create_job`` via a lazy in-method -import. The metrics JSONB is pinned to the exact five-key shape +delegates each item to ``JobService.create_job``. The metrics JSONB is +pinned to the exact five-key shape ``{wape, smape, mae, bias, sample_size}`` — every downstream PRP (parallel-execution, priority-queue, export-and-retry, champion-and-heatmap) consumes this shape directly. ``sample_size`` is @@ -48,8 +48,11 @@ # data_platform is the de-facto shared ORM layer (see the # data-platform-shared-orm-layer memory) — module-scope import for scope -# expansion is permitted; cross-slice *service* calls stay lazy. +# expansion is permitted. from app.features.data_platform.models import Product, SalesDaily, Store +from app.features.jobs.models import JobStatus +from app.features.jobs.schemas import JobCreate +from app.features.jobs.service import JobService if TYPE_CHECKING: from app.features.jobs.schemas import JobResponse @@ -284,15 +287,7 @@ async def _pick_next(self, db: AsyncSession, batch_id: str) -> BatchJobItem | No return (await db.execute(stmt)).scalar_one_or_none() async def _execute_item(self, db: AsyncSession, item: BatchJobItem) -> None: - """Run one item: delegate to ``JobService.create_job`` and capture metrics. - - Lazy cross-slice imports break the alembic cold-boot cycle - (precedent: ``app/features/forecasting/service.py:786-787``). - """ - from app.features.jobs.models import JobStatus - from app.features.jobs.schemas import JobCreate - from app.features.jobs.service import JobService - + """Run one item: delegate to ``JobService.create_job`` and capture metrics.""" item.status = BatchItemStatus.RUNNING.value item.started_at = datetime.now(UTC) await db.commit() diff --git a/app/features/forecasting/feature_metadata.py b/app/features/forecasting/feature_metadata.py index cbfb942b..90dac845 100644 --- a/app/features/forecasting/feature_metadata.py +++ b/app/features/forecasting/feature_metadata.py @@ -27,7 +27,16 @@ RegressionForecaster, XGBoostForecaster, ) -from app.features.forecasting.schemas import FeatureImportanceItem, ModelFamily +from app.features.forecasting.schemas import FeatureImportanceItem + +# Back-compat re-export (#268): forecasting/service.py and tests import these +# from here. ``_MODEL_FAMILY_MAP`` is re-exported because the drift-lock test +# (test_feature_metadata.py) reads it to compare against the ModelType Literal. +from app.shared.model_taxonomy import ( + _MODEL_FAMILY_MAP as _MODEL_FAMILY_MAP, # pyright: ignore[reportPrivateUsage] +) +from app.shared.model_taxonomy import ModelFamily as ModelFamily +from app.shared.model_taxonomy import model_family_for as model_family_for if TYPE_CHECKING: from app.features.forecasting.models import BaseForecaster @@ -35,44 +44,6 @@ logger = structlog.get_logger(__name__) -# Canonical map: model_type string → ModelFamily. Unknown types log a warning -# and classify as BASELINE (forward-compatible for new families before this -# map is updated). Keep in sync with the ``ModelType`` Literal in -# ``forecasting/models.py`` (line 1133-1135). -_MODEL_FAMILY_MAP: dict[str, ModelFamily] = { - "naive": ModelFamily.BASELINE, - "seasonal_naive": ModelFamily.BASELINE, - "moving_average": ModelFamily.BASELINE, - "weighted_moving_average": ModelFamily.BASELINE, - "seasonal_average": ModelFamily.BASELINE, - "trend_regression_baseline": ModelFamily.ADDITIVE, - "random_forest": ModelFamily.TREE, - "regression": ModelFamily.TREE, - "lightgbm": ModelFamily.TREE, - "xgboost": ModelFamily.TREE, - "prophet_like": ModelFamily.ADDITIVE, -} - - -def model_family_for(model_type: str) -> ModelFamily: - """Return the :class:`ModelFamily` for a given ``model_type`` string. - - Unknown types log a warning and return :attr:`ModelFamily.BASELINE` so a - new model registered in :mod:`forecasting.models` before this map is - updated does not raise — it just shows up in the dashboard as a baseline - until the map catches up. - """ - family = _MODEL_FAMILY_MAP.get(model_type) - if family is None: - logger.warning( - "forecasting.unknown_model_family", - model_type=model_type, - fallback=ModelFamily.BASELINE.value, - ) - return ModelFamily.BASELINE - return family - - class FeatureImportanceUnavailableError(ValueError): """The estimator does not expose a usable feature-importance vector. diff --git a/app/features/forecasting/schemas.py b/app/features/forecasting/schemas.py index 8dd06c98..c304bccf 100644 --- a/app/features/forecasting/schemas.py +++ b/app/features/forecasting/schemas.py @@ -10,13 +10,17 @@ import hashlib from datetime import date as date_type -from enum import Enum from typing import Literal from pydantic import BaseModel, ConfigDict, Field, field_validator, model_validator from app.shared.feature_frames import FeatureGroup +# Back-compat re-export (#268): downstream modules and tests import ModelFamily +# from this module; the redundant alias makes the re-export explicit for +# mypy/pyright/ruff. +from app.shared.model_taxonomy import ModelFamily as ModelFamily + # ============================================================================= # Model Configuration Schemas # ============================================================================= @@ -610,25 +614,11 @@ class PredictResponse(BaseModel): # ============================================================================= -# Model Family + Feature Metadata Schemas (MLZOO-D / PRP-31) +# Feature Metadata Schemas (MLZOO-D / PRP-31; ModelFamily moved to +# app/shared/model_taxonomy — #268) # ============================================================================= -class ModelFamily(str, Enum): - """Classifier for advanced-model UI surfacing. - - Derived from ``model_type``; not persisted in the DB. Surfaced on - ``RunResponse`` via a computed field and consumed by the dashboard for the - family Badge and the feature-importance panel routing. Unknown model types - classify as ``BASELINE`` (forward-compatible for new families before the - map in ``feature_metadata.py`` is updated). - """ - - BASELINE = "baseline" # naive, seasonal_naive, moving_average - TREE = "tree" # regression (HistGBR), lightgbm, xgboost - ADDITIVE = "additive" # prophet_like (Ridge pipeline) - - class FeatureImportanceItem(BaseModel): """One row of model-derived feature importance, ready for the dashboard.""" diff --git a/app/features/forecasting/service.py b/app/features/forecasting/service.py index 27cc2d1e..a4bf3b07 100644 --- a/app/features/forecasting/service.py +++ b/app/features/forecasting/service.py @@ -51,14 +51,6 @@ PredictResponse, TrainResponse, ) - -# NOTE: ``RegistryService`` / ``JobService`` and their status enums are imported -# LAZILY inside the feature-metadata methods below. Importing them at module -# scope would close a cycle with ``registry.schemas`` (which eagerly imports -# ``ModelFamily`` from the forecasting slice for the ``model_family`` computed -# field on ``RunResponse``). The explainability slice avoids the same trap by -# importing only ``registry.models`` (a read-only ORM contract); we keep the -# import-graph one-way by deferring our service-level imports. from app.features.forecasting.v2_loaders import ( assemble_v2_historical_sidecar, load_exogenous_history, @@ -68,6 +60,8 @@ load_replenishment_history, load_returns_history, ) +from app.features.registry.schemas import RunStatus +from app.features.registry.service import RegistryService from app.shared.feature_frames import ( DEFAULT_V2_GROUPS, HISTORY_TAIL_DAYS, @@ -82,6 +76,14 @@ v2_pinned_constants, ) +# NOTE: ``JobService`` and the job enums are imported LAZILY inside +# ``get_feature_metadata_for_job``: ``jobs/service.py`` lazily imports +# ``ForecastingService`` back at call time (lines ~435, ~545), so the pair is +# mutually dependent and at least one side must stay call-time-lazy to keep +# cold-boot clean. The registry imports above are EAGER since #268 moved +# ``ModelFamily`` to ``app/shared/model_taxonomy`` — registry no longer +# imports this slice. + if TYPE_CHECKING: pass @@ -963,10 +965,6 @@ async def get_feature_metadata_for_run( estimator does not expose ``feature_importances_`` (``HistGradientBoostingRegressor``). """ - # Lazy cross-slice imports — see module-level NOTE. - from app.features.registry.schemas import RunStatus - from app.features.registry.service import RegistryService - run = await RegistryService().get_run(db, run_id) if run is None: raise NotFoundError(message=f"Model run not found: {run_id}") @@ -1049,7 +1047,7 @@ async def get_feature_metadata_for_job( ``load_model_bundle`` can no longer find, or when the ``ml-*`` extra is missing at unpickle time. """ - # Lazy cross-slice imports — see module-level NOTE. + # Lazy by design — see the jobs↔forecasting NOTE below the module imports. from app.features.jobs.models import JobStatus, JobType from app.features.jobs.service import JobService diff --git a/app/features/model_selection/capabilities.py b/app/features/model_selection/capabilities.py index 5c513496..b01d7272 100644 --- a/app/features/model_selection/capabilities.py +++ b/app/features/model_selection/capabilities.py @@ -6,8 +6,8 @@ ``MODEL_FAMILY_MAP`` / labels never drift from the Python authority. Capability provenance (BACKEND-OWNED, verified 2026-06-01): -- ``family`` — ``forecasting.feature_metadata.model_family_for`` (lazy - cross-slice import inside the builder, per the slice's import discipline). +- ``family`` — ``app.shared.model_taxonomy.model_family_for`` (shared + taxonomy, #268). - ``feature_aware`` — the set whose forecasters set ``requires_features=True`` (RandomForest/Regression/LightGBM/XGBoost/ProphetLike), i.e. exactly the set ``ForecastingService.predict()`` rejects (``forecasting/service.py``). @@ -28,6 +28,7 @@ CandidateModelInfo, ModelCatalogResponse, ) +from app.shared.model_taxonomy import model_family_for # Models gated behind the matching opt-in extra (may be absent at runtime). _REQUIRES_EXTRA: frozenset[str] = frozenset({"lightgbm", "xgboost"}) @@ -130,10 +131,6 @@ def build_model_catalog() -> ModelCatalogResponse: from the module-level sets. Returns the full catalog plus the documented default candidate set. """ - # Lazy cross-slice import (mirror service.py) — avoids closing an alembic - # cold-boot import cycle through the forecasting slice. - from app.features.forecasting.feature_metadata import model_family_for - models: list[CandidateModelInfo] = [] for model_type, meta in _CATALOG.items(): feature_aware = model_type in _FEATURE_AWARE diff --git a/app/features/model_selection/service.py b/app/features/model_selection/service.py index baef2875..10220540 100644 --- a/app/features/model_selection/service.py +++ b/app/features/model_selection/service.py @@ -74,6 +74,13 @@ TrainWinnerResponse, WinnerSummary, ) +from app.features.registry.schemas import ( + AliasCreate, + RunCreate, + RunStatus, + RunUpdate, +) +from app.features.registry.service import RegistryService if TYPE_CHECKING: from app.features.backtesting.schemas import BacktestResponse @@ -1074,14 +1081,6 @@ async def promote( audit record on ``model_selection_run``. Promotion is NEVER automatic and performs NO comparison. """ - from app.features.registry.schemas import ( # lazy - AliasCreate, - RunCreate, - RunStatus, - RunUpdate, - ) - from app.features.registry.service import RegistryService # lazy - row = await self._load(db, selection_id) if not row.final_model_path or not row.trained_model_type: raise UnprocessableEntityError(message="Train the model before promoting.") diff --git a/app/features/model_selection/tests/test_service.py b/app/features/model_selection/tests/test_service.py index c60080b1..67f60a60 100644 --- a/app/features/model_selection/tests/test_service.py +++ b/app/features/model_selection/tests/test_service.py @@ -623,8 +623,10 @@ def _patch_registry(monkeypatch: pytest.MonkeyPatch) -> dict[str, AsyncMock]: create_run = AsyncMock(return_value=run_resp) update_run = AsyncMock(return_value=run_resp) create_alias = AsyncMock(return_value=alias_resp) + # Patch the binding promote() actually uses — module-scope since #268 + # promoted the registry imports out of the method body. monkeypatch.setattr( - "app.features.registry.service.RegistryService", + "app.features.model_selection.service.RegistryService", lambda: SimpleNamespace( create_run=create_run, update_run=update_run, create_alias=create_alias ), diff --git a/app/features/registry/schemas.py b/app/features/registry/schemas.py index 9ef5417b..1de40ddb 100644 --- a/app/features/registry/schemas.py +++ b/app/features/registry/schemas.py @@ -17,14 +17,11 @@ from pydantic import BaseModel, ConfigDict, Field, computed_field, field_validator -# Pydantic v2 resolves a ``@computed_field``'s return-type annotation at -# validation time, so ``ModelFamily`` must be a real runtime import here. -# To avoid the cycle this introduces with the forecasting slice (whose -# ``service.py`` imports ``RegistryService``), the forecasting slice's -# cross-slice imports of ``RegistryService`` / ``JobService`` / status enums -# are LAZY (inside the methods that use them). See -# ``app/features/forecasting/service.py`` for the matching contract. -from app.features.forecasting.schemas import ModelFamily +# ``ModelFamily`` / ``model_family_for`` live in ``app/shared/model_taxonomy`` +# (#268) so this module never imports from another feature slice. Pydantic v2 +# resolves a ``@computed_field``'s return-type annotation at schema-build time, +# so ``ModelFamily`` must be a real runtime import (never TYPE_CHECKING-gated). +from app.shared.model_taxonomy import ModelFamily, model_family_for class RunStatus(str, Enum): @@ -131,9 +128,8 @@ class RunResponse(BaseModel): ``model_family`` is a computed field derived from ``model_type`` at serialization time — no DB column, no Alembic migration, no backfill. - See ``app/features/forecasting/feature_metadata.py:model_family_for`` for - the canonical map. Unknown model types log a warning and return - ``ModelFamily.BASELINE``. + See ``app/shared/model_taxonomy.py`` for the canonical map. Unknown model + types log a warning and return ``ModelFamily.BASELINE``. """ model_config = ConfigDict(from_attributes=True, populate_by_name=True) @@ -166,14 +162,7 @@ class RunResponse(BaseModel): @computed_field # type: ignore[prop-decorator] @property def model_family(self) -> ModelFamily: - """Computed family label derived from ``model_type``. - - Imported lazily to avoid a hard cycle between - ``registry.schemas`` and ``forecasting.feature_metadata`` at module - import time. - """ - from app.features.forecasting.feature_metadata import model_family_for - + """Computed family label derived from ``model_type``.""" return model_family_for(self.model_type) @computed_field # type: ignore[prop-decorator] diff --git a/app/shared/model_taxonomy.py b/app/shared/model_taxonomy.py new file mode 100644 index 00000000..7e63020c --- /dev/null +++ b/app/shared/model_taxonomy.py @@ -0,0 +1,71 @@ +"""Model-family taxonomy shared across slices (#268). + +``ModelFamily`` + ``model_family_for`` moved here from the forecasting slice +(``forecasting/schemas.py:617`` / ``forecasting/feature_metadata.py:38``): +``registry.schemas`` needs ``ModelFamily`` at module scope for the +``RunResponse.model_family`` computed field, and while the enum lived inside +a feature slice that one eager import forced lazy-import workarounds across +the registry boundary (the forecasting↔registry alembic cold-boot cycle). +A neutral ``app/shared`` home keeps the import graph one-way: +``app/features/* → app/shared`` only. +""" + +from __future__ import annotations + +from enum import Enum + +from app.core.logging import get_logger + +logger = get_logger(__name__) + + +class ModelFamily(str, Enum): + """Classifier for advanced-model UI surfacing. + + Derived from ``model_type``; not persisted in the DB. Surfaced on + ``RunResponse`` via a computed field and consumed by the dashboard for the + family Badge and the feature-importance panel routing. Unknown model types + classify as ``BASELINE`` (forward-compatible for new families before the + map below is updated). + """ + + BASELINE = "baseline" # naive, seasonal_naive, moving_average + TREE = "tree" # regression (HistGBR), lightgbm, xgboost + ADDITIVE = "additive" # prophet_like (Ridge pipeline) + + +# Canonical map: model_type string → ModelFamily. Unknown types log a warning +# and classify as BASELINE. Keep in sync with the ``ModelType`` Literal in +# ``app/features/forecasting/models.py``. +_MODEL_FAMILY_MAP: dict[str, ModelFamily] = { + "naive": ModelFamily.BASELINE, + "seasonal_naive": ModelFamily.BASELINE, + "moving_average": ModelFamily.BASELINE, + "weighted_moving_average": ModelFamily.BASELINE, + "seasonal_average": ModelFamily.BASELINE, + "trend_regression_baseline": ModelFamily.ADDITIVE, + "random_forest": ModelFamily.TREE, + "regression": ModelFamily.TREE, + "lightgbm": ModelFamily.TREE, + "xgboost": ModelFamily.TREE, + "prophet_like": ModelFamily.ADDITIVE, +} + + +def model_family_for(model_type: str) -> ModelFamily: + """Return the :class:`ModelFamily` for a given ``model_type`` string. + + Unknown types log a warning and return :attr:`ModelFamily.BASELINE` so a + new model registered in :mod:`forecasting.models` before this map is + updated does not raise — it just shows up in the dashboard as a baseline + until the map catches up. + """ + family = _MODEL_FAMILY_MAP.get(model_type) + if family is None: + logger.warning( + "forecasting.unknown_model_family", + model_type=model_type, + fallback=ModelFamily.BASELINE.value, + ) + return ModelFamily.BASELINE + return family diff --git a/app/shared/tests/__init__.py b/app/shared/tests/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/app/shared/tests/test_model_taxonomy.py b/app/shared/tests/test_model_taxonomy.py new file mode 100644 index 00000000..bf241d0f --- /dev/null +++ b/app/shared/tests/test_model_taxonomy.py @@ -0,0 +1,110 @@ +"""Tests for the shared model-family taxonomy (#268). + +Locks the canonical mapping, the back-compat re-export identity across the +legacy forecasting import paths, the JSON-schema contract on +``RunResponse.model_family``, and — the spec for this bug class — fresh- +interpreter cold-boot probes. pytest's in-process import order masks +cross-slice import cycles (forecasting usually loads before registry), so a +subprocess importing registry FIRST (the ``alembic/env.py`` shape) is the +only honest in-suite check. +""" + +from __future__ import annotations + +import subprocess +import sys + +import pytest + +from app.shared.model_taxonomy import ModelFamily, model_family_for + +# --------------------------------------------------------------------------- +# model_family_for — canonical mapping (mirrors the legacy suite in +# app/features/forecasting/tests/test_feature_metadata.py, via the new path) +# --------------------------------------------------------------------------- + + +def test_model_family_for_maps_baseline_types_to_baseline() -> None: + for mt in ( + "naive", + "seasonal_naive", + "moving_average", + "weighted_moving_average", + "seasonal_average", + ): + assert model_family_for(mt) == ModelFamily.BASELINE + + +def test_model_family_for_maps_tree_types_to_tree() -> None: + for mt in ("regression", "lightgbm", "xgboost", "random_forest"): + assert model_family_for(mt) == ModelFamily.TREE + + +def test_model_family_for_maps_additive_types_to_additive() -> None: + for mt in ("prophet_like", "trend_regression_baseline"): + assert model_family_for(mt) == ModelFamily.ADDITIVE + + +def test_model_family_for_unknown_returns_baseline() -> None: + """An unknown model_type logs a warning and degrades to BASELINE.""" + assert model_family_for("future_arima_v9") == ModelFamily.BASELINE + + +# --------------------------------------------------------------------------- +# Back-compat re-exports — OBJECT IDENTITY across the legacy paths (#268). +# Enum members are str-valued, so == would pass even across distinct class +# objects; the `is` checks in forecasting/service.py demand identity. +# --------------------------------------------------------------------------- + + +def test_legacy_import_paths_return_the_same_objects() -> None: + from app.features.forecasting.feature_metadata import ( + ModelFamily as FMetaMF, + ) + from app.features.forecasting.feature_metadata import ( + model_family_for as legacy_fn, + ) + from app.features.forecasting.schemas import ModelFamily as SchemasMF + + assert SchemasMF is ModelFamily + assert FMetaMF is ModelFamily + assert legacy_fn is model_family_for + + +# --------------------------------------------------------------------------- +# JSON-schema lock — the move is contract-invariant (title from the class +# name, members from the values; both unchanged by relocation). +# --------------------------------------------------------------------------- + + +def test_run_response_model_family_json_schema_unchanged() -> None: + from app.features.registry.schemas import RunResponse + + schema = RunResponse.model_json_schema(mode="serialization") + definition = schema["$defs"]["ModelFamily"] + assert definition["title"] == "ModelFamily" + assert definition["enum"] == ["baseline", "tree", "additive"] + + +# --------------------------------------------------------------------------- +# Cold-boot probes — fresh interpreters, worst-case entry orders. The +# alembic-shape probe (registry first) is the one that crashed at PRP-31. +# --------------------------------------------------------------------------- + + +@pytest.mark.parametrize( + "stmt", + [ + "from app.features.registry import models", # alembic env.py shape + "import app.features.forecasting", # forecasting-first entry + "import app.main", # full wiring + ], +) +def test_cold_boot_import_probe(stmt: str) -> None: + result = subprocess.run( # noqa: S603 — internal command, trusted args + [sys.executable, "-c", stmt], + capture_output=True, + text=True, + timeout=120, + ) + assert result.returncode == 0, result.stderr diff --git a/docs/_base/ARCHITECTURE.md b/docs/_base/ARCHITECTURE.md index 01804385..6a2b1d95 100644 --- a/docs/_base/ARCHITECTURE.md +++ b/docs/_base/ARCHITECTURE.md @@ -87,10 +87,15 @@ When a feature slice needs to call a service method or read a schema from a Existing precedents: - `app/features/explainability/service.py:57` — read-only `ModelRun` import -- `app/features/forecasting/service.py` — lazy `RegistryService` / `JobService` / - `RunStatus` imports inside `get_feature_metadata_for_*` methods (added by - PRP-31; required because `RunResponse.model_family` computed_field closes - the cycle at alembic cold-boot) +- `app/features/forecasting/service.py` ↔ `app/features/jobs/service.py` — mutually lazy + service pair (each lazily imports the other at call time; at least one side must stay + lazy). The former ModelFamily-driven cycle here was RESOLVED by #268. + +Resolved cycle (reference case): +- #268 moved `ModelFamily` + `model_family_for` to `app/shared/model_taxonomy.py`; the + registry→forecasting eager edge disappeared and the registry-related lazy imports were + promoted to module scope. When a cycle is caused by a shared *type*, relocate the type + to `app/shared/` instead of adding lazy imports. ## Deployment Flow (Causal Chain) From 1f4524992ba8cd53b27a642900b244db0b1922db Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Thu, 11 Jun 2026 23:33:34 +0200 Subject: [PATCH 19/44] docs(repo): track reliability E4 prp for shared model taxonomy (#268) --- ...RP-reliability-E4-shared-model-taxonomy.md | 665 ++++++++++++++++++ 1 file changed, 665 insertions(+) create mode 100644 PRPs/PRP-reliability-E4-shared-model-taxonomy.md diff --git a/PRPs/PRP-reliability-E4-shared-model-taxonomy.md b/PRPs/PRP-reliability-E4-shared-model-taxonomy.md new file mode 100644 index 00000000..1ff06df6 --- /dev/null +++ b/PRPs/PRP-reliability-E4-shared-model-taxonomy.md @@ -0,0 +1,665 @@ +name: "PRP — Reliability E4: move ModelFamily to app/shared and retire the forecasting↔registry lazy-import workarounds" +description: | + Parallel epic of umbrella #380 (platform reliability hardening), after Foundation E1 (#334). + Issue: #268 · Branch: `refactor/shared-model-taxonomy` off `dev` · Commit scope: `forecast,registry` + (backend-only — zero frontend changes, zero DB migration, zero API contract change). + +--- + +## Goal + +`ModelFamily` (enum) and `model_family_for()` (its canonical model_type→family map) live in the +forecasting slice today. `app/features/registry/schemas.py:27` must import `ModelFamily` **eagerly** +(Pydantic v2 resolves a `@computed_field`'s return-type annotation when it builds the serialization +schema), which is a direct violation of the vertical-slice invariant (`app/features/X` may NOT +import `app/features/Y`) and the proximate cause of the documented lazy-import workaround web +(6 mapped sites, NOTE comments in 4 files, and the alembic cold-boot trap that bit PRP-31). + +**Deliverable:** +1. A new neutral module `app/shared/model_taxonomy.py` owning `ModelFamily`, `_MODEL_FAMILY_MAP`, + and `model_family_for()` (moved verbatim, logger via `app.core.logging.get_logger`). +2. Back-compat re-exports from the two old homes (`forecasting/schemas.py`, + `forecasting/feature_metadata.py`) using the redundant-alias idiom (`import X as X`) so every + existing importer — including both untouched test suites — keeps resolving under + `mypy --strict` (no-implicit-reexport). +3. The 6 mapped lazy-import sites resolved per the verdict table below: 5 retired (promoted to + module scope), 1 kept-but-re-documented (the genuinely mutual jobs↔forecasting pair). +4. A new test suite `app/shared/tests/test_model_taxonomy.py` that locks the mapping, the enum + identity across re-export paths, the JSON-schema contract, and — the spec for this bug class — + subprocess **cold-boot probes** that import the worst-case entry points in fresh interpreters + (pytest's in-process import order masks these cycles; a subprocess is the only honest in-suite check). +5. `docs/_base/ARCHITECTURE.md` cross-slice pattern section updated: the stale forecasting + precedent replaced, this refactor recorded as the "resolved cycle" example. + +**Success definition (umbrella #380 acceptance row):** `ModelFamily` imports resolve from +`app/shared/`; **zero** lazy-import NOTE comments reference the forecasting↔registry ModelFamily +cycle; `alembic upgrade head` cold-boots clean (CI migration-check + local smoke); all five +validation gates green; both pre-existing test suites (`test_feature_metadata.py`, +`test_schemas.py`) pass **untouched**. + +## Why + +- **The cycle is real and armed.** Verified 2026-06-11: `alembic/env.py:23` does + `from app.features.registry import models` → `registry/__init__.py` eagerly imports + `registry.schemas` → `registry/schemas.py:27` imports `app.features.forecasting.schemas`, + which first executes `forecasting/__init__.py` → which eagerly imports `forecasting/service.py`. + If anyone promotes `forecasting/service.py`'s registry imports to module scope today, + `registry.service:29` re-enters the **partially initialized** `registry.schemas` → + `ImportError` at alembic cold-boot. PRP-31 hit exactly this; the lazy imports were the patch, + not the fix (memory anchor: `[[computed-field-cross-slice-cycle]]`, tracking issue #260). +- **The workaround metastasized.** The discipline got cargo-culted into model_selection and batch + ("mirror service.py", "precedent: forecasting/service.py:786-787" — a line number that is + already stale), so every new slice pays a tax for a cycle only one edge causes. +- **Umbrella #380 acceptance** explicitly lists: "`ModelFamily` imports resolve from `app/shared/`; + zero lazy-import NOTE comments reference the forecasting↔registry ModelFamily cycle; `alembic + upgrade head` cold-boots clean in CI migration-check (#268 closed)". +- **Origin:** CodeRabbit #3270767554 on PR #266 flagged the slice-invariant violation; issue #268 + deferred it behind this PRP. + +## What + +### The cycle topology (verified, this is the whole bug) + +``` +alembic/env.py:23 ──► from app.features.registry import models + └─► app/features/registry/__init__.py (eagerly imports schemas + service + storage) + └─► registry/schemas.py:27 from app.features.forecasting.schemas import ModelFamily + └─► app/features/forecasting/__init__.py (eagerly imports forecasting.service!) + └─► forecasting/service.py + └─► [IF EAGER] registry.service:29 → from registry.schemas import ... + └─► registry.schemas is PARTIALLY INITIALIZED (still on line 27) + └─► ImportError: cannot import name ... from partially + initialized module ← the alembic cold-boot crash +``` + +Post-move: `registry/schemas.py` imports only `app.shared.model_taxonomy` → the +registry→forecasting edge disappears entirely → every chain terminates in `app/shared` / +`app/core` → `forecasting/service.py` may import `RegistryService` eagerly. Verified by tracing +every entry order (alembic-first, forecasting-first, main) against the module-scope import lists +of all involved slices (see verdict table). + +### The 6 mapped import sites — verdicts + +| # | Site | Today | Verdict | Why | +|---|------|-------|---------|-----| +| S1 | `registry/schemas.py:27` (eager `ModelFamily` from forecasting) + `:175` (lazy `model_family_for` inside the computed field) | THE cycle-causing edge | **RETIRE** — both become module-scope imports from `app.shared.model_taxonomy` | Removes the only registry→forecasting edge | +| S2 | `forecasting/service.py:967-968` — lazy `RunStatus` + `RegistryService` inside `get_feature_metadata_for_run` | workaround for S1 | **RETIRE** — promote to module scope | Post-move chain: forecasting.service → registry pkg `__init__` → schemas (shared-only) + service + storage → terminates. Verified no back-edge. | +| S3 | `forecasting/service.py:1053-1054` — lazy `JobStatus`/`JobType`/`JobService` inside `get_feature_metadata_for_job` | lumped into the same NOTE | **KEEP lazy, REWRITE comment** | Different cycle: `jobs/service.py:435` and `:545` lazily import `ForecastingService` back (call-time). The pair is mutually dependent — at least one side must stay lazy forever. NOT broken by this refactor; the comment must stop blaming ModelFamily. | +| S4 | `model_selection/capabilities.py:135` — lazy `model_family_for` inside `build_model_catalog` | "mirror service.py" | **RETIRE** — module-scope `from app.shared.model_taxonomy import model_family_for` | Pure shared import; also update the module-docstring provenance note (lines 9-10). | +| S5 | `model_selection/service.py:1077-1083` — lazy `AliasCreate`/`RunCreate`/`RunStatus`/`RunUpdate` + `RegistryService` inside `promote_champion_selection` | defensive mirror | **RETIRE** — promote to module scope | Verified: nothing in registry imports model_selection; `model_selection/models.py` imports only core+shared (alembic path clean); `model_selection/__init__.py` is docstring-only. Chain terminates post-move. No name collisions at module scope (checked). | +| S6 | `batch/service.py:292-294` — lazy `JobStatus`/`JobCreate`/`JobService` inside `_execute_item` | comment cites "alembic cold-boot cycle" + stale precedent line | **RETIRE** — promote to module scope | Verified: nothing anywhere imports batch except `app/main.py` (routes); jobs slice has ZERO eager cross-slice imports; jobs never imports batch. Permanently safe. The stale comment goes away with it. | + +Counterpart sites that stay lazy **by design** (out of the 6, do not touch the code): +`jobs/service.py:435,545` (lazy `ForecastingService` — the other half of the S3 pair) and +`model_selection/service.py:380,396,916,972,1036` (lazy `ForecastingService` — heavy module, +different dependency; eagerizing is possible post-move but is pointless churn here). + +### Out of scope + +- **No API change.** `ModelFamily`'s JSON schema derives from the class name and values + (`title: "ModelFamily"`, `enum: ["baseline","tree","additive"]`) — verified unchanged by a move + (see Gotcha 6). `RunResponse.model_family` keeps its shape; the OpenAPI *description* string + changes because the stale lazy-import docstring is rewritten (descriptions leak docstrings) — + harmless, documented here so nobody chases it. +- **No DB migration** — `ModelFamily` is never persisted (computed field; not in any ORM model). +- **No pickle/artifact concern** — verified `ModelFamily` instances never enter joblib bundles + (`grep -rn "ModelFamily" app/features/forecasting/persistence.py` → no hits), so relocating the + class cannot break unpickling of existing artifacts. +- **The data_platform cross-slice ORM imports** — sibling CodeRabbit flag, tracked separately + (memory anchor `[[data-platform-shared-orm-layer]]`); do NOT touch. +- **No ADR.** Issue #268 said "likely an ADR if the chosen module isn't an obvious fit" — + `app/shared/` is the documented home for cross-cutting code (AGENTS.md § Architecture; + precedent: `app/shared/feature_frames/` already owns the cross-slice `FeatureGroup`). Obvious fit. +- **jobs↔forecasting mutual dependency** — a real design wart; needs an interface/abstraction PRP + of its own if it ever matters. This PRP only re-documents it honestly. + +### Success Criteria + +- [ ] `app/shared/model_taxonomy.py` exists; `from app.shared.model_taxonomy import ModelFamily, + model_family_for` works; mapping behavior byte-identical (same 11 entries, same BASELINE + fallback + warning event `forecasting.unknown_model_family`) +- [ ] `from app.features.forecasting.schemas import ModelFamily` and + `from app.features.forecasting.feature_metadata import model_family_for` still resolve and + return **the same objects** (`is` identity) — both pre-existing test suites pass untouched +- [ ] `grep -rn "from app.features.forecasting" app/features/registry/` → zero hits +- [ ] Zero comments in `app/` reference the ModelFamily/registry cycle as a reason for laziness + (`grep -rn "ModelFamily" app/ --include="*.py" | grep -i "lazy\|cycle"` → only + model_taxonomy.py's own docstring narrating the history) +- [ ] S2/S4/S5/S6 imports are at module scope; S3 stays lazy with a comment naming + `jobs/service.py:435,545` as the counterpart +- [ ] Cold-boot probes green: fresh-interpreter `from app.features.registry import models` + (alembic shape), `import app.features.forecasting`, `import app.main`; plus + `uv run alembic upgrade head` against the local DB +- [ ] All gates green: ruff + ruff format, mypy --strict, pyright --strict, pytest unit; + targeted integration tests pass + +## All Needed Context + +### Documentation & References + +```yaml +# MUST READ — why each mechanism behaves the way it does +- url: https://docs.python.org/3/reference/import.html#regular-packages + why: Importing app.features.forecasting.schemas FIRST executes the package __init__.py — + this is what arms the cycle (forecasting/__init__ eagerly imports service.py). Any + "module X only imports Y" reasoning must account for package __init__ side effects. + +- url: https://mypy.readthedocs.io/en/stable/config_file.html#confval-no_implicit_reexport + why: pyproject sets mypy strict=true which enables no_implicit_reexport — a plain + `from shared import ModelFamily` in forecasting/schemas.py would NOT legally re-export; + downstream `from forecasting.schemas import ModelFamily` would be a mypy error. + The redundant alias `import X as X` is the sanctioned re-export idiom (also recognized + by pyright and by ruff F401). + +- url: https://docs.pydantic.dev/latest/concepts/fields/#the-computed_field-decorator + why: The computed_field return annotation is resolved when Pydantic builds the model's + serialization schema — ModelFamily must be a REAL runtime import in registry/schemas.py + (the existing comment at lines 20-26 says exactly this; keep that fact in the new comment). + +# MUST READ — codebase files (the mutation surface) +- file: app/features/forecasting/schemas.py + why: Lines 612-629 — section header + ModelFamily class to DELETE (move). Line 677 + FeatureMetadataResponse.model_family keeps using the name. Head shows import style + + `from __future__ import annotations`. 724 lines total. + +- file: app/features/forecasting/feature_metadata.py + why: Lines 38-73 — _MODEL_FAMILY_MAP + model_family_for() to DELETE (move verbatim). + Line 30 imports FeatureImportanceItem + ModelFamily from schemas (ModelFamily becomes + re-export only — nothing else in this file uses it after the move). Line 35 shows the + feature-module logger idiom (structlog.get_logger); the shared module uses the + app.core.logging.get_logger idiom instead. 232 lines total. + +- file: app/features/registry/schemas.py + why: Lines 9 (from __future__ import annotations), 20-27 (the cycle comment + eager import + to REPLACE), 129-137 (RunResponse docstring citing the feature_metadata path — update), + 166-177 (the computed field + lazy import to simplify). + +- file: app/features/forecasting/service.py + why: Lines 34-53 (module import block — where the promoted registry imports land, isort-sorted), + 55-61 (the NOTE block to REWRITE — it currently blames ModelFamily), 966-968 (S2 lazy + imports to delete), 974/984-985 (RunStatus/model_family_for usage that keeps working), + 1052-1054 (S3 — KEEP, new comment). + +- file: app/features/model_selection/capabilities.py + why: Lines 1-22 module docstring (provenance bullet at 9-10 says "lazy cross-slice import …" — + update), 133-135 (S4 lazy import + comment to delete; promote to module scope next to the + existing `from app.features.model_selection.schemas import …` block). + +- file: app/features/model_selection/service.py + why: Lines 28-60 module import block (promotion target), 1077-1083 (S5 lazy imports to delete). + Names are used at 1095-1138 — no module-scope collisions (verified). + +- file: app/features/batch/service.py + why: Lines 286-295 (S6 — docstring with the stale "alembic cold-boot cycle / precedent + forecasting/service.py:786-787" claim + lazy imports). Promote imports; fix docstring. + +- file: app/shared/feature_frames/__init__.py + why: The house precedent for a shared package consumed cross-slice (FeatureGroup already + crosses slices from here). model_taxonomy.py follows the same "leaf-level shared module, + imported explicitly by path" style. Do NOT add to app/shared/__init__.py (feature_frames + isn't there either). + +- file: app/features/scenarios/feature_frame.py + why: Lines 41-94 — the house precedent for back-compat re-exports after relocating symbols + (re-export + explanatory comment). This PRP uses the redundant-alias variant because + forecasting/schemas.py has no __all__ today and only two names move. + +- file: app/shared/feature_frames/tests/test_contract.py + why: Shared-package test conventions (tests/ subdir WITH __init__.py — verified all tests + dirs in this repo have one; plain functions; S101 et al. ignored via per-file-ignores). + +- file: tests/test_e2e_demo.py + why: Lines 102, 152 — the house pattern for subprocess in tests: + `subprocess.run(...) # noqa: S603 — internal command, trusted args` (ruff S is enabled; + S603 fires on subprocess use; this is the sanctioned suppression). + +- file: app/features/forecasting/tests/test_feature_metadata.py + why: Lines 75-108 — the 6 existing model_family_for tests. They import via the OLD paths and + MUST stay untouched and green (they are the back-compat proof). Mirror their style for + the new shared suite. + +- file: app/features/registry/tests/test_schemas.py + why: Line 9 imports ModelFamily from forecasting.schemas; the TestRunResponseModelFamily class + (~lines 389-450) exercises the computed field. Untouched + green = back-compat proof #2. + +- file: docs/_base/ARCHITECTURE.md + why: Section "### Cross-slice read-only import pattern" (line 71); the "Existing precedents" + bullet at lines 88-93 cites the forecasting lazy imports as "required because + RunResponse.model_family computed_field closes the cycle at alembic cold-boot" — stale + after this PRP; rewrite per Task 10. + +- file: PRPs/PRP-reliability-E3-safe-uuid-non-secure-context.md + why: House format for the reliability-epic PRP series (this file mirrors its structure). +``` + +### Current Codebase tree (relevant slice) + +```bash +app/ +├── core/logging.py # get_logger(name: str|None) -> FilteringBoundLogger +├── shared/ +│ ├── __init__.py # exports ErrorResponse/Paginated*/TimestampMixin — DO NOT TOUCH +│ ├── feature_frames/ # precedent: shared leaf package w/ own tests/ +│ │ └── tests/__init__.py # tests dirs have __init__.py in this repo +│ └── (no tests/ at shared top level — created by this PRP) +├── features/ +│ ├── forecasting/ +│ │ ├── __init__.py # EAGERLY imports service.py ← arms the cycle +│ │ ├── schemas.py # :617-629 ModelFamily def (DELETE→move); :677 uses it +│ │ ├── feature_metadata.py # :38-73 map+fn (DELETE→move); :30 imports from schemas +│ │ └── service.py # :55-61 NOTE; :967-968 S2; :1053-1054 S3 +│ ├── registry/ +│ │ ├── __init__.py # EAGERLY imports schemas+service ← alembic entry +│ │ └── schemas.py # :20-27 eager import S1a; :175 lazy S1b +│ ├── model_selection/ +│ │ ├── __init__.py # docstring-only (safe) +│ │ ├── capabilities.py # :133-135 S4 +│ │ └── service.py # :1077-1083 S5 +│ ├── batch/service.py # :289-294 S6 (stale comment) +│ └── jobs/service.py # :435,:545 lazy ForecastingService — S3's counterpart, DO NOT TOUCH +└── alembic/env.py # :14-24 imports all models via packages → runs registry/__init__ +``` + +### Desired Codebase tree + +```bash +app/shared/model_taxonomy.py # NEW — ModelFamily + _MODEL_FAMILY_MAP + model_family_for +app/shared/tests/__init__.py # NEW — empty (repo convention) +app/shared/tests/test_model_taxonomy.py # NEW — mapping + identity + schema-lock + cold-boot probes +app/features/forecasting/schemas.py # MODIFIED — class deleted, re-export added +app/features/forecasting/feature_metadata.py # MODIFIED — map+fn deleted, re-exports added +app/features/forecasting/service.py # MODIFIED — S2 promoted, NOTE rewritten, S3 re-commented +app/features/registry/schemas.py # MODIFIED — S1 shared imports, comment+docstrings updated +app/features/model_selection/capabilities.py # MODIFIED — S4 promoted, docstring updated +app/features/model_selection/service.py # MODIFIED — S5 promoted +app/features/batch/service.py # MODIFIED — S6 promoted, stale docstring claim removed +docs/_base/ARCHITECTURE.md # MODIFIED — precedents bullet rewritten +``` + +### Known Gotchas & Library Quirks + +```python +# GOTCHA 1 — package __init__ eagerness is the trigger, not the leaf imports. +# registry/__init__.py imports schemas+service+storage; forecasting/__init__.py imports service; +# jobs/__init__.py imports routes+service. `from app.features.registry import models` therefore +# loads the ENTIRE registry slice. Never reason "alembic only imports models.py". + +# GOTCHA 2 — pytest MASKS this cycle class. The conftest import order usually loads forecasting +# before registry, so an in-process test passes against broken code. Only a FRESH interpreter +# importing registry first (the alembic shape) reproduces. Hence the subprocess probes in the new +# suite AND the mandatory step-3.5 cold-boot smoke (uv run python -c "import app.main" + +# uv run alembic upgrade head) BEFORE Level-1 validation. Memory: [[computed-field-cross-slice-cycle]]. + +# GOTCHA 3 — mypy strict ⇒ no_implicit_reexport. A plain `from app.shared.model_taxonomy import +# ModelFamily` inside forecasting/schemas.py does NOT re-export it; registry tests importing +# ModelFamily from forecasting.schemas would fail mypy with attr-defined/no-redef noise. +# Use the redundant alias: `from app.shared.model_taxonomy import ModelFamily as ModelFamily`. +# ruff F401 also honors that idiom (no unused-import finding even when the name is otherwise +# unused, as in feature_metadata.py post-move); pylint's useless-import-alias is NOT enabled +# (ruff select has no PL group). If F401 fires anyway, fall back to `# noqa: F401` with a comment. + +# GOTCHA 4 — Pydantic v2 needs ModelFamily at RUNTIME in registry/schemas.py even though the +# file has `from __future__ import annotations`: the computed_field return annotation is resolved +# when the serialization schema is built. Keep a module-scope runtime import (now from shared) — +# do NOT move it under TYPE_CHECKING. + +# GOTCHA 5 — re-export must preserve OBJECT IDENTITY. forecasting.schemas.ModelFamily must BE +# app.shared.model_taxonomy.ModelFamily (same class object), or enum members would compare equal +# (str values) but `is` checks like `model_family_for(t) is ModelFamily.BASELINE` +# (forecasting/service.py:974) would silently break across paths. The new suite asserts identity. + +# GOTCHA 6 — JSON schema is move-invariant. Verified 2026-06-11: +# uv run python -c " +# from app.features.registry.schemas import RunResponse +# s = RunResponse.model_json_schema(mode='serialization') +# print(s['$defs']['ModelFamily'])" +# → {"enum": ["baseline", "tree", "additive"], "title": "ModelFamily", "type": "string", ...} +# Title comes from the class NAME, members from values — both unchanged. BUT the field +# `description` embeds the property docstring, so rewriting the lazy-import docstring changes the +# OpenAPI description text (cosmetic). Lock enum+title in the new suite; ignore description. + +# GOTCHA 7 — ModelFamily is NOT pickled. Verified: +# grep -rn "ModelFamily" app/features/forecasting/persistence.py → no hits +# so old joblib bundles keep loading (pickle stores module paths — a pickled enum WOULD have +# broken). Re-verify with the same grep if persistence.py changed since 2026-06-11. + +# GOTCHA 8 — keep the structlog event name "forecasting.unknown_model_family" verbatim in the +# moved function. It is a log contract (greppable); the module moved, the event didn't. +# Logger idiom in app/shared: `from app.core.logging import get_logger` then +# `logger = get_logger(__name__)` (precedent app/shared/seeder/core.py:15,48) — NOT the +# feature-module `structlog.get_logger` idiom. + +# GOTCHA 9 — ruff S603 fires on subprocess in the new test. Use the house suppression +# (tests/test_e2e_demo.py:152): `subprocess.run(...) # noqa: S603 — internal command, trusted args` +# and pass [sys.executable, "-c", stmt] (list form, no shell). + +# GOTCHA 10 — isort (ruff I001) will want the promoted imports sorted inside the +# app-imports block. In forecasting/service.py the NOTE block (lines 55-61) currently sits in the +# MIDDLE of the import block (between schemas and v2_loaders) — when editing, move the rewritten +# NOTE to AFTER the import block ends (line ~80) or ruff format/I001 placement gets ugly. +# Run `uv run ruff check --fix . && uv run ruff format .` after each file edit. + +# GOTCHA 11 — the cold-boot probes spawn fresh interpreters that import pandas/sklearn (~2-4 s +# each). Cap at 3 probes (alembic shape, forecasting-first, app.main) — ~10 s added to the unit +# suite. Do NOT parametrize over all 19 slices. + +# GOTCHA 12 — repo has mixed CRLF/LF line endings with no policy. New files are fine (LF); for +# the 8 modified files check `git diff --stat` shows surgical hunks, not whole-file rewrites. + +# GOTCHA 13 — the working tree may carry an unrelated dirty `uv.lock` and a local-only +# `docker-compose.lan.yml`. Do NOT sweep either into this branch. +``` + +## Implementation Blueprint + +### Data models and structure + +No ORM, no migration. One moved enum + map + function — complete spec: + +```python +# app/shared/model_taxonomy.py — NEW (move, not rewrite; bodies verbatim from the old homes) +"""Model-family taxonomy shared across slices (#268). + +``ModelFamily`` + ``model_family_for`` moved here from the forecasting slice +(``forecasting/schemas.py:617`` / ``forecasting/feature_metadata.py:38``): +``registry.schemas`` needs ``ModelFamily`` at module scope for the +``RunResponse.model_family`` computed field, and while the enum lived inside +a feature slice that one eager import forced lazy-import workarounds across +the registry boundary (the forecasting↔registry alembic cold-boot cycle). +A neutral ``app/shared`` home keeps the import graph one-way: +``app/features/* → app/shared`` only. +""" + +from __future__ import annotations + +from enum import Enum + +from app.core.logging import get_logger + +logger = get_logger(__name__) + + +class ModelFamily(str, Enum): + """Classifier for advanced-model UI surfacing. + + Derived from ``model_type``; not persisted in the DB. Surfaced on + ``RunResponse`` via a computed field and consumed by the dashboard for the + family Badge and the feature-importance panel routing. Unknown model types + classify as ``BASELINE`` (forward-compatible for new families before the + map below is updated). + """ + + BASELINE = "baseline" # naive, seasonal_naive, moving_average + TREE = "tree" # regression (HistGBR), lightgbm, xgboost + ADDITIVE = "additive" # prophet_like (Ridge pipeline) + + +# Canonical map: model_type string → ModelFamily. Unknown types log a warning +# and classify as BASELINE. Keep in sync with the ``ModelType`` Literal in +# ``app/features/forecasting/models.py``. +_MODEL_FAMILY_MAP: dict[str, ModelFamily] = { + # ... the 11 entries VERBATIM from feature_metadata.py:42-54 ... +} + + +def model_family_for(model_type: str) -> ModelFamily: + # ... body VERBATIM from feature_metadata.py:57-73, event name + # "forecasting.unknown_model_family" KEPT (Gotcha 8) ... +``` + +### Tasks (in order) + +```yaml +Task 1: +CREATE app/shared/model_taxonomy.py: + - CONTENT: exactly the spec above; map entries + function body moved VERBATIM from + app/features/forecasting/feature_metadata.py:42-73; enum moved VERBATIM from + app/features/forecasting/schemas.py:617-629 (docstring's "the map in feature_metadata.py" + → "the map below") + - Logger via app.core.logging.get_logger (Gotcha 8) + +Task 2: +MODIFY app/features/forecasting/schemas.py: + - DELETE lines 617-629 (class ModelFamily) and trim the section-header comment at 612-614 to + "Feature Metadata Schemas (MLZOO-D / PRP-31; ModelFamily moved to app/shared/model_taxonomy — #268)" + - ADD after the `from app.shared.feature_frames import FeatureGroup` import (line 18): + # Back-compat re-export (#268): downstream modules and tests import ModelFamily from this + # module; the redundant alias makes the re-export explicit for mypy/pyright/ruff. + from app.shared.model_taxonomy import ModelFamily as ModelFamily + - PRESERVE FeatureImportanceItem + FeatureMetadataResponse untouched (they stay here; + FeatureMetadataResponse.model_family keeps the ModelFamily annotation, now via re-export) + +Task 3: +MODIFY app/features/forecasting/feature_metadata.py: + - DELETE lines 38-73 (_MODEL_FAMILY_MAP + model_family_for) + - REPLACE line 30 `from app.features.forecasting.schemas import FeatureImportanceItem, ModelFamily` + with: + from app.features.forecasting.schemas import FeatureImportanceItem + # Back-compat re-export (#268): forecasting/service.py and tests import these from here. + from app.shared.model_taxonomy import ModelFamily as ModelFamily + from app.shared.model_taxonomy import model_family_for as model_family_for + - VERIFY nothing else in the file references _MODEL_FAMILY_MAP (it doesn't — verified) + - Module docstring: no change needed (it describes importance extraction, which stays) + +Task 4: +MODIFY app/features/registry/schemas.py: + - REPLACE lines 20-27 (comment block + eager forecasting import) with: + # ``ModelFamily`` / ``model_family_for`` live in ``app/shared/model_taxonomy`` (#268) so + # this module never imports from another feature slice. Pydantic v2 resolves a + # ``@computed_field``'s return-type annotation at schema-build time, so ``ModelFamily`` + # must be a real runtime import (never TYPE_CHECKING-gated). + from app.shared.model_taxonomy import ModelFamily, model_family_for + - SIMPLIFY the computed field (lines 166-177): delete the in-method import and the + "Imported lazily…" docstring paragraph → + @computed_field # type: ignore[prop-decorator] + @property + def model_family(self) -> ModelFamily: + """Computed family label derived from ``model_type``.""" + return model_family_for(self.model_type) + - UPDATE RunResponse class docstring (lines 132-137): the canonical-map pointer + "app/features/forecasting/feature_metadata.py:model_family_for" → "app/shared/model_taxonomy.py" + +Task 5: +MODIFY app/features/forecasting/service.py: + - ADD to the module import block (isort position — `registry` sorts AFTER `forecasting.*`, + i.e. between the v2_loaders import and the app.shared.feature_frames import; let + `uv run ruff check --fix .` settle the exact slot): + from app.features.registry.schemas import RunStatus + from app.features.registry.service import RegistryService + - DELETE lines 966-968 (the "# Lazy cross-slice imports — see module-level NOTE." comment + + both imports inside get_feature_metadata_for_run) + - REWRITE the NOTE block (lines 55-61) and MOVE it below the import block (Gotcha 10): + # NOTE: ``JobService`` and the job enums are imported LAZILY inside + # ``get_feature_metadata_for_job``: ``jobs/service.py`` lazily imports + # ``ForecastingService`` back at call time (lines ~435, ~545), so the pair is mutually + # dependent and at least one side must stay call-time-lazy to keep cold-boot clean. + # The registry imports above are EAGER since #268 moved ``ModelFamily`` to + # ``app/shared/model_taxonomy`` — registry no longer imports this slice. + - UPDATE the inline comment at line 1052 ("# Lazy cross-slice imports — see module-level NOTE.") + → "# Lazy by design — see the jobs↔forecasting NOTE below the module imports." + - PRESERVE everything else, incl. line 974 `model_family_for(run.model_type) is ModelFamily.BASELINE` + (works unchanged via the existing module-scope imports from feature_metadata/schemas re-exports) + +Task 6: +MODIFY app/features/model_selection/capabilities.py: + - ADD module-scope import (sorted before the model_selection.schemas import): + from app.shared.model_taxonomy import model_family_for + - DELETE lines 133-135 (the lazy-import comment + import inside build_model_catalog) + - UPDATE module docstring provenance bullet (lines 9-10): + "- ``family`` — ``app.shared.model_taxonomy.model_family_for`` (shared taxonomy, #268)." + +Task 7: +MODIFY app/features/model_selection/service.py: + - ADD to the module import block (sorted after model_selection.* per isort… actually BEFORE — + `registry` > `model_selection` alphabetically, so AFTER the model_selection imports): + from app.features.registry.schemas import AliasCreate, RunCreate, RunStatus, RunUpdate + from app.features.registry.service import RegistryService + - DELETE lines 1077-1083 (both lazy imports + their `# lazy` comments inside + promote_champion_selection) + +Task 8: +MODIFY app/features/batch/service.py: + - ADD to the module import block (sorted: data_platform < jobs, so after line 52): + from app.features.jobs.models import JobStatus + from app.features.jobs.schemas import JobCreate + from app.features.jobs.service import JobService + - DELETE lines 292-294 (the three lazy imports) and the docstring sentence at 289-290 + ("Lazy cross-slice imports break the alembic cold-boot cycle (precedent: …786-787).") + - CHECK no name collisions: batch.models defines BatchStatus/BatchItemStatus, not JobStatus — + verify with grep before promoting + +Task 9: +CREATE app/shared/tests/__init__.py (empty) and app/shared/tests/test_model_taxonomy.py: + - MIRROR style: app/features/forecasting/tests/test_feature_metadata.py:75-108 + - CASES: + 1. mapping: the 5 baseline types → BASELINE; regression/lightgbm/xgboost/random_forest → + TREE; prophet_like/trend_regression_baseline → ADDITIVE (canonical import path) + 2. unknown type → BASELINE fallback (no raise) + 3. IDENTITY (Gotcha 5): + from app.shared.model_taxonomy import ModelFamily, model_family_for + from app.features.forecasting.schemas import ModelFamily as SchemasMF + from app.features.forecasting.feature_metadata import ( + ModelFamily as FMetaMF, model_family_for as legacy_fn) + assert SchemasMF is ModelFamily and FMetaMF is ModelFamily + assert legacy_fn is model_family_for + 4. JSON-schema lock (Gotcha 6): + s = RunResponse.model_json_schema(mode="serialization") + d = s["$defs"]["ModelFamily"] + assert d["title"] == "ModelFamily" and d["enum"] == ["baseline", "tree", "additive"] + 5. cold-boot probes (Gotchas 1+2, the spec for the bug class), parametrized over EXACTLY: + "from app.features.registry import models", # alembic env.py shape + "import app.features.forecasting", # forecasting-first entry + "import app.main", # full wiring + each run as: + result = subprocess.run( # noqa: S603 — internal command, trusted args + [sys.executable, "-c", stmt], capture_output=True, text=True, timeout=120) + assert result.returncode == 0, result.stderr + - DO NOT touch app/features/forecasting/tests/test_feature_metadata.py or + app/features/registry/tests/test_schemas.py — green-untouched is the back-compat proof + +Task 10: +MODIFY docs/_base/ARCHITECTURE.md (section "### Cross-slice read-only import pattern", line ~71): + - REPLACE the "Existing precedents" forecasting bullet (lines ~91-93: "lazy RegistryService / + JobService / RunStatus imports … required because RunResponse.model_family computed_field + closes the cycle at alembic cold-boot") with: + - `app/features/forecasting/service.py` ↔ `app/features/jobs/service.py` — mutually lazy + service pair (each lazily imports the other at call time; at least one side must stay + lazy). The former ModelFamily-driven cycle here was RESOLVED by #268. + - ADD a short "Resolved cycle (reference case)" line after the precedents: + - #268 moved `ModelFamily` + `model_family_for` to `app/shared/model_taxonomy.py`; the + registry→forecasting eager edge disappeared and the registry-related lazy imports were + promoted to module scope. When a cycle is caused by a shared *type*, relocate the type + to `app/shared/` instead of adding lazy imports. + +Task 11: +RUN the validation loop below (Level 0 cold-boot smoke FIRST — prp-execute step 3.5). +``` + +### Integration Points + +```yaml +BACKEND: 8 files modified + 3 created — no router change, no schema change, no migration +DATABASE: none (ModelFamily never persisted) +CONFIG: none +DOCS: docs/_base/ARCHITECTURE.md only (Task 10); API_CONTRACTS/DOMAIN_MODEL grep-verified clean +FRONTEND: none (enum values + JSON-schema title unchanged) +GIT: + branch: refactor/shared-model-taxonomy (off dev; type/kebab, ≤50 chars) + commits: refactor(forecast,registry): move model family taxonomy to shared (#268) + # the PRP file itself lands as: + docs(repo): track reliability E4 prp for shared model taxonomy (#268) + # (mirrors the E2/E3 "docs(repo): track reliability EN prp …" precedent) + note: refactor type → no version bump (correct; this is internal) +``` + +## Validation Loop + +### Level 0: Cold-boot smoke (MANDATORY FIRST — prp-execute step 3.5; pytest masks this class) + +```bash +uv run python -c "import app.main; print('app.main ok')" +uv run python -c "from app.features.registry import models; print('alembic-shape ok')" +uv run python -c "import app.features.forecasting; print('forecasting-first ok')" +docker compose up -d && uv run alembic upgrade head # real env.py execution +# Baseline before edits (verified 2026-06-11): all four pass. They must STILL pass after. +``` + +### Level 1: Syntax & Style + +```bash +uv run ruff check . && uv run ruff format --check . +uv run mypy app/ # strict; watch for no_implicit_reexport errors → Gotcha 3 +uv run pyright app/ # strict; 197 pre-existing warnings are baseline noise, 0 errors is the bar +``` + +### Level 2: Unit Tests + +```bash +# Targeted first (fast signal): +uv run pytest -v app/shared/tests/test_model_taxonomy.py \ + app/features/forecasting/tests/test_feature_metadata.py \ + app/features/registry/tests/test_schemas.py +# Then the full unit suite (baseline 2026-06-11: 1920 passed): +uv run pytest -v -m "not integration" +# Expected: old suites green UNTOUCHED; new suite adds ~10 cases incl. 3 subprocess probes (~10 s). +``` + +### Level 3: Integration (touched slices; DB must be up) + +```bash +uv run pytest -v -m integration app/features/registry app/features/forecasting \ + app/features/model_selection app/features/batch +# Memory [[integration-suite-shared-state-pollution]]: do NOT run the FULL integration suite as +# the gate — destructive seeder tests pollute the shared DB mid-suite; targeted slices suffice. + +# Live contract sanity (uvicorn running): +curl -s "http://localhost:8123/registry/runs?page=1&page_size=1" | python3 -c \ + "import sys,json; r=json.load(sys.stdin)['runs']; print('model_family:', r[0]['model_family'] if r else 'empty-table-ok')" +# Expected: one of baseline|tree|additive (or empty-table-ok on a fresh DB). +``` + +### Level 4: Full gates + issue sanity + +```bash +uv run ruff check . && uv run ruff format --check . && uv run mypy app/ && uv run pyright app/ \ + && uv run pytest -v -m "not integration" +gh issue view 268 --json state # OPEN until the PR lands +grep -rn "from app.features.forecasting" app/features/registry/ # MUST be empty +grep -rn "lazy" app/ --include="*.py" -i | grep -i "modelfamily\|model_family" # MUST be empty +``` + +## Final Validation Checklist + +- [ ] Level 0 cold-boot: all four probes pass AFTER the refactor (the alembic-shape one is the bug) +- [ ] `uv run pytest -v -m "not integration"` — green; both pre-existing suites UNTOUCHED +- [ ] New suite: mapping + identity (`is`) + JSON-schema lock + 3 subprocess probes green +- [ ] `grep -rn "from app.features.forecasting" app/features/registry/` → empty +- [ ] Zero comments blame the ModelFamily cycle for laziness (umbrella #380 acceptance) +- [ ] S3 kept lazy with the corrected jobs↔forecasting comment; `jobs/service.py:435,545` untouched +- [ ] `docs/_base/ARCHITECTURE.md` precedents bullet rewritten (Task 10) +- [ ] `git diff --stat` surgical (Gotcha 12); no `uv.lock` / `docker-compose.lan.yml` churn (Gotcha 13) +- [ ] Commits `refactor(forecast,registry): … (#268)`; no AI trailer; branch off `dev` + +--- + +## Anti-Patterns to Avoid + +- ❌ Don't REWRITE the enum/map/function while moving — move verbatim; behavior change is scope creep +- ❌ Don't gate the registry import under `TYPE_CHECKING` — Pydantic needs ModelFamily at runtime (Gotcha 4) +- ❌ Don't use a plain import as the re-export — mypy no_implicit_reexport breaks downstream (Gotcha 3) +- ❌ Don't eagerize `jobs/service.py:435,545` or forecasting's S3 to "finish the job" — that pair is + a genuinely mutual dependency; both-eager = the cycle returns (S3 verdict) +- ❌ Don't add ModelFamily to `app/shared/__init__.py` — feature_frames precedent imports by full path +- ❌ Don't trust an in-process pytest pass as cycle proof — only fresh-interpreter probes count (Gotcha 2) +- ❌ Don't rename the `forecasting.unknown_model_family` log event (Gotcha 8) +- ❌ Don't touch `app/features/data_platform` cross-slice imports — separately tracked sibling flag + +## Confidence Score: 9/10 + +One-pass success is highly likely: the cycle topology is fully traced and runtime-verified (all +entry orders), every mutation site is mapped with verbatim current text and a per-site verdict, +the re-export idiom is checked against the actual mypy/ruff config, and the regression suite +includes the one test shape (fresh-interpreter probes) that this bug class cannot hide from. +The 1-point deduction: isort/ruff-format placement of the promoted imports and the relocated +NOTE block in `forecasting/service.py` (Gotcha 10) may need one fix-up iteration, and the +subprocess probes' wall-clock cost could need trimming if the unit-suite budget complains. From 5d4b8aa7668231e2dd6a0c3ed8abe1282142461b Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 10:15:32 +0200 Subject: [PATCH 20/44] chore(repo): sync uv.lock with 0.2.21 release version (#268) --- uv.lock | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/uv.lock b/uv.lock index 2de76781..61725802 100644 --- a/uv.lock +++ b/uv.lock @@ -821,7 +821,7 @@ wheels = [ [[package]] name = "forecastlabai" -version = "0.2.19" +version = "0.2.21" source = { editable = "." } dependencies = [ { name = "alembic" }, From f7fd09cc9229aa237c79a867d30e1e94db54a5a5 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 10:39:41 +0200 Subject: [PATCH 21/44] docs(repo): address review notes on taxonomy map provenance and prp typo (#268) --- PRPs/PRP-reliability-E4-shared-model-taxonomy.md | 2 +- app/shared/model_taxonomy.py | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/PRPs/PRP-reliability-E4-shared-model-taxonomy.md b/PRPs/PRP-reliability-E4-shared-model-taxonomy.md index 1ff06df6..cca732d1 100644 --- a/PRPs/PRP-reliability-E4-shared-model-taxonomy.md +++ b/PRPs/PRP-reliability-E4-shared-model-taxonomy.md @@ -21,7 +21,7 @@ import `app/features/Y`) and the proximate cause of the documented lazy-import w 2. Back-compat re-exports from the two old homes (`forecasting/schemas.py`, `forecasting/feature_metadata.py`) using the redundant-alias idiom (`import X as X`) so every existing importer — including both untouched test suites — keeps resolving under - `mypy --strict` (no-implicit-reexport). + `mypy --strict` (no_implicit_reexport). 3. The 6 mapped lazy-import sites resolved per the verdict table below: 5 retired (promoted to module scope), 1 kept-but-re-documented (the genuinely mutual jobs↔forecasting pair). 4. A new test suite `app/shared/tests/test_model_taxonomy.py` that locks the mapping, the enum diff --git a/app/shared/model_taxonomy.py b/app/shared/model_taxonomy.py index 7e63020c..a42f10e1 100644 --- a/app/shared/model_taxonomy.py +++ b/app/shared/model_taxonomy.py @@ -37,6 +37,10 @@ class ModelFamily(str, Enum): # Canonical map: model_type string → ModelFamily. Unknown types log a warning # and classify as BASELINE. Keep in sync with the ``ModelType`` Literal in # ``app/features/forecasting/models.py``. +# NOTE: re-exported (private) via ``forecasting/feature_metadata.py`` solely for +# the drift-lock test ``forecasting/tests/test_feature_metadata.py:: +# test_model_family_map_covers_every_known_model_type`` — keep that re-export when +# moving this symbol. _MODEL_FAMILY_MAP: dict[str, ModelFamily] = { "naive": ModelFamily.BASELINE, "seasonal_naive": ModelFamily.BASELINE, From 151611ca2e1395b2a3e7ef0a47885d9dec649cca Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 11:32:21 +0200 Subject: [PATCH 22/44] fix(data): couple seeded sales prices to price_history windows (#237) --- app/shared/seeder/config.py | 5 + app/shared/seeder/core.py | 10 ++ app/shared/seeder/generators/__init__.py | 4 + app/shared/seeder/generators/facts.py | 99 ++++++++++- app/shared/seeder/tests/test_generators.py | 186 ++++++++++++++++++++- 5 files changed, 300 insertions(+), 4 deletions(-) diff --git a/app/shared/seeder/config.py b/app/shared/seeder/config.py index 135e68ec..75129e19 100644 --- a/app/shared/seeder/config.py +++ b/app/shared/seeder/config.py @@ -82,6 +82,10 @@ class RetailPatternConfig: promotion_lift: Sales multiplier during promotions. stockout_behavior: How to handle stockouts (zero sales or backlog). price_elasticity: % demand change per % price change (negative = inverse). + price_sales_coupling: When True, sales_daily.unit_price follows the + generated price_history windows (incl. markdowns) and demand + responds via price_elasticity. When False, legacy behavior: + unit_price = base_price always. new_product_ramp_days: Days to reach full demand for new products. weekend_spike: Additional weekend multiplier on top of weekly seasonality. promotion_probability: Probability of a product having a promotion per period. @@ -91,6 +95,7 @@ class RetailPatternConfig: promotion_lift: float = 1.3 stockout_behavior: Literal["zero", "backlog"] = "zero" price_elasticity: float = -0.5 + price_sales_coupling: bool = True new_product_ramp_days: int = 30 weekend_spike: float = 1.0 # Already in weekly_seasonality, this is additional promotion_probability: float = 0.05 diff --git a/app/shared/seeder/core.py b/app/shared/seeder/core.py index 1d97b4cf..27259471 100644 --- a/app/shared/seeder/core.py +++ b/app/shared/seeder/core.py @@ -39,6 +39,7 @@ ReturnsGenerator, SalesDailyGenerator, StoreGenerator, + build_price_lookup, ) from app.shared.seeder.generators.exogenous import WEATHER_SIGNAL_NAME @@ -431,6 +432,14 @@ async def _generate_facts( if weather_lookup and self.config.exogenous.weather_temperature_sensitivity != 0.0 else None ) + # Price/sales coupling (issue #237): thread the generated price + # windows (incl. Phase 2 markdowns merged above) into the sales + # generator so unit_price follows price_history and demand responds + # via retail.price_elasticity. None preserves the legacy path + # byte-for-byte — same gating convention as weather_lookup_for_sales. + price_lookup_for_sales = ( + build_price_lookup(price_records) if self.config.retail.price_sales_coupling else None + ) sales_gen = SalesDailyGenerator( self.rng, self.config.time_series, @@ -453,6 +462,7 @@ async def _generate_facts( promo_dates, stockout_dates, product_lifecycle_data=product_lifecycle_data, + price_lookup=price_lookup_for_sales, ) logger.info( diff --git a/app/shared/seeder/generators/__init__.py b/app/shared/seeder/generators/__init__.py index ad1e8bd7..f5aee033 100644 --- a/app/shared/seeder/generators/__init__.py +++ b/app/shared/seeder/generators/__init__.py @@ -8,6 +8,8 @@ PriceHistoryGenerator, PromotionGenerator, SalesDailyGenerator, + build_price_lookup, + resolve_price, ) from app.shared.seeder.generators.lifecycle import LifecycleGenerator from app.shared.seeder.generators.markdowns import MarkdownGenerator @@ -30,4 +32,6 @@ "ReturnsGenerator", "SalesDailyGenerator", "StoreGenerator", + "build_price_lookup", + "resolve_price", ] diff --git a/app/shared/seeder/generators/facts.py b/app/shared/seeder/generators/facts.py index 68438b7b..873c8f73 100644 --- a/app/shared/seeder/generators/facts.py +++ b/app/shared/seeder/generators/facts.py @@ -6,7 +6,7 @@ import random from datetime import date, timedelta from decimal import Decimal -from typing import TYPE_CHECKING +from typing import TYPE_CHECKING, cast if TYPE_CHECKING: from app.shared.seeder.config import ( @@ -26,6 +26,83 @@ """Mirrors the SQL CHECK on ``sales_daily.channel`` (see PRP-12 §schema).""" +PriceLookup = dict[tuple[int, int | None], list[tuple[date, date | None, Decimal]]] +"""Price windows indexed by ``(product_id, store_id)``. + +``store_id=None`` keys hold chain-wide windows; each value is a list of +``(valid_from, valid_to, price)`` tuples sorted by ``valid_from`` ascending +(``valid_to=None`` means open-ended). +""" + + +def build_price_lookup( + price_records: list[dict[str, date | int | Decimal | None]], +) -> PriceLookup: + """Index price-history records by ``(product_id, store_id)`` scope. + + Pure and rng-free — the seeder's byte-stability contract forbids any + rng draw here. Accepts the row shape both :class:`PriceHistoryGenerator` + and the Phase 2 ``MarkdownGenerator`` emit. Within a scope, windows are + sorted by ``valid_from`` ascending (stable, so a markdown appended after + the base window it overlaps wins on a ``valid_from`` tie). + + Args: + price_records: Price-history row dicts carrying ``product_id``, + ``store_id``, ``price``, ``valid_from``, ``valid_to``. + + Returns: + The :data:`PriceLookup` mapping :func:`resolve_price` consumes. + """ + lookup: PriceLookup = {} + for record in price_records: + product_id = cast("int", record["product_id"]) + store_id = cast("int | None", record["store_id"]) + price = cast("Decimal", record["price"]) + valid_from = cast("date", record["valid_from"]) + valid_to = cast("date | None", record["valid_to"]) + lookup.setdefault((product_id, store_id), []).append((valid_from, valid_to, price)) + for windows in lookup.values(): + windows.sort(key=lambda window: window[0]) + return lookup + + +def resolve_price( + lookup: PriceLookup, + product_id: int, + store_id: int, + day: date, + base_price: Decimal, +) -> Decimal: + """Resolve the active price for ``(product_id, store_id)`` on ``day``. + + Precedence (deterministic): a store-specific window beats a chain-wide + one; within a scope the latest ``valid_from`` covering ``day`` wins + (markdowns fire later than the base window they cut). ``valid_to=None`` + is open-ended. A day no window covers falls back to ``base_price``. + + Args: + lookup: The index built by :func:`build_price_lookup`. + product_id: Product the sales row belongs to. + store_id: Store the sales row belongs to. + day: The sales date to resolve. + base_price: Fallback when no window covers ``day``. + + Returns: + The active price as a :class:`~decimal.Decimal`. + """ + for scope in ((product_id, store_id), (product_id, None)): + windows = lookup.get(scope) + if not windows: + continue + # Sorted by valid_from ASC at build time; scan from the latest so the + # most recent window covering ``day`` wins (markdown overlaps cut the + # base window they were appended after). + for valid_from, valid_to, price in reversed(windows): + if valid_from <= day and (valid_to is None or day <= valid_to): + return price + return base_price + + class SalesDailyGenerator: """Generator for daily sales fact data with realistic time-series patterns. @@ -420,6 +497,8 @@ def generate( promotions: dict[tuple[int, int], set[date]], # (store_id, product_id) -> promo dates stockouts: dict[tuple[int, int], set[date]], # (store_id, product_id) -> stockout dates product_lifecycle_data: dict[int, tuple[date | None, date | None]] | None = None, + *, + price_lookup: PriceLookup | None = None, ) -> list[dict[str, date | int | Decimal]]: """Generate sales daily records. @@ -435,6 +514,12 @@ def generate( :meth:`__init__`. Missing entries fall back to ``(None, None)`` so the lifecycle multiplier evaluates to 1.0 for that product. + price_lookup: Optional :data:`PriceLookup` built from the + generated price-history records (issue #237). When supplied, + each row's ``unit_price`` is the day's resolved price and + demand responds via ``retail_config.price_elasticity``. + When ``None`` (the default) the legacy path is byte-identical: + ``unit_price = base_price`` and no elasticity effect. Returns: List of sales dictionaries ready for database insertion. @@ -518,11 +603,19 @@ def generate( stockouts_today = stockouts_by_store_date.get((store_id, current_date)) + # Resolved day price (issue #237): None on the legacy path + # so the elasticity term in _compute_demand stays dormant. + resolved_price = ( + resolve_price(price_lookup, product_id, store_id, current_date, base_price) + if price_lookup is not None + else None + ) + quantity = self._compute_demand( current_date=current_date, base_date=base_date, base_price=base_price, - current_price=None, # Simplified: use base price + current_price=resolved_price, is_promotion=is_promotion, is_stockout=is_stockout, product_launch_date=launch_date_for_product, @@ -541,7 +634,7 @@ def generate( quantity, chosen_channel = self._maybe_apply_channel(quantity, is_promotion) # Calculate total amount - unit_price = base_price + unit_price = resolved_price if resolved_price is not None else base_price total_amount = unit_price * quantity row: dict[str, date | int | Decimal | str] = { diff --git a/app/shared/seeder/tests/test_generators.py b/app/shared/seeder/tests/test_generators.py index 1f468e2e..0df423ae 100644 --- a/app/shared/seeder/tests/test_generators.py +++ b/app/shared/seeder/tests/test_generators.py @@ -7,7 +7,12 @@ from datetime import date from decimal import Decimal -from app.shared.seeder.config import HolidayConfig, SparsityConfig +from app.shared.seeder.config import ( + HolidayConfig, + RetailPatternConfig, + SparsityConfig, + TimeSeriesConfig, +) from app.shared.seeder.generators import ( CalendarGenerator, InventorySnapshotGenerator, @@ -16,6 +21,8 @@ PromotionGenerator, SalesDailyGenerator, StoreGenerator, + build_price_lookup, + resolve_price, ) @@ -258,6 +265,183 @@ def test_sparsity_reduces_combinations(self, rng, time_series_config, retail_con assert len(sales) < max_sales +class TestPriceLookupResolver: + """Tests for build_price_lookup / resolve_price (issue #237). + + Pure, rng-free functions — the seeder's byte-stability contract forbids + any rng draw in the resolver. + """ + + def test_store_specific_beats_chain_wide(self): + """A store-specific window takes precedence over a chain-wide one.""" + lookup = build_price_lookup( + [ + { + "product_id": 1, + "store_id": None, + "price": Decimal("10.00"), + "valid_from": date(2024, 1, 1), + "valid_to": None, + }, + { + "product_id": 1, + "store_id": 7, + "price": Decimal("8.50"), + "valid_from": date(2024, 1, 1), + "valid_to": None, + }, + ] + ) + + assert resolve_price(lookup, 1, 7, date(2024, 2, 1), Decimal("9.99")) == Decimal("8.50") + # A different store only sees the chain-wide window. + assert resolve_price(lookup, 1, 3, date(2024, 2, 1), Decimal("9.99")) == Decimal("10.00") + + def test_latest_valid_from_wins_on_overlap(self): + """A later window (e.g. a markdown) overrides the base window it cuts.""" + lookup = build_price_lookup( + [ + { + "product_id": 1, + "store_id": None, + "price": Decimal("10.00"), + "valid_from": date(2024, 1, 1), + "valid_to": date(2024, 3, 31), + }, + { + "product_id": 1, + "store_id": None, + "price": Decimal("6.00"), + "valid_from": date(2024, 2, 1), + "valid_to": date(2024, 2, 14), + }, + ] + ) + + assert resolve_price(lookup, 1, 1, date(2024, 1, 15), Decimal("9.99")) == Decimal("10.00") + assert resolve_price(lookup, 1, 1, date(2024, 2, 7), Decimal("9.99")) == Decimal("6.00") + # After the markdown window closes, the base window resumes. + assert resolve_price(lookup, 1, 1, date(2024, 3, 1), Decimal("9.99")) == Decimal("10.00") + + def test_open_ended_window_covers_any_later_day(self): + """valid_to=None means the window stays active indefinitely.""" + lookup = build_price_lookup( + [ + { + "product_id": 1, + "store_id": None, + "price": Decimal("11.25"), + "valid_from": date(2024, 6, 1), + "valid_to": None, + }, + ] + ) + + assert resolve_price(lookup, 1, 1, date(2030, 1, 1), Decimal("9.99")) == Decimal("11.25") + + def test_uncovered_day_falls_back_to_base_price(self): + """A day before the first window (or an unknown product) → base_price.""" + lookup = build_price_lookup( + [ + { + "product_id": 1, + "store_id": None, + "price": Decimal("10.00"), + "valid_from": date(2024, 3, 1), + "valid_to": None, + }, + ] + ) + + assert resolve_price(lookup, 1, 1, date(2024, 1, 15), Decimal("9.99")) == Decimal("9.99") + assert resolve_price(lookup, 99, 1, date(2024, 3, 15), Decimal("4.49")) == Decimal("4.49") + + +class TestSalesDailyGeneratorPriceCoupling: + """Tests for SalesDailyGenerator's price_lookup coupling (issue #237).""" + + @staticmethod + def _deterministic_generator(seed: int = 42) -> SalesDailyGenerator: + """A generator with zero noise/anomaly so demand is exact.""" + return SalesDailyGenerator( + random.Random(seed), + TimeSeriesConfig( + base_demand=100, + trend="none", + weekly_seasonality=[1.0] * 7, + monthly_seasonality={}, + noise_sigma=0.0, + anomaly_probability=0.0, + ), + RetailPatternConfig(price_elasticity=-0.5), + SparsityConfig(), + [], + ) + + def test_coupled_unit_price_follows_lookup_and_demand_responds(self): + """On a cut day unit_price is the cut price AND elasticity lifts demand.""" + base_price = Decimal("10.00") + cut_day = date(2024, 1, 3) + lookup = build_price_lookup( + [ + { + "product_id": 1, + "store_id": None, + "price": Decimal("8.00"), # a -20% cut + "valid_from": cut_day, + "valid_to": cut_day, + }, + ] + ) + dates = [date(2024, 1, d) for d in range(1, 6)] + + coupled = self._deterministic_generator().generate( + [1], [(1, base_price)], dates, {}, {}, price_lookup=lookup + ) + legacy = self._deterministic_generator().generate([1], [(1, base_price)], dates, {}, {}) + + coupled_by_date = {row["date"]: row for row in coupled} + legacy_by_date = {row["date"]: row for row in legacy} + + # Cut day: unit_price follows the lookup; -20% price x -0.5 elasticity + # lifts demand by +10% over the legacy twin (same seed, zero noise). + assert coupled_by_date[cut_day]["unit_price"] == Decimal("8.00") + assert legacy_by_date[cut_day]["unit_price"] == base_price + assert coupled_by_date[cut_day]["quantity"] > legacy_by_date[cut_day]["quantity"] + + # Uncovered days: identical to the legacy twin. + for day in dates: + if day == cut_day: + continue + assert coupled_by_date[day] == legacy_by_date[day] + + # total_amount = unit_price * quantity holds on every coupled row. + for row in coupled: + assert row["total_amount"] == row["unit_price"] * row["quantity"] + + def test_none_lookup_is_byte_identical_to_legacy_call(self): + """price_lookup=None reproduces the no-kwarg output byte-for-byte.""" + store_ids = [1, 2] + product_data = [(1, Decimal("9.99")), (2, Decimal("4.99"))] + dates = [date(2024, 1, d) for d in range(1, 31)] + + def _noisy_generator() -> SalesDailyGenerator: + return SalesDailyGenerator( + random.Random(42), + TimeSeriesConfig(), + RetailPatternConfig(), + SparsityConfig(random_gaps_per_series=1), + [], + ) + + legacy = _noisy_generator().generate(store_ids, product_data, dates, {}, {}) + explicit_none = _noisy_generator().generate( + store_ids, product_data, dates, {}, {}, price_lookup=None + ) + + assert legacy == explicit_none + + class TestInventorySnapshotGenerator: """Tests for InventorySnapshotGenerator.""" From 9449ac4f3d4641048b0ecaf6ad06cb20edd60c71 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 11:32:21 +0200 Subject: [PATCH 23/44] test(forecast): pin model_exogenous price-inertia discriminators (#237) --- .../tests/test_regression_forecaster.py | 59 +++++ .../scenarios/tests/test_feature_frame.py | 64 +++++ .../tests/test_routes_integration.py | 236 +++++++++++++++++- 3 files changed, 356 insertions(+), 3 deletions(-) diff --git a/app/features/forecasting/tests/test_regression_forecaster.py b/app/features/forecasting/tests/test_regression_forecaster.py index 22caae79..e7d03fea 100644 --- a/app/features/forecasting/tests/test_regression_forecaster.py +++ b/app/features/forecasting/tests/test_regression_forecaster.py @@ -98,6 +98,65 @@ def test_handles_nan_features() -> None: assert bool(np.all(np.isfinite(predictions))) +def test_constant_price_column_is_inert_to_future_price() -> None: + """A model fit on a constant price column ignores future prices EXACTLY. + + Discriminator test for issue #237 (hypothesis 2, the verdict): a + ``RegressionForecaster`` trained on a matrix whose price column is the + constant ``1.0`` — exactly what the pre-fix seeder produced, since + ``sales_daily.unit_price`` never moved off ``base_price`` and so + ``price_factor = unit_price / median(unit_price) ≡ 1.0`` — predicts + byte-identically for ANY future price value. ``HistGradientBoostingRegressor`` + never splits on a constant training column, so the scenario delta is + exactly 0.0, not merely small. The 0.0 the issue reproduces is + zero-learned-elasticity, not lost wiring (the wiring twin lives in + ``app/features/scenarios/tests/test_feature_frame.py``). + """ + rng = np.random.default_rng(42) + n = 300 + features = rng.normal(10.0, 2.0, size=(n, 5)).astype(np.float64) + features[:, 4] = 1.0 # the price column is CONSTANT, as on pre-fix seeded data + target = (40.0 + 0.5 * features[:, 0] + rng.normal(scale=0.5, size=n)).astype(np.float64) + + model = RegressionForecaster(random_state=42).fit(target, features) + + future = features[:14].copy() + future[:, 4] = 1.0 + baseline_prediction = model.predict(14, future) + future[:, 4] = 0.60 # a deep -40% price cut + cut_prediction = model.predict(14, future) + + np.testing.assert_array_equal(cut_prediction, baseline_prediction) + + +def test_elastic_price_column_responds_to_future_price() -> None: + """The converse discriminator: trained price variance → a real response. + + Fit on price-elastic synthetic data (price column uniform in [0.7, 1.1], + demand falling with price) and the same forecaster's prediction moves + when the future price moves — proving the inertia in the constant-column + twin is a property of the *training data*, not of the model or the + predict path (issue #237). + """ + rng = np.random.default_rng(42) + n = 300 + features = rng.normal(10.0, 2.0, size=(n, 5)).astype(np.float64) + features[:, 4] = rng.uniform(0.7, 1.1, size=n) + target = (40.0 - 20.0 * features[:, 4] + rng.normal(scale=0.5, size=n)).astype(np.float64) + + model = RegressionForecaster(random_state=42).fit(target, features) + + future = features[:14].copy() + future[:, 4] = 1.0 + baseline_prediction = model.predict(14, future) + future[:, 4] = 0.85 # a -15% cut inside the training range + cut_prediction = model.predict(14, future) + + assert not np.array_equal(cut_prediction, baseline_prediction) + # Demand falls with price, so a cut lifts every horizon day's forecast. + assert bool(np.all(cut_prediction > baseline_prediction)) + + def test_get_and_set_params() -> None: """``get_params`` reflects construction; ``set_params`` mutates in place.""" model = RegressionForecaster(max_iter=150, learning_rate=0.03, max_depth=4) diff --git a/app/features/scenarios/tests/test_feature_frame.py b/app/features/scenarios/tests/test_feature_frame.py index 3f14b09b..946cc509 100644 --- a/app/features/scenarios/tests/test_feature_frame.py +++ b/app/features/scenarios/tests/test_feature_frame.py @@ -201,6 +201,70 @@ def test_assemble_future_frame_shape_and_order() -> None: assert all(len(row) == len(columns) for row in frame.matrix) +def test_scenario_vs_baseline_frame_differs_only_in_price_factor() -> None: + """A price assumption reaches X_future, and ONLY the price_factor column. + + Discriminator test for issue #237 (hypothesis 1 falsification): the + scenario frame and an assumptions-stripped baseline frame — built from + identical dates/columns/history_tail/holiday_dates/launch_date — differ + exactly in the ``price_factor`` cells inside the assumption window + (``1 + change_pct`` vs ``1.0``) and nowhere else. The future-frame wiring + therefore does NOT drop the price assumption; the observed 0.0 delta must + come from the trained model itself (see the inert-mechanism twin in + ``app/features/forecasting/tests/test_regression_forecaster.py``). + """ + columns = canonical_feature_columns() + history_tail = [float(value) for value in range(HISTORY_TAIL_DAYS)] + launch = _ORIGIN - timedelta(days=100) + holiday_dates = {_HORIZON_DATES[0]} + # Window covering days 3..7 (indices 2..6) of the 14-day horizon. + assumptions = ScenarioAssumptions( + price=PriceAssumption( + change_pct=-0.15, + start_date=_HORIZON_DATES[2], + end_date=_HORIZON_DATES[6], + ) + ) + + scenario_frame = assemble_future_frame( + dates=_HORIZON_DATES, + feature_columns=columns, + history_tail=history_tail, + assumptions=assumptions, + holiday_dates=holiday_dates, + launch_date=launch, + ) + baseline_frame = assemble_future_frame( + dates=_HORIZON_DATES, + feature_columns=columns, + history_tail=history_tail, + assumptions=ScenarioAssumptions(), + holiday_dates=holiday_dates, + launch_date=launch, + ) + + price_index = columns.index("price_factor") + for row_index in range(_HORIZON): + scenario_row = scenario_frame.matrix[row_index] + baseline_row = baseline_frame.matrix[row_index] + for col_index in range(len(columns)): + scenario_cell = scenario_row[col_index] + baseline_cell = baseline_row[col_index] + if col_index == price_index: + if 2 <= row_index <= 6: + assert scenario_cell == 0.85 + assert baseline_cell == 1.0 + else: + assert scenario_cell == 1.0 + assert baseline_cell == 1.0 + continue + # Every non-price cell is identical (NaN-aware compare — lag + # cells whose source target lies in the horizon are NaN). + if math.isnan(scenario_cell) and math.isnan(baseline_cell): + continue + assert scenario_cell == baseline_cell + + def test_assemble_future_frame_unknown_column_is_nan() -> None: """A requested column the builders do not produce becomes an all-NaN column.""" columns = [*canonical_feature_columns(), "mystery_feature"] diff --git a/app/features/scenarios/tests/test_routes_integration.py b/app/features/scenarios/tests/test_routes_integration.py index 79092115..8be8735d 100644 --- a/app/features/scenarios/tests/test_routes_integration.py +++ b/app/features/scenarios/tests/test_routes_integration.py @@ -5,20 +5,37 @@ and persistence. Requires ``docker compose up -d``. """ +import random import uuid +from collections.abc import AsyncGenerator +from datetime import date, timedelta +from decimal import Decimal +from pathlib import Path import pytest from httpx import AsyncClient +from sqlalchemy import delete, select from sqlalchemy.exc import IntegrityError from sqlalchemy.ext.asyncio import AsyncSession +from app.features.data_platform.models import Calendar, Product, SalesDaily, Store from app.features.scenarios.models import ScenarioPlan +from app.shared.seeder.config import RetailPatternConfig, SparsityConfig, TimeSeriesConfig +from app.shared.seeder.generators import SalesDailyGenerator, build_price_lookup # A price window covering the test bundle's 14-day horizon (train_end 2026-06-30). _PRICE_ASSUMPTION = { "price": {"change_pct": -0.15, "start_date": "2026-07-01", "end_date": "2026-07-14"}, } +# Grain for the seeded end-to-end repro (issue #237) — deliberately high IDs no +# seeder uses, mirroring the TEST_STORE_ID convention in conftest.py. +_SEEDED_STORE_ID = 990101 +_SEEDED_PRODUCT_ID = 990102 +_SEEDED_BASE_PRICE = Decimal("10.00") +_SEEDED_START = date(2026, 2, 1) +_SEEDED_END = date(2026, 6, 30) # the training origin T; horizon is July 1..14 + @pytest.mark.integration @pytest.mark.asyncio @@ -150,7 +167,14 @@ class TestSimulateModelExogenous: async def test_regression_baseline_returns_model_exogenous( self, client: AsyncClient, trained_regression_model: str ) -> None: - """A regression baseline re-forecasts — method is 'model_exogenous'.""" + """A regression baseline re-forecasts — and the delta is non-zero. + + The conftest bundle is trained price-sensitive (demand falls with + ``price_factor``), so a price cut MUST lift the re-forecast. This is + the assertion gap issue #237 closed: the LightGBM twin always pinned + ``units_delta > 0.0``; the regression twin — the exact model type from + the issue's repro — only pinned the method. + """ response = await client.post( "/scenarios/simulate", json={ @@ -163,6 +187,44 @@ async def test_regression_baseline_returns_model_exogenous( data = response.json() assert data["method"] == "model_exogenous" + assert data["disclaimer"], "every comparison must carry a non-empty disclaimer" + assert len(data["points"]) == 14 + # A price cut moves the re-forecast — the deltas are model-driven, not + # a fixed multiplier, and the modelled price response lifts demand. + assert data["units_delta"] > 0.0 + + async def test_regression_deeper_cut_moves_at_least_as_much( + self, client: AsyncClient, trained_regression_model: str + ) -> None: + """A -40% cut moves demand at least as much as -15% — never less. + + ``>=`` (not ``>``) is deliberate: HistGBR bins features by TRAINING + quantiles, so a scenario price below the trained range saturates at + the lowest bin instead of extrapolating linearly (issue #237). + """ + deltas: dict[float, float] = {} + for change_pct in (-0.15, -0.40): + response = await client.post( + "/scenarios/simulate", + json={ + "run_id": trained_regression_model, + "horizon": 14, + "assumptions": { + "price": { + "change_pct": change_pct, + "start_date": "2026-07-01", + "end_date": "2026-07-14", + }, + }, + }, + ) + assert response.status_code == 200 + data = response.json() + assert data["method"] == "model_exogenous" + deltas[change_pct] = data["units_delta"] + + assert deltas[-0.15] > 0.0 + assert deltas[-0.40] >= deltas[-0.15] async def test_lightgbm_baseline_returns_model_exogenous( self, client: AsyncClient, trained_lightgbm_model: str @@ -187,8 +249,6 @@ async def test_lightgbm_baseline_returns_model_exogenous( assert data["method"] == "model_exogenous" assert data["disclaimer"], "every comparison must carry a non-empty disclaimer" assert len(data["points"]) == 14 - assert data["disclaimer"], "every comparison must carry a non-empty disclaimer" - assert len(data["points"]) == 14 # A price cut moves the re-forecast — the deltas are model-driven, not # a fixed multiplier, and the modelled price response lifts demand. assert data["units_delta"] > 0.0 @@ -291,6 +351,176 @@ async def test_model_exogenous_plan_persists( assert fetched.json()["comparison"]["method"] == "model_exogenous" +@pytest.fixture +async def seeded_price_elastic_grain( + db_session: AsyncSession, +) -> AsyncGenerator[tuple[int, int], None]: + """Seed one (store, product) grain whose sales carry real price variance. + + Inserts the dimension rows, the calendar days the grain needs (skipping + any that already exist — CI shares the Postgres service), and sales rows + generated by ``SalesDailyGenerator`` WITH a price lookup: three multi-week + -20% price windows, so a trained regression model sees ``price_factor`` + spanning {0.8, 1.0} and can learn the demand response (issue #237). + Everything inserted here is removed on teardown. + """ + days = [ + _SEEDED_START + timedelta(days=offset) + for offset in range((_SEEDED_END - _SEEDED_START).days + 1) + ] + + db_session.add(Store(id=_SEEDED_STORE_ID, code=f"S{_SEEDED_STORE_ID}", name="Seeded E2E Store")) + db_session.add( + Product( + id=_SEEDED_PRODUCT_ID, + sku=f"SKU{_SEEDED_PRODUCT_ID}", + name="Seeded E2E Product", + base_price=_SEEDED_BASE_PRICE, + ) + ) + + existing_dates = set( + ( + await db_session.execute( + select(Calendar.date).where( + Calendar.date >= _SEEDED_START, Calendar.date <= _SEEDED_END + ) + ) + ) + .scalars() + .all() + ) + inserted_dates = [day for day in days if day not in existing_dates] + for day in inserted_dates: + db_session.add( + Calendar( + date=day, + day_of_week=day.weekday(), + month=day.month, + quarter=(day.month - 1) // 3 + 1, + year=day.year, + ) + ) + + # Three -20% windows of varying, non-week-aligned lengths so price_factor + # — not a quantity lag — is the cleanest predictor of the demand lift. + cut_price = Decimal("8.00") + price_lookup = build_price_lookup( + [ + { + "product_id": _SEEDED_PRODUCT_ID, + "store_id": None, + "price": cut_price, + "valid_from": window_start, + "valid_to": window_end, + } + for window_start, window_end in ( + (date(2026, 3, 1), date(2026, 3, 18)), + (date(2026, 4, 10), date(2026, 4, 30)), + (date(2026, 5, 20), date(2026, 6, 5)), + ) + ] + ) + generator = SalesDailyGenerator( + random.Random(42), + TimeSeriesConfig( + base_demand=100, + trend="none", + weekly_seasonality=[1.0] * 7, + monthly_seasonality={}, + noise_sigma=0.05, + anomaly_probability=0.0, + ), + # A strong elasticity so the learnable signal dwarfs the 5% noise: + # -20% price x -2.0 elasticity = +40% demand inside each window. + RetailPatternConfig(price_elasticity=-2.0), + SparsityConfig(), + [], + ) + sales_rows = generator.generate( + [_SEEDED_STORE_ID], + [(_SEEDED_PRODUCT_ID, _SEEDED_BASE_PRICE)], + days, + {}, + {}, + price_lookup=price_lookup, + ) + for row in sales_rows: + db_session.add(SalesDaily(**row)) + await db_session.commit() + + try: + yield (_SEEDED_STORE_ID, _SEEDED_PRODUCT_ID) + finally: + await db_session.execute( + delete(SalesDaily).where( + SalesDaily.store_id == _SEEDED_STORE_ID, + SalesDaily.product_id == _SEEDED_PRODUCT_ID, + ) + ) + if inserted_dates: + await db_session.execute(delete(Calendar).where(Calendar.date.in_(inserted_dates))) + await db_session.execute(delete(Product).where(Product.id == _SEEDED_PRODUCT_ID)) + await db_session.execute(delete(Store).where(Store.id == _SEEDED_STORE_ID)) + await db_session.commit() + + +@pytest.mark.integration +@pytest.mark.asyncio +class TestModelExogenousOnSeededData: + """Issue #237's repro, automated: seed → train regression → simulate. + + This is the end-to-end proof the fix targets. On pre-fix dev the seeder + emitted a constant ``unit_price``, the trained model learned zero price + elasticity, and any price assumption returned an exact 0.0 delta. With + the price/sales coupling on, a freshly seeded grain trains a model whose + re-forecast genuinely responds to a price cut. + """ + + async def test_seeded_train_simulate_price_cut_moves_demand( + self, client: AsyncClient, seeded_price_elastic_grain: tuple[int, int] + ) -> None: + """A price cut on a model trained on coupled seeded data → non-zero delta.""" + store_id, product_id = seeded_price_elastic_grain + + train = await client.post( + "/forecasting/train", + json={ + "store_id": store_id, + "product_id": product_id, + "train_start_date": _SEEDED_START.isoformat(), + "train_end_date": _SEEDED_END.isoformat(), + "config": {"model_type": "regression"}, + }, + ) + assert train.status_code == 200, train.text + model_path = Path(train.json()["model_path"]) + run_id = model_path.stem.removeprefix("model_") + + try: + response = await client.post( + "/scenarios/simulate", + json={ + "run_id": run_id, + "horizon": 14, + "assumptions": { + "price": { + "change_pct": -0.20, + "start_date": "2026-07-01", + "end_date": "2026-07-14", + }, + }, + }, + ) + assert response.status_code == 200, response.text + data = response.json() + + assert data["method"] == "model_exogenous" + assert data["units_delta"] != 0.0 + finally: + model_path.unlink(missing_ok=True) + + @pytest.mark.integration @pytest.mark.asyncio class TestScenarioPlanModel: From 82300eb51660f3c32e58f2f507825afbe2556459 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 11:32:21 +0200 Subject: [PATCH 24/44] docs(repo): track reliability E5 prp and document seeder price coupling (#237) --- ...bility-E5-model-exogenous-price-inertia.md | 609 ++++++++++++++++++ docs/DATA-SEEDER.md | 20 + 2 files changed, 629 insertions(+) create mode 100644 PRPs/PRP-reliability-E5-model-exogenous-price-inertia.md diff --git a/PRPs/PRP-reliability-E5-model-exogenous-price-inertia.md b/PRPs/PRP-reliability-E5-model-exogenous-price-inertia.md new file mode 100644 index 00000000..8aaf2956 --- /dev/null +++ b/PRPs/PRP-reliability-E5-model-exogenous-price-inertia.md @@ -0,0 +1,609 @@ +name: "PRP reliability-E5 — model_exogenous price inertia: discriminator tests + seeder price coupling" +description: | + Issue #237 (epic E5 of umbrella #380, milestone reliability-hardening). + Investigate-first epic: the `model_exogenous` scenario re-forecast returns an exact + 0.0 units/revenue delta for any price assumption. This PRP lands the discriminating + reproduction tests (wiring-bug vs zero-learned-elasticity), records the verdict the + research already established with runtime evidence, and ships the root-cause fix: + the seeder never couples its generated price changes into `sales_daily`, so every + trained regression model sees a constant `price_factor ≡ 1.0` and cannot learn a + price response. + +--- + +## Goal + +Make `method="model_exogenous"` scenario simulations genuinely respond to price +assumptions on seeded data, and pin the discriminating evidence as permanent tests: + +1. **Discriminator tests** (the issue's primary deliverable) that prove, in CI, which + hypothesis from #237 holds: (a) the future-frame/predict wiring drops the price + assumption, or (b) the trained model genuinely learned zero price elasticity. +2. **The fix per the verdict** — the verdict is (b), with a precise mechanical cause: + `app/shared/seeder/generators/facts.py` hardcodes `current_price=None` (line 525) + and `unit_price = base_price` (line 544), so seeded `sales_daily.unit_price` is + constant per product even though `price_history` (and Phase-2 markdowns) say the + price moved. Training-time `price_factor = unit_price / median(unit_price) ≡ 1.0` + is a constant column; `HistGradientBoostingRegressor` never splits on a constant, + so the re-forecast is invariant to any future `price_factor` — delta exactly 0.0 + (runtime-verified below). The fix threads the already-generated price series into + `SalesDailyGenerator` so `unit_price` varies truthfully AND demand responds via the + existing (currently dormant) `RetailPatternConfig.price_elasticity` machinery. +3. **End-to-end proof**: an integration test that seeds → trains a regression model → + simulates a price cut → asserts a non-zero delta. This is the test that fails on + `dev` today (with coupling off) and passes with the fix. + +**End state**: a freshly seeded DB + `POST /forecasting/train` (`regression`) + +`POST /scenarios/simulate` with a price assumption produces `units_delta != 0.0`, and +the What-If Planner's model_exogenous path is demonstrably model-driven. + +## Why + +- `method="model_exogenous"` is the scenarios slice's headline capability (PRP-27): a + *model-driven* what-if, distinct from the deterministic heuristic multiplier. While + it returns 0.0 for every price assumption it is indistinguishable from a no-op and + actively misleading in the Planner UI (#229/#236 made the path reachable; #237 is + the follow-up they explicitly deferred). +- The dormant seeder elasticity machinery (`RetailPatternConfig.price_elasticity = -0.5`, + `_compute_demand` lines 367-370) was designed for exactly this and is one wiring + step away from working. +- Data consistency: today `price_history` says a product's price changed while every + `sales_daily` row for the same days carries the unchanged `base_price` — the two + fact tables contradict each other. The fix removes a falsehood from "The Forge". + +## What + +### Investigation verdict (pre-established — encode as tests, don't re-litigate) + +Research for this PRP traced the full chain and verified each link at runtime: + +| # | Link | Evidence | Verdict | +|---|------|----------|---------| +| 1 | Scenario price assumption → future frame | `build_exogenous_columns` emits `price_factor = 1 + change_pct` inside the window (`app/features/scenarios/feature_frame.py:155-158`); unit test `test_exogenous_price_window` (`tests/test_feature_frame.py:133`) passes | ✅ wired | +| 2 | Two frames → two predicts | `_simulate_model_exogenous` builds a scenario frame and an assumptions-stripped baseline frame, feeds both to `bundle.model.predict(h, X)` (`app/features/scenarios/service.py:250-288`) | ✅ wired | +| 3 | `RegressionForecaster.predict` honors X | passes X straight to `HistGradientBoostingRegressor.predict` (`app/features/forecasting/models.py:1129`) | ✅ wired | +| 4 | E2E with a price-sensitive bundle | integration test asserts `units_delta > 0.0` for a LightGBM bundle trained on elastic synthetic data (`app/features/scenarios/tests/test_routes_integration.py:194`) — but the **regression**-bundle twin (line 150) asserts only `method`, not the delta (gap to close) | ✅ wired (gap: assertion missing for `regression`) | +| 5 | Seeded training data | `SalesDailyGenerator` hardcodes `current_price=None` ("Simplified: use base price", `facts.py:525`) and `unit_price = base_price` (`facts.py:544`); `PriceHistoryGenerator` output (`facts.py:562-653`) + markdown price records (`core.py:403`) never reach sales rows | ❌ **root cause** | +| 6 | Constant column → exact 0.0 | runtime-verified (Known Gotchas, VERIFIED CLAIM #1): HistGBR trained with a constant `price_factor` produces byte-identical predictions for price_factor 1.0 / 0.85 / 0.60 | ❌ consequence | + +**Verdict: hypothesis 2 from #237 — zero learned elasticity — with a deterministic +mechanical cause in the seeder, NOT a scenarios-slice wiring bug.** The 0.0 is exact +(not merely small) because a tree ensemble never splits on a constant training column. + +### Behavior change + +- `app/shared/seeder/`: `SalesDailyGenerator.generate` accepts an optional per-day + price lookup built from the already-generated `price_history` records (including + Phase-2 markdown price drops). When supplied, each sales row's `unit_price` is the + resolved day price and `_compute_demand` receives it as `current_price`, activating + the existing elasticity multiplier (`demand *= 1 + price_elasticity * change_pct`). + When omitted (`None`, the default) the legacy path is byte-identical — the seeder's + established disabled-path convention. +- `RetailPatternConfig` gains `price_sales_coupling: bool = True` — the orchestrator + (`core.py`) threads the lookup only when True. Default True so every scenario + (`demo_minimal`, `showcase_rich`, …) produces learnable price signal out of the box. +- Scenarios slice: **no production-code change** (wiring is sound). Tests only. +- Forecasting slice: **no production-code change**. Tests only. + +### Success Criteria + +- [ ] Frame discriminator test: scenario frame vs assumptions-stripped baseline frame + differ ONLY in the `price_factor` column, exactly inside the assumption window + (pure unit test, no DB). +- [ ] Inert-mechanism test: a `RegressionForecaster` fit on a constant-`price_factor` + matrix predicts identically for any future `price_factor` (delta exactly 0.0) — + pins hypothesis-2 mechanics so the verdict is executable documentation. +- [ ] The regression-bundle integration test asserts `units_delta > 0.0` (closing the + gap vs its LightGBM twin) and that a −40% cut moves demand at least as much as + −15%. +- [ ] Seeder: with coupling on, `sales_daily.unit_price` agrees with the active + `price_history` window per (store, product, date); with the lookup omitted the + output is byte-identical to today (regression-pinned). +- [ ] E2E integration test: seed (coupling on) → train `regression` → simulate a + price cut → `units_delta != 0.0` and `method == "model_exogenous"`. +- [ ] All five validation gates green; seeder + scenarios + forecasting suites pass; + `make demo` still goes green. +- [ ] Issue #237 closed with the verdict + evidence; `docs/DATA-SEEDER.md` documents + the coupling flag. + +## All Needed Context + +### Documentation & References + +```yaml +# ── The symptom and its scope ──────────────────────────────────────────────── +- issue: "#237 — gh issue view 237" + why: Exact repro (store 332/product 456, run dc6dc4aaea18, -0.15 and -0.40 both → 0.0), + the two hypotheses, and the investigate-first mandate. Epic E5 of umbrella #380. + +- file: app/features/scenarios/service.py + why: "_simulate_model_exogenous (lines 184-350) — the two-frame compare. READ-ONLY: + scenario_frame (line 250) carries request.assumptions; baseline_frame (line 265) + carries ScenarioAssumptions(); both go to bundle.model.predict(h, X) (283-288). + Wiring is sound — do NOT change this file." + +- file: app/features/scenarios/feature_frame.py + why: "build_exogenous_columns (114-181): price_factor = 1.0 + change_pct inside the + inclusive window (155-158), 1.0 outside. assemble_future_frame (184-232) is the + pure assembler the frame discriminator test (T1) drives — no DB needed." + +- file: app/features/forecasting/models.py + why: "RegressionForecaster (1020-1159): requires_features=True, fit validates X + (1084-1091), HistGBR estimator (1092-1099), predict passes X straight through + (1123-1133). T2 (inert-mechanism test) fits this class directly." + +# ── Root cause: the seeder's price/sales decoupling ───────────────────────── +- file: app/shared/seeder/generators/facts.py + why: "THE FIX SITE. _compute_demand (303+) already applies price elasticity when + current_price is not None (367-370: demand *= 1 + price_elasticity*change_pct). + The sales loop nulls it: current_price=None at 525 ('Simplified: use base + price') and unit_price = base_price at 544. PriceHistoryGenerator (562-653) + emits validity-window records: mostly chain-wide (store_id=None, line 604), + ~10% store-specific (607-608), final open-ended record valid_to=None (643-651)." + +- file: app/shared/seeder/core.py + why: "Orchestration order proves feasibility: price_records materialized at 335-341, + markdown price drops merged into the same list at 403, sales_gen.generate called + AFTER at 449-456 — the lookup can be built in between. Mirror the existing + optional-input convention: weather_lookup_for_sales (429-433) is threaded only + when its config flag is on, None otherwise." + +- file: app/shared/seeder/config.py + why: "RetailPatternConfig.price_elasticity: float = -0.5 (line 93, documented '% demand + change per % price change'). Add price_sales_coupling: bool = True beside it. + Dataclass conventions: field + docstring Args entry." + +# ── How training consumes prices (read-only — explains the constant column) ── +- file: app/features/forecasting/service.py + why: "Regression training reads sales_daily.unit_price (line 636), baseline_price = + median of positive prices (659-660), so constant unit_price → price_factor ≡ 1.0. + Also shows promo_dates/holiday_dates sourcing — training DOES see promo flags, + which is why promo_active (unlike price_factor) has training variance today." + +- file: app/shared/feature_frames/rows.py + why: "build_historical_feature_rows line 113: row.append(prices[index] / baseline_price) + — the price_factor cell. The single source of truth shared by training and the + scenarios future frame (canonical_feature_columns)." + +# ── Test patterns to mirror (extend, never weaken) ─────────────────────────── +- file: app/features/scenarios/tests/conftest.py + why: "trained_regression_model fixture (line 240): fits RegressionForecaster on + synthetic X = rng.normal(...), target = 40 - 20*price_factor + noise(0.5), + full PRP-27 metadata (feature_columns/history_tail/launch_date), saves bundle + under settings.forecast_model_artifacts_dir, yields run_id, unlinks on teardown. + TEST_TRAIN_END_DATE='2026-06-30' (line 49) — horizon days are 2026-07-01..14." + +- file: app/features/scenarios/tests/test_routes_integration.py + why: "_PRICE_ASSUMPTION (18-20): change_pct=-0.15, window 2026-07-01..14 — fully + covers the 14-day horizon. TestSimulateModelExogenous (147): the regression test + (150-165) asserts ONLY method=='model_exogenous'; the LightGBM twin asserts + units_delta > 0.0 at line 194. T3 adds the missing assertions to the regression + test (and dedups the copy-pasted lines 188-191 in the LightGBM twin if touched)." + +- file: app/features/scenarios/tests/test_feature_frame.py + why: "test_exogenous_price_window (133) — the existing price-window unit test; T1 + (frame discriminator) extends this file with a scenario-vs-baseline matrix diff." + +- file: app/features/scenarios/tests/test_future_frame_leakage.py + why: "LOAD-BEARING leakage spec (with app/shared/feature_frames/tests/test_leakage.py). + NEVER weaken. The new tests must not touch these files." + +- file: app/shared/seeder/tests/test_phase1_regression.py + why: "Byte-stability conventions: no_kwargs == explicit-defaults (58-72) and + disabled-Phase-1 consumes zero rng (75-93). The price lookup must use NO rng + draws and default to None so both invariants keep passing unchanged." + +- file: app/shared/seeder/tests/test_generators.py + why: "SalesDailyGenerator unit-test patterns (deterministic random.Random(seed), + direct .generate calls, dict-row assertions; total_amount == unit_price * + quantity at 229-243 — still holds after the fix since both are recomputed)." + +- docfile: PRPs/ai_docs/exogenous-regressor-forecasting.md + why: "PRP-27's research doc: the model contract, the future-frame leakage rule, and + why HistGBR was chosen. Background for the discriminator tests' design." + +# ── External references ─────────────────────────────────────────────────────── +- url: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html + why: "Binning behavior — features are discretized into max_bins=255 training-quantile + bins; values outside the training range fall into the edge bins. Explains both + the exact-0.0 symptom (constant column → no splits) and the saturation gotcha." + +- url: https://github.com/w7-mgfcode/ForecastLabAI/pull/236 + why: "PR that made the model_exogenous path reachable from the Planner (#229/#228) + and explicitly deferred this 0.0-delta investigation to #237." +``` + +### Current Codebase tree (relevant subset) + +```bash +app/ +├── features/ +│ ├── forecasting/ +│ │ ├── models.py # RegressionForecaster (1020-1159) +│ │ ├── service.py # training reads sales_daily.unit_price (636) +│ │ └── tests/test_models.py # _synthetic_data() builder pattern +│ └── scenarios/ +│ ├── feature_frame.py # build_exogenous_columns / assemble_future_frame +│ ├── service.py # _simulate_model_exogenous (READ-ONLY here) +│ └── tests/ +│ ├── conftest.py # trained_regression_model (240), price-sensitive +│ ├── test_feature_frame.py # T1 lands here +│ └── test_routes_integration.py # T3 lands here (150-165 gap) +└── shared/ + └── seeder/ + ├── config.py # RetailPatternConfig (price_elasticity at 93) + ├── core.py # orchestration (335 price → 449 sales) + ├── generators/facts.py # SalesDailyGenerator + PriceHistoryGenerator + └── tests/ # test_generators.py, test_phase1_regression.py … +``` + +### Desired Codebase tree + +```bash +app/ +├── features/ +│ ├── forecasting/tests/ +│ │ └── test_models.py # + T2: constant-column inertia test +│ └── scenarios/tests/ +│ ├── test_feature_frame.py # + T1: scenario-vs-baseline frame diff +│ └── test_routes_integration.py # + T3: regression delta assertions +│ # + T6: seeded-E2E repro (new class) +└── shared/seeder/ + ├── config.py # + RetailPatternConfig.price_sales_coupling + ├── generators/facts.py # + build_price_lookup() resolver; + │ # SalesDailyGenerator.generate( + │ # ..., price_lookup=None) + ├── core.py # + build & thread the lookup (gated) + └── tests/ + ├── test_generators.py # + T4: resolver + coupling unit tests + └── test_phase1_regression.py # unchanged — must keep passing +docs/DATA-SEEDER.md # + coupling flag paragraph +PRPs/PRP-reliability-E5-model-exogenous-price-inertia.md # this file +``` + +### Known Gotchas & Library Quirks + +```python +# ── VERIFIED LIBRARY CLAIM #1: constant training column → EXACT 0.0 delta (sklearn 1.6+) +# uv run python -c " +# import numpy as np +# from sklearn.ensemble import HistGradientBoostingRegressor +# rng = np.random.default_rng(42); n = 300 +# X = rng.normal(10, 2, size=(n, 5)); X[:, 4] = 1.0 # price col CONSTANT +# y = 40.0 + 0.5 * X[:, 0] + rng.normal(0, 0.5, n) +# m = HistGradientBoostingRegressor(max_iter=200, learning_rate=0.05, +# max_depth=6, random_state=42).fit(X, y) +# xt = X[:5].copy(); p0 = m.predict(xt) +# xt[:, 4] = 0.85; p15 = m.predict(xt); xt[:, 4] = 0.60; p40 = m.predict(xt) +# print(np.abs(p15-p0).max(), np.abs(p40-p0).max())" +# # → 0.0 0.0 (verified 2026-06-12, sklearn from uv.lock) +# A tree ensemble NEVER splits on a constant training column, so predictions are +# invariant to that column at predict time — the delta is EXACTLY 0.0, matching the +# issue symptom byte-for-byte. This is the mechanism T2 pins. Re-verify on sklearn bump. + +# ── VERIFIED LIBRARY CLAIM #2: out-of-training-range price clips to the edge bin ────── +# (same script, but X[:, 4] = rng.uniform(0.7, 1.1, n); y = 40 - 20*X[:, 4] + noise) +# # → delta(-15%) ≈ 2.88, delta(-40%, i.e. 0.60 < train-min 0.7) == delta(at 0.70) +# HistGBR bins by TRAINING quantiles; a scenario price_factor below the training range +# saturates at the lowest bin. After the seeder fix, training price_factor spans roughly +# [1 - max_price_change_pct, 1 + max_price_change_pct] compounded (~±20%/step) — deltas +# for cuts deeper than the observed range will saturate, NOT extrapolate linearly. +# T3's monotonicity assertion must therefore be >= (not >) between -40% and -15%. + +# ── GOTCHA: byte-stability is the seeder's contract ─────────────────────────────────── +# app/shared/seeder/tests/test_phase1_regression.py pins (a) no-kwargs == explicit +# defaults and (b) disabled features consume ZERO rng draws. The price resolver must be +# PURE (no rng) and the new generate() parameter must default to None with the legacy +# branch byte-identical. core.py threads it ONLY when config.retail.price_sales_coupling +# is True — mirroring the weather_lookup_for_sales gating pattern (core.py:429-433). + +# ── GOTCHA: Decimal vs float ────────────────────────────────────────────────────────── +# base_price / PriceHistory.price are Decimal (Numeric(10,2)); _compute_demand converts +# at line 369 (float((current_price - base_price) / base_price)). The resolver must +# return Decimal (quantized .01 — PriceHistoryGenerator already quantizes at 635-637) +# and the sales row's unit_price must stay Decimal (total_amount = unit_price * quantity +# at 545 multiplies Decimal * int). + +# ── GOTCHA: price_history grain & precedence ────────────────────────────────────────── +# Records are chain-wide (store_id=None) for ~90% of products, store-specific for ~10% +# (facts.py:604-608); markdown price records (Phase 2) are appended to the SAME list +# (core.py:403) and may OVERLAP a chain-wide window. Resolver precedence (deterministic): +# store-specific match > chain-wide match; within a scope, latest valid_from wins +# (markdowns fire later than the base window they cut). valid_to=None == open-ended. +# Days before the first window (shouldn't happen — generator starts at start_date) fall +# back to base_price. + +# ── GOTCHA: tests that already isolate price effects ────────────────────────────────── +# Several seeder suites pin OTHER effects with price_elasticity=0.0 (e.g. +# test_phase1_sales_effects.py:44, test_phase2_lifecycle_sales_integration.py:50). +# With coupling ON their demand is unchanged (elasticity 0 → multiplier 1.0) but +# unit_price in emitted rows now varies — any assertion pinned to base_price-derived +# totals must be checked. Tests calling SalesDailyGenerator.generate DIRECTLY (without +# the new kwarg) are byte-identical by construction. + +# ── GOTCHA: demo tuning ─────────────────────────────────────────────────────────────── +# demo_minimal is tuned to avoid the SPARSE NaN-WAPE trap (RUNBOOKS § make demo). +# Elasticity-driven demand shifts are bounded (±20% price × -0.5 → ≤ ±10% demand), so +# the tuning holds, but Level 4 MUST run `make demo` to confirm green. + +# ── GOTCHA: scenarios slice strictness ──────────────────────────────────────────────── +# ScenarioAssumptions date fields carry Field(strict=False) (JSON-path policy, +# docs/_base/SECURITY.md). New tests posting JSON bodies follow the existing +# _PRICE_ASSUMPTION dict shape — ISO strings, change_pct as float. + +# ── GOTCHA: line endings ────────────────────────────────────────────────────────────── +# The repo has mixed CRLF/LF files. Check `git diff --stat` before committing — a +# whole-file diff on facts.py/core.py means your editor rewrote line endings. +``` + +## Implementation Blueprint + +### Data models and structure + +```python +# app/shared/seeder/config.py — RetailPatternConfig (dataclass, beside price_elasticity) +price_sales_coupling: bool = True +# Args docstring: "price_sales_coupling: When True, sales_daily.unit_price follows the +# generated price_history windows (incl. markdowns) and demand responds via +# price_elasticity. When False, legacy behavior: unit_price = base_price always." + +# app/shared/seeder/generators/facts.py — pure resolver (NO rng), module level +PriceRecord = dict[str, date | int | Decimal | None] # the price_history row shape + +def build_price_lookup( + price_records: list[PriceRecord], +) -> dict[tuple[int, int | None], list[tuple[date, date | None, Decimal]]]: + """Index price records by (product_id, store_id), windows sorted by valid_from.""" + +def resolve_price( + lookup: dict[tuple[int, int | None], list[tuple[date, date | None, Decimal]]], + product_id: int, + store_id: int, + day: date, + base_price: Decimal, +) -> Decimal: + """Store-specific scope first, then chain-wide (None); within a scope the + latest-valid_from window covering `day` wins; fall back to base_price.""" +``` + +### List of tasks (dependency order) + +```yaml +Task 1 — T1 frame discriminator (scenarios, pure unit test): + MODIFY app/features/scenarios/tests/test_feature_frame.py: + - ADD test_scenario_vs_baseline_frame_differs_only_in_price_factor: + build two frames via assemble_future_frame (same dates/columns/history_tail/ + holiday_dates/launch_date): one with a PriceAssumption(change_pct=-0.15, + window covering days 3..7 of a 14-day horizon), one with ScenarioAssumptions(). + Assert: for the price_factor column index, cells differ EXACTLY on days 3..7 + (1.0 vs 0.85) and match outside; for EVERY other column index, cells are + identical (NaN-aware compare: math.isnan(a) and math.isnan(b) counts as equal). + - MIRROR the import/builder style of test_assemble_future_frame_shape_and_order (186). + WHY: executable proof of hypothesis-1 falsification — the assumption reaches X_future. + +Task 2 — T2 inert-mechanism test (forecasting, pure unit test): + MODIFY app/features/forecasting/tests/test_models.py: + - ADD test_regression_constant_price_column_is_inert_to_future_price (class + TestRegressionForecaster or module style — match the file's existing layout): + fit RegressionForecaster on synthetic X whose price column is constant 1.0 + (target driven by another column + noise, seeded rng); predict twice on the + same X_future with price col 1.0 vs 0.60; assert np.array_equal — delta is + EXACTLY zero. Docstring MUST state this pins issue #237's verdict: a model + trained on constant price_factor (the pre-fix seeded data) cannot respond to + price assumptions — the 0.0 delta is zero-learned-elasticity, not lost wiring. + - ADD the converse: fit on price-elastic synthetic data (price col uniform 0.7-1.1, + target = 40 - 20*price + noise); assert prediction at price 0.85 differs from 1.0. + WHY: the discriminating pair — together with Task 1 it IS the reproduction test + #237 asked for, and it documents the verdict in executable form. + +Task 3 — T3 close the regression-delta assertion gap (scenarios, integration): + MODIFY app/features/scenarios/tests/test_routes_integration.py: + - In test_regression_baseline_returns_model_exogenous (150-165): after the existing + method assertion ADD: points length == 14; non-empty disclaimer; + data["units_delta"] > 0.0 (the conftest bundle at conftest.py:240 is trained + price-sensitive: target = 40 - 20*price_factor — mirroring the LightGBM twin's + line 194). + - ADD test_regression_deeper_cut_moves_at_least_as_much: simulate change_pct=-0.15 + and -0.40 against the same trained_regression_model; assert + delta(-0.40) >= delta(-0.15) > 0.0 # >= not >: VERIFIED CLAIM #2 (bin clipping) + WHY: E2E wiring proof for the EXACT model type from the issue's repro. + +Task 4 — the fix: seeder price coupling: + MODIFY app/shared/seeder/config.py: + - ADD price_sales_coupling: bool = True to RetailPatternConfig (+ Args docstring line). + MODIFY app/shared/seeder/generators/facts.py: + - ADD build_price_lookup / resolve_price (pure, NO rng — see Data models). + - MODIFY SalesDailyGenerator.generate: new keyword-only param + price_lookup: dict[...] | None = None. In the per-day loop (511-544): + resolved = resolve_price(price_lookup, product_id, store_id, current_date, + base_price) if price_lookup is not None else None + pass current_price=resolved # replaces the hardcoded None at 525 + unit_price = resolved if resolved is not None else base_price # replaces 544 + PRESERVE everything else byte-for-byte; the None path must not change any rng + draw or emitted value (test_phase1_regression.py is the watchdog). + MODIFY app/shared/seeder/core.py: + - AFTER the markdown merge (403) and BEFORE sales_gen.generate (449): build the + lookup from price_records when self.config.retail.price_sales_coupling, else None + (mirror weather_lookup_for_sales gating, 429-433); pass price_lookup=... to + sales_gen.generate. + WHY: root-cause fix — truthful unit_price + active elasticity → learnable price signal. + +Task 5 — T4/T5 seeder tests: + MODIFY app/shared/seeder/tests/test_generators.py: + - ADD resolver unit tests: store-specific beats chain-wide; latest valid_from wins + on overlap; valid_to=None open-ended; uncovered day → base_price. + - ADD coupling tests on SalesDailyGenerator.generate: + (a) with a lookup whose price cuts day D by 20% and price_elasticity=-0.5: + row(D).unit_price == cut price AND quantity uplift vs a no-lookup run + (seeded rng twins — same seed, two generators); + (b) with price_lookup=None: output equals today's byte-for-byte (freeze a + small golden run in-test: same rng seed, compare full row lists). + - VERIFY total_amount == unit_price * quantity still holds (existing 229-243 + pattern covers it — extend if the row path changed). + RUN the full seeder suite; FIX any test that pinned base_price-derived totals with + coupling implicitly on (prefer setting price_sales_coupling=False in tests that + isolate OTHER effects — matching their existing price_elasticity=0.0 convention). + +Task 6 — T6 seeded end-to-end repro (integration): + MODIFY app/features/scenarios/tests/test_routes_integration.py (new class + TestModelExogenousOnSeededData, @pytest.mark.integration): + - Seed a tiny single-grain dataset DIRECTLY (insert store/product/calendar + + generate sales via SalesDailyGenerator with a price lookup containing at least + one multi-week price window — reuse the slice's DB fixtures; do NOT shell out to + the seeder API), >= _MIN_REGRESSION_TRAIN_ROWS days. + - POST /forecasting/train {model_type: regression} → run_id; + POST /scenarios/simulate with a price assumption covering the horizon. + - ASSERT method == "model_exogenous" and units_delta != 0.0. + - Idempotency: unique store/product ids (mirror TEST_STORE_ID=990001 convention) + + teardown deletes — CI shares the Postgres service. + WHY: the test that fails on pre-fix dev (constant prices) and passes post-fix — + the issue's repro, automated. + +Task 7 — docs + verdict: + MODIFY docs/DATA-SEEDER.md: paragraph on price_sales_coupling (what it couples, the + elasticity interaction, why default True, how to disable). + POST verdict comment on issue #237 (the evidence table from ## What) when the PR + merges; the PRP file itself is tracked via the standard docs(repo) commit. +``` + +### Per-task pseudocode (critical details only) + +```python +# Task 4 — resolver core (facts.py; pure, deterministic, no rng) +def resolve_price(lookup, product_id, store_id, day, base_price): + for scope in ((product_id, store_id), (product_id, None)): # specific → chain + windows = lookup.get(scope) + if not windows: + continue + # windows sorted by valid_from ASC at build time; scan from the latest so the + # most recent window covering `day` wins (markdown overlaps cut the base window) + for valid_from, valid_to, price in reversed(windows): + if valid_from <= day and (valid_to is None or day <= valid_to): + return price + return base_price + +# Task 4 — the two-line surgical change in the sales loop (facts.py:521-544) +resolved = ( + resolve_price(price_lookup, product_id, store_id, current_date, base_price) + if price_lookup is not None else None +) +quantity = self._compute_demand(..., current_price=resolved, ...) # was: None +... +unit_price = resolved if resolved is not None else base_price # was: base_price + +# Task 6 — the seeded-data signal (make the test data honestly elastic) +# Build sales WITH a lookup so training sees varying price_factor; the price window +# must be long enough (e.g. 3 weeks inside ~120 days) that HistGBR has bins on both +# sides of 1.0. Keep noise low; rely on RetailPatternConfig(price_elasticity=-0.5) default. +``` + +### Integration Points + +```yaml +DATABASE: + - migration: NONE — no schema change (sales_daily.unit_price already exists; only + its generated VALUES change) +CONFIG: + - app/shared/seeder/config.py: RetailPatternConfig.price_sales_coupling: bool = True + (dataclass field — not an env var; seeder config is request/preset-driven) +ROUTES: + - NONE — no API surface change +FRONTEND: + - NONE — the Planner already renders non-zero deltas (heuristic path proves it) +CI: + - no workflow change; integration tests ride the existing Postgres service job +``` + +## Validation Loop + +### Level 1: Syntax & Style + +```bash +uv run ruff check . && uv run ruff format --check . +uv run mypy app/ && uv run pyright app/ +# Expected: clean. The resolver's dict-tuple types must be fully annotated (--strict ×2). +``` + +### Level 2: Unit (no DB) + +```bash +# The discriminator pair + frame purity: +uv run pytest -v app/features/forecasting/tests/test_models.py -k "constant_price or elastic" +uv run pytest -v app/features/scenarios/tests/test_feature_frame.py +# Seeder: new resolver/coupling tests AND the untouched byte-stability watchdogs: +uv run pytest -v app/shared/seeder/tests/test_generators.py app/shared/seeder/tests/test_phase1_regression.py +# Then the full unit gate: +uv run pytest -v -m "not integration" +``` + +### Level 3: Integration (real Postgres) + +```bash +docker compose up -d && uv run alembic upgrade head +uv run pytest -v -m integration app/features/scenarios/tests/test_routes_integration.py +uv run pytest -v -m integration app/shared/seeder/tests/ +# CAVEAT (memory: integration-suite-shared-state-pollution): run scenario/seeder suites +# against a fresh DB (docker compose down -v first) before trusting failures. +``` + +### Level 4: End-to-end dogfood (the issue's original repro, inverted) + +```bash +docker compose down -v && docker compose up -d && uv run alembic upgrade head +uv run uvicorn app.main:app --port 8123 & # check no stale uvicorn holds :8123 first +uv run python scripts/seed_random.py --full-new --seed 42 --confirm +# discover a real (store_id, product_id) with sales (seeder does NOT reset sequences): +# GET /dimensions/stores, /dimensions/products +curl -s -X POST localhost:8123/forecasting/train -H 'content-type: application/json' \ + -d '{"store_id":<S>,"product_id":<P>,"model_type":"regression", ...}' # → run_id +curl -s -X POST localhost:8123/scenarios/simulate -H 'content-type: application/json' \ + -d '{"run_id":"<run_id>","horizon":14,"assumptions":{"price":{"change_pct":-0.15, + "start_date":"<T+1>","end_date":"<T+14>"}}}' +# Expected: method=model_exogenous, units_delta != 0.0 (sign per learned elasticity). +make demo # must stay green (demo_minimal tuning intact) +``` + +## Final validation Checklist + +- [ ] All five gates green: ruff, ruff format, mypy --strict, pyright --strict, pytest unit +- [ ] Integration suites green on a fresh DB (scenarios + seeder + forecasting) +- [ ] T1/T2 discriminator pair passes and is documented as #237's verdict in docstrings +- [ ] T3: regression model_exogenous integration test asserts units_delta > 0.0 +- [ ] T6 seeded E2E test passes (and demonstrably fails if coupling is forced off) +- [ ] test_phase1_regression.py passes UNCHANGED (byte-stability + zero-rng watchdogs) +- [ ] Level 4 curl shows non-zero delta; `make demo` green +- [ ] docs/DATA-SEEDER.md updated; verdict comment posted on #237 +- [ ] Commits: `test(forecast): …`, `fix(data): …`, `docs(repo): …` — all `(#237)`; + branch `fix/forecast-model-exogenous-price-inertia` off dev + +## Anti-Patterns to Avoid + +- ❌ Don't touch `app/features/scenarios/service.py` or `feature_frame.py` production + code — the wiring is verified sound; "fixing" it would be cargo-culting the symptom. +- ❌ Don't weaken any leakage spec (`test_future_frame_leakage.py`, + `app/shared/feature_frames/tests/test_leakage.py`, `featuresets/tests/test_leakage.py`). +- ❌ Don't draw rng in the price resolver or change rng draw order/count anywhere in + the legacy (lookup=None) path — byte-stability is the seeder's contract. +- ❌ Don't assert strict `>` between the −40% and −15% deltas — bin clipping makes + deep cuts saturate (VERIFIED CLAIM #2); use `>=` with both `> 0`. +- ❌ Don't make T6 shell out to the seeder HTTP API or assume 1-based ids — insert the + grain directly with unique ids and clean up (CI shares the DB). +- ❌ Don't "fix" the data inconsistency from the consumer side (training joining + price_history instead of sales_daily.unit_price) — sales rows carrying the actual + transaction price is the truthful contract; fix the producer. +- ❌ Don't add an API caveat field for zero-delta responses in this PRP — out of scope + (note it on #237 if reviewers want it; it would be additive schema work). + +## Confidence Score + +**8/10** for one-pass implementation success. + +Strong: the verdict is pre-established with runtime-verified mechanics; the fix reuses +dormant, already-tested machinery (`price_elasticity`, validity-window records); every +test has a named in-repo pattern to mirror; no schema/API surface change. + +Residual risk (the −2): the breadth of seeder-suite fallout from coupling-on-by-default +(tests that implicitly pinned constant unit_price totals) is only discoverable by +running the suite — Task 5 budgets for it; and T6's training step needs enough price +variance inside one grain's window for HistGBR to bin on, which the test controls by +constructing the price window explicitly. diff --git a/docs/DATA-SEEDER.md b/docs/DATA-SEEDER.md index 63ebe175..aca61990 100644 --- a/docs/DATA-SEEDER.md +++ b/docs/DATA-SEEDER.md @@ -196,6 +196,26 @@ uv run python scripts/seed_random.py --full-new --config examples/seed/config_cu - **Price Elasticity**: Demand adjustment based on price changes - **New Product Ramps**: Gradual demand increase for new launches +### Price/Sales Coupling (`price_sales_coupling`, issue #237) + +`RetailPatternConfig.price_sales_coupling` (default **`true`**) couples the generated +`price_history` windows — including Phase 2 markdown price drops — into `sales_daily`: + +- Each sales row's `unit_price` is the price actually active for that + `(store, product, date)` (store-specific windows beat chain-wide ones; on + overlapping windows the latest `valid_from` wins, so markdowns cut the base window). +- Demand responds to the price via `price_elasticity` + (`demand *= 1 + price_elasticity * change_pct`) — a -20% price window with the + default `-0.5` elasticity lifts demand by +10% inside the window. + +This matters because a forecasting model trained on seeded data can only learn a price +response when `sales_daily.unit_price` actually varies; with coupling off every trained +`regression` model sees a constant `price_factor ≡ 1.0` and any `model_exogenous` +scenario price assumption returns an exact 0.0 delta. Default `true` so every scenario +produces learnable price signal out of the box; set `price_sales_coupling: false` under +`retail:` to restore the legacy behavior (`unit_price = base_price` on every row, no +elasticity effect). + ## Phase 1 Realism Extensions Phase 1 adds opt-in realism: exogenous signals, multi-seasonality, trend changepoints, From fdf72c10e34bb48594101150609bc6c44b05e2c4 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 12:56:08 +0200 Subject: [PATCH 25/44] docs(repo): track reliability E6 prp (#387) --- PRPs/PRP-reliability-E6-release-gate.md | 551 ++++++++++++++++++++++++ 1 file changed, 551 insertions(+) create mode 100644 PRPs/PRP-reliability-E6-release-gate.md diff --git a/PRPs/PRP-reliability-E6-release-gate.md b/PRPs/PRP-reliability-E6-release-gate.md new file mode 100644 index 00000000..3cc5344d --- /dev/null +++ b/PRPs/PRP-reliability-E6-release-gate.md @@ -0,0 +1,551 @@ +name: "PRP reliability-E6 — release gate: showcase_rich dogfood + per-epic spot checks + umbrella close-out" +description: | + Issue #387 (epic E6 of umbrella #380, milestone reliability-hardening). + Release-gate epic: NO new production code. The deliverable is executed + verification — a green end-to-end showcase_rich dogfood run on a fresh stack, + one live spot check per closed reliability epic (E1 #334, E2 #335, E3 #332, + E4 #268, E5 #237), all five validation gates green on dev, evidence recorded + on #387, and umbrella #380 closed. If any check fails, the gate STOPS and + files a fix issue — it never fixes forward inside this epic. + +--- + +## Goal + +Prove that the five reliability fixes hold **as one system** on `dev`, not just as +isolated epic PRs, then close the reliability-hardening umbrella: + +1. **Fresh-stack showcase_rich dogfood** — `docker compose down -v` → up → migrate, + then a `/showcase` run with `scenario=showcase_rich` + **Re-seed first**: all + 24 steps / 10 phases green (PRP-41 layout). Provider-dependent steps may ⏭️ skip + or ⚠️ warn per `docs/_base/RUNBOOKS.md`; the pipeline must still end green + (`pipeline_complete`, no ❌ step). +2. **Five per-epic spot checks** on `dev` — each is a committed regression test + re-run PLUS (where meaningful) a live HTTP probe against the running stack. +3. **All five validation gates green** on `dev` (ruff check, ruff format --check, + mypy --strict, pyright --strict, pytest unit). +4. **Close-out** — evidence comment on #387, tick every satisfied checkbox on + #380 (the live body has drifted — see Known Gotchas), close #380 with a + close-out comment linking the evidence, close #387. + +**End state**: #387 and #380 are CLOSED with linked evidence; `dev` is demonstrably +green end-to-end; this PRP file is committed as `docs(repo)` (the E1–E5 precedent). + +## Why + +- "showcase_rich demo pipeline runs green end-to-end after E6" is the **last open + success criterion** on umbrella #380 — every other epic (#334, #335, #332, #268, + #237) is closed as of 2026-06-12. Nothing verifies their *combined* behavior yet. +- The five fixes interact: E1 changed the failure surface E2 classifies; E5's seeder + coupling changes the data every showcase step trains on; E4 moved an import the + alembic cold-boot path exercises; E3 only manifests in a real browser over LAN HTTP. + An isolated-PR-green ≠ system-green. +- The umbrella is also the flow-pack dogfood evidence (#368/#375) — a clean, + evidence-linked close-out is part of the methodology being proven. + +## What + +A verification campaign, not a feature. No `app/` or `frontend/` source change is +in scope. The only repo change this PRP itself produces is the PRP file +(`PRPs/PRP-reliability-E6-release-gate.md`) committed as +`docs(repo): track reliability E6 prp (#387)`. + +### Success Criteria (mirror of #387 exit criteria) + +- [ ] Fresh stack rebuilt: `docker compose down -v && docker compose up -d && + uv run alembic upgrade head` exits clean (this is ALSO the E4 cold-boot proof). +- [ ] showcase_rich dogfood green end-to-end via the `/showcase` page loaded over a + **plain-HTTP LAN origin** (covers E3 simultaneously) — evidence: final step + summary + screenshot; no white-screen, no ❌ step. +- [ ] E1 #334 spot check passes: doubled provider prefix → 422 (live PATCH + tests). +- [ ] E2 #335 spot check passes: exhausted fallback → 502 `AGENT_FALLBACK_EXHAUSTED` + with classified `failures[]` (committed route test on fresh DB + optional live probe). +- [ ] E3 #332 spot check passes: LAN-HTTP page load completes a run without + white-screen; `safeRandomUUID` vitest green. +- [ ] E4 #268 spot check passes: `ModelFamily` imports from + `app.shared.model_taxonomy`; zero lazy-import NOTEs reference the old + registry↔forecasting cycle; alembic cold-boot clean (from the fresh-stack step). +- [ ] E5 #237 spot check passes: seeded grain → train `regression` → price-cut + simulate → `method == "model_exogenous"` and `units_delta != 0.0` + (committed integration test + optional live curl chain). +- [ ] All five validation gates green on `dev`. +- [ ] Evidence comment posted on #387; all satisfied checkboxes ticked on #380; + #380 closed with close-out comment; #387 closed. + +## All Needed Context + +### Documentation & References + +```yaml +# ── The gate's contract ────────────────────────────────────────────────────── +- issue: "#387 — gh issue view 387" + why: The epic's sub-task list and exit criteria this PRP encodes verbatim. + +- issue: "#380 — gh issue view 380" + why: Umbrella. Success-criteria checklist — the LAST unchecked item + ("showcase_rich demo pipeline runs green end-to-end after E6") gets ticked + here; then the issue is closed with a close-out comment. + +# ── showcase_rich pipeline (what 'green' means) ───────────────────────────── +- file: app/features/demo/pipeline.py + why: "_phase_table() (~line 2464) is the step registry. showcase_rich = 24 steps / + 10 phases: data(7: precheck, reset, seed, status, features, phase2_enrichment, + historical_backfill), modeling(2: train, v2_train), decision(5: backtest, + register, champion_compat_compare, stale_alias_trigger, safer_promote_flow), + portfolio(1: batch_preset), planning(2: scenario_simulate_and_save, + multi_plan_compare), knowledge(3: embedding_provider_probe, rag_index_subset, + rag_retrieve_probe), verify(1), agents(1: agent_hitl_flow), ops(1: + ops_snapshot), cleanup(1). READ-ONLY." + +- file: docs/_base/RUNBOOKS.md + why: "'Showcase page (/showcase) pipeline fails at step X' — items 1–27 are the + per-step diagnosis table. Defines which skips/warns are ACCEPTABLE on a green + run (see Known Gotchas below). Consult before treating any non-✅ as failure." + +- file: docs/_base/API_CONTRACTS.md + why: "WS /demo/stream contract — start frame, StepEvent shape, pipeline_complete + fields (winner_model_type, winner_wape, winning_run_id, alias, wall_clock_s, + v2_run_id). The headless fallback path drives this directly." + +- file: frontend/src/pages/showcase.tsx + why: "UI controls and their request mapping (~line 110-115): + start({ seed: 42, skip_seed: !reseed, reset: resetDb, scenario }). + 'Re-seed first' checkbox → skip_seed=false. 'Reset database' → reset=true. + ScenarioPicker carries demo_minimal | showcase_rich | sparse." + +# ── E1 spot-check surface (#334) ───────────────────────────────────────────── +- file: app/core/config.py + why: "validate_model_identifier (line 20) — rejects nested provider prefix + ('google-gla:google-gla:…') with the 'Did you mean' ValueError; ollama + multi-colon tags stay valid. Settings.agent_default_model (192) / + agent_fallback_model (193) field_validator at line 231. READ-ONLY." + +- file: app/features/config/tests/test_routes.py + why: "test_patch_rejects_doubled_provider_prefix (line 120) — the live-route 422 + regression test to re-run." + +- file: app/features/config/tests/test_schemas.py + why: "test_rejects_doubled_provider_prefix (55), test_rejects_mixed_provider_prefix + (60), test_rejects_doubled_prefix_via_model_validate (134)." + +- file: app/features/agents/tests/test_config_validation.py + why: "test_doubled_prefix_rejected_at_settings_boot (line 41) — the Settings-boot + validation path." + +# ── E2 spot-check surface (#335) ───────────────────────────────────────────── +- file: app/core/exceptions.py + why: "AgentFallbackExhaustedError → 502 problem+json, code=AGENT_FALLBACK_EXHAUSTED, + type=…/errors/agent-fallback-exhausted, failures[] extension (line ~272)." + +- file: app/features/agents/service.py + why: "chat fallback-exhausted path (~line 316) and stream path (~line 717, + error_type='fallback_exhausted', recoverable=true). READ-ONLY." + +- file: app/features/agents/tests/test_routes.py + why: "TestChatRoutes (integration-marked) :: + test_chat_fallback_exhausted_returns_502_problem_json (line 167) — asserts 502, + code, two classified failures (model_not_found + quota_exhausted), secret + scrubbing. This is the committed ≥2-failure-leg proof; re-run it." + +# ── E3 spot-check surface (#332) ───────────────────────────────────────────── +- file: frontend/src/lib/uuid-utils.ts + why: "safeRandomUUID — crypto.randomUUID → getRandomValues-v4 → Math.random-v4 + fallback chain." + +- file: frontend/src/lib/uuid-utils.test.ts + why: "vitest incl. the explicit 'LAN-HTTP shape' case (randomUUID undefined). + Run: cd frontend && pnpm test --run src/lib/uuid-utils.test.ts" + +- file: frontend/eslint.config.js + why: "no-restricted-properties guard (~lines 30-44) banning raw crypto.randomUUID." + +# ── E4 spot-check surface (#268) ───────────────────────────────────────────── +- file: app/shared/model_taxonomy.py + why: "Exports ModelFamily (str Enum: BASELINE/TREE/ADDITIVE) + model_family_for + + _MODEL_FAMILY_MAP. Module docstring documents the resolved cycle. READ-ONLY." + +- file: docs/_base/ARCHITECTURE.md + why: "'Cross-slice read-only import pattern' section — records #268 as RESOLVED; + the ONLY legitimately remaining lazy pair is forecasting↔jobs." + +# ── E5 spot-check surface (#237) ───────────────────────────────────────────── +- file: app/features/scenarios/tests/test_routes_integration.py + why: "TestModelExogenousOnSeededData::test_seeded_train_simulate_price_cut_moves_demand + (line 480) — THE committed end-to-end proof: seeded elastic grain → train + regression → simulate -20% price cut → method=='model_exogenous' && + units_delta != 0.0. Re-run it on the fresh DB. Also shows the exact live-curl + request bodies (train: lines 486-496, simulate: lines 503-516)." + +- file: PRPs/PRP-reliability-E5-model-exogenous-price-inertia.md + why: "The E5 verdict + fix narrative — context for interpreting a failure here + (seeder coupling flag RetailPatternConfig.price_sales_coupling=True)." + +# ── Close-out mechanics ────────────────────────────────────────────────────── +- file: .claude/rules/umbrella-issue.md + why: "Write discipline for gh mutations: dry-run echo → idempotent check → + approval gate → confirm. Applies to the #380 body edit + closes." + +- file: .claude/rules/output-formatting.md + why: "Evidence comment format: emoji status indicators, box separators, ≤40 lines." +``` + +### Current Codebase tree (verification-relevant subset) + +```bash +app/core/config.py # validate_model_identifier (E1) +app/core/exceptions.py # AGENT_FALLBACK_EXHAUSTED (E2) +app/shared/model_taxonomy.py # ModelFamily home (E4) +app/features/demo/pipeline.py # _phase_table — 24-step showcase_rich registry +app/features/config/tests/ # E1 regression tests +app/features/agents/tests/ # E2 route test (integration), E1 boot test +app/features/scenarios/tests/test_routes_integration.py # E5 e2e test (integration) +frontend/src/lib/uuid-utils.{ts,test.ts} # E3 +frontend/src/pages/showcase.tsx # dogfood entry point +scripts/run_demo.py # legacy CLI pipeline (NOT the dogfood target) +``` + +### Desired Codebase tree + +```bash +PRPs/PRP-reliability-E6-release-gate.md # this file — the ONLY tracked change +# No app/, frontend/, alembic/, or docs/_base/ source change is in scope. +``` + +### Known Gotchas & Environment Quirks + +```python +# ── STOP RULE (governs the whole epic) ─────────────────────────────────────── +# If ANY spot check or the dogfood FAILS: capture evidence (response body / +# screenshot / log excerpt), open a NEW fix issue referencing #380 + the failed +# epic issue, comment the failure on #387, and STOP. The release gate never +# fixes forward — a fix is a new branch/PR through the normal flow, and the +# gate re-runs after it merges. + +# ── Fresh stack / processes ────────────────────────────────────────────────── +# GOTCHA: a stale uvicorn from a prior session can hold :8123 — curl then hits +# OLD code while you think you're testing dev. Before starting the backend: +# lsof -iTCP:8123 -sTCP:LISTEN # kill any stale PID first +# GOTCHA: `docker compose down -v` ERASES the DB incl. RAG corpus and app_config +# runtime overrides (agent model settings revert to .env values on next boot). +# That's desired here (clean gate), but means: re-check GET /config/ai after boot. +# GOTCHA: run the BACKEND AS LOCAL UVICORN (uv run uvicorn app.main:app --port +# 8123), NOT the compose backend container — model artifacts must land on the +# host filesystem for verify/feature-metadata steps, and the docker-compose.yml +# default brings up Postgres only on :5433 anyway. +# GOTCHA: pnpm 11 depsStatusCheck can stall `pnpm dev` — start Vite directly: +# cd frontend && ./node_modules/.bin/vite --host 0.0.0.0 + +# ── Dogfood / browser ──────────────────────────────────────────────────────── +# CRITICAL (E3): crypto.randomUUID is undefined only in NON-SECURE contexts. +# http://localhost:5173 IS a secure context — it cannot reproduce #332. Load the +# page via a real LAN IP: http://$(hostname -I | awk '{print $1}'):5173/showcase. +# frontend/.env VITE_API_BASE_URL=http://localhost:8123 still works when browsing +# from this same host (the browser resolves localhost locally), and the backend +# CORS dev regex already allows 10.x/192.168.x/172.16-31.x origins. +# GOTCHA: Playwright MCP and `playwright install` both fail on this host. Use +# native Python Playwright with executable_path="/snap/bin/chromium", or the +# agent-browser skill. Verify the chromium path exists before relying on it. +# ACCEPTABLE NON-GREEN STEPS on showcase_rich (RUNBOOKS items 9-26): per #387, +# "provider-dependent steps may ⏭️ skip per RUNBOOKS, pipeline still green": +# - agent_hitl_flow ⏭️ — no key for agent_default_model provider / approval +# timeout / model didn't call save_scenario (known recurring skip on this host) +# - rag_index_subset / rag_retrieve_probe ⏭️ — embedding provider unreachable +# or rejected credentials (#329); embedding_provider_probe ✅ even when +# reachable=False +# - verify ⏭️ — expected on a prophet_like (V2) winner: artifact roots differ +# - champion_compat_compare / safer_promote_flow ⏭️ — missing V2 run or V1 +# baseline (should NOT happen with Re-seed first ticked — investigate if hit) +# - batch_preset ⚠️ — 90 s poll timeout on a loaded laptop (non-fatal) +# - ops_snapshot ⚠️ — /ops/* unavailable (warn, never fail) +# ANY ❌ step = gate failure → STOP RULE. +# GOTCHA: only one pipeline runs at a time (module asyncio.Lock); a second start +# gets one `error` event / POST gets 409. Stop button releases the lock in ~5 s. +# Wall-clock: budget ~5-10 min for showcase_rich on this laptop; per-step HTTP +# timeout is 120 s, batch poll 90 s, HITL approval 90 s. + +# ── Spot-check mechanics ───────────────────────────────────────────────────── +# E2 (integration test): TestChatRoutes is @pytest.mark.integration — needs the +# compose Postgres up + migrations applied. Run TARGETED tests, NOT the full +# integration suite: the full suite is known to pollute shared DB state mid-run +# (destructive seeder tests) and produce false negatives. Run the E2 + E5 tests +# individually, E5 BEFORE anything that mutates seeded data, or on a fresh DB. +# E2 (optional live probe): PATCH /config/ai persists overrides to app_config +# AND applies live. To provoke real exhaustion: GET /config/ai (record current +# agent_default_model/agent_fallback_model), PATCH both to +# "ollama:nonexistent-model-e6" (valid format — passes E1 validation; Ollama at +# localhost:11434 returns 404 → reason="model_not_found"), create session, chat, +# expect 502; then PATCH the recorded values BACK. NEVER leave the override in +# place — it would break the showcase agent step on the next run. +# E5 (live curl variant): the /scenarios/* run_id is the ARTIFACT KEY parsed +# from TrainResponse.model_path ("model_{key}.joblib" → stem minus "model_"), +# NOT the registry model_run.run_id. Different ID spaces. +# E5 (live curl variant): the seeder does NOT reset Postgres ID sequences — +# discover real store/product IDs + date window via GET /dimensions/stores, +# GET /dimensions/products, and the seeded calendar range; never assume id=1. +# (Fresh `down -v` stack makes IDs 1-based again, but discover anyway.) +# E4: the ONLY remaining lazy-import NOTE in app/ must be the forecasting↔jobs +# pair (app/features/forecasting/service.py:~1050). Anything mentioning a +# ModelFamily / registry↔forecasting cycle = E4 regression → STOP RULE. + +# ── Validation gates / frontend ────────────────────────────────────────────── +# GOTCHA: `pnpm tsc --noEmit` is VACUOUS here (solution-style tsconfig checks 0 +# files) and `tsc -b` has known pre-existing failures on dev — frontend +# type-check is NOT one of this gate's five criteria. Frontend evidence = the +# uuid-utils vitest + the browser dogfood. +# The five gates (#387 wording): ruff check, ruff format --check, mypy app/, +# pyright app/, pytest -m "not integration". +# GOTCHA: app/core/tests/test_config.py settings tests can fail if they pick up +# the local .env — known issue, fixed via Settings(_env_file=None) in the tests +# already; if a gate failure looks like .env-bleed, see RUNBOOKS before STOPping. + +# ── GitHub close-out ───────────────────────────────────────────────────────── +# Write discipline (.claude/rules/umbrella-issue.md): echo each gh mutation +# before running it. +# DRIFT WARNING (verified 2026-06-12): #380's live body has ALL 12 checkboxes +# unticked — the five per-epic success criteria were never ticked when E1-E5 +# closed, and the E6 Decomposition line still says "not yet created". Closing +# the umbrella with unticked boxes contradicts umbrella-issue.md ("checkbox list +# an outside reviewer uses as the close-or-not decision"). So: tick EVERY +# satisfied box (5 success criteria + 5 E1-E5 decomposition lines + the final +# showcase_rich criterion + the E6 line), and update the E6 line's "not yet +# created" → "#387". Do NOT pattern-match checkbox text literally — the live +# body contains backticks (`showcase_rich`) the issue text elsewhere omits; +# fetch with `gh issue view 380 --json body`, edit the markdown, push back via +# `gh issue edit 380 --body-file`. Preserve everything else byte-identical — +# the body carries an HTML provenance comment. +# Close order: evidence comment on #387 → tick #380 → close #380 (comment links +# #387 evidence) → close #387 last (it's the epic doing the closing). +``` + +## Implementation Blueprint + +### Data models and structure + +None. This epic ships zero schemas, zero migrations, zero source changes. + +### List of tasks in execution order + +```yaml +Task 0 — Preflight: + VERIFY branch: git switch dev && git pull → clean, up to date with origin/dev. + VERIFY no stale server: lsof -iTCP:8123 -sTCP:LISTEN → kill stale PIDs. + VERIFY chromium for dogfood: ls /snap/bin/chromium (else plan agent-browser skill). + RECORD: git rev-parse HEAD → the SHA all evidence refers to. + +Task 1 — Fresh stack (E4 cold-boot proof rides along): + RUN: docker compose down -v + RUN: docker compose up -d # Postgres+pgvector on :5433 + RUN: uv run alembic upgrade head # MUST exit 0 on the EMPTY db — E4 evidence + RUN: uv run python scripts/check_db.py # connectivity sanity + START backend: uv run uvicorn app.main:app --port 8123 (background, log to file) + VERIFY: curl -s http://localhost:8123/health → {"status":"ok"} + START frontend: cd frontend && ./node_modules/.bin/vite --host 0.0.0.0 (background) + VERIFY: curl -sI http://localhost:5173 → 200. + +Task 2 — showcase_rich dogfood over LAN origin (primary deliverable; covers E3): + DISCOVER LAN IP: hostname -I | awk '{print $1}' + DRIVE BROWSER (native Python Playwright, executable_path=/snap/bin/chromium): + - goto http://<LAN_IP>:5173/showcase # NON-secure context — E3 surface + - assert page renders (no white-screen), zero console errors mentioning + randomUUID / crypto + - select scenario "showcase_rich"; tick "Re-seed first" + (→ {seed:42, skip_seed:false, reset:false, scenario:"showcase_rich"}) + - click Run; poll up to ~10 min for the completion banner + - if the HITL step card shows an Approve button within its 90 s window, + click it (a ⏭️ skip on agent_hitl_flow is acceptable per RUNBOOKS 23-25) + - capture: full-page screenshot + the per-step status list (24 rows) + ASSERT: pipeline green — every step ✅/⏭️/⚠️ per the acceptable-list in Known + Gotchas; zero ❌. Record winner_model_type / winner_wape / v2_run_id from the + summary if surfaced. + FALLBACK (only if browser automation is unusable): drive WS /demo/stream + headlessly with start frame {"seed":42,"reset":false,"skip_seed":false, + "scenario":"showcase_rich"}, assert pipeline_complete + zero fail events — + THEN still do a LAN-origin page load + one demo_minimal UI run for E3. + ON ANY ❌ STEP: STOP RULE (RUNBOOKS items 1-27 give the diagnosis per step). + +Task 3 — E1 #334 spot check (doubled provider prefix → 422): + LIVE: curl -s -o /dev/null -w '%{http_code}' -X PATCH \ + http://localhost:8123/config/ai -H 'Content-Type: application/json' \ + -d '{"agent_default_model":"google-gla:google-gla:gemini-2.0-flash"}' + → expect 422; re-run without -o to capture the problem+json body + (RFC 7807, mentions nested provider prefix). NOTE: a 422 means nothing + was persisted — no restore needed. + TESTS: uv run pytest \ + app/features/config/tests/test_schemas.py \ + app/features/config/tests/test_routes.py::TestUpdateAIConfig \ + "app/features/agents/tests/test_config_validation.py::TestModelIdentifierValidation::test_doubled_prefix_rejected_at_settings_boot" \ + -v -k "doubled or mixed or prefix" + +Task 4 — E2 #335 spot check (fallback exhaustion classified): + TEST (the committed ≥2-leg proof; integration-marked, fresh DB is up): + uv run pytest "app/features/agents/tests/test_routes.py::TestChatRoutes::test_chat_fallback_exhausted_returns_502_problem_json" -v -m integration + OPTIONAL LIVE PROBE (only if Ollama responds on localhost:11434): + - GET /config/ai → record agent_default_model + agent_fallback_model + - PATCH /config/ai {"agent_default_model":"ollama:nonexistent-model-e6", + "agent_fallback_model":"ollama:nonexistent-model-e6"} + - POST /agents/sessions {"agent_type":"experiment"} → session_id + - POST /agents/sessions/{id}/chat {"message":"hello"} + → expect 502 application/problem+json, code=AGENT_FALLBACK_EXHAUSTED, + failures[] with reason model_not_found, no secret values in body + - DELETE the session; PATCH /config/ai back to the recorded values; GET to + confirm restore. (MANDATORY restore — see Known Gotchas.) + +Task 5 — E4 #268 spot check (taxonomy home + no stale cycle NOTEs): + RUN: uv run python -c "from app.shared.model_taxonomy import ModelFamily, model_family_for; print(model_family_for('regression'), model_family_for('prophet_like'), model_family_for('naive'))" + → "ModelFamily.TREE ModelFamily.ADDITIVE ModelFamily.BASELINE" + RUN: grep -rn "ModelFamily" app/ --include="*.py" | grep -v "model_taxonomy" \ + | grep -iE "lazy|cycle|circular|NOTE" → MUST be empty + RUN: grep -rn "NOTE" app/ --include="*.py" | grep -iE "lazy|cycle|circular" + → ONLY the forecasting↔jobs pair (app/features/forecasting/service.py). + EVIDENCE: alembic cold-boot already proven in Task 1 (upgrade head on empty DB). + +Task 6 — E5 #237 spot check (price cut moves model_exogenous demand): + TEST (the committed e2e proof; run BEFORE anything further mutates seed data — + Task 2's run is fine, the test seeds its own isolated grain and cleans up): + uv run pytest "app/features/scenarios/tests/test_routes_integration.py::TestModelExogenousOnSeededData::test_seeded_train_simulate_price_cut_moves_demand" -v -m integration + OPTIONAL LIVE CURL CHAIN (mirrors the test, against the showcase-seeded data): + - GET /dimensions/stores + /dimensions/products → pick a real (store_id, + product_id) with sales (never assume id=1) + - POST /forecasting/train {"store_id":S,"product_id":P, + "train_start_date":"<window start>","train_end_date":"<window end>", + "config":{"model_type":"regression"}} → 200; model_path + - run_id = basename(model_path) minus "model_" prefix minus ".joblib" + - POST /scenarios/simulate {"run_id":run_id,"horizon":14,"assumptions": + {"price":{"change_pct":-0.20,"start_date":"<D+1>","end_date":"<D+14>"}}} + → 200, method=="model_exogenous", units_delta != 0.0 + +Task 7 — Five validation gates on dev: + RUN: uv run ruff check . && uv run ruff format --check . + RUN: uv run mypy app/ && uv run pyright app/ + RUN: uv run pytest -v -m "not integration" + PLUS frontend E3 unit evidence: cd frontend && pnpm test --run src/lib/uuid-utils.test.ts + ALL must pass. A failure here on untouched dev = regression → STOP RULE. + +Task 8 — Evidence + close-out (gh write discipline: echo each command first): + COMMIT this PRP file FIRST (before any close): branch docs/reliability-e6-prp + off dev, `docs(repo): track reliability E6 prp (#387)`, PR into dev (E5 + precedent: commit 82300eb). NOTE: the PR needs 1 approving review + CI — + it will NOT merge autonomously; opening it is enough to proceed, the merge + lands through the normal flow. + COMMENT on #387: evidence block per .claude/rules/output-formatting.md — + HEAD SHA, fresh-stack proof, dogfood result table (24 steps with ✅/⏭️/⚠️ and + skip reasons), the five spot-check results with the exact commands run, + gate results, screenshot attached or path referenced. + EDIT #380 body (see DRIFT WARNING in Known Gotchas): tick ALL satisfied + checkboxes — the 5 per-epic success criteria, the 5 E1-E5 Decomposition + lines, the E6 Decomposition line (updating "not yet created" → "#387"), and + the final "...showcase_rich demo pipeline runs green end-to-end after E6" + criterion. Preserve everything else byte-identical. + CLOSE #380: gh issue close 380 --comment "<close-out linking the #387 evidence + comment + per-epic issue list #334 #335 #332 #268 #237>" + CLOSE #387: gh issue close 387 --comment "<gate complete — evidence above>" + +Task 9 — Teardown: + STOP the background uvicorn + vite processes started in Task 1. + LEAVE the seeded DB in place (operator-visible artefacts are fine post-gate). +``` + +### Integration Points + +```yaml +GITHUB: + - issue #387: evidence comment + close + - issue #380: body checkbox tick + close-out comment + close + - PR: docs(repo) commit of this PRP file into dev + +RUNTIME (no code integration — consumers only): + - docker compose Postgres :5433, local uvicorn :8123, Vite :5173 (LAN-bound) + - Ollama localhost:11434 (optional, E2 live probe + agent/knowledge steps) +``` + +## Validation Loop + +### Level 1 — environment sanity (before anything else) + +```bash +git -C . status --short && git rev-parse --abbrev-ref HEAD # dev, clean +lsof -iTCP:8123 -sTCP:LISTEN # must be empty +docker compose ps # postgres healthy +curl -s http://localhost:8123/health # {"status":"ok"} after Task 1 +``` + +### Level 2 — targeted regression tests (the per-epic committed proofs) + +```bash +# E1 +uv run pytest app/features/config/tests/ app/features/agents/tests/test_config_validation.py -v -k "doubled or mixed or prefix" +# E2 (integration — fresh DB) +uv run pytest "app/features/agents/tests/test_routes.py::TestChatRoutes::test_chat_fallback_exhausted_returns_502_problem_json" -v -m integration +# E3 +cd frontend && pnpm test --run src/lib/uuid-utils.test.ts && cd .. +# E4 +uv run python -c "from app.shared.model_taxonomy import ModelFamily, model_family_for; print(model_family_for('regression'))" +# E5 (integration — self-seeding, self-cleaning) +uv run pytest "app/features/scenarios/tests/test_routes_integration.py::TestModelExogenousOnSeededData::test_seeded_train_simulate_price_cut_moves_demand" -v -m integration +``` + +### Level 3 — live system (dogfood + probes) + +```bash +# Dogfood: browser at http://<LAN_IP>:5173/showcase, scenario=showcase_rich, +# Re-seed first ticked → green pipeline, screenshot captured. (Task 2.) + +# E1 live: +curl -s -X PATCH http://localhost:8123/config/ai -H 'Content-Type: application/json' \ + -d '{"agent_default_model":"google-gla:google-gla:gemini-2.0-flash"}' | head -c 400 +# → 422 problem+json mentioning the nested provider prefix + +# E5 live: train→simulate chain per Task 6 (IDs discovered, never assumed). +``` + +### Level 4 — repo gates + +```bash +uv run ruff check . && uv run ruff format --check . +uv run mypy app/ && uv run pyright app/ +uv run pytest -v -m "not integration" +``` + +## Final validation Checklist + +- [ ] Fresh stack: `down -v` → `up -d` → `alembic upgrade head` clean (E4 cold-boot) +- [ ] showcase_rich dogfood: 24 steps / 10 phases, zero ❌, over plain-HTTP LAN + origin, screenshot + step table captured (E3 white-screen proof included) +- [ ] E1: live PATCH → 422; doubled/mixed-prefix tests green +- [ ] E2: `test_chat_fallback_exhausted_returns_502_problem_json` green + (+ optional live 502 probe, config RESTORED afterwards) +- [ ] E3: uuid-utils vitest green; LAN page load clean +- [ ] E4: taxonomy import one-liner correct; zero stale cycle NOTEs + (only forecasting↔jobs remains) +- [ ] E5: `test_seeded_train_simulate_price_cut_moves_demand` green + (+ optional live chain: method=model_exogenous, units_delta != 0.0) +- [ ] Five gates green: ruff, format, mypy, pyright, unit pytest +- [ ] Evidence comment on #387; #380 checkbox ticked; #380 closed; #387 closed +- [ ] This PRP committed via `docs(repo): track reliability E6 prp (#387)` +- [ ] Background servers stopped; no config overrides left in app_config + +--- + +## Anti-Patterns to Avoid + +- ❌ Don't fix forward inside the gate — a failed check files a new issue and STOPS +- ❌ Don't treat a RUNBOOKS-sanctioned ⏭️/⚠️ as failure — but don't hand-wave a ❌ either +- ❌ Don't verify E3 on localhost — it's a secure context; #332 only manifests on LAN IP +- ❌ Don't run the FULL integration suite as a gate — known shared-state pollution; + run the targeted tests listed above +- ❌ Don't leave `ollama:nonexistent-model-e6` (or any probe override) in app_config +- ❌ Don't assume store/product IDs or date windows — discover via /dimensions/* +- ❌ Don't rewrite #380's body beyond ticking satisfied checkboxes + the E6 line update +- ❌ Don't `gh pr merge --merge` anything dev→main here — this epic ends at `dev`; + the release cut is a separate decision (stop-and-ask gate) + +## Confidence Score: 8.5/10 + +One-pass success likelihood is high: every spot check maps to a committed, +named regression test plus an exact live command; the dogfood path, acceptable +skip list, and environment traps (stale uvicorn, LAN secure-context, ID +discovery, config restore) are all pinned with file:line grounding. Residual +risk (−1.5): the showcase_rich browser run has non-deterministic legs +(agent_hitl_flow, provider reachability, batch timing on a loaded laptop) that +may force a re-run or RUNBOOKS triage, and host browser automation has a known +fragile setup (snap chromium path). From 62a2463cde67ef5142a7167d21e041f5c4da2669 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 12:59:56 +0200 Subject: [PATCH 26/44] docs(repo): address review wording nits on e6 prp (#387) --- PRPs/PRP-reliability-E6-release-gate.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/PRPs/PRP-reliability-E6-release-gate.md b/PRPs/PRP-reliability-E6-release-gate.md index 3cc5344d..834c9033 100644 --- a/PRPs/PRP-reliability-E6-release-gate.md +++ b/PRPs/PRP-reliability-E6-release-gate.md @@ -1,8 +1,8 @@ name: "PRP reliability-E6 — release gate: showcase_rich dogfood + per-epic spot checks + umbrella close-out" description: | Issue #387 (epic E6 of umbrella #380, milestone reliability-hardening). - Release-gate epic: NO new production code. The deliverable is executed - verification — a green end-to-end showcase_rich dogfood run on a fresh stack, + Release-gate epic: NO new production code. The deliverable is an executed + verification: a green end-to-end showcase_rich dogfood run on a fresh stack, one live spot check per closed reliability epic (E1 #334, E2 #335, E3 #332, E4 #268, E5 #237), all five validation gates green on dev, evidence recorded on #387, and umbrella #380 closed. If any check fails, the gate STOPS and @@ -442,7 +442,7 @@ Task 8 — Evidence + close-out (gh write discipline: echo each command first): Task 9 — Teardown: STOP the background uvicorn + vite processes started in Task 1. - LEAVE the seeded DB in place (operator-visible artefacts are fine post-gate). + LEAVE the seeded DB in place (operator-visible artifacts are fine post-gate). ``` ### Integration Points From 1bc887f03984e8eb1d426496f5807de0ecc5d5af Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 14:05:11 +0200 Subject: [PATCH 27/44] feat(api): add showcase_workspace model and migration (#390) --- alembic/env.py | 1 + ...fa37fcc_create_showcase_workspace_table.py | 103 ++++++++++++++++++ app/features/demo/models.py | 89 +++++++++++++++ 3 files changed, 193 insertions(+) create mode 100644 alembic/versions/324a2fa37fcc_create_showcase_workspace_table.py create mode 100644 app/features/demo/models.py diff --git a/alembic/env.py b/alembic/env.py index 2cadd971..4f40a1ee 100644 --- a/alembic/env.py +++ b/alembic/env.py @@ -16,6 +16,7 @@ from app.features.batch import models as batch_models # noqa: F401 from app.features.config import models as config_models # noqa: F401 from app.features.data_platform import models as data_platform_models # noqa: F401 +from app.features.demo import models as demo_models # noqa: F401 from app.features.explainability import models as explainability_models # noqa: F401 from app.features.jobs import models as jobs_models # noqa: F401 from app.features.model_selection import models as model_selection_models # noqa: F401 diff --git a/alembic/versions/324a2fa37fcc_create_showcase_workspace_table.py b/alembic/versions/324a2fa37fcc_create_showcase_workspace_table.py new file mode 100644 index 00000000..a3dd4bc2 --- /dev/null +++ b/alembic/versions/324a2fa37fcc_create_showcase_workspace_table.py @@ -0,0 +1,103 @@ +"""create showcase_workspace table + +Revision ID: 324a2fa37fcc +Revises: e4f5a6b7c8d9 +Create Date: 2026-06-12 10:00:00.000000 + +E1 of the showcase-workspace initiative (umbrella #389, epic #390). First +table owned by the demo slice: one row per preserved showcase run -- its +configuration (replay inputs) plus the soft-reference ids of every object the +pipeline created. Deliberately NO ForeignKey to ``model_run`` / +``scenario_plan`` / ``batch_job`` / ``agent_session`` -- recorded ids are +opaque soft references so cross-slice schema coupling stays zero and the +referenced rows remain independently deletable. +""" + +from collections.abc import Sequence + +import sqlalchemy as sa +from sqlalchemy.dialects import postgresql + +from alembic import op + +# revision identifiers, used by Alembic. +revision: str = "324a2fa37fcc" +down_revision: str | None = "e4f5a6b7c8d9" +branch_labels: str | Sequence[str] | None = None +depends_on: str | Sequence[str] | None = None + + +def upgrade() -> None: + """Apply migration -- create the showcase_workspace table.""" + op.create_table( + "showcase_workspace", + sa.Column("id", sa.Integer(), nullable=False), + sa.Column("workspace_id", sa.String(length=32), nullable=False), + sa.Column("name", sa.String(length=100), nullable=True), + sa.Column("status", sa.String(length=20), nullable=False), + sa.Column("seed", sa.Integer(), nullable=False), + sa.Column("scenario", sa.String(length=40), nullable=False), + sa.Column("reset", sa.Boolean(), nullable=False), + sa.Column("skip_seed", sa.Boolean(), nullable=False), + sa.Column("store_id", sa.Integer(), nullable=True), + sa.Column("product_id", sa.Integer(), nullable=True), + sa.Column("date_start", sa.Date(), nullable=True), + sa.Column("date_end", sa.Date(), nullable=True), + sa.Column( + "created_objects", + postgresql.JSONB(astext_type=sa.Text()), + server_default=sa.text("'{}'::jsonb"), + nullable=False, + ), + sa.Column("result_summary", postgresql.JSONB(astext_type=sa.Text()), nullable=True), + sa.Column( + "created_at", + sa.DateTime(timezone=True), + server_default=sa.text("now()"), + nullable=False, + ), + sa.Column( + "updated_at", + sa.DateTime(timezone=True), + server_default=sa.text("now()"), + nullable=False, + ), + sa.CheckConstraint( + "status IN ('running', 'completed', 'failed')", + name="ck_showcase_workspace_status", + ), + sa.PrimaryKeyConstraint("id"), + ) + op.create_index( + op.f("ix_showcase_workspace_workspace_id"), + "showcase_workspace", + ["workspace_id"], + unique=True, + ) + op.create_index( + op.f("ix_showcase_workspace_name"), + "showcase_workspace", + ["name"], + unique=False, + ) + op.create_index( + op.f("ix_showcase_workspace_status"), + "showcase_workspace", + ["status"], + unique=False, + ) + op.create_index( + "ix_showcase_workspace_status_created", + "showcase_workspace", + ["status", "created_at"], + unique=False, + ) + + +def downgrade() -> None: + """Revert migration -- drop the showcase_workspace table.""" + op.drop_index("ix_showcase_workspace_status_created", table_name="showcase_workspace") + op.drop_index(op.f("ix_showcase_workspace_status"), table_name="showcase_workspace") + op.drop_index(op.f("ix_showcase_workspace_name"), table_name="showcase_workspace") + op.drop_index(op.f("ix_showcase_workspace_workspace_id"), table_name="showcase_workspace") + op.drop_table("showcase_workspace") diff --git a/app/features/demo/models.py b/app/features/demo/models.py new file mode 100644 index 00000000..30ad586e --- /dev/null +++ b/app/features/demo/models.py @@ -0,0 +1,89 @@ +"""Showcase workspace ORM model. + +First table owned by the demo slice (precedent: ``app/features/batch/models.py``). +A row = one preserved showcase run: its configuration (replay inputs) plus the +ids of every object the pipeline created. All recorded ids are OPAQUE SOFT +REFERENCES -- deliberately NO ForeignKey to ``model_run`` / ``scenario_plan`` / +``batch_job`` / ``agent_session``: a cross-slice FK would couple the demo +slice's schema to four other slices and break independent deletion (e.g. +``DELETE /registry/runs/{id}`` must keep working while a workspace row still +references the run). E1 of the showcase-workspace initiative (umbrella #389, +epic #390). + +GOTCHA: SQLAlchemy reserves the declarative attribute name ``metadata``; the +JSONB columns are therefore named ``created_objects`` and ``result_summary``. +""" + +from __future__ import annotations + +import datetime as _dt +from typing import Any + +from sqlalchemy import CheckConstraint, Date, Index, Integer, String, text +from sqlalchemy.dialects.postgresql import JSONB +from sqlalchemy.orm import Mapped, mapped_column + +from app.core.database import Base +from app.shared.models import TimestampMixin + +# Workspace lifecycle states -- guarded by a CHECK constraint. ``running`` is +# written at creation (before the first pipeline step executes); the finalize +# hook settles the row to ``completed`` or ``failed``. +WORKSPACE_STATUS_RUNNING = "running" +WORKSPACE_STATUS_COMPLETED = "completed" +WORKSPACE_STATUS_FAILED = "failed" + + +class ShowcaseWorkspace(TimestampMixin, Base): + """A preserved showcase run. + + Attributes: + id: Surrogate primary key. + workspace_id: Unique external identifier (UUID hex, 32 chars). + name: Optional human label from ``DemoRunRequest.workspace_name``. + status: Lifecycle state -- running / completed / failed (CHECK-constrained). + seed: Seeder seed the run was started with (replay input). + scenario: Seeder scenario preset value (replay input). + reset: Whether the run wiped the database before seeding (replay input). + skip_seed: Whether the run skipped the seed step (replay input). + store_id: Showcase grain store id; NULL when the run failed early. + product_id: Showcase grain product id; NULL when the run failed early. + date_start: Seeded data window start; NULL when unknown. + date_end: Seeded data window end; NULL when unknown. + created_objects: Soft-reference ids of everything the run created (JSONB). + result_summary: Winner / WAPE / wall-clock display payload (JSONB). + """ + + __tablename__ = "showcase_workspace" + + id: Mapped[int] = mapped_column(Integer, primary_key=True) + workspace_id: Mapped[str] = mapped_column(String(32), unique=True, index=True) + name: Mapped[str | None] = mapped_column(String(100), nullable=True, index=True) + status: Mapped[str] = mapped_column( + String(20), default=WORKSPACE_STATUS_RUNNING, nullable=False, index=True + ) + # Run configuration -- replay inputs (E4 restore/replay reads these verbatim). + seed: Mapped[int] = mapped_column(Integer, nullable=False) + scenario: Mapped[str] = mapped_column(String(40), nullable=False) + reset: Mapped[bool] = mapped_column(nullable=False, default=False) + skip_seed: Mapped[bool] = mapped_column(nullable=False, default=True) + # Grain + window discovered by the status/seed steps (NULL on early failure). + store_id: Mapped[int | None] = mapped_column(Integer, nullable=True) + product_id: Mapped[int | None] = mapped_column(Integer, nullable=True) + date_start: Mapped[_dt.date | None] = mapped_column(Date, nullable=True) + date_end: Mapped[_dt.date | None] = mapped_column(Date, nullable=True) + # Everything the run created -- flexible JSONB of soft references (see the + # module docstring for the deliberate no-FK decision). + created_objects: Mapped[dict[str, Any]] = mapped_column( + JSONB, nullable=False, default=dict, server_default=text("'{}'::jsonb") + ) + # winner_model_type / winner_wape / wall_clock_s -- display payload. + result_summary: Mapped[dict[str, Any] | None] = mapped_column(JSONB, nullable=True) + + __table_args__ = ( + CheckConstraint( + "status IN ('running', 'completed', 'failed')", + name="ck_showcase_workspace_status", + ), + Index("ix_showcase_workspace_status_created", "status", "created_at"), + ) From e41643fe61f3d5c960ee0227a37c99fdcfd04b8d Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 14:05:11 +0200 Subject: [PATCH 28/44] feat(api): record demo run objects into showcase workspace (#390) --- app/features/demo/pipeline.py | 19 +++ app/features/demo/schemas.py | 40 ++++- app/features/demo/service.py | 2 + app/features/demo/tests/conftest.py | 28 ++++ app/features/demo/tests/test_models.py | 111 ++++++++++++ app/features/demo/tests/test_pipeline.py | 149 +++++++++++++++++ app/features/demo/tests/test_routes.py | 87 ++++++++++ app/features/demo/tests/test_schemas.py | 47 ++++++ app/features/demo/tests/test_workspace.py | 166 ++++++++++++++++++ app/features/demo/workspace.py | 195 ++++++++++++++++++++++ 10 files changed, 839 insertions(+), 5 deletions(-) create mode 100644 app/features/demo/tests/test_models.py create mode 100644 app/features/demo/tests/test_workspace.py create mode 100644 app/features/demo/workspace.py diff --git a/app/features/demo/pipeline.py b/app/features/demo/pipeline.py index 041d5361..9af07a3f 100644 --- a/app/features/demo/pipeline.py +++ b/app/features/demo/pipeline.py @@ -40,6 +40,7 @@ from app.core.config import get_settings from app.core.logging import get_logger from app.core.problem_details import EMBEDDING_AUTH_CODE, ERROR_TYPES +from app.features.demo import workspace from app.features.demo.schemas import DemoRunRequest, StepEvent, StepStatus from app.shared.seeder.config import ScenarioPreset @@ -254,6 +255,9 @@ class DemoContext: # step_agent_hitl_flow on SHOWCASE_RICH. Remain None on every other path. approval_action_id: str | None = None agent_approval_decision: str | None = None # "executed"|"rejected"|"expired"|"timed_out" + # E1 (#390) -- workspace persistence. Set only on preservation="keep" runs + # (and only when the row insert succeeded); None on ephemeral runs. + workspace_id: str | None = None # ============================================================================= @@ -2585,6 +2589,11 @@ async def run_pipeline(app: FastAPI, req: DemoRunRequest) -> AsyncIterator[StepE reset=req.reset, scenario=req.scenario, ) + # E1 (#390) -- create the workspace row BEFORE the first step executes so + # even an early failure records the run config. create_workspace is + # warn-and-continue: a DB failure returns None and the run proceeds. + if req.preservation == "keep": + ctx.workspace_id = await workspace.create_workspace(req) wall_start = time.monotonic() any_fail = False # PRP-41 — buffer for intermediate events the HITL step emits via @@ -2668,6 +2677,13 @@ async def run_pipeline(app: FastAPI, req: DemoRunRequest) -> AsyncIterator[StepE break wall = time.monotonic() - wall_start + # E1 (#390) -- settle the workspace row BEFORE the final yield so the + # mid-run-failure path records partial created_objects too. + # finalize_workspace is warn-and-continue: it never raises. + if ctx.workspace_id is not None: + await workspace.finalize_workspace( + ctx.workspace_id, ctx, failed=any_fail, wall_clock_s=wall + ) yield StepEvent( event_type="pipeline_complete", step_name="summary", @@ -2687,5 +2703,8 @@ async def run_pipeline(app: FastAPI, req: DemoRunRequest) -> AsyncIterator[StepE # PRP-38 — expose the V2 run id when set so the Inspect deep # link can target /explorer/runs/{v2_run_id}. "v2_run_id": ctx.v2_run_id, + # E1 (#390) -- additive; a string on preservation='keep' runs, + # None otherwise (legacy clients ignore unknown keys). + "workspace_id": ctx.workspace_id, }, ) diff --git a/app/features/demo/schemas.py b/app/features/demo/schemas.py index e7f0aa4b..e02738af 100644 --- a/app/features/demo/schemas.py +++ b/app/features/demo/schemas.py @@ -11,7 +11,7 @@ from datetime import UTC, datetime from typing import Any, Literal -from pydantic import BaseModel, ConfigDict, Field +from pydantic import BaseModel, ConfigDict, Field, model_validator from app.shared.seeder.config import ScenarioPreset @@ -29,10 +29,12 @@ def _utc_now() -> datetime: class DemoRunRequest(BaseModel): """Request body for ``POST /demo/run`` and the ``WS /demo/stream`` start frame. - Every field is JSON-native (``int`` / ``bool``), so ``ConfigDict(strict=True)`` - is safe with no ``Field(strict=False)`` override -- there is no - ``date`` / ``datetime`` / ``UUID`` / ``Decimal`` field (see - ``.claude/rules/security-patterns.md`` and ``test_strict_mode_policy.py``). + Every field is JSON-native (``int`` / ``bool`` / ``str`` / ``Literal``), so + ``ConfigDict(strict=True)`` is safe with no ``Field(strict=False)`` + override -- there is no ``date`` / ``datetime`` / ``UUID`` / ``Decimal`` + field (see ``.claude/rules/security-patterns.md`` and + ``test_strict_mode_policy.py``). The sole exception is ``scenario``, whose + enum-on-the-wire form carries its own override (PRP-38). """ model_config = ConfigDict(strict=True) @@ -59,6 +61,28 @@ class DemoRunRequest(BaseModel): strict=False, description="Seeder scenario preset that drives the pipeline shape.", ) + # E1 (#390): preservation policy. Default "ephemeral" keeps legacy + # behaviour byte-identical (no workspace row). Both new fields are + # JSON-native (Literal[str] / str), so the model-level ``strict=True`` + # needs no per-field override. + preservation: Literal["ephemeral", "keep"] = Field( + default="ephemeral", + description="'keep' records this run as a showcase_workspace row.", + ) + workspace_name: str | None = Field( + default=None, + max_length=100, + # Same pattern as the registry alias_name (registry/schemas.py). + pattern=r"^[a-z0-9][a-z0-9\-_]*$", + description="Optional workspace label; requires preservation='keep'.", + ) + + @model_validator(mode="after") + def _workspace_name_requires_keep(self) -> DemoRunRequest: + """Reject a workspace_name on a run that does not keep a workspace.""" + if self.workspace_name is not None and self.preservation != "keep": + raise ValueError("workspace_name requires preservation='keep'") + return self class StepEvent(BaseModel): @@ -134,3 +158,9 @@ class DemoRunResult(BaseModel): default=0.0, description="Total pipeline wall-clock in seconds.", ) + # E1 (#390): additive Optional field mirroring ``winning_run_id`` -- + # ``None`` on ephemeral runs, the workspace_id on preservation='keep' runs. + workspace_id: str | None = Field( + default=None, + description="showcase_workspace id recorded for this run, if kept.", + ) diff --git a/app/features/demo/service.py b/app/features/demo/service.py index cc3dd8a6..514d3b3f 100644 --- a/app/features/demo/service.py +++ b/app/features/demo/service.py @@ -77,4 +77,6 @@ async def run_pipeline_sync(app: FastAPI, req: DemoRunRequest) -> DemoRunResult: winning_run_id=final.data.get("winning_run_id"), alias=final.data.get("alias"), wall_clock_s=float(wall_clock) if isinstance(wall_clock, (int, float)) else 0.0, + # E1 (#390) -- additive; mirrors the winning_run_id passthrough. + workspace_id=final.data.get("workspace_id"), ) diff --git a/app/features/demo/tests/conftest.py b/app/features/demo/tests/conftest.py index c4653ff7..607eb163 100644 --- a/app/features/demo/tests/conftest.py +++ b/app/features/demo/tests/conftest.py @@ -4,7 +4,11 @@ import pytest from httpx import ASGITransport, AsyncClient +from sqlalchemy import delete +from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine +from app.core.config import get_settings +from app.features.demo.models import ShowcaseWorkspace from app.main import app @@ -20,3 +24,27 @@ async def client() -> AsyncGenerator[AsyncClient, None]: base_url="http://demo-test", ) as ac: yield ac + + +@pytest.fixture +async def db_session() -> AsyncGenerator[AsyncSession, None]: + """Yield an async session; wipe every showcase_workspace row on teardown. + + E1 (#390) integration fixture (pattern: + ``app/features/scenarios/tests/conftest.py``). ``showcase_workspace`` is a + slice-private table -- no seeder or other slice writes it -- so the + teardown safely wipes it whole rather than relying on a row marker. + """ + settings = get_settings() + engine = create_async_engine(settings.database_url, echo=False) + session_maker = async_sessionmaker(engine, class_=AsyncSession, expire_on_commit=False) + + async with session_maker() as session: + try: + yield session + finally: + await session.rollback() + await session.execute(delete(ShowcaseWorkspace)) + await session.commit() + + await engine.dispose() diff --git a/app/features/demo/tests/test_models.py b/app/features/demo/tests/test_models.py new file mode 100644 index 00000000..91c9d0e5 --- /dev/null +++ b/app/features/demo/tests/test_models.py @@ -0,0 +1,111 @@ +"""Integration tests for the ShowcaseWorkspace ORM model (E1, #390). + +Run against the real docker-compose Postgres (``docker compose up -d`` + +``uv run alembic upgrade head`` required). Constraint tests assert the +DB-level guarantees the migration created (unique workspace_id, status CHECK). +""" + +from __future__ import annotations + +import uuid +from datetime import date + +import pytest +from sqlalchemy.exc import IntegrityError +from sqlalchemy.ext.asyncio import AsyncSession + +from app.features.demo.models import ( + WORKSPACE_STATUS_COMPLETED, + WORKSPACE_STATUS_RUNNING, + ShowcaseWorkspace, +) +from app.features.demo.workspace import get_workspace + +pytestmark = pytest.mark.integration + + +def _make_row(**overrides: object) -> ShowcaseWorkspace: + """Build a valid ShowcaseWorkspace row; keyword overrides win.""" + values: dict[str, object] = { + "workspace_id": uuid.uuid4().hex, + "name": "it-row", + "seed": 42, + "scenario": "demo_minimal", + "reset": False, + "skip_seed": True, + } + values.update(overrides) + return ShowcaseWorkspace(**values) + + +async def test_showcase_workspace_crud_roundtrip(db_session: AsyncSession) -> None: + """Insert a full row incl. JSONB payloads and read it back intact.""" + created = { + "winning_run_id": "run-abc123", + "scenario_plan_ids": ["scn-1", "scn-2"], + } + summary = {"winner_model_type": "seasonal_naive", "winner_wape": 0.15} + row = _make_row( + status=WORKSPACE_STATUS_COMPLETED, + store_id=7, + product_id=3, + date_start=date(2026, 1, 1), + date_end=date(2026, 3, 31), + created_objects=created, + result_summary=summary, + ) + db_session.add(row) + await db_session.commit() + + loaded = await get_workspace(db_session, row.workspace_id) + assert loaded is not None + assert loaded.status == WORKSPACE_STATUS_COMPLETED + assert loaded.name == "it-row" + assert loaded.seed == 42 + assert loaded.scenario == "demo_minimal" + assert loaded.store_id == 7 + assert loaded.product_id == 3 + assert loaded.date_start == date(2026, 1, 1) + assert loaded.date_end == date(2026, 3, 31) + assert loaded.created_objects == created + assert loaded.result_summary == summary + assert loaded.created_at is not None + assert loaded.updated_at is not None + + +async def test_showcase_workspace_defaults_applied(db_session: AsyncSession) -> None: + """A minimal insert gets running status + empty created_objects.""" + row = _make_row(name=None) + db_session.add(row) + await db_session.commit() + + loaded = await get_workspace(db_session, row.workspace_id) + assert loaded is not None + assert loaded.status == WORKSPACE_STATUS_RUNNING + assert loaded.name is None + assert loaded.created_objects == {} + assert loaded.result_summary is None + assert loaded.store_id is None + assert loaded.product_id is None + + +async def test_showcase_workspace_duplicate_workspace_id_rejected( + db_session: AsyncSession, +) -> None: + """The unique index on workspace_id rejects a duplicate insert.""" + workspace_id = uuid.uuid4().hex + db_session.add(_make_row(workspace_id=workspace_id)) + await db_session.commit() + + db_session.add(_make_row(workspace_id=workspace_id)) + with pytest.raises(IntegrityError): + await db_session.commit() + await db_session.rollback() + + +async def test_showcase_workspace_status_check_violation(db_session: AsyncSession) -> None: + """The status CHECK constraint rejects values outside the state set.""" + db_session.add(_make_row(status="archived")) + with pytest.raises(IntegrityError): + await db_session.commit() + await db_session.rollback() diff --git a/app/features/demo/tests/test_pipeline.py b/app/features/demo/tests/test_pipeline.py index 5f73a8c8..b9b37f07 100644 --- a/app/features/demo/tests/test_pipeline.py +++ b/app/features/demo/tests/test_pipeline.py @@ -2110,3 +2110,152 @@ async def request(self, step: str, method: str, path: str, **_kw: Any) -> dict[s "total_aliases": 0, "degrading_health_count": 0, } + + +# ============================================================================= +# E1 (#390) -- workspace persistence hooks +# ============================================================================= + + +class _WorkspaceSpy: + """Recording stand-in for the workspace module's create/finalize hooks.""" + + def __init__(self, create_returns: str | None = "ws-e1-test") -> None: + self.create_calls: list[Any] = [] + self.finalize_calls: list[dict[str, Any]] = [] + self._create_returns = create_returns + + async def create_workspace(self, req: Any) -> str | None: + self.create_calls.append(req) + return self._create_returns + + async def finalize_workspace( + self, + workspace_id: str, + ctx: Any, + *, + failed: bool, + wall_clock_s: float | None = None, + ) -> None: + self.finalize_calls.append( + {"workspace_id": workspace_id, "failed": failed, "wall_clock_s": wall_clock_s} + ) + + +async def test_run_pipeline_keep_creates_and_finalizes_workspace(monkeypatch, tmp_path): + """E1 (#390) -- keep run: create before steps, finalize before the yield.""" + artifact = tmp_path / "m.joblib" + artifact.write_bytes(b"x") + monkeypatch.setattr(pipeline, "get_settings", lambda: _fake_settings(str(tmp_path / "reg"))) + wapes = {"naive": 0.3, "seasonal_naive": 0.1, "moving_average": 0.2} + monkeypatch.setattr(pipeline, "_Client", _build_fake_client(str(artifact), wapes)) + spy = _WorkspaceSpy() + monkeypatch.setattr(pipeline, "workspace", spy) + + req = DemoRunRequest.model_validate({"preservation": "keep", "workspace_name": "e1-test"}) + events = [e async for e in pipeline.run_pipeline(app=_FAKE_APP, req=req)] + + assert len(spy.create_calls) == 1 + assert spy.create_calls[0] is req + assert len(spy.finalize_calls) == 1 + assert spy.finalize_calls[0]["workspace_id"] == "ws-e1-test" + assert spy.finalize_calls[0]["failed"] is False + assert spy.finalize_calls[0]["wall_clock_s"] is not None + + final = events[-1] + assert final.event_type == "pipeline_complete" + assert final.status == "pass" + assert final.data["workspace_id"] == "ws-e1-test" + + +async def test_run_pipeline_ephemeral_touches_no_workspace(monkeypatch, tmp_path): + """E1 (#390) -- default (ephemeral) run issues zero workspace calls.""" + artifact = tmp_path / "m.joblib" + artifact.write_bytes(b"x") + monkeypatch.setattr(pipeline, "get_settings", lambda: _fake_settings(str(tmp_path / "reg"))) + wapes = {"naive": 0.3, "seasonal_naive": 0.1, "moving_average": 0.2} + monkeypatch.setattr(pipeline, "_Client", _build_fake_client(str(artifact), wapes)) + spy = _WorkspaceSpy() + monkeypatch.setattr(pipeline, "workspace", spy) + + events = [e async for e in pipeline.run_pipeline(app=_FAKE_APP, req=DemoRunRequest())] + + assert spy.create_calls == [] + assert spy.finalize_calls == [] + final = events[-1] + assert final.event_type == "pipeline_complete" + # The key is additive and present, with a null value on ephemeral runs. + assert "workspace_id" in final.data + assert final.data["workspace_id"] is None + + +async def test_run_pipeline_workspace_create_failure_does_not_break_run(monkeypatch, tmp_path): + """E1 (#390) -- create_workspace's warn path (None) leaves the run green.""" + artifact = tmp_path / "m.joblib" + artifact.write_bytes(b"x") + monkeypatch.setattr(pipeline, "get_settings", lambda: _fake_settings(str(tmp_path / "reg"))) + wapes = {"naive": 0.3, "seasonal_naive": 0.1, "moving_average": 0.2} + monkeypatch.setattr(pipeline, "_Client", _build_fake_client(str(artifact), wapes)) + spy = _WorkspaceSpy(create_returns=None) + monkeypatch.setattr(pipeline, "workspace", spy) + + req = DemoRunRequest.model_validate({"preservation": "keep"}) + events = [e async for e in pipeline.run_pipeline(app=_FAKE_APP, req=req)] + + assert len(spy.create_calls) == 1 + # No row was created, so there is nothing to finalize. + assert spy.finalize_calls == [] + final = events[-1] + assert final.event_type == "pipeline_complete" + assert final.status == "pass" + assert final.data["workspace_id"] is None + + +async def test_run_pipeline_keep_finalizes_failed_on_step_failure(monkeypatch): + """E1 (#390) -- a mid-run step failure still finalizes, with failed=True.""" + + class _FailingClient: + def __init__(self, _app: Any, *, event_sink: list[Any] | None = None) -> None: + self._event_sink = event_sink + + async def __aenter__(self) -> _FailingClient: + return self + + async def __aexit__(self, *_exc: object) -> None: + return None + + def yield_event(self, event: Any) -> None: + if self._event_sink is None: + return + self._event_sink.append(event) + + async def request( + self, + step: str, + method: str, + path: str, + *, + json_body: dict[str, Any] | None = None, + ) -> dict[str, Any]: + if path == "/health": + return {"status": "ok"} + if path == "/seeder/status": + raise pipeline._StepError( + "status", 500, {"title": "Database Error", "detail": "db down"} + ) + raise AssertionError(f"unexpected request after failure: {path}") + + monkeypatch.setattr(pipeline, "_Client", _FailingClient) + spy = _WorkspaceSpy() + monkeypatch.setattr(pipeline, "workspace", spy) + + req = DemoRunRequest.model_validate({"preservation": "keep"}) + events = [e async for e in pipeline.run_pipeline(app=_FAKE_APP, req=req)] + + assert len(spy.finalize_calls) == 1 + assert spy.finalize_calls[0]["workspace_id"] == "ws-e1-test" + assert spy.finalize_calls[0]["failed"] is True + final = events[-1] + assert final.event_type == "pipeline_complete" + assert final.status == "fail" + assert final.data["workspace_id"] == "ws-e1-test" diff --git a/app/features/demo/tests/test_routes.py b/app/features/demo/tests/test_routes.py index caad2d64..5158d1ca 100644 --- a/app/features/demo/tests/test_routes.py +++ b/app/features/demo/tests/test_routes.py @@ -112,3 +112,90 @@ async def fake_stream(_app, _params: DemoRunRequest) -> AsyncIterator[StepEvent] event = ws.receive_json() assert event["event_type"] == "error" assert "in progress" in event["detail"] + + +# ============================================================================= +# E1 (#390) -- preservation / workspace_name passthrough +# ============================================================================= + + +async def test_run_demo_accepts_preservation_fields( + client, monkeypatch, canned_result: DemoRunResult +): + """E1 (#390) -- the new optional fields validate and reach the service.""" + seen: dict[str, DemoRunRequest] = {} + + async def fake_run_sync(_app, params: DemoRunRequest) -> DemoRunResult: + seen["params"] = params + return canned_result + + monkeypatch.setattr(service, "run_pipeline_sync", fake_run_sync) + + resp = await client.post( + "/demo/run", + json={"skip_seed": True, "preservation": "keep", "workspace_name": "e1-route"}, + ) + assert resp.status_code == 200 + assert seen["params"].preservation == "keep" + assert seen["params"].workspace_name == "e1-route" + # The additive DemoRunResult field rides on the response (None here -- + # the canned result doesn't set it). + assert resp.json()["workspace_id"] is None + + +async def test_run_demo_rejects_name_without_keep_422(client): + """E1 (#390) -- workspace_name without preservation='keep' is a 422.""" + resp = await client.post("/demo/run", json={"workspace_name": "bad"}) + assert resp.status_code == 422 + assert resp.headers["content-type"].startswith("application/problem+json") + + +def test_demo_stream_websocket_accepts_preservation_fields(monkeypatch): + """E1 (#390) -- the WS start frame accepts the new fields end-to-end.""" + seen: dict[str, DemoRunRequest] = {} + + async def fake_stream(_app, params: DemoRunRequest) -> AsyncIterator[StepEvent]: + seen["params"] = params + yield StepEvent( + event_type="pipeline_complete", + step_name="summary", + step_index=11, + total_steps=11, + status="pass", + data={"workspace_id": "ws-route-test"}, + ) + + monkeypatch.setattr(service, "stream_pipeline", fake_stream) + + with TestClient(app).websocket_connect("/demo/stream") as ws: + ws.send_json({"preservation": "keep", "workspace_name": "ws-frame"}) + event = ws.receive_json() + assert event["event_type"] == "pipeline_complete" + assert event["data"]["workspace_id"] == "ws-route-test" + assert seen["params"].preservation == "keep" + assert seen["params"].workspace_name == "ws-frame" + + +def test_demo_stream_websocket_legacy_frame_ignores_unknown_keys(monkeypatch): + """E1 (#390) -- unknown start-frame keys stay ignored (the WS forward/ + backward compatibility contract; no extra='forbid').""" + seen: dict[str, DemoRunRequest] = {} + + async def fake_stream(_app, params: DemoRunRequest) -> AsyncIterator[StepEvent]: + seen["params"] = params + yield StepEvent( + event_type="pipeline_complete", + step_name="summary", + step_index=11, + total_steps=11, + status="pass", + ) + + monkeypatch.setattr(service, "stream_pipeline", fake_stream) + + with TestClient(app).websocket_connect("/demo/stream") as ws: + ws.send_json({"seed": 7, "future_key_from_a_newer_client": True}) + event = ws.receive_json() + assert event["event_type"] == "pipeline_complete" + assert seen["params"].seed == 7 + assert seen["params"].preservation == "ephemeral" diff --git a/app/features/demo/tests/test_schemas.py b/app/features/demo/tests/test_schemas.py index ed390ee7..bdbfaac3 100644 --- a/app/features/demo/tests/test_schemas.py +++ b/app/features/demo/tests/test_schemas.py @@ -47,6 +47,51 @@ def test_demo_run_request_scenario_rejects_unknown(): DemoRunRequest.model_validate({"scenario": "not_a_preset"}) +def test_demo_run_request_new_field_defaults(): + """E1 (#390) -- defaults preserve legacy behaviour (ephemeral, unnamed).""" + req = DemoRunRequest() + assert req.preservation == "ephemeral" + assert req.workspace_name is None + + +def test_demo_run_request_json_path_keep_with_name(): + """E1 (#390) -- the JSON wire form (validate_python on a parsed dict, the + path FastAPI uses) accepts keep + a named workspace.""" + req = DemoRunRequest.model_validate({"preservation": "keep", "workspace_name": "bf-demo"}) + assert req.preservation == "keep" + assert req.workspace_name == "bf-demo" + + +def test_demo_run_request_legacy_frame_still_validates(): + """E1 (#390) -- a legacy start frame without the new keys still validates.""" + req = DemoRunRequest.model_validate({"seed": 7}) + assert req.seed == 7 + assert req.preservation == "ephemeral" + assert req.workspace_name is None + + +def test_demo_run_request_workspace_name_requires_keep(): + """E1 (#390) -- workspace_name without preservation='keep' is rejected.""" + with pytest.raises(ValidationError): + DemoRunRequest.model_validate({"workspace_name": "x"}) + with pytest.raises(ValidationError): + DemoRunRequest.model_validate({"preservation": "ephemeral", "workspace_name": "x"}) + + +def test_demo_run_request_workspace_name_pattern_rejected(): + """E1 (#390) -- names violating the alias-style pattern are rejected.""" + with pytest.raises(ValidationError): + DemoRunRequest.model_validate({"preservation": "keep", "workspace_name": "Black Friday!"}) + with pytest.raises(ValidationError): + DemoRunRequest.model_validate({"preservation": "keep", "workspace_name": "-leading-dash"}) + + +def test_demo_run_request_rejects_unknown_preservation(): + """E1 (#390) -- preservation is a closed Literal; unknown values 422.""" + with pytest.raises(ValidationError): + DemoRunRequest.model_validate({"preservation": "archive"}) + + def test_step_event_json_round_trip(): event = StepEvent( event_type="step_complete", @@ -134,3 +179,5 @@ def test_demo_run_result_defaults(): assert result.winner_model_type is None assert result.winner_wape is None assert result.wall_clock_s == 0.0 + # E1 (#390) -- additive Optional field defaults to None (ephemeral runs). + assert result.workspace_id is None diff --git a/app/features/demo/tests/test_workspace.py b/app/features/demo/tests/test_workspace.py new file mode 100644 index 00000000..110254c4 --- /dev/null +++ b/app/features/demo/tests/test_workspace.py @@ -0,0 +1,166 @@ +"""Integration tests for the workspace persistence helpers (E1, #390). + +``create_workspace`` / ``finalize_workspace`` open their OWN sessions via +``get_session_maker()`` -- these tests exercise that real write path against +the docker-compose Postgres; the ``db_session`` fixture is used only to read +back and to wipe rows on teardown. +""" + +from __future__ import annotations + +from datetime import date + +import pytest +from sqlalchemy.ext.asyncio import AsyncSession + +from app.features.demo import workspace +from app.features.demo.models import ( + WORKSPACE_STATUS_COMPLETED, + WORKSPACE_STATUS_FAILED, + WORKSPACE_STATUS_RUNNING, +) +from app.features.demo.pipeline import DemoContext +from app.features.demo.schemas import DemoRunRequest +from app.shared.seeder.config import ScenarioPreset + +pytestmark = pytest.mark.integration + + +def _keep_request(**overrides: object) -> DemoRunRequest: + """Build a preservation='keep' request; keyword overrides win.""" + payload: dict[str, object] = { + "seed": 7, + "reset": False, + "skip_seed": True, + "preservation": "keep", + "workspace_name": "it-keep", + } + payload.update(overrides) + return DemoRunRequest.model_validate(payload) + + +def _finished_ctx() -> DemoContext: + """Build a DemoContext as a green showcase run would leave it.""" + ctx = DemoContext( + seed=7, + skip_seed=True, + reset=False, + scenario=ScenarioPreset.DEMO_MINIMAL, + ) + ctx.store_id = 7 + ctx.product_id = 3 + ctx.date_start = date(2026, 1, 1) + ctx.date_end = date(2026, 3, 31) + ctx.winner_model_type = "seasonal_naive" + ctx.winner_wape = 0.15 + ctx.winning_run_id = "run-abc123def456" + ctx.train_results = {"naive": {}, "seasonal_naive": {}, "moving_average": {}} + ctx.session_id = "sess-0123abcd" + return ctx + + +async def test_create_workspace_persists_config(db_session: AsyncSession) -> None: + """create_workspace inserts a running row carrying the request config.""" + workspace_id = await workspace.create_workspace(_keep_request()) + assert workspace_id is not None + + row = await workspace.get_workspace(db_session, workspace_id) + assert row is not None + assert row.status == WORKSPACE_STATUS_RUNNING + assert row.name == "it-keep" + assert row.seed == 7 + assert row.scenario == "demo_minimal" + assert row.reset is False + assert row.skip_seed is True + assert row.created_objects == {} + assert row.result_summary is None + + +async def test_finalize_workspace_completed(db_session: AsyncSession) -> None: + """finalize(failed=False) settles to completed with collected ids.""" + workspace_id = await workspace.create_workspace(_keep_request()) + assert workspace_id is not None + + await workspace.finalize_workspace( + workspace_id, _finished_ctx(), failed=False, wall_clock_s=12.5 + ) + + row = await workspace.get_workspace(db_session, workspace_id) + assert row is not None + assert row.status == WORKSPACE_STATUS_COMPLETED + assert row.store_id == 7 + assert row.product_id == 3 + assert row.date_start == date(2026, 1, 1) + assert row.date_end == date(2026, 3, 31) + assert row.created_objects["winning_run_id"] == "run-abc123def456" + assert row.created_objects["alias"] == "demo-production" + assert row.created_objects["agent_session_id"] == "sess-0123abcd" + assert row.created_objects["train_model_types"] == [ + "moving_average", + "naive", + "seasonal_naive", + ] + # None-valued accumulators are dropped, not stored as nulls. + assert "v2_run_id" not in row.created_objects + assert "batch_id" not in row.created_objects + assert row.result_summary == { + "winner_model_type": "seasonal_naive", + "winner_wape": 0.15, + "wall_clock_s": 12.5, + } + + +async def test_finalize_workspace_failed(db_session: AsyncSession) -> None: + """finalize(failed=True) settles to failed, still recording partial ids.""" + workspace_id = await workspace.create_workspace(_keep_request(workspace_name="it-fail")) + assert workspace_id is not None + + ctx = _finished_ctx() + ctx.winning_run_id = None # run died before register + ctx.winner_model_type = None + ctx.winner_wape = None + await workspace.finalize_workspace(workspace_id, ctx, failed=True, wall_clock_s=3.0) + + row = await workspace.get_workspace(db_session, workspace_id) + assert row is not None + assert row.status == WORKSPACE_STATUS_FAILED + assert "winning_run_id" not in row.created_objects + assert "alias" not in row.created_objects + # Partial state still recorded -- the agent session + trained models. + assert row.created_objects["agent_session_id"] == "sess-0123abcd" + assert row.created_objects["train_model_types"] == [ + "moving_average", + "naive", + "seasonal_naive", + ] + + +async def test_finalize_workspace_missing_id_is_noop(db_session: AsyncSession) -> None: + """Finalizing an unknown workspace_id neither raises nor inserts.""" + await workspace.finalize_workspace( + "deadbeef" * 4, _finished_ctx(), failed=False, wall_clock_s=1.0 + ) + rows = await workspace.list_workspaces(db_session) + assert rows == [] + + +async def test_list_workspaces_newest_first_limit_offset(db_session: AsyncSession) -> None: + """list_workspaces orders newest first and honours limit/offset.""" + ids: list[str] = [] + for index in range(3): + workspace_id = await workspace.create_workspace( + _keep_request(workspace_name=f"it-list-{index}") + ) + assert workspace_id is not None + ids.append(workspace_id) + + rows = await workspace.list_workspaces(db_session) + assert [r.workspace_id for r in rows] == list(reversed(ids)) + + page = await workspace.list_workspaces(db_session, limit=1, offset=1) + assert [r.workspace_id for r in page] == [ids[1]] + + +async def test_get_workspace_missing_returns_none(db_session: AsyncSession) -> None: + """get_workspace returns None for an unknown id.""" + assert await workspace.get_workspace(db_session, "0" * 32) is None diff --git a/app/features/demo/workspace.py b/app/features/demo/workspace.py new file mode 100644 index 00000000..44e8b475 --- /dev/null +++ b/app/features/demo/workspace.py @@ -0,0 +1,195 @@ +"""Showcase workspace persistence helpers (E1, issue #390). + +Create/finalize the ``showcase_workspace`` row a ``preservation="keep"`` demo +run records itself into. The write helpers open their OWN sessions via +``app.core.database.get_session_maker()`` -- ``run_pipeline`` is not +request-scoped, so no FastAPI dependency is available (precedent: the lifespan +config-override load in ``app/main.py`` and the agents websocket per-message +sessions). + +CONTRACT -- warn-and-continue: a workspace DB failure must NEVER break the +demo pipeline. :func:`create_workspace` returns ``None`` on any error; +:func:`finalize_workspace` swallows any error. Both log a structured warning +(pattern: the ``app/main.py`` lifespan config-override load). + +:func:`get_workspace` / :func:`list_workspaces` are unrouted in E1 -- consumed +by the integration tests now and by the E4 restore/replay routes later +(epic #393). +""" + +from __future__ import annotations + +import uuid +from typing import TYPE_CHECKING, Any + +from sqlalchemy import select +from sqlalchemy.ext.asyncio import AsyncSession + +from app.core.database import get_session_maker +from app.core.logging import get_logger +from app.features.demo.models import ( + WORKSPACE_STATUS_COMPLETED, + WORKSPACE_STATUS_FAILED, + ShowcaseWorkspace, +) +from app.features.demo.schemas import DemoRunRequest + +if TYPE_CHECKING: + # NOTE: pipeline imports this module at runtime; importing DemoContext + # eagerly here would close an import cycle. The type-only import is safe. + from app.features.demo.pipeline import DemoContext + +logger = get_logger(__name__) + + +async def create_workspace(req: DemoRunRequest) -> str | None: + """Insert a ``running`` workspace row for a ``preservation="keep"`` run. + + Args: + req: The validated demo run request (config recorded verbatim). + + Returns: + The new row's ``workspace_id``, or ``None`` when the insert failed + (warn-and-continue -- the pipeline proceeds without a workspace). + """ + workspace_id = uuid.uuid4().hex + try: + session_maker = get_session_maker() + async with session_maker() as db: + db.add( + ShowcaseWorkspace( + workspace_id=workspace_id, + name=req.workspace_name, + seed=req.seed, + scenario=req.scenario.value, + reset=req.reset, + skip_seed=req.skip_seed, + ) + ) + await db.commit() + except Exception as exc: # workspace must never break the demo + logger.warning( + "demo.workspace_create_failed", + error=str(exc), + error_type=type(exc).__name__, + ) + return None + logger.info("demo.workspace_created", workspace_id=workspace_id, name=req.workspace_name) + return workspace_id + + +def _collect_created_objects(ctx: DemoContext) -> dict[str, Any]: + """Map ``DemoContext`` accumulator fields to the ``created_objects`` JSONB. + + Every value is already a plain ``str`` / ``None`` on ``ctx`` (HTTP response + payloads). ``None`` and empty values are dropped so the JSONB stays sparse + and greppable. + """ + raw: dict[str, Any] = { + "winning_run_id": ctx.winning_run_id, + "v2_run_id": ctx.v2_run_id, + "v2_model_path": ctx.v2_model_path, + # Literal mirrors ``pipeline.DEMO_ALIAS`` -- importing pipeline here + # would close an import cycle (pipeline imports this module). + "alias": "demo-production" if ctx.winning_run_id else None, + "agent_session_id": ctx.session_id, + "batch_id": ctx.batch_id, + "scenario_plan_ids": [s for s in (ctx.price_cut_scenario_id, ctx.holiday_scenario_id) if s], + "scenario_artifact_key": ctx.scenario_artifact_key, + "train_model_types": sorted(ctx.train_results), + "stale_alias_run_id": ctx.stale_alias_run_id, + } + return {key: value for key, value in raw.items() if value not in (None, [])} + + +async def finalize_workspace( + workspace_id: str, + ctx: DemoContext, + *, + failed: bool, + wall_clock_s: float | None = None, +) -> None: + """Settle a workspace row to ``completed`` / ``failed`` with collected ids. + + Called by ``run_pipeline`` BEFORE the final ``pipeline_complete`` yield -- + including the mid-run-failure path, so a partial run still records what it + created. Finalizing a missing ``workspace_id`` (its create failed earlier) + is a silent no-op. + + Args: + workspace_id: The row to finalize (from :func:`create_workspace`). + ctx: The pipeline's cross-step accumulator. + failed: Whether any step failed. + wall_clock_s: Total pipeline wall-clock, recorded in ``result_summary``. + """ + try: + session_maker = get_session_maker() + async with session_maker() as db: + result = await db.execute( + select(ShowcaseWorkspace).where(ShowcaseWorkspace.workspace_id == workspace_id) + ) + row = result.scalar_one_or_none() + if row is None: # create failed earlier -- nothing to finalize + return + row.status = WORKSPACE_STATUS_FAILED if failed else WORKSPACE_STATUS_COMPLETED + row.store_id = ctx.store_id + row.product_id = ctx.product_id + row.date_start = ctx.date_start + row.date_end = ctx.date_end + row.created_objects = _collect_created_objects(ctx) + row.result_summary = { + "winner_model_type": ctx.winner_model_type, + "winner_wape": ctx.winner_wape, + "wall_clock_s": wall_clock_s, + } + await db.commit() + except Exception as exc: # workspace must never break the demo + logger.warning( + "demo.workspace_finalize_failed", + workspace_id=workspace_id, + error=str(exc), + error_type=type(exc).__name__, + ) + return + logger.info("demo.workspace_finalized", workspace_id=workspace_id, failed=failed) + + +async def get_workspace(db: AsyncSession, workspace_id: str) -> ShowcaseWorkspace | None: + """Load a workspace row by its external id. + + Args: + db: An open async session (caller-owned). + workspace_id: The external id to look up. + + Returns: + The row, or ``None`` when missing. + """ + result = await db.execute( + select(ShowcaseWorkspace).where(ShowcaseWorkspace.workspace_id == workspace_id) + ) + return result.scalar_one_or_none() + + +async def list_workspaces( + db: AsyncSession, + *, + limit: int = 50, + offset: int = 0, +) -> list[ShowcaseWorkspace]: + """List workspace rows, newest first (tie-broken by id, descending). + + Args: + db: An open async session (caller-owned). + limit: Maximum rows to return. + offset: Rows to skip from the newest end. + + Returns: + The matching rows, newest first. + """ + result = await db.execute( + select(ShowcaseWorkspace) + .order_by(ShowcaseWorkspace.created_at.desc(), ShowcaseWorkspace.id.desc()) + .limit(limit) + .offset(offset) + ) + return list(result.scalars().all()) From 9a406699c77fee8b55203d6aef23ec23467fbb2d Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 14:05:11 +0200 Subject: [PATCH 29/44] docs(api): document preservation and workspace_name fields (#390) --- docs/_base/API_CONTRACTS.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/_base/API_CONTRACTS.md b/docs/_base/API_CONTRACTS.md index 27d75ea1..abcebd1a 100644 --- a/docs/_base/API_CONTRACTS.md +++ b/docs/_base/API_CONTRACTS.md @@ -58,7 +58,7 @@ All endpoints serve JSON; error responses use `application/problem+json` (RFC 78 | agents | WS | `/agents/stream` | Token-by-token streaming + tool-call events | | seeder | (see `app/features/seeder/routes.py`) | `/seeder/*` | Trigger scenarios, status, customization | | seeder | POST | `/seeder/phase2-enrichment` | PRP-38 — run Phase 2 generators (lifecycle, replenishment, exogenous, returns) against the existing seeded data. `422 application/problem+json` on an empty database. | -| demo | POST | `/demo/run` | Run the end-to-end demo pipeline in-process; returns a `DemoRunResult`. `409 application/problem+json` if a run is already active. **PRP-38** — body accepts an Optional `scenario: 'demo_minimal' \| 'showcase_rich' \| 'sparse'` field; default `'demo_minimal'` (back-compat). | +| demo | POST | `/demo/run` | Run the end-to-end demo pipeline in-process; returns a `DemoRunResult`. `409 application/problem+json` if a run is already active. **PRP-38** — body accepts an Optional `scenario: 'demo_minimal' \| 'showcase_rich' \| 'sparse'` field; default `'demo_minimal'` (back-compat). **E1 (#390)** — body accepts additive Optional `preservation: 'ephemeral' \| 'keep'` (default `'ephemeral'`, today's no-row behavior) and `workspace_name: str \| null` (pattern `^[a-z0-9][a-z0-9\-_]*$`, ≤100 chars); `workspace_name` without `preservation='keep'` → `422 application/problem+json`. `preservation='keep'` records the run as a `showcase_workspace` row; `DemoRunResult` gains an additive Optional `workspace_id: str \| null`. | | demo | WS | `/demo/stream` | Stream one `StepEvent` per pipeline step for the live Showcase page | | config | GET | `/config/ai` | Effective AI-model config (agent LLM + RAG embeddings); API keys masked, never raw | | config | PATCH | `/config/ai` | Persist + apply AI-model changes live (no restart). `409` if an embedding-dimension change would orphan indexed RAG chunks (resend with `force=true`) | @@ -83,12 +83,12 @@ Verified against `app/features/agents/websocket.py` and `app/features/agents/sch Drives the end-to-end demo pipeline for the dashboard Showcase page. Verified against `app/features/demo/routes.py` and `app/features/demo/schemas.py` (`StepEvent`). -- **Client → server (one start frame):** `{"seed": int, "reset": bool, "skip_seed": bool, "scenario"?: "demo_minimal" | "showcase_rich" | "sparse"}` — all fields optional (`DemoRunRequest` supplies defaults `seed=42`, `reset=false`, `skip_seed=true`, `scenario="demo_minimal"`). The pipeline runs once, then the server closes. +- **Client → server (one start frame):** `{"seed": int, "reset": bool, "skip_seed": bool, "scenario"?: "demo_minimal" | "showcase_rich" | "sparse", "preservation"?: "ephemeral" | "keep", "workspace_name"?: str}` — all fields optional (`DemoRunRequest` supplies defaults `seed=42`, `reset=false`, `skip_seed=true`, `scenario="demo_minimal"`, `preservation="ephemeral"`, `workspace_name=null`). E1 (#390) — `workspace_name` requires `preservation="keep"` (else one `error` event from validation); unknown start-frame keys remain ignored (forward/backward compat). The pipeline runs once, then the server closes. - **Server → client (every frame):** Pydantic-serialized `StepEvent` — `{"event_type", "step_name", "step_index", "total_steps", "status", "detail", "duration_ms", "data", "timestamp", "phase_name"?, "phase_index"?, "phase_total"?}`. PRP-38 — the three `phase_*` fields are Optional + Nullable so legacy clients that don't render phases keep working. - **`event_type` values (Literal in `StepEvent`):** - `step_start` — a step began; `status` is `null`. - `step_complete` — a step finished; `status ∈ {pass, fail, skip, warn}`, `data` carries structured payload (backtest `per_model` WAPE + `winner` + `bucketed_aggregated_metrics` on PRP-36/38 feature-aware runs; register `run_id` + `alias`; PRP-38 `v2_train` → `v2_run_id` + `feature_frame_version` + `feature_columns_count` + `feature_groups` + `artifact_uri_full`). - - `pipeline_complete` — final event; `data` carries `winner_model_type`, `winner_wape`, `winning_run_id`, `alias`, `wall_clock_s`, `v2_run_id` (PRP-38; null when no V2 run was registered). + - `pipeline_complete` — final event; `data` carries `winner_model_type`, `winner_wape`, `winning_run_id`, `alias`, `wall_clock_s`, `v2_run_id` (PRP-38; null when no V2 run was registered), and `workspace_id` (E1 #390; additive — a string on `preservation="keep"` runs, null otherwise). - `error` — bad start frame or a concurrent run already in progress; one event, then the server closes. - Concurrency: a module-level `asyncio.Lock` allows one pipeline at a time. A second `POST /demo/run` returns `409`; a second `WS /demo/stream` receives one `error` event. - PRP-38 — `scenario="showcase_rich"` extends the data phase with `phase2_enrichment` + `historical_backfill` steps and the modeling phase with `v2_train` (one V2 `prophet_like` run). Phase ids are `data` / `modeling` / `decision` / `verify` / `agent` / `cleanup` (6 phases). From 85f0151b3313bf6d3d52f1a8f968fe9c5e944607 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 14:05:11 +0200 Subject: [PATCH 30/44] docs(repo): track showcase workspace e1 prp (#390) --- ...wcase-workspace-E1-persistence-backbone.md | 666 ++++++++++++++++++ 1 file changed, 666 insertions(+) create mode 100644 PRPs/PRP-showcase-workspace-E1-persistence-backbone.md diff --git a/PRPs/PRP-showcase-workspace-E1-persistence-backbone.md b/PRPs/PRP-showcase-workspace-E1-persistence-backbone.md new file mode 100644 index 00000000..729d6e62 --- /dev/null +++ b/PRPs/PRP-showcase-workspace-E1-persistence-backbone.md @@ -0,0 +1,666 @@ +name: "PRP — Showcase Workspace E1: Persistence Backbone (issue #390)" +description: | + +## Purpose + +Implement the Foundation epic of the showcase-workspace initiative (umbrella #389): +the demo slice gains its first persistence — a `showcase_workspace` table + Alembic +migration + additive Optional `preservation`/`workspace_name` fields on +`DemoRunRequest` + pipeline recording of every created object id into the workspace +row. Blocks epics #391 (presets), #392 (tags), #393 (restore/replay). + +## Core Principles + +1. **Context is King**: every reference below was verified against the live code on 2026-06-12. +2. **Validation Loops**: each level is executable as written. +3. **Information Dense**: patterns cite exact file:line. +4. **Progressive Success**: schema fields → model+migration → service → pipeline hook → tests. +5. **Global rules**: follow CLAUDE.md / AGENTS.md; all five CI gates must pass. + +--- + +## Goal + +A demo/showcase run started with `preservation="keep"` creates exactly one +`showcase_workspace` row that records the run configuration (seed, scenario, reset, +skip_seed, name) and — when the pipeline finishes — every object the run created +(winning/V2 registry run ids, alias, scenario plan ids, batch id, agent session id, +artifact paths, store/product grain, date window) plus a result summary. A run +without the new fields behaves **byte-identically to today** (no row, same events, +same responses). Legacy WS/HTTP clients keep working unchanged. + +**Deliverable** (all additive, backend-only — no frontend changes in E1): + +- `app/features/demo/models.py` — new `ShowcaseWorkspace` ORM model (first table owned by the demo slice). +- `alembic/versions/<new>_create_showcase_workspace_table.py` — forward migration + clean downgrade. +- `alembic/env.py` — one added model-registration import. +- `app/features/demo/schemas.py` — `DemoRunRequest` gains `preservation` + `workspace_name`. +- `app/features/demo/workspace.py` — new module: create/finalize (+ get/list helpers for tests and E4). +- `app/features/demo/pipeline.py` — `DemoContext.workspace_id` field + create/finalize hooks in `run_pipeline`; `pipeline_complete.data` gains additive `workspace_id`. +- Tests: schema unit tests, model constraint/CRUD integration tests, workspace-service integration tests, pipeline unit tests, route passthrough tests. +- `docs/_base/API_CONTRACTS.md` — additive contract notes for the two request fields and the `workspace_id` summary key. + +**Success definition**: all Success Criteria below check off, the five CI gates are +green, and a manual `POST /demo/run` with `{"preservation": "keep", "workspace_name": +"e1-smoke"}` against a seeded local stack produces a `completed` workspace row whose +`created_objects` JSONB contains the run's real ids. + +## Why + +- The cleanup step deletes nothing (`app/features/demo/pipeline.py:2045` `step_cleanup` only closes the agent session and restores the `demo-production` alias), so showcase objects already persist — but unlabeled and unfindable. E1 gives that de-facto preservation explicit semantics and discoverability. +- Umbrella #389 decomposition: E1 is the Foundation; #391/#392/#393 all build on the table and the request fields added here. +- The only run memory today is a localStorage FIFO-5 in the frontend (`frontend/src/pages/showcase.tsx:166`) — server-side workspace rows are the prerequisite for restore/replay (E4). + +## What + +### User-visible behavior + +- `POST /demo/run` and the `WS /demo/stream` start frame accept two new **optional** fields: + - `preservation`: `"ephemeral"` (default — today's behavior, no row) or `"keep"` (create + finalize a workspace row). + - `workspace_name`: optional human label, `^[a-z0-9][a-z0-9\-_]*$`, ≤100 chars (same pattern as registry alias names, `app/features/registry/schemas.py:213`). Only allowed with `preservation="keep"` — supplying it with `"ephemeral"` is a 422. +- The final `pipeline_complete` event's `data` dict gains an additive `workspace_id` key (`null` on ephemeral runs). +- No new public endpoints in E1 (list/load is epic #393/E4). `workspace.py` ships `get_workspace`/`list_workspaces` helpers for tests and E4 reuse, unrouted. + +### Technical requirements + +- Workspace row is created (status `running`) before the first step executes and finalized (status `completed`/`failed` + collected ids) before `pipeline_complete` is yielded — including the mid-run-failure path, so a partial run still records what it created. +- Workspace DB writes are **warn-and-continue**: a DB failure must never break the demo pipeline (mirror the lifespan pattern at `app/main.py:62-71`). +- **No ForeignKeys** to `model_run` / `scenario_plan` / `batch_job` / `agent_session` — recorded ids are opaque soft references. A cross-slice FK would couple the demo slice's schema to four other slices and break independent deletion. This is a deliberate design decision; document it in the model docstring. +- The demo slice still never imports another feature slice (`app/features/demo/` imports only `app.core.*`, `app.shared.*`, and stdlib/3rd-party — verified: the pipeline drives everything through ASGITransport). + +### Success Criteria + +- [ ] `DemoRunRequest()` (no args) serializes identically to today's defaults plus `preservation="ephemeral"`, `workspace_name=None`; a start frame without the new keys validates (legacy compatibility). +- [ ] `preservation="keep"` run → exactly one `showcase_workspace` row: status `completed` on a green run, `failed` when a step fails; `created_objects` carries the ids the run produced; `result_summary` carries winner/wape/wall-clock. +- [ ] `preservation="ephemeral"` (or omitted) → zero rows written, zero workspace queries issued. +- [ ] `workspace_name` with `preservation="ephemeral"` → 422 `application/problem+json`. +- [ ] `pipeline_complete.data.workspace_id` present (string on keep runs, `null` otherwise). +- [ ] Migration applies AND downgrades cleanly on a fresh DB; `schema-validation.yml` autogenerate drift check sees the model (env.py import added). +- [ ] `uv run ruff check . && uv run ruff format --check . && uv run mypy app/ && uv run pyright app/ && uv run pytest -v -m "not integration"` all green; integration suite green against docker-compose Postgres. + +## All Needed Context + +### Documentation & References + +```yaml +# MUST READ — codebase patterns (all verified 2026-06-12, branch dev @ 2c71928) + +- file: app/features/demo/schemas.py + why: | + DemoRunRequest lives at lines 29-61. ConfigDict(strict=True) at line 38. + The `scenario` field (line 57) shows the Field(strict=False) override pattern + for enum-on-the-wire; the NEW fields are JSON-native (str/Literal) so they + need NO strict=False. Copy the comment style used for the PRP-38 scenario field. + +- file: app/features/demo/pipeline.py + why: | + DemoContext dataclass at line 212 (add `workspace_id: str | None = None` after + the PRP-41 fields at line 256). Orchestrator run_pipeline at line 2554: ctx is + built at 2582-2587; the _Client context opens at 2595; the fail-path alias + restore at 2661-2668; pipeline_complete is yielded at 2671-2691 — finalize the + workspace BEFORE this yield and add "workspace_id" to its data dict (line 2681). + The orchestrator MUST NEVER raise (contract in docstring, lines 2557-2558). + +- file: app/features/demo/service.py + why: | + Single-flight asyncio.Lock at line 19 — only one pipeline runs at a time, so + workspace-row writes have no concurrency races. run_pipeline_sync (line 46) + builds DemoRunResult from the pipeline_complete event — no change needed there + unless you surface workspace_id on DemoRunResult (optional, recommended: + additive `workspace_id: str | None = None` field mirroring `winning_run_id`). + +- file: app/features/demo/routes.py + why: | + POST /demo/run (line 38) and WS /demo/stream (line 57). The WS start frame is + validated via DemoRunRequest.model_validate(raw) at line 73 — pydantic default + (no extra="forbid") IGNORES unknown keys, so old/new clients interoperate. + Routes have NO DB dependency today and need none — the workspace module opens + its own sessions. + +- file: app/features/batch/models.py + why: | + THE precedent for "a slice owns its own table": Base + TimestampMixin imports + (lines 42-43), Mapped[]/mapped_column patterns, String(32) unique external id + (line 143), JSONB columns (lines 145-146, 159-160), CheckConstraint + + composite Index in __table_args__ (lines 166-180). Mirror this file's shape. + +- file: app/features/scenarios/models.py + why: | + Second precedent: JSONB with server_default text("'[]'::jsonb") (lines 74-76), + CHECK constraint naming convention ck_<table>_<col> (lines 102-115). + GOTCHA in its docstring: SQLAlchemy reserves attribute name `metadata` — + never name a column/attribute that. + +- file: app/shared/models.py + why: TimestampMixin (created_at/updated_at, server_default=func.now()) — use it. + +- file: alembic/env.py + why: | + Lines 15-24: every slice with models registers via + `from app.features.<slice> import models as <slice>_models # noqa: F401`. + ADD `from app.features.demo import models as demo_models # noqa: F401` + in alphabetical position (after data_platform, before explainability). + +- file: alembic/versions/e4f5a6b7c8d9_add_model_selection_decision_promotion.py + why: | + CURRENT HEAD revision is e4f5a6b7c8d9 (verified `uv run alembic heads`). + Your new migration's down_revision = "e4f5a6b7c8d9". Copy the header/docstring + format, the typing (`revision: str`, `down_revision: str | None`), and the + upgrade()/downgrade() docstring style. + +- file: alembic/versions/43e35957a248_create_scenario_plan_table.py + why: | + create_table + named CheckConstraint + op.create_index (incl. GIN with + postgresql_using='gin', lines 62-70) — the create-table migration to mirror. + +- file: app/core/database.py + why: | + Base class + get_session_maker(). The workspace module opens sessions via + get_session_maker() (NOT a request dependency) because run_pipeline is not + request-scoped. Precedent: app/main.py:63-65 (lifespan) and the agents + websocket per-message sessions. + +- file: app/main.py + why: | + Lines 62-71 — the warn-and-continue pattern ("config must never block + startup"): try/except Exception + logger.warning with error & error_type. + Workspace writes use exactly this pattern ("workspace must never break the demo"). + +- file: app/features/scenarios/service.py + why: | + create_plan (line 354) — canonical async service write: build ORM object, + db.add, await db.commit() (line 423), await db.refresh (line 424). + Follow for create_workspace/finalize_workspace. + +- file: app/core/exceptions.py + why: | + ForecastLabError subclasses → RFC 7807 via registered handlers. The 422 on + workspace_name+ephemeral comes FREE from pydantic validation at the boundary + (FastAPI → 422 problem+json via the validation handler); no manual raise needed. + +- file: app/features/demo/tests/conftest.py + why: | + The demo test client fixture (ASGITransport over app.main.app); route tests + monkeypatch the demo service so the real pipeline never runs. + +- file: app/features/demo/tests/test_schemas.py + why: | + Existing DemoRunRequest tests INCLUDING the JSON-path convention + (Model.model_validate({json-shaped dict}) — mandated by + .claude/rules/security-patterns.md § strict mode). Extend this file. + +- file: app/features/scenarios/tests/conftest.py + why: | + Integration DB fixture precedent (async_sessionmaker over create_async_engine, + line 52-59) — copy for the workspace/model integration tests. + +- file: docs/_base/API_CONTRACTS.md + why: | + The POST /demo/run row and "WebSocket Events (/demo/stream)" section document + the start-frame fields — add the two new Optional fields + the additive + pipeline_complete data.workspace_id key, in the same additive-note style as + the PRP-38 scenario field. + +# Issue / initiative context +- url: https://github.com/w7-mgfcode/ForecastLabAI/issues/390 + why: The epic this PRP implements (Foundation; blocks #391 #392 #393). +- url: https://github.com/w7-mgfcode/ForecastLabAI/issues/389 + why: Umbrella — success criteria + out-of-scope list (no export, no per-phase config, no endpoints beyond recording in E1). +``` + +### Current Codebase tree (relevant subset) + +```bash +app/features/demo/ +├── __init__.py +├── pipeline.py # 2692 lines; DemoContext @212; run_pipeline @2554 +├── routes.py # POST /demo/run @38; WS /demo/stream @57 +├── schemas.py # DemoRunRequest @29; StepEvent @64; DemoRunResult @106 +├── service.py # asyncio.Lock single-flight @19 +└── tests/ + ├── conftest.py # ASGITransport client fixture + ├── test_pipeline.py + ├── test_routes.py + └── test_schemas.py +alembic/ +├── env.py # model imports @15-24 (NO demo import yet) +└── versions/ # head: e4f5a6b7c8d9 +``` + +### Desired Codebase tree (files added/modified) + +```bash +app/features/demo/ +├── models.py # NEW — ShowcaseWorkspace ORM (+ status constants) +├── workspace.py # NEW — create/finalize/get/list (session-maker based) +├── schemas.py # MOD — DemoRunRequest +preservation +workspace_name (+model_validator); +│ # DemoRunResult +workspace_id (additive Optional) +├── pipeline.py # MOD — DemoContext.workspace_id; create/finalize hooks in run_pipeline +├── service.py # MOD — surface workspace_id on DemoRunResult (1 line in the final build) +└── tests/ + ├── test_schemas.py # MOD — new-field defaults, JSON path, pattern, ephemeral+name=422, legacy frame + ├── test_models.py # NEW — constraint + CRUD (integration) + ├── test_workspace.py # NEW — create/finalize/get/list (integration) + ├── test_pipeline.py # MOD — keep-mode creates+finalizes (workspace fns monkeypatched); ephemeral writes nothing + └── test_routes.py # MOD — passthrough of new fields (service monkeypatched); WS legacy frame +alembic/ +├── env.py # MOD — +demo models import +└── versions/a1b2c3d4e5f6_create_showcase_workspace_table.py # NEW (id illustrative — generate your own 12-hex) +docs/_base/API_CONTRACTS.md # MOD — additive contract notes +``` + +### Known Gotchas & Library Quirks + +```python +# CRITICAL — strict mode: DemoRunRequest has ConfigDict(strict=True) (schemas.py:38). +# The new fields are JSON-native (Literal[str] / str|None) → NO Field(strict=False) +# needed. test_strict_mode_policy.py (AST walker) only fires on +# date/datetime/time/UUID/Decimal — neither new field triggers it. + +# CRITICAL — the orchestrator must NEVER raise (pipeline.py:2557 contract). +# Wrap every workspace DB call in try/except Exception + logger.warning +# (pattern: app/main.py:62-71). A dead Postgres must not kill the demo stream. + +# CRITICAL — pipeline_complete is ALWAYS emitted (even on step failure via the +# break at pipeline.py:2668). Finalize the workspace row BEFORE the final yield +# at 2671 so the failure path records partial created_objects too. + +# CRITICAL — NO ForeignKeys on showcase_workspace. ids are soft references. +# ctx.winning_run_id et al. are plain strings produced by HTTP responses; the +# referenced rows can be deleted independently (e.g. DELETE /registry/runs/{id}). + +# GOTCHA — SQLAlchemy reserves the declarative attr name `metadata` +# (scenarios/models.py:9-10). Use `created_objects` / `result_summary`. + +# GOTCHA — external-id convention is uuid.uuid4().hex (32 chars, python-side), +# String(32) unique+index — NOT server-side gen_random_uuid(). Matches +# batch_job.batch_id (batch/models.py:143) and scenario_plan.scenario_id. + +# GOTCHA — alembic/env.py MUST import the new models module (noqa: F401) or the +# schema-validation autogenerate drift check will not see the table and a later +# autogenerate would try to DROP it. + +# GOTCHA — alembic revision ids in this repo are hand-written 12-hex strings +# continuing the chain (head = e4f5a6b7c8d9). Either run +# `uv run alembic revision -m "create showcase_workspace table"` and keep the +# generated id, or hand-write one — but down_revision MUST be "e4f5a6b7c8d9". + +# GOTCHA — WS start frame: DemoRunRequest.model_validate(raw) at routes.py:73 with +# default model_config IGNORES unknown keys. Do NOT add extra="forbid" — that +# would break forward/backward compatibility deliberately relied upon. + +# GOTCHA — repo has mixed CRLF/LF line endings; check `git diff --stat` before +# committing to avoid whole-file noise diffs (Write/Edit emit LF — fine for NEW +# files; for schemas.py/pipeline.py edits, verify the diff is surgical). + +# GOTCHA — mypy --strict AND pyright --strict both gate merge. New modules need +# full annotations incl. return types on fixtures and `-> None` on tests. + +# CONVENTION — commits: `feat(api): ... (#390)`; branch off dev: +# feat/showcase-workspace-persistence-backbone (≤50 chars, kebab). +# NO AI co-author trailer (hook-enforced). + +# RUNTIME-VERIFICATION LOG (per prp-create step 3): +# - `uv run alembic heads` → e4f5a6b7c8d9 (verified 2026-06-12) +# - DemoRunRequest strict config + scenario strict=False → schemas.py:38,57 (read) +# - No FastAPI/SQLAlchemy/Pydantic API is cited here beyond patterns already +# working in-repo (JSONB, CheckConstraint, async_sessionmaker) — no external +# library claims requiring a one-off import probe. +``` + +## Implementation Blueprint + +### Data models and structure + +```python +# app/features/demo/models.py (NEW — mirror batch/models.py shape) +"""Showcase workspace ORM model. + +First table owned by the demo slice (precedent: app/features/batch/models.py). +A row = one preserved showcase run: its configuration and the ids of every +object the pipeline created. All recorded ids are OPAQUE SOFT REFERENCES — +deliberately no ForeignKey to model_run / scenario_plan / batch_job / +agent_session, so cross-slice schema coupling stays zero and referenced rows +remain independently deletable. +""" +from __future__ import annotations +import datetime as _dt +from typing import Any +from sqlalchemy import CheckConstraint, Date, Index, Integer, String, text +from sqlalchemy.dialects.postgresql import JSONB +from sqlalchemy.orm import Mapped, mapped_column +from app.core.database import Base +from app.shared.models import TimestampMixin + +WORKSPACE_STATUS_RUNNING = "running" +WORKSPACE_STATUS_COMPLETED = "completed" +WORKSPACE_STATUS_FAILED = "failed" + +class ShowcaseWorkspace(TimestampMixin, Base): + __tablename__ = "showcase_workspace" + + id: Mapped[int] = mapped_column(Integer, primary_key=True) + workspace_id: Mapped[str] = mapped_column(String(32), unique=True, index=True) # uuid4().hex + name: Mapped[str | None] = mapped_column(String(100), nullable=True, index=True) + status: Mapped[str] = mapped_column( + String(20), default=WORKSPACE_STATUS_RUNNING, nullable=False, index=True + ) + # Run configuration — replay inputs (E4 reads these verbatim). + seed: Mapped[int] = mapped_column(Integer, nullable=False) + scenario: Mapped[str] = mapped_column(String(40), nullable=False) # ScenarioPreset.value + reset: Mapped[bool] = mapped_column(nullable=False, default=False) + skip_seed: Mapped[bool] = mapped_column(nullable=False, default=True) + # Grain + window discovered by the status/seed steps (nullable: unknown on early failure). + store_id: Mapped[int | None] = mapped_column(Integer, nullable=True) + product_id: Mapped[int | None] = mapped_column(Integer, nullable=True) + date_start: Mapped[_dt.date | None] = mapped_column(Date, nullable=True) + date_end: Mapped[_dt.date | None] = mapped_column(Date, nullable=True) + # Everything the run created — flexible JSONB (soft references, see module docstring). + created_objects: Mapped[dict[str, Any]] = mapped_column( + JSONB, nullable=False, default=dict, server_default=text("'{}'::jsonb") + ) + # winner_model_type / winner_wape / wall_clock_s / any_fail — display payload. + result_summary: Mapped[dict[str, Any] | None] = mapped_column(JSONB, nullable=True) + + __table_args__ = ( + CheckConstraint( + "status IN ('running', 'completed', 'failed')", + name="ck_showcase_workspace_status", + ), + Index("ix_showcase_workspace_status_created", "status", "created_at"), + ) +``` + +```python +# app/features/demo/schemas.py — DemoRunRequest additions (after `scenario`, line 61) + # E1 (#390): preservation policy. Default "ephemeral" keeps legacy behaviour + # byte-identical (no workspace row). Both fields are JSON-native, so the + # model-level strict=True needs no per-field override. + preservation: Literal["ephemeral", "keep"] = Field( + default="ephemeral", + description="'keep' records this run as a showcase_workspace row.", + ) + workspace_name: str | None = Field( + default=None, + max_length=100, + pattern=r"^[a-z0-9][a-z0-9\-_]*$", # same pattern as registry alias_name + description="Optional workspace label; requires preservation='keep'.", + ) + + @model_validator(mode="after") + def _workspace_name_requires_keep(self) -> DemoRunRequest: + if self.workspace_name is not None and self.preservation != "keep": + raise ValueError("workspace_name requires preservation='keep'") + return self +``` + +### List of tasks (dependency order) + +```yaml +Task 1 — branch & issue hygiene: + RUN: git switch dev && git pull && git switch -c feat/showcase-workspace-persistence-backbone + VERIFY: gh issue view 390 --json state # open + +Task 2 — CREATE app/features/demo/models.py: + - MIRROR shape: app/features/batch/models.py (Base+TimestampMixin, __table_args__) + - CONTENT: ShowcaseWorkspace + 3 status constants (see blueprint above) + - DOCSTRING: state the no-FK soft-reference decision explicitly + +Task 3 — MODIFY alembic/env.py: + - INSERT (alphabetical, after data_platform import at line 18): + from app.features.demo import models as demo_models # noqa: F401 + +Task 4 — CREATE migration alembic/versions/<rev>_create_showcase_workspace_table.py: + - down_revision = "e4f5a6b7c8d9" + - MIRROR: 43e35957a248_create_scenario_plan_table.py (create_table + named CHECK + + op.create_index incl. unique index on workspace_id + composite status/created_at) + - downgrade(): drop indexes then op.drop_table("showcase_workspace") + - VERIFY locally: uv run alembic upgrade head && uv run alembic downgrade -1 && uv run alembic upgrade head + +Task 5 — MODIFY app/features/demo/schemas.py: + - ADD model_validator import from pydantic + - ADD the two fields + validator to DemoRunRequest (blueprint above) + - ADD to DemoRunResult: workspace_id: str | None = Field(default=None, description=...) + - UPDATE DemoRunRequest docstring (the "every field is JSON-native" claim still holds — say so) + +Task 6 — CREATE app/features/demo/workspace.py: + - Module docstring: warn-and-continue contract; session-maker (not request-scoped) + - async def create_workspace(req: DemoRunRequest) -> str | None + # opens get_session_maker()() session; inserts row (uuid4().hex, status=running, + # config from req); commit; returns workspace_id. On ANY Exception: + # logger.warning("demo.workspace_create_failed", error=..., error_type=...); return None + - async def finalize_workspace(workspace_id: str, ctx: DemoContext, *, failed: bool) -> None + # loads row by workspace_id, sets status, store_id/product_id/date_start/date_end, + # created_objects (see pseudocode), result_summary; commit. Warn-and-continue. + # NOTE: import DemoContext under TYPE_CHECKING to avoid runtime import cycles + # (pipeline imports workspace; workspace needs only the ctx type). + - async def get_workspace(db: AsyncSession, workspace_id: str) -> ShowcaseWorkspace | None + - async def list_workspaces(db: AsyncSession, *, limit: int = 50, offset: int = 0) -> list[ShowcaseWorkspace] + # newest-first; unrouted in E1 — consumed by tests now, E4 routes later + +Task 7 — MODIFY app/features/demo/pipeline.py: + - DemoContext: ADD `workspace_id: str | None = None` after line 256 (PRP-41 block), + with an `# E1 (#390)` comment matching the per-PRP comment convention + - run_pipeline: AFTER ctx construction (line 2587): + if req.preservation == "keep": + ctx.workspace_id = await workspace.create_workspace(req) + - run_pipeline: BEFORE the pipeline_complete yield (line 2671): + if ctx.workspace_id is not None: + await workspace.finalize_workspace(ctx.workspace_id, ctx, failed=any_fail) + - pipeline_complete data dict: ADD "workspace_id": ctx.workspace_id + - import: from app.features.demo import workspace (module import, monkeypatch-friendly) + +Task 8 — MODIFY app/features/demo/service.py: + - run_pipeline_sync: thread workspace_id from final.data into DemoRunResult + (mirror the winning_run_id line at service.py:77) + +Task 9 — tests (see Validation Loop for the full matrix): + - MODIFY tests/test_schemas.py (unit) + - CREATE tests/test_models.py (@pytest.mark.integration) + - CREATE tests/test_workspace.py(@pytest.mark.integration) + - MODIFY tests/test_pipeline.py (unit — monkeypatch workspace.create_workspace/finalize_workspace) + - MODIFY tests/test_routes.py (unit — service monkeypatched) + +Task 10 — MODIFY docs/_base/API_CONTRACTS.md: + - POST /demo/run row: append "E1 (#390) — body accepts additive Optional + `preservation: 'ephemeral'|'keep'` (default 'ephemeral') and `workspace_name`; + `workspace_name` without `preservation='keep'` → 422." + - WS /demo/stream section: same note on the start frame + "`pipeline_complete.data` + gains additive `workspace_id` (string|null)." + +Task 11 — gates, commit, PR: + - RUN the five gates + integration suite (Validation Loop) + - git diff --stat # confirm surgical diffs (CRLF noise check) + - COMMITS (reference #390, no AI trailer), e.g.: + feat(api): add showcase_workspace model and migration (#390) + feat(api): record demo run objects into showcase workspace (#390) + docs(api): document preservation and workspace_name fields (#390) + - PR into dev; title `feat(api): showcase workspace persistence backbone (#390)` +``` + +### Per-task pseudocode — the finalize payload (Task 6) + +```python +def _collect_created_objects(ctx: DemoContext) -> dict[str, Any]: + """Map DemoContext accumulator fields -> created_objects JSONB. + + Every value is already a plain str/None on ctx (HTTP response payloads). + Drop None values so the JSONB stays sparse and greppable. + """ + raw: dict[str, Any] = { + "winning_run_id": ctx.winning_run_id, # pipeline.py:234 + "v2_run_id": ctx.v2_run_id, # :237 + "v2_model_path": ctx.v2_model_path, # :238 (artifact path) + "alias": "demo-production" if ctx.winning_run_id else None, # DEMO_ALIAS + "agent_session_id": ctx.session_id, # :235 + "batch_id": ctx.batch_id, # :245 + "scenario_plan_ids": [ + s for s in (ctx.price_cut_scenario_id, ctx.holiday_scenario_id) if s + ], # :250-251 + "scenario_artifact_key": ctx.scenario_artifact_key, # :249 + "train_model_types": sorted(ctx.train_results), # :230 (keys only) + "stale_alias_run_id": ctx.stale_alias_run_id, # :243 (PRP-39 controlled row) + } + return {k: v for k, v in raw.items() if v not in (None, [])} + +# finalize_workspace core (warn-and-continue wrapper around ALL of it): +async def finalize_workspace(workspace_id: str, ctx: "DemoContext", *, failed: bool) -> None: + try: + session_maker = get_session_maker() + async with session_maker() as db: + row = (await db.execute( + select(ShowcaseWorkspace).where(ShowcaseWorkspace.workspace_id == workspace_id) + )).scalar_one_or_none() + if row is None: # create failed earlier — nothing to finalize + return + row.status = WORKSPACE_STATUS_FAILED if failed else WORKSPACE_STATUS_COMPLETED + row.store_id, row.product_id = ctx.store_id, ctx.product_id + row.date_start, row.date_end = ctx.date_start, ctx.date_end + row.created_objects = _collect_created_objects(ctx) + row.result_summary = { + "winner_model_type": ctx.winner_model_type, + "winner_wape": ctx.winner_wape, + } + await db.commit() + except Exception as exc: # workspace must never break the demo (app/main.py:62 pattern) + logger.warning("demo.workspace_finalize_failed", + workspace_id=workspace_id, error=str(exc), error_type=type(exc).__name__) +``` + +### Integration Points + +```yaml +DATABASE: + - migration: create showcase_workspace (PK id, unique workspace_id, CHECK status, + composite ix status+created_at, JSONB created_objects/result_summary) + - registration: alembic/env.py demo models import (Task 3) + +CONFIG: none — no new settings, no env vars. + +ROUTES: none added in E1. Existing /demo/run + /demo/stream gain fields via schema only. + +FRONTEND: none in E1 (epic #393/E4 wires the UI; adding the optional fields to + frontend/src/types/api.ts DemoRunRequest interface is additive whenever needed). + +DOCS: docs/_base/API_CONTRACTS.md additive notes (Task 10). RUNBOOKS/DOMAIN_MODEL + sweeps belong to the E5 release gate — do not scope-creep them here. +``` + +## Validation Loop + +### Level 1: Syntax & Style + +```bash +uv run ruff check . && uv run ruff format --check . +uv run mypy app/ && uv run pyright app/ +# Expected: clean. Both type checkers are --strict and gate merge. +``` + +### Level 2: Unit Tests (no DB) + +```python +# tests/test_schemas.py — add: +def test_demo_run_request_new_field_defaults() -> None: ... + # DemoRunRequest() -> preservation == "ephemeral", workspace_name is None + +def test_demo_run_request_json_path_keep_with_name() -> None: ... + # DemoRunRequest.model_validate({"preservation": "keep", "workspace_name": "bf-demo"}) + # — the MANDATORY json-dict path per security-patterns.md + +def test_demo_run_request_legacy_frame_still_validates() -> None: ... + # model_validate({"seed": 7}) — no new keys — passes; defaults applied + +def test_demo_run_request_workspace_name_requires_keep() -> None: ... + # pytest.raises(ValidationError): model_validate({"workspace_name": "x"}) + +def test_demo_run_request_workspace_name_pattern_rejected() -> None: ... + # "Black Friday!" and "-leading-dash" both raise ValidationError + +# tests/test_pipeline.py — add (monkeypatch app.features.demo.pipeline.workspace): +async def test_run_pipeline_keep_creates_and_finalizes_workspace(...) -> None: ... + # stub create_workspace -> "ws123"; run with canned _Client responses; + # assert finalize called once with failed matching outcome; + # assert pipeline_complete data["workspace_id"] == "ws123" + +async def test_run_pipeline_ephemeral_touches_no_workspace(...) -> None: ... + # stubs assert_not_called + +async def test_run_pipeline_workspace_create_failure_does_not_break_run(...) -> None: ... + # create_workspace returns None (its warn path) -> pipeline still completes, + # data["workspace_id"] is None + +# tests/test_routes.py — add (service monkeypatched per existing pattern): +async def test_run_demo_accepts_preservation_fields(client) -> None: ... +async def test_run_demo_rejects_name_without_keep_422(client) -> None: ... + # response.status_code == 422; content-type application/problem+json +``` + +```bash +uv run pytest app/features/demo -v -m "not integration" +uv run pytest app/core/tests/test_strict_mode_policy.py -v # AST walker still green +``` + +### Level 3: Integration (real Postgres) + +```python +# tests/test_models.py + tests/test_workspace.py — @pytest.mark.integration, +# session fixture copied from app/features/scenarios/tests/conftest.py:52-59. +# Cases: insert/read roundtrip incl. JSONB; duplicate workspace_id -> IntegrityError; +# status CHECK violation -> IntegrityError; create_workspace persists config; +# finalize_workspace(failed=True/False) sets status + payloads; finalize on a +# missing id is a silent no-op; list_workspaces newest-first + limit/offset. +``` + +```bash +docker compose up -d +uv run alembic upgrade head +uv run alembic downgrade -1 && uv run alembic upgrade head # downgrade is clean +uv run pytest app/features/demo -v -m integration +``` + +### Level 4: Manual smoke (seeded local stack, uvicorn on :8123) + +```bash +curl -s -X POST http://localhost:8123/demo/run \ + -H 'Content-Type: application/json' \ + -d '{"skip_seed": true, "preservation": "keep", "workspace_name": "e1-smoke"}' | python3 -m json.tool | head -20 +# Expect overall_status pass + workspace_id non-null. Then: +docker exec forecastlab-postgres psql -U forecastlab -d forecastlab -c \ + "SELECT workspace_id, name, status, created_objects FROM showcase_workspace ORDER BY created_at DESC LIMIT 1;" +# Expect: status=completed, created_objects with winning_run_id etc. +curl -s -X POST http://localhost:8123/demo/run -H 'Content-Type: application/json' \ + -d '{"workspace_name": "bad"}' | python3 -m json.tool # 422 problem+json +``` + +## Final validation Checklist + +- [ ] All five gates green: `uv run ruff check . && uv run ruff format --check . && uv run mypy app/ && uv run pyright app/ && uv run pytest -v -m "not integration"` +- [ ] Integration suite green: `uv run pytest -v -m integration` (fresh docker-compose DB) +- [ ] Migration upgrade + downgrade clean on a fresh DB; env.py imports demo models +- [ ] Legacy start frame (`{"seed": 42}`) behaves byte-identically (no row, no workspace key absent — `workspace_id: null` present in pipeline_complete data is the ONLY delta, and it is additive) +- [ ] Manual smoke (Level 4) passes: keep→row recorded, ephemeral→no row, name-without-keep→422 +- [ ] `git diff --stat` shows surgical diffs (no CRLF whole-file noise) +- [ ] docs/_base/API_CONTRACTS.md updated additively +- [ ] Commits formatted `feat(api)/docs(api): ... (#390)`, no AI trailer; PR into dev + +--- + +## Anti-Patterns to Avoid + +- ❌ Don't add ForeignKeys from showcase_workspace to other slices' tables — soft references only. +- ❌ Don't let a workspace DB error propagate out of run_pipeline — warn-and-continue, always. +- ❌ Don't add `extra="forbid"` to DemoRunRequest — unknown-key tolerance is the WS compat contract. +- ❌ Don't add list/get HTTP routes — that's epic #393 (E4); E1 ships the helpers unrouted. +- ❌ Don't touch the localStorage history or any frontend file — E1 is backend-only. +- ❌ Don't edit existing migrations — new revision off head e4f5a6b7c8d9. +- ❌ Don't import another feature slice from app/features/demo/ — core/shared only. + +## Confidence Score + +**9/10** for one-pass implementation success. Every pattern has a verified in-repo +precedent (batch models, scenarios migration, lifespan warn-and-continue, demo test +monkeypatching); the two open judgment calls (exact `created_objects` key set and +whether `DemoRunResult.workspace_id` is surfaced) are both specified above and both +additive — a wrong guess costs a follow-up field, not a rework. The −1 is for the +pipeline-unit-test fixtures: canned `_Client` response sequences are fiddly and may +need iteration against the existing `test_pipeline.py` harness. From c99b21720a2a363e32d8bb0cfa71ae0f567ecf8d Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 14:43:43 +0200 Subject: [PATCH 31/44] feat(api): extend demo seed profiles to all scenario presets (#391) --- app/features/demo/pipeline.py | 57 +++++++++++-- app/features/demo/tests/test_pipeline.py | 102 +++++++++++++++++++++-- 2 files changed, 144 insertions(+), 15 deletions(-) diff --git a/app/features/demo/pipeline.py b/app/features/demo/pipeline.py index 9af07a3f..66caa76b 100644 --- a/app/features/demo/pipeline.py +++ b/app/features/demo/pipeline.py @@ -32,7 +32,7 @@ from dataclasses import dataclass, field from datetime import UTC, date, datetime, timedelta from pathlib import Path -from typing import Any +from typing import Any, NamedTuple import httpx from fastapi import FastAPI @@ -476,12 +476,47 @@ async def step_reset(ctx: DemoContext, client: _Client) -> StepResult: ) -_SCENARIO_SEED_PROFILE: dict[ScenarioPreset, tuple[int, int, int]] = { - ScenarioPreset.DEMO_MINIMAL: (DEMO_SEED_STORES, DEMO_SEED_PRODUCTS, DEMO_SEED_SPAN_DAYS), +class _SeedProfile(NamedTuple): + """Demo-scaled seed profile for one scenario preset. + + The /seeder/generate request overrides preset dims/window by design + (app/features/seeder/service.py:_build_config_from_params) while + preserving the preset's behavioral character (noise, promos, stockouts, + sparsity, launch ramps). ``window`` pins a fixed calendar range + (holiday_rush); when None the window runs ``span_days`` back from today. + """ + + stores: int + products: int + span_days: int + window: tuple[date, date] | None = None + + +_SCENARIO_SEED_PROFILE: dict[ScenarioPreset, _SeedProfile] = { + ScenarioPreset.DEMO_MINIMAL: _SeedProfile( + DEMO_SEED_STORES, DEMO_SEED_PRODUCTS, DEMO_SEED_SPAN_DAYS + ), # PRP-38 — SHOWCASE_RICH profile mirrors app/shared/seeder/config.py:from_scenario. - ScenarioPreset.SHOWCASE_RICH: (5, 15, 180), + ScenarioPreset.SHOWCASE_RICH: _SeedProfile(5, 15, 180), # PRP-38 — SPARSE picker option exercises the data-shape edge case. - ScenarioPreset.SPARSE: (DEMO_SEED_STORES, DEMO_SEED_PRODUCTS, DEMO_SEED_SPAN_DAYS), + ScenarioPreset.SPARSE: _SeedProfile(DEMO_SEED_STORES, DEMO_SEED_PRODUCTS, DEMO_SEED_SPAN_DAYS), + # E2 (#391) — demo-scaled profiles for the remaining presets; the preset's + # character comes from SeederConfig.from_scenario, dims/window from this + # request (precedence contract: app/features/seeder/service.py). All + # windows stay >= 75 days so a later showcase_rich run with + # skip_seed=true clears the historical_backfill gate. + ScenarioPreset.RETAIL_STANDARD: _SeedProfile(5, 15, 180), + ScenarioPreset.HIGH_VARIANCE: _SeedProfile(5, 15, 180), + ScenarioPreset.STOCKOUT_HEAVY: _SeedProfile(5, 15, 180), + # Extra products for launch variety (the native preset seeds 100). + ScenarioPreset.NEW_LAUNCHES: _SeedProfile(5, 25, 180), + # Calendar-pinned: the preset's HolidayConfig spikes are fixed 2024 dates + # (app/shared/seeder/config.py:from_scenario) — a today-anchored window + # would never contain them. span_days is dead data when window is set + # (the pinned range is 92 days inclusive, delta 91). + ScenarioPreset.HOLIDAY_RUSH: _SeedProfile( + 5, 15, 91, window=(date(2024, 10, 1), date(2024, 12, 31)) + ), } @@ -489,12 +524,16 @@ async def step_seed(ctx: DemoContext, client: _Client) -> StepResult: """Seed the active scenario (skipped when ``skip_seed`` is set).""" if ctx.skip_seed: return ("skip", "skip_seed=true (assuming a seeded database)", {}) - stores, products, span_days = _SCENARIO_SEED_PROFILE.get( + profile = _SCENARIO_SEED_PROFILE.get( ctx.scenario, - (DEMO_SEED_STORES, DEMO_SEED_PRODUCTS, DEMO_SEED_SPAN_DAYS), + _SeedProfile(DEMO_SEED_STORES, DEMO_SEED_PRODUCTS, DEMO_SEED_SPAN_DAYS), ) - seed_end = datetime.now(UTC).date() - seed_start = seed_end - timedelta(days=span_days) + stores, products = profile.stores, profile.products + if profile.window is not None: + seed_start, seed_end = profile.window + else: + seed_end = datetime.now(UTC).date() + seed_start = seed_end - timedelta(days=profile.span_days) body = await client.request( "seed", "POST", diff --git a/app/features/demo/tests/test_pipeline.py b/app/features/demo/tests/test_pipeline.py index b9b37f07..cfd692d0 100644 --- a/app/features/demo/tests/test_pipeline.py +++ b/app/features/demo/tests/test_pipeline.py @@ -8,9 +8,11 @@ from __future__ import annotations +from datetime import date, timedelta from types import SimpleNamespace from typing import Any, cast +import pytest from fastapi import FastAPI from app.features.demo import pipeline @@ -675,11 +677,26 @@ def test_phase_table_showcase_rich_emits_24_steps_with_agents_hitl_and_ops_snaps ] -def test_phase_table_sparse_matches_demo_minimal_shape(): - """PRP-38 — SPARSE is offered in the picker but does not extend the pipeline.""" - sparse_rows = pipeline._phase_table(ScenarioPreset.SPARSE) +@pytest.mark.parametrize( + "preset", + [ + ScenarioPreset.RETAIL_STANDARD, + ScenarioPreset.HOLIDAY_RUSH, + ScenarioPreset.HIGH_VARIANCE, + ScenarioPreset.STOCKOUT_HEAVY, + ScenarioPreset.NEW_LAUNCHES, + ScenarioPreset.SPARSE, + ], +) +def test_phase_table_non_showcase_presets_match_demo_minimal_shape(preset: ScenarioPreset): + """PRP-38 / E2 (#391) — only SHOWCASE_RICH extends the pipeline. + + Every other preset (incl. the 5 newly exposed by E2) runs the legacy + 11-step flow; the picker offers them as data-shape variations only. + """ + rows = pipeline._phase_table(preset) minimal_rows = pipeline._phase_table(ScenarioPreset.DEMO_MINIMAL) - assert [(p, s) for p, s, _ in sparse_rows] == [(p, s) for p, s, _ in minimal_rows] + assert [(p, s) for p, s, _ in rows] == [(p, s) for p, s, _ in minimal_rows] def test_legacy_step_table_adapter_returns_11_pairs(): @@ -1001,8 +1018,6 @@ def test_parse_artifact_key_v2_artifacts_models_path(): def test_parse_artifact_key_rejects_unparseable(): """PRP-40 — a malformed artifact_uri raises ValueError (not a silent miss).""" - import pytest - with pytest.raises(ValueError, match="Cannot parse artifact-key"): pipeline._parse_artifact_key("not-a-model-uri.bin") @@ -2259,3 +2274,78 @@ async def request( assert final.event_type == "pipeline_complete" assert final.status == "fail" assert final.data["workspace_id"] == "ws-e1-test" + + +# ============================================================================= +# E2 (#391) — per-preset demo seed profiles +# ============================================================================= + + +def test_scenario_seed_profile_covers_every_preset(): + """E2 (#391) — every ScenarioPreset member has an explicit seed profile. + + The ``.get`` fallback in step_seed stays (a future 9th member must not + crash), but no CURRENT member may silently fall back to demo_minimal — + the picker cards promise per-preset seed shapes. + """ + assert set(pipeline._SCENARIO_SEED_PROFILE) == set(ScenarioPreset) + + +def test_scenario_seed_profile_windows_clear_backfill_gate(): + """E2 (#391) — every profile window is >= 75 days so a later + showcase_rich run with skip_seed=true clears the historical_backfill gate.""" + for preset, profile in pipeline._SCENARIO_SEED_PROFILE.items(): + if profile.window is not None: + span = (profile.window[1] - profile.window[0]).days + else: + span = profile.span_days + assert span >= 75, f"{preset.value} window spans only {span} days" + + +async def test_step_seed_holiday_rush_posts_pinned_window(): + """E2 (#391) — holiday_rush MUST seed the calendar-pinned 2024 window. + + The preset's HolidayConfig spikes are fixed 2024 dates; a today-anchored + window would never contain them and the preset silently degrades. + """ + ctx = pipeline.DemoContext( + seed=42, skip_seed=False, reset=False, scenario=ScenarioPreset.HOLIDAY_RUSH + ) + client = _RecordingClient( + None, + responses={("POST", "/seeder/generate"): {"records_created": {"sales": 1}}}, + ) + status, detail, _data = await pipeline.step_seed(ctx, _as_client(client)) + assert status == "pass" + body = client.calls[0][2] + assert body is not None + assert body["scenario"] == "holiday_rush" + assert body["start_date"] == "2024-10-01" + assert body["end_date"] == "2024-12-31" + assert body["stores"] == 5 + assert body["products"] == 15 + assert "holiday_rush: 5 stores x 15 products" in detail + + +async def test_step_seed_retail_standard_posts_demo_scaled_profile(): + """E2 (#391) — retail_standard seeds 5x15 over a 180-day today-anchored window.""" + ctx = pipeline.DemoContext( + seed=42, skip_seed=False, reset=False, scenario=ScenarioPreset.RETAIL_STANDARD + ) + client = _RecordingClient( + None, + responses={("POST", "/seeder/generate"): {"records_created": {"sales": 1}}}, + ) + status, _detail, _data = await pipeline.step_seed(ctx, _as_client(client)) + assert status == "pass" + body = client.calls[0][2] + assert body is not None + assert body["scenario"] == "retail_standard" + assert body["stores"] == 5 + assert body["products"] == 15 + start = date.fromisoformat(body["start_date"]) + end = date.fromisoformat(body["end_date"]) + assert end - start == timedelta(days=180) + # sparsity stays 0.0 — the seeder override fires only when > 0, which is + # what preserves the sparse preset's 50%-missing character. + assert body["sparsity"] == 0.0 From f6e86c98b268d3e2c1369aa9603c09ef2e19bf48 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 14:43:43 +0200 Subject: [PATCH 32/44] feat(ui): expose all eight scenario presets as guided cards (#391) --- .../src/components/demo/PHASE_DEFS.test.ts | 14 ++ .../components/demo/ScenarioPicker.test.tsx | 63 +++++++-- .../src/components/demo/ScenarioPicker.tsx | 128 ++++++++++++------ frontend/src/pages/showcase.tsx | 5 +- frontend/src/types/api.ts | 15 +- 5 files changed, 166 insertions(+), 59 deletions(-) diff --git a/frontend/src/components/demo/PHASE_DEFS.test.ts b/frontend/src/components/demo/PHASE_DEFS.test.ts index 74f0665c..803cc16e 100644 --- a/frontend/src/components/demo/PHASE_DEFS.test.ts +++ b/frontend/src/components/demo/PHASE_DEFS.test.ts @@ -72,6 +72,20 @@ describe('PHASE_DEFS lockstep with backend _phase_table', () => { expect(sparse).toEqual(minimal) }) + // E2 (#391) — lockstep with the backend parametrized test + // test_phase_table_non_showcase_presets_match_demo_minimal_shape: only + // showcase_rich extends the pipeline; the 5 newly exposed presets run the + // legacy 11-step flow. + it.each([ + 'retail_standard', + 'holiday_rush', + 'high_variance', + 'stockout_heavy', + 'new_launches', + ] as const)('%s -> matches the demo_minimal shape (E2 #391)', (preset) => { + expect(phaseDefsForScenario(preset)).toEqual(phaseDefsForScenario('demo_minimal')) + }) + it('PHASE_ORDER contains exactly the ten canonical phases (PRP-41 swaps agent->agents and adds ops)', () => { expect(PHASE_ORDER).toEqual([ 'data', diff --git a/frontend/src/components/demo/ScenarioPicker.test.tsx b/frontend/src/components/demo/ScenarioPicker.test.tsx index 15bc8ac6..044f137a 100644 --- a/frontend/src/components/demo/ScenarioPicker.test.tsx +++ b/frontend/src/components/demo/ScenarioPicker.test.tsx @@ -1,26 +1,63 @@ -import { afterEach, describe, expect, it } from 'vitest' -import { cleanup, render, screen } from '@testing-library/react' +import { afterEach, describe, expect, it, vi } from 'vitest' +import { cleanup, fireEvent, render, screen } from '@testing-library/react' import { ScenarioPicker } from './ScenarioPicker' afterEach(cleanup) +const ALL_PRESETS = [ + 'retail_standard', + 'holiday_rush', + 'high_variance', + 'stockout_heavy', + 'new_launches', + 'sparse', + 'demo_minimal', + 'showcase_rich', +] as const + describe('ScenarioPicker', () => { - it('renders the current value on the trigger', () => { + it('renders all 8 preset cards with their monospace ids', () => { render(<ScenarioPicker value="demo_minimal" onChange={() => undefined} />) - const trigger = screen.getByRole('combobox') - expect(trigger).toBeTruthy() - expect(trigger.textContent ?? '').toContain('demo_minimal') + const cards = screen.getAllByRole('button') + expect(cards.length).toBe(8) + for (const preset of ALL_PRESETS) { + expect(screen.getByText(preset)).toBeTruthy() + } }) - it('is disabled when the run is in flight', () => { - render(<ScenarioPicker value="demo_minimal" onChange={() => undefined} disabled />) - const trigger = screen.getByRole('combobox') as HTMLButtonElement - expect(trigger.disabled).toBe(true) + it('fires onChange with the preset value when a card is clicked', () => { + const onChange = vi.fn() + render(<ScenarioPicker value="demo_minimal" onChange={onChange} />) + fireEvent.click(screen.getByText('retail_standard').closest('button')!) + expect(onChange).toHaveBeenCalledWith('retail_standard') }) - it('renders the showcase_rich label when that is the selected value', () => { + it('marks the selected card with aria-pressed=true and all others false', () => { render(<ScenarioPicker value="showcase_rich" onChange={() => undefined} />) - const trigger = screen.getByRole('combobox') - expect(trigger.textContent ?? '').toContain('showcase_rich') + const pressed = screen.getAllByRole('button', { pressed: true }) + expect(pressed.length).toBe(1) + expect(pressed[0]!.textContent ?? '').toContain('showcase_rich') + expect(screen.getAllByRole('button', { pressed: false }).length).toBe(7) + }) + + it('disables every card while a run is in flight', () => { + render(<ScenarioPicker value="demo_minimal" onChange={() => undefined} disabled />) + const cards = screen.getAllByRole('button') as HTMLButtonElement[] + expect(cards.length).toBe(8) + for (const card of cards) { + expect(card.disabled).toBe(true) + } + }) + + it('shows the expected-fail caveat on the sparse card', () => { + render(<ScenarioPicker value="demo_minimal" onChange={() => undefined} />) + const sparseCard = screen.getByText('sparse').closest('button')! + expect(sparseCard.textContent ?? '').toContain('expected') + }) + + it('shows the pinned-2024-window caveat on the holiday_rush card', () => { + render(<ScenarioPicker value="demo_minimal" onChange={() => undefined} />) + const holidayCard = screen.getByText('holiday_rush').closest('button')! + expect(holidayCard.textContent ?? '').toContain('2024') }) }) diff --git a/frontend/src/components/demo/ScenarioPicker.tsx b/frontend/src/components/demo/ScenarioPicker.tsx index b19a239c..bbd028e7 100644 --- a/frontend/src/components/demo/ScenarioPicker.tsx +++ b/frontend/src/components/demo/ScenarioPicker.tsx @@ -1,38 +1,74 @@ -import { - Select, - SelectContent, - SelectGroup, - SelectItem, - SelectTrigger, - SelectValue, -} from '@/components/ui/select' +import { Badge } from '@/components/ui/badge' +import { cn } from '@/lib/utils' import type { ScenarioPreset } from '@/types/api' interface ScenarioOption { value: ScenarioPreset - label: string + title: string description: string estimatedWallClock: string + caveat?: string + caveatKind?: 'expected-skip' | 'info' } +// E2 (#391) — single source of card copy. Descriptions are truthful to the +// demo-scaled _SeedProfile the pipeline's seed step posts +// (app/features/demo/pipeline.py:_SCENARIO_SEED_PROFILE), NOT to the preset's +// native full-size config. const SCENARIO_OPTIONS: ScenarioOption[] = [ { value: 'demo_minimal', - label: 'demo_minimal', + title: 'Demo minimal', description: '3 stores × 10 products × 92 days — fast smoke loop', estimatedWallClock: '~60 s', }, { value: 'showcase_rich', - label: 'showcase_rich', - description: '5 stores × 15 products × 180 days — V1+V2 modeling', + title: 'Showcase rich', + description: '5 × 15 × 180 days — full 24-step flow, V1+V2 modeling', estimatedWallClock: '~3 min', + caveat: 'Knowledge/agent steps skip without provider keys', + caveatKind: 'info', + }, + { + value: 'retail_standard', + title: 'Retail standard', + description: '5 × 15 × 180 days — steady demand, light promos', + estimatedWallClock: '~90 s', + }, + { + value: 'holiday_rush', + title: 'Holiday rush', + description: '5 × 15 × Oct–Dec 2024 — Black Friday/Christmas spikes', + estimatedWallClock: '~90 s', + caveat: 'Seeds a pinned 2024 window (calendar-pinned holidays)', + caveatKind: 'info', + }, + { + value: 'high_variance', + title: 'High variance', + description: '5 × 15 × 180 days — noisy demand with anomaly spikes', + estimatedWallClock: '~90 s', + }, + { + value: 'stockout_heavy', + title: 'Stockout heavy', + description: '5 × 15 × 180 days — 25% stockout days zero the sales', + estimatedWallClock: '~90 s', + }, + { + value: 'new_launches', + title: 'New launches', + description: '5 × 25 × 180 days — 45-day product launch ramps', + estimatedWallClock: '~2 min', }, { value: 'sparse', - label: 'sparse', - description: 'Sparse + gappy time series — edge-case data shape', + title: 'Sparse', + description: '3 × 10 × 92 days — 50% missing grains + random gaps', estimatedWallClock: '~90 s', + caveat: '⏭️ May fail at features/backtest (NaN WAPE) — expected; see runbook', + caveatKind: 'expected-skip', }, ] @@ -43,39 +79,49 @@ interface ScenarioPickerProps { } /** - * PRP-38 — shadcn `<Select>` for the demo pipeline's scenario preset. + * E2 (#391) — guided card grid for the demo pipeline's scenario preset. * - * Composition rule: `<SelectItem>` lives inside `<SelectGroup>` per - * `.claude/rules/shadcn-ui.md`. Three headline options; default - * `demo_minimal` keeps wire-compat with prior clients. + * All 8 backend ScenarioPreset values are exposed as aria-pressed toggle + * buttons (W3C APG button pattern — no roving tabindex needed, unlike + * role="radio"). Default `demo_minimal` keeps wire-compat with prior clients. */ export function ScenarioPicker({ value, onChange, disabled }: ScenarioPickerProps) { return ( <div className="flex flex-col gap-2"> <label className="text-sm font-medium">Scenario</label> - <Select - value={value} - onValueChange={(v) => onChange(v as ScenarioPreset)} - disabled={disabled} - > - <SelectTrigger className="w-[280px]"> - <SelectValue placeholder="Pick a scenario" /> - </SelectTrigger> - <SelectContent> - <SelectGroup> - {SCENARIO_OPTIONS.map((opt) => ( - <SelectItem key={opt.value} value={opt.value}> - <div className="flex flex-col"> - <span className="font-mono">{opt.label}</span> - <span className="text-xs text-muted-foreground"> - {opt.description} · {opt.estimatedWallClock} - </span> - </div> - </SelectItem> - ))} - </SelectGroup> - </SelectContent> - </Select> + <div role="group" aria-label="Scenario" className="grid grid-cols-2 gap-2 xl:grid-cols-4"> + {SCENARIO_OPTIONS.map((opt) => ( + <button + key={opt.value} + type="button" + aria-pressed={opt.value === value} + disabled={disabled} + onClick={() => onChange(opt.value)} + className={cn( + 'rounded-lg border p-3 text-left transition-colors', + 'hover:bg-muted/50 disabled:pointer-events-none disabled:opacity-50', + opt.value === value ? 'border-primary ring-1 ring-primary' : 'border-border' + )} + > + <div className="flex items-center justify-between gap-2"> + <span className="text-sm font-medium">{opt.title}</span> + <span className="font-mono text-xs text-muted-foreground">{opt.value}</span> + </div> + <p className="mt-1 text-xs text-muted-foreground"> + {opt.description} · {opt.estimatedWallClock} + </p> + {opt.caveat && ( + <Badge variant="outline" className="mt-2 whitespace-normal text-xs text-muted-foreground"> + {opt.caveat} + </Badge> + )} + </button> + ))} + </div> + <p className="text-xs text-muted-foreground"> + Tick <span className="font-medium">Re-seed first</span> when switching presets — without + it the run reuses the currently seeded dataset. + </p> </div> ) } diff --git a/frontend/src/pages/showcase.tsx b/frontend/src/pages/showcase.tsx index 0f590218..6f6e38ed 100644 --- a/frontend/src/pages/showcase.tsx +++ b/frontend/src/pages/showcase.tsx @@ -155,8 +155,9 @@ export default function ShowcasePage() { <p className="mt-1 text-muted-foreground"> Run the full forecasting pipeline live — phase by phase. The same flow as{' '} <code className="rounded bg-muted px-1 py-0.5 text-sm">make demo</code>, streamed to - the browser. Pick a scenario to control depth (demo_minimal stays fast; - showcase_rich exercises V1+V2 modeling). + the browser. Pick a scenario to control depth and data shape — all eight seeder + presets are available (demo_minimal stays fast; showcase_rich exercises V1+V2 + modeling). </p> </div> diff --git a/frontend/src/types/api.ts b/frontend/src/types/api.ts index 88a204e1..54ff956a 100644 --- a/frontend/src/types/api.ts +++ b/frontend/src/types/api.ts @@ -742,9 +742,18 @@ export interface VerifyResult { export type DemoStepStatus = 'running' | 'pass' | 'fail' | 'skip' | 'warn' export type DemoEventType = 'step_start' | 'step_complete' | 'pipeline_complete' | 'error' -// PRP-38 — seeder scenario presets the picker offers. Mirrors the backend -// app/shared/seeder/config.py:ScenarioPreset enum's string values. -export type ScenarioPreset = 'demo_minimal' | 'showcase_rich' | 'sparse' +// PRP-38 / E2 (#391) — seeder scenario presets the picker offers. Mirrors +// the backend app/shared/seeder/config.py:ScenarioPreset enum's string +// values — all 8 members. +export type ScenarioPreset = + | 'retail_standard' + | 'holiday_rush' + | 'high_variance' + | 'stockout_heavy' + | 'new_launches' + | 'sparse' + | 'demo_minimal' + | 'showcase_rich' // One streamed pipeline event from WS /demo/stream (matches the backend // StepEvent Pydantic model; snake_case on the wire). From ce7033dbc8a1387d873a07070e32f8ac081491e6 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 14:43:43 +0200 Subject: [PATCH 33/44] docs(api): document full scenario union and preset outcomes (#391) --- docs/_base/API_CONTRACTS.md | 4 ++-- docs/_base/RUNBOOKS.md | 7 ++++++- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/docs/_base/API_CONTRACTS.md b/docs/_base/API_CONTRACTS.md index abcebd1a..68d73c5d 100644 --- a/docs/_base/API_CONTRACTS.md +++ b/docs/_base/API_CONTRACTS.md @@ -58,7 +58,7 @@ All endpoints serve JSON; error responses use `application/problem+json` (RFC 78 | agents | WS | `/agents/stream` | Token-by-token streaming + tool-call events | | seeder | (see `app/features/seeder/routes.py`) | `/seeder/*` | Trigger scenarios, status, customization | | seeder | POST | `/seeder/phase2-enrichment` | PRP-38 — run Phase 2 generators (lifecycle, replenishment, exogenous, returns) against the existing seeded data. `422 application/problem+json` on an empty database. | -| demo | POST | `/demo/run` | Run the end-to-end demo pipeline in-process; returns a `DemoRunResult`. `409 application/problem+json` if a run is already active. **PRP-38** — body accepts an Optional `scenario: 'demo_minimal' \| 'showcase_rich' \| 'sparse'` field; default `'demo_minimal'` (back-compat). **E1 (#390)** — body accepts additive Optional `preservation: 'ephemeral' \| 'keep'` (default `'ephemeral'`, today's no-row behavior) and `workspace_name: str \| null` (pattern `^[a-z0-9][a-z0-9\-_]*$`, ≤100 chars); `workspace_name` without `preservation='keep'` → `422 application/problem+json`. `preservation='keep'` records the run as a `showcase_workspace` row; `DemoRunResult` gains an additive Optional `workspace_id: str \| null`. | +| demo | POST | `/demo/run` | Run the end-to-end demo pipeline in-process; returns a `DemoRunResult`. `409 application/problem+json` if a run is already active. **PRP-38** — body accepts an Optional `scenario: 'demo_minimal' \| 'showcase_rich' \| 'sparse'` field; default `'demo_minimal'` (back-compat). **E1 (#390)** — body accepts additive Optional `preservation: 'ephemeral' \| 'keep'` (default `'ephemeral'`, today's no-row behavior) and `workspace_name: str \| null` (pattern `^[a-z0-9][a-z0-9\-_]*$`, ≤100 chars); `workspace_name` without `preservation='keep'` → `422 application/problem+json`. `preservation='keep'` records the run as a `showcase_workspace` row; `DemoRunResult` gains an additive Optional `workspace_id: str \| null`. **E2 (#391)** — `scenario` accepts all 8 `ScenarioPreset` values (`retail_standard` / `holiday_rush` / `high_variance` / `stockout_heavy` / `new_launches` / `sparse` / `demo_minimal` / `showcase_rich`); only `showcase_rich` changes the step table (24 rows), every other preset runs the legacy 11-row flow. | | demo | WS | `/demo/stream` | Stream one `StepEvent` per pipeline step for the live Showcase page | | config | GET | `/config/ai` | Effective AI-model config (agent LLM + RAG embeddings); API keys masked, never raw | | config | PATCH | `/config/ai` | Persist + apply AI-model changes live (no restart). `409` if an embedding-dimension change would orphan indexed RAG chunks (resend with `force=true`) | @@ -83,7 +83,7 @@ Verified against `app/features/agents/websocket.py` and `app/features/agents/sch Drives the end-to-end demo pipeline for the dashboard Showcase page. Verified against `app/features/demo/routes.py` and `app/features/demo/schemas.py` (`StepEvent`). -- **Client → server (one start frame):** `{"seed": int, "reset": bool, "skip_seed": bool, "scenario"?: "demo_minimal" | "showcase_rich" | "sparse", "preservation"?: "ephemeral" | "keep", "workspace_name"?: str}` — all fields optional (`DemoRunRequest` supplies defaults `seed=42`, `reset=false`, `skip_seed=true`, `scenario="demo_minimal"`, `preservation="ephemeral"`, `workspace_name=null`). E1 (#390) — `workspace_name` requires `preservation="keep"` (else one `error` event from validation); unknown start-frame keys remain ignored (forward/backward compat). The pipeline runs once, then the server closes. +- **Client → server (one start frame):** `{"seed": int, "reset": bool, "skip_seed": bool, "scenario"?: "demo_minimal" | "showcase_rich" | "sparse", "preservation"?: "ephemeral" | "keep", "workspace_name"?: str}` — all fields optional (`DemoRunRequest` supplies defaults `seed=42`, `reset=false`, `skip_seed=true`, `scenario="demo_minimal"`, `preservation="ephemeral"`, `workspace_name=null`). E1 (#390) — `workspace_name` requires `preservation="keep"` (else one `error` event from validation); unknown start-frame keys remain ignored (forward/backward compat). E2 (#391) — `scenario` accepts all 8 `ScenarioPreset` values (`retail_standard` / `holiday_rush` / `high_variance` / `stockout_heavy` / `new_launches` / `sparse` / `demo_minimal` / `showcase_rich`); only `showcase_rich` changes the step table (24 rows), every other preset runs the legacy 11-row flow. The pipeline runs once, then the server closes. - **Server → client (every frame):** Pydantic-serialized `StepEvent` — `{"event_type", "step_name", "step_index", "total_steps", "status", "detail", "duration_ms", "data", "timestamp", "phase_name"?, "phase_index"?, "phase_total"?}`. PRP-38 — the three `phase_*` fields are Optional + Nullable so legacy clients that don't render phases keep working. - **`event_type` values (Literal in `StepEvent`):** - `step_start` — a step began; `status` is `null`. diff --git a/docs/_base/RUNBOOKS.md b/docs/_base/RUNBOOKS.md index 4ba53dca..df636648 100644 --- a/docs/_base/RUNBOOKS.md +++ b/docs/_base/RUNBOOKS.md @@ -134,10 +134,15 @@ uv run python scripts/run_demo.py --seed 42 --quiet 2>&1 | tee demo.log 25. **`agent_hitl_flow` step shows ⏭️ with `agent did not trigger save_scenario` (PRP-41, `showcase_rich` only)** — the agent answered the prompt directly (no `tool_save_scenario` call) so `pending_approval=false` came back on the chat response. Cause: model picked a different tool / answered in chat. Pipeline still greens. Fix: re-run; the model's response is non-deterministic. If the model ALWAYS skips the tool, raise the temperature in `agent_default_model` or re-prompt. 26. **`ops_snapshot` step shows ⚠️ with `/ops/* all 4xx/5xx -- ops snapshot unavailable` (PRP-41, `showcase_rich` only)** — all three of `GET /ops/summary`, `/ops/retraining-candidates`, `/ops/model-health` returned non-2xx. Cause: DB unreachable, alembic migration drift, OpsService change broke the schema. Pipeline still warn (NEVER fail). Fix: `docker compose ps`; `uv run alembic upgrade head`; re-run. 27. **Stop button used mid-run** — the Stop button on `/showcase` closes the WebSocket; the backend's `WebSocketDisconnect` handler at `app/features/demo/routes.py:74` releases `_pipeline_lock`. Page returns to `idle` within ~5 s with banner "Pipeline cancelled by user.". To resume, click Run again. Half-finished registry rows / scenario plans persist (the backend doesn't roll them back — they're operator-visible artefacts of a partial run). +28. **A newly exposed preset run ends red/skipped (E2 #391)** — the scenario card grid exposes all 8 `ScenarioPreset` values; some presets have documented non-green outcomes. Cause: a re-seed posts a demo-scaled dims/window override while keeping the preset's behavioral character from `SeederConfig.from_scenario` (noise, promos, stockouts, gaps), and some characters legitimately break pipeline steps. Per-preset expected-outcome matrix: + - `sparse` — **may FAIL** at `features`/`backtest`: 50% missing (store, product) grains + random 2–10-day gaps can leave the discovered demo grain with too-thin history, or produce an all-NaN WAPE which the `step_backtest` NaN gate fails by design (a graceful-skip would mask real regressions on healthy presets). The card carries an expected-fail badge; either green or this fail is a documented outcome. + - `holiday_rush` — seeds a **pinned Oct–Dec 2024 window** (the preset's `HolidayConfig` spikes are fixed 2024 dates; a today-anchored window would never contain them). Re-seeding ADDS rows without wiping prior data, so after a holiday_rush re-seed `/seeder/status` reports the union range (e.g. `2024-10-01..today`); tick **Reset database** together with **Re-seed first** for a clean pinned window, and again when switching back to a today-anchored preset. Expected green on the 11-step flow. + - `retail_standard` / `high_variance` / `stockout_heavy` — demo-scaled 5×15×180d, today-anchored; `new_launches` — 5×25×180d. All expected **green** on the legacy 11-step flow (only `showcase_rich` runs the 24-step table). + Fix: none for the documented outcomes above. If a normally-green preset fails, make sure **Re-seed first** was ticked (without it the run reuses the currently seeded dataset, whatever preset produced it), then re-run. > ⚠️ **RAG embedding-dim mismatch can orphan chunks (R4).** PRP-40 indexes a curated 5-file subset; if the operator switches the embedding provider mid-showcase, indexed chunks orphan (pgvector assumes one fixed dimension per column). PRP-40 does NOT ship a `clear_rag` UI toggle — that's a future PRP. Stick to one provider for the showcase run. -**Notes:** the `POST /demo/run` body and `WS /demo/stream` events are documented in `docs/_base/API_CONTRACTS.md`. The pipeline mirrors `scripts/run_demo.py`; the per-step diagnosis for `make demo` above applies to the same steps. PRP-38 added the `scenario` field on `DemoRunRequest` (defaults to `demo_minimal`) and the additive `phase_name` / `phase_index` / `phase_total` fields on every `StepEvent`. PRP-39 added four new steps (`champion_compat_compare`, `stale_alias_trigger`, `safer_promote_flow`, `batch_preset`) and a new `portfolio` phase between `decision` and `verify`. PRP-40 added the `planning` + `knowledge` phases (5 steps inserted after `portfolio`, before `verify`) and the additive `IndexProjectDocsRequest.path_prefix` field on the RAG slice. PRP-41 — design Z renames the legacy `agent` phase to `agents`, swaps the legacy `step_agent` for `agent_hitl_flow` (HITL approval round-trip), and appends a new `ops` phase carrying `ops_snapshot` immediately before `cleanup`. Total: 24 rows / 10 phases on `showcase_rich`; demo_minimal / sparse keep the 11-row layout under the unified `agents` phase id. The frontend's `DemoPhasePanel.tsx` now carries `onValueChange` (issue #311) and the Showcase page adds a KPI strip + Run-history strip + Stop button + Inspect-Artifacts panel + one-click Approve button on the HITL step card. +**Notes:** the `POST /demo/run` body and `WS /demo/stream` events are documented in `docs/_base/API_CONTRACTS.md`. The pipeline mirrors `scripts/run_demo.py`; the per-step diagnosis for `make demo` above applies to the same steps. PRP-38 added the `scenario` field on `DemoRunRequest` (defaults to `demo_minimal`) and the additive `phase_name` / `phase_index` / `phase_total` fields on every `StepEvent`. PRP-39 added four new steps (`champion_compat_compare`, `stale_alias_trigger`, `safer_promote_flow`, `batch_preset`) and a new `portfolio` phase between `decision` and `verify`. PRP-40 added the `planning` + `knowledge` phases (5 steps inserted after `portfolio`, before `verify`) and the additive `IndexProjectDocsRequest.path_prefix` field on the RAG slice. PRP-41 — design Z renames the legacy `agent` phase to `agents`, swaps the legacy `step_agent` for `agent_hitl_flow` (HITL approval round-trip), and appends a new `ops` phase carrying `ops_snapshot` immediately before `cleanup`. Total: 24 rows / 10 phases on `showcase_rich`; demo_minimal / sparse keep the 11-row layout under the unified `agents` phase id. The frontend's `DemoPhasePanel.tsx` now carries `onValueChange` (issue #311) and the Showcase page adds a KPI strip + Run-history strip + Stop button + Inspect-Artifacts panel + one-click Approve button on the HITL step card. E2 (#391) — the Scenario control is a card grid exposing all 8 `ScenarioPreset` values with per-preset demo seed profiles (`_SCENARIO_SEED_PROFILE` is exhaustive over the enum; `holiday_rush` seeds a pinned Oct–Dec 2024 window); the 5 newly exposed presets keep the legacy 11-row layout. ### release-please skipped the bump after a dev → main merge **Symptoms:** `dev → main` PR is merged, `CD Release` workflow on `main` completes in ~10s, **no Release PR** is opened. release-please log shows `No user facing commits found since <sha> - skipping`. From 39eb42964f073f3c1bf8008986d8e15662836353 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 14:43:43 +0200 Subject: [PATCH 34/44] docs(repo): track showcase workspace e2 prp (#391) --- ...P-showcase-workspace-E2-preset-exposure.md | 620 ++++++++++++++++++ 1 file changed, 620 insertions(+) create mode 100644 PRPs/PRP-showcase-workspace-E2-preset-exposure.md diff --git a/PRPs/PRP-showcase-workspace-E2-preset-exposure.md b/PRPs/PRP-showcase-workspace-E2-preset-exposure.md new file mode 100644 index 00000000..e20eb7a5 --- /dev/null +++ b/PRPs/PRP-showcase-workspace-E2-preset-exposure.md @@ -0,0 +1,620 @@ +name: "PRP — Showcase Workspace E2: Full Preset Exposure (issue #391)" +description: | + +## Purpose + +Implement the first Parallel epic of the showcase-workspace initiative (umbrella #389): +surface all 8 `ScenarioPreset` values as guided, business-friendly cards in the +frontend `ScenarioPicker`, give per-preset demo seed profiles to the pipeline's +seed step, and attach expected-skip semantics (card caveat + runbook entry) to +presets that cannot complete every pipeline step. Frontend-mostly; the backend +already accepts the full enum. + +## Core Principles + +1. **Context is King**: every reference below was verified against the live code on 2026-06-12 (branch dev @ 0493192, post-E1 merge). +2. **Validation Loops**: each level is executable as written. +3. **Information Dense**: patterns cite exact file:line. +4. **Progressive Success**: backend seed profiles → types → picker cards → lockstep tests → docs → browser dogfood. +5. **Global rules**: follow CLAUDE.md / AGENTS.md; all five backend CI gates must pass; UI work follows `.claude/rules/ui-design.md` + `.claude/rules/shadcn-ui.md`. + +--- + +## Goal + +A user on `/showcase` can pick any of the 8 seeder presets (`retail_standard`, +`holiday_rush`, `high_variance`, `stockout_heavy`, `new_launches`, `sparse`, +`demo_minimal`, `showcase_rich`) from a card grid that explains, per preset: +what data it seeds (stores × products × window), its business character (promos, +stockouts, launches, noise), an estimated wall-clock, and — where applicable — +an **expected-skip/fail caveat** so a non-green outcome reads as documented +behavior, not a bug. Re-seeding with any preset produces a pipeline run that +either goes green or fails/skips exactly as its card predicts. + +**Deliverable** (all additive): + +- `app/features/demo/pipeline.py` — `_SCENARIO_SEED_PROFILE` extended from 3 to all 8 presets via a `_SeedProfile` NamedTuple that supports an optional calendar-pinned window (needed by `holiday_rush`); `step_seed` honors the pinned window. +- `frontend/src/types/api.ts` — `ScenarioPreset` union widened from 3 to all 8 string values. +- `frontend/src/components/demo/ScenarioPicker.tsx` — shadcn `<Select>` replaced by an 8-card grid (existing `value`/`onChange`/`disabled` props preserved so `showcase.tsx` wiring does not change shape). +- Tests: backend pipeline unit tests (profiles exhaustive, pinned window posted, phase-table shape for the 5 new presets), frontend `ScenarioPicker.test.tsx` rewrite + `PHASE_DEFS.test.ts` additions. +- Docs: `docs/_base/API_CONTRACTS.md` scenario-union correction; `docs/_base/RUNBOOKS.md` showcase entry #28 (per-preset expected-outcome matrix, sparse NaN-WAPE trap). + +**Success definition**: all Success Criteria check off, the five backend gates + +frontend lint/test are green, and a real-browser dogfood shows all 8 cards, +runs `retail_standard` (re-seeded) to a green 11-step pipeline, and shows the +documented caveat on the `sparse` card. + +## Why + +- Umbrella #389: the UI exposes only 3 of 8 presets even though `DemoRunRequest.scenario` is typed as the full enum (`app/features/demo/schemas.py:59-63`) and `/seeder/generate` validates any of the 8 names (`app/features/seeder/service.py:59-71`). +- Without per-preset seed profiles, the 5 unmapped presets silently fall back to the demo_minimal profile (3×10×92d) (`app/features/demo/pipeline.py:479-485` + `.get` default at `:492-495`) — cards could not be truthful about what a re-seed generates. +- `sparse` already ships in the picker with NO caveat; its 50% missing grains + random gaps can produce a NaN-WAPE backtest **fail** (`pipeline.py:763-765`) that looks like a bug. E2 makes that an expected, documented outcome. +- E2 is Parallel after Foundation (E1 #390, merged as PR #394); it does not touch the workspace table and can land independently of E3 (#392) / E4 (#393). + +## What + +### User-visible behavior + +- The Scenario control on `/showcase` becomes a card grid with all 8 presets. Each card shows: a business-friendly title, the monospace preset id, a one-line data/character description, an estimated wall-clock, and (where applicable) a caveat badge — `sparse` gets an "expected fail/skip" badge, `holiday_rush` a "pinned 2024 window" badge. +- Selection behavior, default (`demo_minimal`), the Run/Stop buttons, Re-seed/Reset checkboxes, and the WS start frame are unchanged in shape — only the picker widget and the set of accepted values change. +- A hint line under the grid: switching presets only changes the data when **Re-seed first** is ticked (otherwise the run reuses the currently seeded dataset). +- Re-seeding with any of the 5 newly exposed presets seeds a demo-scaled dataset (5×15×180d profile; `new_launches` 5×25×180d; `holiday_rush` the calendar-pinned Oct–Dec 2024 window) that carries the preset's character (noise, promos, stockouts, launch ramps, gaps) from `SeederConfig.from_scenario`. + +### Technical requirements + +- All 5 newly exposed presets run the legacy 11-step phase table — `_phase_table` branches only on `SHOWCASE_RICH` (`pipeline.py:2510-2533`) and `phaseDefsForScenario` mirrors that (`frontend/src/components/demo/PHASE_DEFS.ts:113-119`). NO phase-table change in E2. +- The pipeline's seed request keeps overriding preset dims/window by design — `/seeder/generate` applies explicit `stores`/`products`/`start_date`/`end_date` over the preset and preserves the preset's behavioral configs (`app/features/seeder/service.py:213-226`; `sparsity` is preserved because the pipeline sends `0.0` and the override fires only `if params.sparsity > 0` at `:225-226`). +- Every demo seed window stays ≥ 75 days so a follow-up `showcase_rich` run with `skip_seed=true` clears the `historical_backfill` gate (`pipeline.py:829-833`; gate = `3*(14+1)+30 = 75`). +- No pipeline behavior change for sparse: a NaN-WAPE backtest still FAILS (`pipeline.py:763-765`); E2 ships labeling + docs, not a graceful-skip rework (that would mask real regressions on healthy presets). +- `ScenarioPicker` keeps its exact props interface (`value: ScenarioPreset`, `onChange`, `disabled?` — `ScenarioPicker.tsx:39-43`) so `showcase.tsx:187` is untouched. + +### Success Criteria + +- [ ] `frontend/src/types/api.ts` `ScenarioPreset` lists all 8 values; `pnpm lint && pnpm test --run` green; no NEW `tsc -b` errors in touched files. +- [ ] The picker renders 8 cards; clicking one fires `onChange` with the preset value; the selected card is visually + aria-marked; all cards disable while a run is in flight. +- [ ] `sparse` card carries an expected-fail/skip caveat; `holiday_rush` card carries the pinned-window caveat. +- [ ] `_SCENARIO_SEED_PROFILE` covers ALL 8 enum members (exhaustiveness test) and `step_seed` posts: `holiday_rush` → `start_date=2024-10-01`, `end_date=2024-12-31`; `retail_standard` → 5 stores, 15 products, 180-day today-anchored window. +- [ ] `_phase_table(p)` for each of the 5 new presets equals the DEMO_MINIMAL shape (backend parametrized test) and `phaseDefsForScenario(p)` matches (frontend lockstep test). +- [ ] `docs/_base/API_CONTRACTS.md` documents the full 8-value union on POST /demo/run + the WS start frame; `docs/_base/RUNBOOKS.md` gains showcase entry #28 (preset expected-outcome matrix). +- [ ] Backend gates green: `uv run ruff check . && uv run ruff format --check . && uv run mypy app/ && uv run pyright app/ && uv run pytest -v -m "not integration"`. +- [ ] Real-browser dogfood (Level 4): 8 cards visible; `retail_standard` re-seed run goes green (11 steps); legacy behavior unchanged (`demo_minimal` default). + +## All Needed Context + +### Documentation & References + +```yaml +# MUST READ — codebase patterns (all verified 2026-06-12, dev @ 0493192) + +- file: app/shared/seeder/config.py + why: | + ScenarioPreset enum at lines 37-47 (8 members, string values are the wire + values). Window constants at 10-25 (DEMO_MINIMAL_SPAN_DAYS=91, + SHOWCASE_RICH_SPAN_DAYS=180). from_scenario at 527-695 defines each + preset's character — copy card descriptions from these configs: + retail_standard 538-551 (noise 0.15, promo 0.1, stockout 0.02) + holiday_rush 553-579 (CALENDAR-PINNED 2024-10-01..2024-12-31, the + docstring at 554-557 says "Pass an explicit + start_date/end_date to shift it" — the demo MUST send + the pinned window or the 2024 HolidayConfig spikes + never land in a today-anchored window) + high_variance 581-595 (noise 0.4, anomaly 5% x3.0) + stockout_heavy 597-610 (stockout 25%, behavior "zero") + new_launches 612-628 (45-day launch ramps, native 100 products) + sparse 630-642 (missing_combinations_pct=0.5, 3 gaps of 2-10d) + showcase_rich 644-667 (5x15x180d, tuned noise 0.10) + demo_minimal 669-692 (3x10x92d, tuned noise 0.10; >= 72d for non-NaN + WAPE with expanding/3-splits/h=14/min_train=30) + +- file: app/features/demo/pipeline.py + why: | + _SCENARIO_SEED_PROFILE at 479-485 — TODAY only 3 entries + (DEMO_MINIMAL/SHOWCASE_RICH/SPARSE as (stores, products, span_days) + 3-tuples); step_seed at 488-522 reads it with a demo_minimal fallback at + 492-495 and posts a today-anchored window at 496-497. REPLACE the tuple + with a _SeedProfile NamedTuple carrying an optional pinned window. + step_seed sends sparsity=0.0 (509) — keep; it preserves preset sparsity. + _phase_table at ~2468-2551: ONLY `scenario is ScenarioPreset.SHOWCASE_RICH` + gets the 24-row table; every other member gets the legacy 11 rows — the 5 + new presets need zero phase-table work. + Backtest NaN gate at 763-765: all-NaN WAPE -> step FAIL (this is the + sparse expected outcome documented in runbook #28). + historical_backfill window gate at 829-833 (75 days, showcase_rich-only). + +- file: app/features/seeder/service.py + why: | + _build_config_from_params at 202-247 — THE precedence contract: explicit + stores/products/start_date/end_date ALWAYS override the preset (218-224); + sparsity overrides only when params.sparsity > 0 (225-226), so the + pipeline's 0.0 keeps the sparse preset's 50%-missing config. The demo can + therefore demo-scale any preset without losing its character. + _get_scenario_preset at 59-71 — any of the 8 names validates. + +- file: app/features/seeder/schemas.py + why: | + GenerateParams at 78+ — scenario: str, stores ge=1 le=100, + products ge=1 le=500, start_date/end_date. The demo seed body + (pipeline.py:502-511) maps 1:1 onto this. + +- file: frontend/src/types/api.ts + why: | + ScenarioPreset union at line 747 — widen to all 8 (keep the comment at + 745-746 pointing at the backend enum). DemoRunRequest at 769-775 needs no + change beyond the union. WARNING: this file has MIXED CRLF/LF line + endings — keep the edit surgical and check `git diff --stat` (a 1-line + union change must not become a whole-file diff). + +- file: frontend/src/components/demo/ScenarioPicker.tsx + why: | + The component to rewrite. Keep: ScenarioOption interface shape (11-16, + extend with caveat fields), SCENARIO_OPTIONS array as the single source + of card copy (18-37), the props interface verbatim (39-43), the + "Scenario" label, font-mono preset id + text-muted-foreground description + typography (69-72). Replace: the shadcn Select with a card grid. + +- file: frontend/src/components/demo/PHASE_DEFS.ts + why: | + phaseDefsForScenario at 113-119 — branches ONLY on 'showcase_rich'; + every other value (incl. the 5 new ones) returns the legacy 11 steps. + No change needed; ADD lockstep test coverage for the new values. + +- file: frontend/src/pages/showcase.tsx + why: | + Wiring stays identical: scenario state from useDemoPipeline (106-108), + start frame at 115, picker at 187. Optional: refresh the header copy at + 158-159 ("Pick a scenario to control depth..."). The Re-seed checkbox at + 205-215 is the trigger that makes a new preset actually take effect. + +- file: frontend/src/hooks/use-demo-pipeline.ts + why: | + Default scenario 'demo_minimal' at line 200; createInitialSteps at 43-55 + derives idle cards from phaseDefsForScenario — all generic over the + widened union; NO change needed (read to confirm, don't edit). + +- file: frontend/src/components/demo/demo-step-card.tsx + why: | + Skip-status visual language to mirror on caveat badges: '⏭️' emoji + (line 16) and muted-foreground accents (line 26). Semantic tokens only — + never raw colors (shadcn rule). + +- file: frontend/src/components/demo/ScenarioPicker.test.tsx + why: | + Current 3 tests query `getByRole('combobox')` — the rewrite REPLACES them + (card grid has no combobox). Keep the vitest + @testing-library/react + + afterEach(cleanup) harness pattern (lines 1-5). + +- file: frontend/src/components/demo/PHASE_DEFS.test.ts + why: Lockstep-test pattern to extend for the 5 new presets. + +- file: app/features/demo/tests/test_pipeline.py + why: | + Patterns to reuse: _RecordingClient (1010-1052) records (method, path, + json_body) per call — use it to assert step_seed's POST body per preset; + _as_client cast helper (1055-1062); test_phase_table_sparse_matches_ + demo_minimal_shape (678-682) — extend to a parametrized all-presets test. + +- file: frontend/src/components/ui/ (badge.tsx, card.tsx, tooltip.tsx) + why: | + Installed primitives for the card grid — compose from these; NO new + shadcn component install is required. If one becomes necessary anyway, + pin the CLI (`pnpm dlx shadcn@4.7.0 add ...`) — shadcn@latest 5.x writes + a stub pnpm-workspace.yaml and skips the component (known local trap). + +- file: docs/_base/RUNBOOKS.md + why: | + "Showcase page (/showcase) pipeline fails at step X" section — numbered + entries 1..27 (last: 27 "Stop button used mid-run"). Append entry 28 in + the same format: bold trigger, Cause, Fix. Also note the existing entry + pattern for expected skips (#6 historical_backfill is the model). + +- file: docs/_base/API_CONTRACTS.md + why: | + POST /demo/run row documents scenario as 'demo_minimal'|'showcase_rich'| + 'sparse' (PRP-38 note) and the WS /demo/stream start-frame line repeats + it — both must say all 8 ScenarioPreset values are accepted (E2 #391 + additive note). E1 (#390) notes on the same row/section were just added — + append, don't disturb them. + +# External references (no new libraries; a11y + testing idioms only) +- url: https://www.w3.org/WAI/ARIA/apg/patterns/button/#:~:text=aria-pressed + why: | + Toggle-button group semantics for the card grid: role="group" + + aria-label on the container, aria-pressed on each card button. Chosen + over role="radiogroup"/role="radio" because the full radio pattern + REQUIRES roving tabindex + arrow-key navigation; aria-pressed buttons + are correct without custom key handling. +- url: https://testing-library.com/docs/queries/about/#priority + why: Query the cards via getAllByRole('button', { pressed }) in the rewrite. + +# Issue / initiative context +- url: https://github.com/w7-mgfcode/ForecastLabAI/issues/391 + why: The epic this PRP implements (Parallel; E1 #390 merged via PR #394). +- url: https://github.com/w7-mgfcode/ForecastLabAI/issues/389 + why: Umbrella — out-of-scope list (NO advanced seed-config UI, NO per-phase interactive config). +``` + +### Current Codebase tree (relevant subset) + +```bash +app/features/demo/ +├── pipeline.py # _SCENARIO_SEED_PROFILE @479 (3 entries); step_seed @488; _phase_table @~2468 +├── schemas.py # DemoRunRequest.scenario: ScenarioPreset (full enum already) @60-64 +└── tests/test_pipeline.py # _RecordingClient @1010; phase-table shape tests @602-701 +app/shared/seeder/config.py # ScenarioPreset @37-47; from_scenario @527-695 +app/features/seeder/service.py # override precedence @202-247 +frontend/src/ +├── types/api.ts # ScenarioPreset union @747 (3 values; MIXED CRLF/LF) +├── components/demo/ +│ ├── ScenarioPicker.tsx # shadcn Select, 3 options +│ ├── ScenarioPicker.test.tsx # 3 combobox-role tests (to be replaced) +│ ├── PHASE_DEFS.ts # phaseDefsForScenario @113-119 (no change) +│ ├── PHASE_DEFS.test.ts # lockstep tests (extend) +│ └── demo-step-card.tsx # skip visual language (⏭️, muted tokens) +├── hooks/use-demo-pipeline.ts # default 'demo_minimal' @200 (no change) +└── pages/showcase.tsx # picker wiring @187 (≈no change) +``` + +### Desired Codebase tree (files added/modified) + +```bash +app/features/demo/pipeline.py # MOD — _SeedProfile NamedTuple; 8-entry _SCENARIO_SEED_PROFILE; step_seed pinned-window branch +app/features/demo/tests/test_pipeline.py # MOD — profile exhaustiveness, per-preset POST-body, parametrized phase-table shape +frontend/src/types/api.ts # MOD — 8-value ScenarioPreset union (surgical edit) +frontend/src/components/demo/ScenarioPicker.tsx # MOD — card grid, 8 SCENARIO_OPTIONS + caveats, same props +frontend/src/components/demo/ScenarioPicker.test.tsx # MOD — rewritten for the card grid +frontend/src/components/demo/PHASE_DEFS.test.ts # MOD — lockstep coverage for the 5 new presets +frontend/src/pages/showcase.tsx # MOD (optional, 1-2 lines) — header copy mentions 8 presets +docs/_base/API_CONTRACTS.md # MOD — full scenario union on /demo/run + WS start frame +docs/_base/RUNBOOKS.md # MOD — showcase entry #28 (preset expected-outcome matrix) +``` + +### Known Gotchas & Library Quirks + +```python +# CRITICAL — holiday_rush is CALENDAR-PINNED (config.py:553-579). Its +# HolidayConfig rows are fixed 2024 dates; a today-anchored window never +# contains them and the preset silently degrades to mild Q4 seasonality. +# The demo seed MUST send start_date=2024-10-01, end_date=2024-12-31 for +# this preset (the _SeedProfile pinned-window field exists for exactly this). + +# CRITICAL — seeder override precedence (service.py:213-226): request +# stores/products/start/end ALWAYS override the preset; sparsity only +# overrides when > 0. The pipeline sends sparsity=0.0 — DO NOT "fix" that +# to a truthy value or the sparse preset's 50%-missing config gets replaced. + +# CRITICAL — sparse can legitimately FAIL the run: the grain discovery picks +# the first store/product (pipeline.py:540-561); with 50% missing combos + +# gaps that grain can have too-thin history -> features/backtest fail, or +# all-NaN WAPE -> step_backtest FAIL (763-765). E2 documents this (card +# caveat + runbook #28); it does NOT change pipeline semantics. + +# CRITICAL — every new seed profile window must be >= 75 days, or a later +# showcase_rich run with skip_seed=true trips the historical_backfill gate +# (pipeline.py:829-833). Chosen profiles: 180d (and holiday_rush's pinned +# 92-day-inclusive window) all clear it. + +# GOTCHA — frontend type gates: `pnpm tsc --noEmit` exits 1 with NO output +# (solution-style tsconfig — vacuous), and `pnpm tsc -b` currently fails +# with 24 PRE-EXISTING errors on dev, none in demo components (verified +# 2026-06-12). Gate on `pnpm lint && pnpm test --run`; for types, require +# "no NEW tsc -b errors mentioning files you touched": +# cd frontend && pnpm tsc -b 2>&1 | grep -E "ScenarioPicker|types/api|PHASE_DEFS|showcase" # expect empty + +# GOTCHA — frontend/src/types/api.ts has MIXED CRLF/LF line endings. Edit the +# single union line only; verify `git diff --stat` shows ±1-2 lines. + +# GOTCHA — the existing ScenarioPicker tests query getByRole('combobox'); +# after the card rewrite that role disappears. Rewrite the tests with +# getAllByRole('button') / aria-pressed queries; keep afterEach(cleanup). + +# GOTCHA — shadcn: compose the grid from the INSTALLED primitives (badge, +# card, tooltip — frontend/src/components/ui/). No radio-group component is +# installed; use aria-pressed buttons (see W3C APG ref) instead of +# installing one. If you DO add a component, pin `pnpm dlx shadcn@4.7.0` +# (5.x writes a stub pnpm-workspace.yaml and skips the component) and use +# per-component @radix-ui/react-X imports, not the radix barrel. + +# GOTCHA — semantic tokens only on cards (border-primary, bg-muted, +# text-muted-foreground); never raw colors (bg-blue-500). Selected state: +# border + ring with primary tokens; caveat badge mirrors the step-card +# skip language ('⏭️' + muted tokens, demo-step-card.tsx:16,26). + +# GOTCHA — mypy --strict AND pyright --strict gate the pipeline.py change. +# A NamedTuple with a default (window: tuple[date, date] | None = None) +# is fine on 3.12; annotate fully. + +# CONVENTION — commits (every one references #391, no AI trailer): +# feat(api): extend demo seed profiles to all scenario presets (#391) +# feat(ui): expose all eight scenario presets as guided cards (#391) +# docs(api): document full scenario union and preset outcomes (#391) +# docs(repo): track showcase workspace e2 prp (#391) +# Branch off dev: feat/showcase-preset-exposure (<=50 chars, kebab). + +# RUNTIME-VERIFICATION LOG (per prp-create step 3): +# - No new third-party API claims — the PRP cites only in-repo patterns +# (NamedTuple defaults are stdlib; aria-pressed is plain DOM). +# - `pnpm test --run src/components/demo/ScenarioPicker.test.tsx` → 3 passed +# (verified 2026-06-12; the vitest harness works as cited). +# - `pnpm tsc --noEmit` exit 1 (silent) / `pnpm tsc -b` 24 pre-existing +# errors, none in demo components (verified 2026-06-12). +# - Seeder precedence + pinned-window behavior read directly from +# service.py:202-247 and config.py:553-579 (not inferred). +``` + +## Implementation Blueprint + +### Data models and structure + +```python +# app/features/demo/pipeline.py — replace the 3-tuple profile (lines 479-485) +class _SeedProfile(NamedTuple): + """Demo-scaled seed profile for one scenario preset. + + The /seeder/generate request overrides preset dims/window by design + (app/features/seeder/service.py:213-226) while preserving the preset's + behavioral character (noise, promos, stockouts, sparsity, launch ramps). + ``window`` pins a fixed calendar range (holiday_rush); when None the + window is ``span_days`` back from today. + """ + stores: int + products: int + span_days: int + window: tuple[date, date] | None = None + +_SCENARIO_SEED_PROFILE: dict[ScenarioPreset, _SeedProfile] = { + ScenarioPreset.DEMO_MINIMAL: _SeedProfile(DEMO_SEED_STORES, DEMO_SEED_PRODUCTS, DEMO_SEED_SPAN_DAYS), + ScenarioPreset.SHOWCASE_RICH: _SeedProfile(5, 15, 180), + ScenarioPreset.SPARSE: _SeedProfile(DEMO_SEED_STORES, DEMO_SEED_PRODUCTS, DEMO_SEED_SPAN_DAYS), + # E2 (#391) — demo-scaled profiles; preset character comes from + # SeederConfig.from_scenario, dims/window from this request (precedence + # contract: app/features/seeder/service.py:213-226). All windows >= 75d + # so a later showcase_rich skip_seed run clears the backfill gate. + ScenarioPreset.RETAIL_STANDARD: _SeedProfile(5, 15, 180), + ScenarioPreset.HIGH_VARIANCE: _SeedProfile(5, 15, 180), + ScenarioPreset.STOCKOUT_HEAVY: _SeedProfile(5, 15, 180), + ScenarioPreset.NEW_LAUNCHES: _SeedProfile(5, 25, 180), # extra products for launch variety (native preset uses 100) + # Calendar-pinned: the preset's HolidayConfig spikes are fixed 2024 dates + # (config.py:553-579) — a today-anchored window would never contain them. + # span_days=91 mirrors DEMO_SEED_SPAN_DAYS symmetry; it is dead data when + # window is set (the pinned range is 92 days inclusive, delta 91). + ScenarioPreset.HOLIDAY_RUSH: _SeedProfile(5, 15, 91, window=(date(2024, 10, 1), date(2024, 12, 31))), +} + +# step_seed (488-522) — window resolution becomes: +# profile = _SCENARIO_SEED_PROFILE.get(ctx.scenario, _SeedProfile(DEMO_SEED_STORES, DEMO_SEED_PRODUCTS, DEMO_SEED_SPAN_DAYS)) +# if profile.window is not None: seed_start, seed_end = profile.window +# else: seed_end = datetime.now(UTC).date(); seed_start = seed_end - timedelta(days=profile.span_days) +# Everything else in the POST body stays byte-identical. +``` + +```tsx +// frontend/src/components/demo/ScenarioPicker.tsx — extended option shape +interface ScenarioOption { + value: ScenarioPreset + title: string // business-friendly, e.g. 'Holiday rush' + description: string // dims x window + character, one line + estimatedWallClock: string + caveat?: string // expected-skip / pinned-window note + caveatKind?: 'expected-skip' | 'info' +} + +// The 8 cards (single source of card copy — descriptions are truthful to the +// _SeedProfile the seed step posts, NOT to the preset's native full-size config): +// demo_minimal 'Demo minimal' '3 stores x 10 products x 92 days — fast smoke loop' '~60 s' +// showcase_rich 'Showcase rich' '5 x 15 x 180 days — full 24-step flow, V1+V2 modeling' '~3 min' +// caveat(info): 'Knowledge/agent steps skip without provider keys' +// retail_standard 'Retail standard' '5 x 15 x 180 days — steady demand, light promos' '~90 s' +// holiday_rush 'Holiday rush' '5 x 15 x Oct-Dec 2024 — Black Friday/Christmas spikes' '~90 s' +// caveat(info): 'Seeds a pinned 2024 window (calendar-pinned holidays)' +// high_variance 'High variance' '5 x 15 x 180 days — noisy demand with anomaly spikes' '~90 s' +// stockout_heavy 'Stockout heavy' '5 x 15 x 180 days — 25% stockout days zero the sales' '~90 s' +// new_launches 'New launches' '5 x 25 x 180 days — 45-day product launch ramps' '~2 min' +// sparse 'Sparse' '3 x 10 x 92 days — 50% missing grains + random gaps' '~90 s' +// caveat(expected-skip): '⏭️ May fail at features/backtest (NaN WAPE) — expected; see runbook' +// +// Markup sketch (semantic tokens only; props interface UNCHANGED): +// <div className="flex flex-col gap-2"> +// <label className="text-sm font-medium">Scenario</label> +// <div role="group" aria-label="Scenario" className="grid grid-cols-2 gap-2 xl:grid-cols-4"> +// {SCENARIO_OPTIONS.map((opt) => ( +// <button key={opt.value} type="button" aria-pressed={opt.value === value} +// disabled={disabled} onClick={() => onChange(opt.value)} +// className={cn('rounded-lg border p-3 text-left transition-colors', +// 'hover:bg-muted/50 disabled:opacity-50 disabled:pointer-events-none', +// opt.value === value ? 'border-primary ring-1 ring-primary' : 'border-border')}> +// <div className="flex items-center justify-between gap-2"> +// <span className="text-sm font-medium">{opt.title}</span> +// <span className="font-mono text-xs text-muted-foreground">{opt.value}</span> +// </div> +// <p className="mt-1 text-xs text-muted-foreground">{opt.description} · {opt.estimatedWallClock}</p> +// {opt.caveat && <Badge variant="outline" className="mt-2 text-xs text-muted-foreground">{opt.caveat}</Badge>} +// </button> +// ))} +// </div> +// <p className="text-xs text-muted-foreground"> +// Tick <span className="font-medium">Re-seed first</span> when switching presets — without it the run reuses the currently seeded dataset. +// </p> +// </div> +``` + +### List of tasks (dependency order) + +```yaml +Task 1 — branch & issue hygiene: + RUN: git switch dev && git pull && git switch -c feat/showcase-preset-exposure + VERIFY: gh issue view 391 --json state # open + +Task 2 — MODIFY app/features/demo/pipeline.py (backend first — it defines the truthful card copy): + - ADD `from typing import NamedTuple` is NOT needed (typing imports exist) — check the + import block at 22-44 and extend `from typing import Any` appropriately (NamedTuple). + - REPLACE the dict at 479-485 with _SeedProfile + 8 entries (blueprint above); + keep the `.get(...)` fallback in step_seed (a future 9th enum member must not crash). + - MODIFY step_seed window resolution (pinned-window branch, blueprint above). + - PRESERVE the POST body keys byte-identically (sparsity=0.0 stays). + +Task 3 — MODIFY app/features/demo/tests/test_pipeline.py: + - ADD test_scenario_seed_profile_covers_every_preset: + assert set(pipeline._SCENARIO_SEED_PROFILE) == set(ScenarioPreset) + - ADD test_step_seed_holiday_rush_posts_pinned_window: + ctx = DemoContext(seed=42, skip_seed=False, reset=False, scenario=ScenarioPreset.HOLIDAY_RUSH) + client = _RecordingClient(None, responses={("POST", "/seeder/generate"): {"records_created": {"sales": 1}}}) + await pipeline.step_seed(ctx, _as_client(client)) + body = client.calls[0][2]; assert body["start_date"] == "2024-10-01"; assert body["end_date"] == "2024-12-31"; assert body["scenario"] == "holiday_rush" + - ADD test_step_seed_retail_standard_posts_demo_scaled_profile: + same harness; assert stores=5, products=15, and + date.fromisoformat(end) - date.fromisoformat(start) == timedelta(days=180) + - EXTEND test_phase_table_sparse_matches_demo_minimal_shape into a + @pytest.mark.parametrize over [RETAIL_STANDARD, HOLIDAY_RUSH, HIGH_VARIANCE, + STOCKOUT_HEAVY, NEW_LAUNCHES, SPARSE] asserting _phase_table(p) shape == + _phase_table(DEMO_MINIMAL) shape (keep the original test name working or + replace it wholesale — your call; do not lose sparse coverage). + +Task 4 — MODIFY frontend/src/types/api.ts (line 747, surgical): + - ScenarioPreset union -> all 8 values, alphabetic except keep the 3 existing + first if you prefer minimal diff; update the comment to say "all 8 members". + - VERIFY: git diff --stat frontend/src/types/api.ts # 1 file, ~2 lines + +Task 5 — REWRITE frontend/src/components/demo/ScenarioPicker.tsx: + - Blueprint above. Props interface UNCHANGED. Single SCENARIO_OPTIONS array + of 8 with caveat fields. aria-pressed button grid in role="group". + - Import Badge from '@/components/ui/badge' and cn from '@/lib/utils' + (verify the cn helper path with grep before importing). + - Remove the now-unused Select imports. + +Task 6 — REWRITE frontend/src/components/demo/ScenarioPicker.test.tsx: + - renders all 8 cards: getAllByRole('button').length === 8 and each preset id visible + - click fires onChange: render with onChange spy, click the 'retail_standard' card, + expect spy called with 'retail_standard' + - selected card aria-pressed: render value="showcase_rich"; the showcase_rich + button has aria-pressed="true", others "false" + - disabled: all 8 buttons disabled when disabled prop set + - sparse caveat: the sparse card's text contains 'expected' (caveat badge) + - holiday_rush caveat: text contains '2024' + +Task 7 — MODIFY frontend/src/components/demo/PHASE_DEFS.test.ts: + - ADD a parametrized (it.each) lockstep case: for each of the 5 new presets, + phaseDefsForScenario(p) deep-equals phaseDefsForScenario('demo_minimal'). + +Task 8 — (optional, 1-2 lines) MODIFY frontend/src/pages/showcase.tsx: + - Header copy at 158-159: "Pick a scenario to control depth and data shape — + all eight seeder presets are available." Keep the rest untouched. + +Task 9 — docs: + - docs/_base/API_CONTRACTS.md: on the POST /demo/run row AND the WS + /demo/stream start-frame line, append: "E2 (#391) — `scenario` accepts all + 8 `ScenarioPreset` values (retail_standard / holiday_rush / high_variance / + stockout_heavy / new_launches / sparse / demo_minimal / showcase_rich); + only `showcase_rich` changes the step table (24 rows), every other preset + runs the legacy 11-row flow." + - docs/_base/RUNBOOKS.md: append showcase entry 28 following entry 6's + expected-skip format: "**A newly exposed preset run ends red/skipped + (E2 #391)** — per-preset expected outcomes: sparse may FAIL at + features/backtest (50% missing grains / NaN WAPE — expected, the card says + so); holiday_rush seeds a pinned Oct–Dec 2024 window (today-anchored data + disappears — re-seed to switch back); all other presets are expected + green on the 11-step flow. Cause/Fix lines per the section's format." + +Task 10 — gates, dogfood, commit, PR: + - Backend gates + frontend lint/test (Validation Loop below). + - Level 4 browser dogfood (mandatory per .claude/rules/ui-design.md — a UI + change is NOT done until exercised in a real browser). + - git diff --stat # surgical-diff check (api.ts CRLF trap) + - Commits per the convention block; PR into dev titled + "feat(ui): showcase workspace full preset exposure (#391)". +``` + +### Integration Points + +```yaml +DATABASE: none — no schema change, no migration. +CONFIG: none — no new settings or env vars. +ROUTES: none — DemoRunRequest already accepts the full enum (schemas.py:60-64). +WS CONTRACT: unchanged shape; only the accepted scenario value set is documented wider. +FRONTEND: ScenarioPicker internals + types/api.ts union; showcase.tsx wiring untouched. +DOCS: API_CONTRACTS scenario union; RUNBOOKS entry #28. (DOMAIN_MODEL/RUNBOOKS + full sweep belongs to the E5 release gate — do not scope-creep.) +``` + +## Validation Loop + +### Level 1: Syntax & Style + +```bash +uv run ruff check . && uv run ruff format --check . +uv run mypy app/ && uv run pyright app/ # both --strict, gate merge +cd frontend && pnpm lint +# Types: no NEW errors mentioning touched files (24 pre-existing tsc -b errors exist on dev): +cd frontend && pnpm tsc -b 2>&1 | grep -E "ScenarioPicker|types/api|PHASE_DEFS|pages/showcase" ; echo "exit=$? (1 = no matches = good)" +``` + +### Level 2: Unit Tests + +```bash +uv run pytest app/features/demo -v -m "not integration" # incl. the new profile/seed/phase tests +cd frontend && pnpm test --run src/components/demo/ # picker rewrite + lockstep +cd frontend && pnpm test --run # full frontend suite +``` + +### Level 3: Integration (real Postgres; demo slice unaffected but run it) + +```bash +docker compose up -d && uv run alembic upgrade head +uv run pytest app/features/demo -v -m integration # E1 suites still green (no schema change) +``` + +### Level 4: Browser dogfood (uvicorn :8123 + vite :5173) + +```bash +uv run uvicorn app.main:app --port 8123 & +cd frontend && ./node_modules/.bin/vite --host 0.0.0.0 & # bypasses pnpm 11 depsStatusCheck +# In a real browser (webapp-testing skill / agent-browser; on this host Playwright +# needs executable_path=/snap/bin/chromium — see memory note in RUNBOOKS context): +# 1. /showcase shows 8 scenario cards; sparse carries the expected-skip badge, +# holiday_rush the pinned-2024 badge; cards disable while running. +# 2. Pick retail_standard + tick "Re-seed first" -> Run: 11 steps, green; the +# seed card detail says "retail_standard: 5 stores x 15 products". +# 3. Pick holiday_rush + Re-seed -> Run: green; after the run, +# GET /seeder/status shows date_range 2024-10-01..2024-12-31. +# 4. Pick demo_minimal + Re-seed -> Run: green (restores the default dataset). +# 5. (Optional, documented outcome) sparse + Re-seed: green OR a features/ +# backtest fail matching runbook #28 — either outcome is a pass for E2. +``` + +## Final validation Checklist + +- [ ] Backend gates: `uv run ruff check . && uv run ruff format --check . && uv run mypy app/ && uv run pyright app/ && uv run pytest -v -m "not integration"` +- [ ] Frontend: `pnpm lint && pnpm test --run` green; no NEW tsc -b errors in touched files +- [ ] `_SCENARIO_SEED_PROFILE` exhaustive over the enum (test enforces) +- [ ] holiday_rush posts the pinned 2024 window; retail_standard posts 5×15×180d (tests enforce) +- [ ] 8 cards render; selection/disabled/aria-pressed/caveats covered by tests +- [ ] Lockstep: backend parametrized phase-table test + frontend PHASE_DEFS it.each both green +- [ ] Browser dogfood (Level 4) performed in a real browser — not just tests +- [ ] `git diff --stat` surgical (especially frontend/src/types/api.ts — mixed CRLF/LF) +- [ ] API_CONTRACTS + RUNBOOKS #28 updated additively +- [ ] Commits `feat(api)/feat(ui)/docs(api)/docs(repo): ... (#391)`, no AI trailer; PR into dev + +--- + +## Anti-Patterns to Avoid + +- ❌ Don't change pipeline semantics for sparse (no NaN→skip rework) — E2 is labeling + docs; a graceful-skip would mask real regressions on healthy presets. +- ❌ Don't touch `_phase_table` / `phaseDefsForScenario` — the 5 new presets already get the legacy 11-step flow from the existing else-branch. +- ❌ Don't seed full-size preset dims (10×50×365 ≈ 183k rows) — demo profiles stay laptop-friendly; the request-override precedence exists precisely for this. +- ❌ Don't break the ScenarioPicker props interface — showcase.tsx, use-demo-pipeline, and RunHistoryStrip are all generic over the widened union and must not need edits. +- ❌ Don't install a shadcn radio-group (or anything) when aria-pressed buttons suffice; if you must, pin shadcn@4.7.0. +- ❌ Don't hand-set raw Tailwind colors — semantic tokens only. +- ❌ Don't ship the UI without a real-browser check — `.claude/rules/ui-design.md` makes that a hard requirement. +- ❌ Don't widen the seeder HTTP schema or add seed-config knobs to the UI — explicitly out of scope per umbrella #389. + +## Confidence Score + +**8.5/10** for one-pass implementation success. Every change has a verified +in-repo precedent (the profile dict + step_seed already exist; _RecordingClient +covers the POST-body assertions; the lockstep test pair already gates phase +shape; the card grid composes from installed primitives with the props +interface frozen). The two judgment calls — demo-scaled profile sizes and the +holiday_rush pinned window — are decided above with rationale and enforced by +tests, so a disagreement costs a constant tweak, not a rework. The −1.5 is +UI-surface risk: card-grid styling/dogfood may need an iteration pass, and the +sparse Level-4 outcome is intentionally non-deterministic (either result is +documented as a pass). From bf55f8621ebba8e42aa311b9d9a8eb8e91d0ce4c Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 15:56:37 +0200 Subject: [PATCH 35/44] feat(api): tag showcase plans with workspace label (#392) --- app/features/demo/pipeline.py | 26 +++++- app/features/demo/tests/test_pipeline.py | 107 ++++++++++++++++++++++- 2 files changed, 130 insertions(+), 3 deletions(-) diff --git a/app/features/demo/pipeline.py b/app/features/demo/pipeline.py index 66caa76b..a8ae7c3c 100644 --- a/app/features/demo/pipeline.py +++ b/app/features/demo/pipeline.py @@ -258,6 +258,9 @@ class DemoContext: # E1 (#390) -- workspace persistence. Set only on preservation="keep" runs # (and only when the row insert succeeded); None on ephemeral runs. workspace_id: str | None = None + # E3 (#392) -- workspace label for plan tagging. Set alongside + # workspace_id in run_pipeline's keep-branch; None on ephemeral runs. + workspace_name: str | None = None # ============================================================================= @@ -354,6 +357,21 @@ def _format_demo_artifact_key(run_id_raw: str) -> str: return run_id_raw.replace("-", "")[:_DEMO_ARTIFACT_KEY_LEN] +def _showcase_plan_tags(ctx: DemoContext, kind: str) -> list[str]: + """Build the tag list for a pipeline-saved scenario plan (E3, #392). + + Always: ["showcase", <kind>, "source:showcase"]. When the run records a + workspace (ctx.workspace_id set -- preservation="keep" AND the E1 insert + succeeded), append "workspace:<label>" where label is the human + workspace_name or, on unnamed runs, the 32-hex workspace_id -- the label + is never empty. No workspace row -> no workspace tag (nothing to find). + """ + tags = ["showcase", kind, "source:showcase"] + if ctx.workspace_id is not None: + tags.append(f"workspace:{ctx.workspace_name or ctx.workspace_id}") + return tags + + # PRP-40 — curated 5-file user-guide corpus indexed by the knowledge phase. # The path_prefix RAG indexing additive contract scopes discovery to this # subset (memory anchor: [[rag-runtime-config-and-corpus-state]] — keep the @@ -1297,6 +1315,7 @@ async def step_scenario_simulate_and_save(ctx: DemoContext, client: _Client) -> "end_date": horizon_end.isoformat(), } } + sent_tags = _showcase_plan_tags(ctx, "price") plan_body = await client.request( "scenario_simulate_and_save[save]", "POST", @@ -1306,7 +1325,7 @@ async def step_scenario_simulate_and_save(ctx: DemoContext, client: _Client) -> "run_id": artifact_key, "horizon": DEMO_HORIZON, "assumptions": assumptions, - "tags": ["showcase", "price"], + "tags": sent_tags, }, ) scenario_id_raw = plan_body.get("scenario_id") @@ -1341,6 +1360,8 @@ async def step_scenario_simulate_and_save(ctx: DemoContext, client: _Client) -> "revenue_delta": revenue_delta, "winner_run_id": winner_run_id, "artifact_key": artifact_key, + # E3 (#392) -- echo the tags sent so the UI/e2e can observe them. + "tags": sent_tags, }, ) @@ -1368,7 +1389,7 @@ async def step_multi_plan_compare(ctx: DemoContext, client: _Client) -> StepResu "run_id": ctx.scenario_artifact_key, "horizon": DEMO_HORIZON, "assumptions": {"holiday": {"dates": [holiday_day]}}, - "tags": ["showcase", "holiday"], + "tags": _showcase_plan_tags(ctx, "holiday"), }, ) except _StepError as exc: @@ -2633,6 +2654,7 @@ async def run_pipeline(app: FastAPI, req: DemoRunRequest) -> AsyncIterator[StepE # warn-and-continue: a DB failure returns None and the run proceeds. if req.preservation == "keep": ctx.workspace_id = await workspace.create_workspace(req) + ctx.workspace_name = req.workspace_name # E3 (#392) -- plan-tag label wall_start = time.monotonic() any_fail = False # PRP-41 — buffer for intermediate events the HITL step emits via diff --git a/app/features/demo/tests/test_pipeline.py b/app/features/demo/tests/test_pipeline.py index cfd692d0..197c2842 100644 --- a/app/features/demo/tests/test_pipeline.py +++ b/app/features/demo/tests/test_pipeline.py @@ -1091,6 +1091,41 @@ def _make_showcase_ctx(scenario: ScenarioPreset = ScenarioPreset.SHOWCASE_RICH) return ctx +def test__showcase_plan_tags_ephemeral(): + """E3 (#392) — no workspace row -> base triple only, no workspace tag.""" + ctx = pipeline.DemoContext(seed=42, skip_seed=True, reset=False) + assert pipeline._showcase_plan_tags(ctx, "price") == [ + "showcase", + "price", + "source:showcase", + ] + + +def test__showcase_plan_tags_keep_named(): + """E3 (#392) — keep run with a name -> workspace:<name> appended.""" + ctx = pipeline.DemoContext(seed=42, skip_seed=True, reset=False) + ctx.workspace_id = "a" * 32 + ctx.workspace_name = "bf-demo" + assert pipeline._showcase_plan_tags(ctx, "holiday") == [ + "showcase", + "holiday", + "source:showcase", + "workspace:bf-demo", + ] + + +def test__showcase_plan_tags_keep_unnamed_falls_back_to_workspace_id(): + """E3 (#392) — keep run without a name -> workspace:<workspace_id>.""" + ctx = pipeline.DemoContext(seed=42, skip_seed=True, reset=False) + ctx.workspace_id = "f" * 32 + assert pipeline._showcase_plan_tags(ctx, "price") == [ + "showcase", + "price", + "source:showcase", + f"workspace:{'f' * 32}", + ] + + async def test_scenario_simulate_and_save_happy_path(): """PRP-40 + #324 — resolves the champion via ctx.winning_run_id -> run -> artifact_key, saves the plan. Must NOT read the demo-production alias @@ -1130,11 +1165,40 @@ async def test_scenario_simulate_and_save_happy_path(): assert body["name"] == "showcase-price-cut-10pct" assert body["run_id"] == "abc123def456" assert body["assumptions"]["price"]["change_pct"] == -0.10 - assert body["tags"] == ["showcase", "price"] + assert body["tags"] == ["showcase", "price", "source:showcase"] + # E3 (#392) — the step data echoes the tags it sent. + assert data["tags"] == ["showcase", "price", "source:showcase"] # #324 — the safer-promote-corrupted demo-production alias must NOT be read. assert all(path != "/registry/aliases/demo-production" for _m, path, _b in client.calls) +async def test_scenario_simulate_and_save_keep_run_carries_workspace_tag(): + """E3 (#392) — a keep run (workspace_id set) stamps workspace:<name>.""" + ctx = _make_showcase_ctx() + ctx.workspace_id = "a" * 32 + ctx.workspace_name = "bf-demo" + client = _RecordingClient( + None, + responses={ + ("GET", "/registry/runs/demo-run-abc123def456"): { + "run_id": "demo-run-abc123def456", + "artifact_uri": "demo/seasonal_naive-model_abc123def456.joblib", + }, + ("POST", "/scenarios"): { + "scenario_id": "scn-001", + "comparison": {"method": "heuristic"}, + }, + }, + ) + status, _detail, data = await pipeline.step_scenario_simulate_and_save(ctx, _as_client(client)) + assert status == "pass" + save_call = next(c for c in client.calls if c[0] == "POST" and c[1] == "/scenarios") + body = save_call[2] + assert body is not None + assert body["tags"] == ["showcase", "price", "source:showcase", "workspace:bf-demo"] + assert data["tags"] == ["showcase", "price", "source:showcase", "workspace:bf-demo"] + + async def test_scenario_simulate_and_save_missing_champion_falls_back_to_alias(): """PRP-40 + #324 — with no champion recorded, fall back to the alias; an alias missing run_id -> FAIL with clear detail.""" @@ -1307,6 +1371,47 @@ async def test_multi_plan_compare_happy_path(): assert data["ranked_by"] == "revenue_delta" assert len(data["ranked"]) == 2 assert "winner=showcase-holiday-uplift" in detail + # E3 (#392) — the holiday-plan save carries the ephemeral tag triple. + save_call = next(c for c in client.calls if c[0] == "POST" and c[1] == "/scenarios") + body = save_call[2] + assert body is not None + assert body["tags"] == ["showcase", "holiday", "source:showcase"] + + +async def test_multi_plan_compare_keep_run_carries_workspace_tag(): + """E3 (#392) — the workspace tag flows to plan #2 on keep runs.""" + ctx = _make_showcase_ctx() + ctx.price_cut_scenario_id = "scn-price" + ctx.scenario_artifact_key = "abc123def456" + ctx.workspace_id = "b" * 32 + ctx.workspace_name = "bf-demo" + client = _RecordingClient( + None, + responses={ + ("POST", "/scenarios"): { + "scenario_id": "scn-holiday", + "comparison": {"method": "heuristic"}, + }, + ("POST", "/scenarios/compare"): { + "scenarios": [ + { + "scenario_id": "scn-holiday", + "name": "showcase-holiday-uplift", + "units_delta": 18.5, + "revenue_delta": 220.0, + "coverage_verdict": "ok", + "rank": 1, + }, + ], + }, + }, + ) + status, _detail, _data = await pipeline.step_multi_plan_compare(ctx, _as_client(client)) + assert status == "pass" + save_call = next(c for c in client.calls if c[0] == "POST" and c[1] == "/scenarios") + body = save_call[2] + assert body is not None + assert body["tags"] == ["showcase", "holiday", "source:showcase", "workspace:bf-demo"] async def test_multi_plan_compare_second_save_failure_emits_warn(): From 233bef54b289acc6f15b6f81d6dab751380badd1 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 15:56:37 +0200 Subject: [PATCH 36/44] feat(ui): add tag filter to planner saved-plans library (#392) --- frontend/src/lib/url-params.test.ts | 25 +++++++- frontend/src/lib/url-params.ts | 17 ++++++ frontend/src/pages/visualize/planner.tsx | 74 ++++++++++++++++++++++-- 3 files changed, 111 insertions(+), 5 deletions(-) diff --git a/frontend/src/lib/url-params.test.ts b/frontend/src/lib/url-params.test.ts index fc81d889..ce78cef5 100644 --- a/frontend/src/lib/url-params.test.ts +++ b/frontend/src/lib/url-params.test.ts @@ -1,5 +1,5 @@ import { describe, it, expect } from 'vitest' -import { parseEnumParam, parseIdParam, parsePageParam } from './url-params' +import { parseEnumParam, parseIdParam, parsePageParam, parseTagsParam } from './url-params' describe('parsePageParam', () => { it('returns the integer for a valid positive page', () => { @@ -46,3 +46,26 @@ describe('parseEnumParam', () => { expect(parseEnumParam('sideways', allowed)).toBeUndefined() }) }) + +describe('parseTagsParam', () => { + it('returns an empty list for no params', () => { + expect(parseTagsParam([])).toEqual([]) + }) + + it('passes through namespaced tags untouched', () => { + expect(parseTagsParam(['workspace:bf-demo'])).toEqual(['workspace:bf-demo']) + }) + + it('trims values and drops empty or whitespace-only entries', () => { + expect(parseTagsParam([' showcase ', '', ' '])).toEqual(['showcase']) + }) + + it('dedupes repeated tags', () => { + expect(parseTagsParam(['price', 'price', ' price '])).toEqual(['price']) + }) + + it('caps the list at 20 entries', () => { + const values = Array.from({ length: 50 }, (_, i) => `tag-${i}`) + expect(parseTagsParam(values)).toHaveLength(20) + }) +}) diff --git a/frontend/src/lib/url-params.ts b/frontend/src/lib/url-params.ts index a9270ed0..d97acfec 100644 --- a/frontend/src/lib/url-params.ts +++ b/frontend/src/lib/url-params.ts @@ -46,3 +46,20 @@ export function parseEnumParam<T extends string>( ? (value as T) : undefined } + +/** + * Parse repeated `tags` query params into a clean filter list. + * + * Trims each value, drops empties, dedupes, and caps at 20 entries + * (matches the backend CreateScenarioRequest.tags item cap) so a + * hand-edited URL degrades to a sane query instead of a 50-param request. + */ +export function parseTagsParam(values: string[]): string[] { + const seen = new Set<string>() + for (const value of values) { + const tag = value.trim() + if (tag) seen.add(tag) + if (seen.size >= 20) break + } + return [...seen] +} diff --git a/frontend/src/pages/visualize/planner.tsx b/frontend/src/pages/visualize/planner.tsx index d532d9eb..d091f946 100644 --- a/frontend/src/pages/visualize/planner.tsx +++ b/frontend/src/pages/visualize/planner.tsx @@ -1,5 +1,6 @@ import { useState } from 'react' -import { AlertTriangle, BarChart3, Download, Loader2, Play, Save, Trash2 } from 'lucide-react' +import { useSearchParams } from 'react-router-dom' +import { AlertTriangle, BarChart3, Download, Loader2, Play, Save, Trash2, X } from 'lucide-react' import { useJob } from '@/hooks/use-jobs' import { useCompareScenarios, @@ -12,6 +13,7 @@ import { import { MultiSeriesChart } from '@/components/charts/multi-series-chart' import { TimeSeriesChart } from '@/components/charts/time-series-chart' import { JobPicker } from '@/components/common/job-picker' +import { Badge } from '@/components/ui/badge' import { Button } from '@/components/ui/button' import { Card, CardContent, CardDescription, CardHeader, CardTitle } from '@/components/ui/card' import { Checkbox } from '@/components/ui/checkbox' @@ -34,6 +36,7 @@ import { } from '@/components/ui/table' import { downloadCsv, toCsv } from '@/lib/csv-export' import { formatCurrency, formatNumber, getErrorMessage } from '@/lib/api' +import { parseTagsParam } from '@/lib/url-params' import { assumptionDateErrors, buildMultiSeries, @@ -120,6 +123,14 @@ export default function WhatIfPlannerPage() { const [runError, setRunError] = useState<string | null>(null) const [reloadId, setReloadId] = useState('') + // -- Saved-plans tag filter (E3, #392) ---------------------------------- + // Read ?tags= ONCE for initial state; React state stays canonical and is + // mirrored back to the URL on every change so the filter is deep-linkable. + const [searchParams, setSearchParams] = useSearchParams() + const [tagFilter, setTagFilter] = useState<string[]>(() => + parseTagsParam(searchParams.getAll('tags')), + ) + // -- Multi-scenario comparison state ----------------------------------- const [selectedPlanIds, setSelectedPlanIds] = useState<Set<string>>(new Set()) const [multiComparison, setMultiComparison] = useState<MultiScenarioComparison | null>(null) @@ -129,9 +140,28 @@ export default function WhatIfPlannerPage() { const createScenario = useCreateScenario() const deleteScenario = useDeleteScenario() const compareScenarios = useCompareScenarios() - const scenariosQuery = useScenarios() + const scenariosQuery = useScenarios(tagFilter) const reloadedPlan = useScenario(reloadId, !!reloadId) + // Mirror the active tag filter to the URL ({ replace: true } keeps chip + // toggles out of the browser history). + function applyTagFilter(next: string[]) { + setTagFilter(next) + setSearchParams( + (prev) => { + const params = new URLSearchParams(prev) + params.delete('tags') + next.forEach((tag) => params.append('tags', tag)) + return params + }, + { replace: true }, + ) + } + const addTag = (tag: string) => { + if (!tagFilter.includes(tag)) applyTagFilter([...tagFilter, tag]) + } + const removeTag = (tag: string) => applyTagFilter(tagFilter.filter((t) => t !== tag)) + // The comparison on screen is either a fresh simulation result or, when a // saved plan has been reloaded, that plan's embedded snapshot. Deriving it // (rather than copying into state inside an effect) keeps the render pure. @@ -674,8 +704,22 @@ export default function WhatIfPlannerPage() { <CardTitle>Saved plans</CardTitle> <CardDescription> Reload a plan to re-render its comparison, or select 2-5 plans to - compare them side by side. + compare them side by side. Click a tag to filter to plans carrying + every selected tag. </CardDescription> + {tagFilter.length > 0 && ( + <div className="mt-2 flex flex-wrap items-center gap-2"> + {tagFilter.map((tag) => ( + <Badge key={tag} className="cursor-pointer" onClick={() => removeTag(tag)}> + {tag} + <X className="h-3 w-3" /> + </Badge> + ))} + <Button variant="ghost" size="sm" onClick={() => applyTagFilter([])}> + Clear + </Button> + </div> + )} </div> <Button variant="outline" @@ -721,7 +765,22 @@ export default function WhatIfPlannerPage() { </TableCell> <TableCell className="font-medium">{plan.name}</TableCell> <TableCell className="text-xs text-muted-foreground"> - {plan.tags.length > 0 ? plan.tags.join(', ') : '—'} + {plan.tags.length > 0 ? ( + <div className="flex flex-wrap gap-1"> + {plan.tags.map((tag) => ( + <Badge + key={tag} + variant={tagFilter.includes(tag) ? 'default' : 'secondary'} + className="cursor-pointer" + onClick={() => addTag(tag)} + > + {tag} + </Badge> + ))} + </div> + ) : ( + '—' + )} </TableCell> <TableCell className="text-right"> {formatDelta(plan.units_delta)} @@ -753,6 +812,13 @@ export default function WhatIfPlannerPage() { })} </TableBody> </Table> + ) : tagFilter.length > 0 ? ( + <div className="flex flex-wrap items-center gap-2"> + <p className="text-sm text-muted-foreground">No plans match the selected tags.</p> + <Button variant="ghost" size="sm" onClick={() => applyTagFilter([])}> + Clear + </Button> + </div> ) : ( <p className="text-sm text-muted-foreground"> No saved plans yet. Run a simulation and save it above. From 9f989b2dcbb365312785de2de9926dee6fe15bd4 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 15:56:37 +0200 Subject: [PATCH 37/44] test(api): cover workspace-tag containment round trip (#392) --- .../tests/test_routes_integration.py | 60 +++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/app/features/scenarios/tests/test_routes_integration.py b/app/features/scenarios/tests/test_routes_integration.py index 8be8735d..dc6aaf32 100644 --- a/app/features/scenarios/tests/test_routes_integration.py +++ b/app/features/scenarios/tests/test_routes_integration.py @@ -140,6 +140,66 @@ async def test_crud_round_trip(self, client: AsyncClient, trained_model: str) -> missing = await client.get(f"/scenarios/{scenario_id}") assert missing.status_code == 404 + async def test_list_scenarios_filters_by_workspace_tag( + self, client: AsyncClient, trained_model: str + ) -> None: + """E3 (#392) — plans tagged workspace:<label> are retrievable by tag. + + Proves the umbrella #389 criterion verbatim: two plans saved with a + workspace tag come back from ``GET /scenarios?tags=workspace:<label>`` + — and adding a second tag narrows by JSONB containment (AND). + The tag is unique per run so a shared/stale DB can't skew the counts. + """ + workspace_tag = f"workspace:e3-it-{uuid.uuid4().hex[:8]}" + created_ids: list[str] = [] + try: + for name, kind in ( + ("showcase-price-cut-10pct", "price"), + ("showcase-holiday-uplift", "holiday"), + ): + create = await client.post( + "/scenarios", + json={ + "name": name, + "run_id": trained_model, + "horizon": 14, + "assumptions": _PRICE_ASSUMPTION, + "tags": ["showcase", kind, "source:showcase", workspace_tag], + }, + ) + assert create.status_code == 201 + created_ids.append(create.json()["scenario_id"]) + + # Control plan WITHOUT the workspace tag — must not match the filter. + control = await client.post( + "/scenarios", + json={ + "name": "ephemeral-control", + "run_id": trained_model, + "horizon": 14, + "assumptions": _PRICE_ASSUMPTION, + "tags": ["showcase", "price", "source:showcase"], + }, + ) + assert control.status_code == 201 + created_ids.append(control.json()["scenario_id"]) + + listed = await client.get("/scenarios", params={"tags": [workspace_tag]}) + assert listed.status_code == 200 + data = listed.json() + assert data["total"] == 2 + assert {item["scenario_id"] for item in data["scenarios"]} == set(created_ids[:2]) + + # Containment is AND — a second tag narrows to the price plan only. + narrowed = await client.get("/scenarios", params={"tags": [workspace_tag, "price"]}) + assert narrowed.status_code == 200 + narrowed_data = narrowed.json() + assert narrowed_data["total"] == 1 + assert narrowed_data["scenarios"][0]["scenario_id"] == created_ids[0] + finally: + for scenario_id in created_ids: + await client.delete(f"/scenarios/{scenario_id}") + async def test_list_scenarios_empty_is_200(self, client: AsyncClient) -> None: """GET /scenarios returns 200 + an empty list, never 404.""" response = await client.get("/scenarios") From d711ca0bf9d00b1c2a9f8a38c5d57037f5534afa Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 15:56:37 +0200 Subject: [PATCH 38/44] docs(api): document workspace plan tags (#392) --- docs/_base/API_CONTRACTS.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/_base/API_CONTRACTS.md b/docs/_base/API_CONTRACTS.md index 68d73c5d..2ce5dc8e 100644 --- a/docs/_base/API_CONTRACTS.md +++ b/docs/_base/API_CONTRACTS.md @@ -93,6 +93,7 @@ Drives the end-to-end demo pipeline for the dashboard Showcase page. Verified ag - Concurrency: a module-level `asyncio.Lock` allows one pipeline at a time. A second `POST /demo/run` returns `409`; a second `WS /demo/stream` receives one `error` event. - PRP-38 — `scenario="showcase_rich"` extends the data phase with `phase2_enrichment` + `historical_backfill` steps and the modeling phase with `v2_train` (one V2 `prophet_like` run). Phase ids are `data` / `modeling` / `decision` / `verify` / `agent` / `cleanup` (6 phases). - PRP-40 — `scenario="showcase_rich"` ALSO adds two phases inserted BEFORE `verify`: `planning` (2 steps — `scenario_simulate_and_save`, `multi_plan_compare`) and `knowledge` (3 steps — `embedding_provider_probe`, `rag_index_subset`, `rag_retrieve_probe`). Total step count: 19 for `showcase_rich`, 11 for `demo_minimal` and `sparse`. Phase ids on `showcase_rich` are `data` / `modeling` / `decision` / `planning` / `knowledge` / `verify` / `agent` / `cleanup` (8 phases). The knowledge steps SKIP gracefully when the embedding provider is unreachable; the pipeline still goes green. +- E3 (#392) — the planning-phase steps tag the plans they save: pipeline-saved plans now carry `source:showcase` (alongside the legacy `showcase` + `price`/`holiday` tags), and on `preservation="keep"` runs additionally `workspace:<workspace_name|workspace_id>` — retrievable via `GET /scenarios?tags=workspace:<label>` (JSONB containment, all listed tags must match). The `scenario_simulate_and_save` step's `data` additively echoes the `tags` list it sent. ## Async Events / Queues From d78d6b0c9ba6c7eafdfc0fff3072a350e93a958f Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 15:56:37 +0200 Subject: [PATCH 39/44] docs(repo): track showcase workspace e3 prp (#392) --- ...ase-workspace-E3-workspace-tagged-plans.md | 613 ++++++++++++++++++ 1 file changed, 613 insertions(+) create mode 100644 PRPs/PRP-showcase-workspace-E3-workspace-tagged-plans.md diff --git a/PRPs/PRP-showcase-workspace-E3-workspace-tagged-plans.md b/PRPs/PRP-showcase-workspace-E3-workspace-tagged-plans.md new file mode 100644 index 00000000..e612a890 --- /dev/null +++ b/PRPs/PRP-showcase-workspace-E3-workspace-tagged-plans.md @@ -0,0 +1,613 @@ +name: "PRP — Showcase Workspace E3: Workspace-Tagged Scenario Plans (issue #392)" +description: | + +## Purpose + +Implement the workspace-tagging epic of the showcase-workspace initiative +(umbrella #389): the two scenario plans the showcase pipeline saves +(`showcase-price-cut-10pct`, `showcase-holiday-uplift`) gain workspace-aware +tags — `["showcase", "<kind>", "source:showcase"]` plus `workspace:<name>` on +`preservation="keep"` runs — via the existing GIN-indexed `scenario_plan.tags` +column, and the What-If Planner's saved-plans library gains a tag filter (with +deep-linkable `?tags=` URL state) so a workspace's plans are retrievable via +`GET /scenarios?tags=workspace:<name>`. Parallel epic after Foundation E1 +(#390); independent of E2 (#391, merged) and E4 (#393, PRP authored, not +started). + +## Core Principles + +1. **Context is King**: every reference below was verified against the live code on 2026-06-12 (branch `dev` @ 3194fe8, post-E1/E2 merge). +2. **Validation Loops**: each level is executable as written. +3. **Information Dense**: patterns cite exact file:line. +4. **Progressive Success**: tag helper → pipeline steps → step tests → planner filter → URL state → docs. +5. **Global rules**: follow CLAUDE.md / AGENTS.md; all five CI gates must pass; UI work follows `.claude/rules/ui-design.md` + `shadcn-ui.md`. + +--- + +## Goal + +A showcase run that saves scenario plans stamps them with discoverable, +namespaced tags: every pipeline-saved plan carries `source:showcase` (alongside +the existing `showcase` + `price`/`holiday` tags, kept for back-compat), and a +`preservation="keep"` run additionally stamps `workspace:<name>` (falling back +to `workspace:<workspace_id>` on unnamed keep runs). On the What-If Planner, +the operator filters the saved-plans library by tag — clicking a tag badge in +the table adds it to the filter, active filters render as removable chips, the +filter round-trips through the `?tags=` query string (so +`/visualize/planner?tags=workspace:black-friday` deep-links straight to one +workspace's plans), and the server does the filtering via the existing JSONB +containment query. Ephemeral runs and legacy plans behave exactly as today. + +**Deliverable** (all additive — no migration, no schema change, no new endpoints): + +- `app/features/demo/pipeline.py` — `DemoContext.workspace_name` field; new pure `_showcase_plan_tags()` helper; the two `POST /scenarios` bodies use it. +- `app/features/demo/tests/test_pipeline.py` — helper unit tests + updated step-body assertions (keep vs ephemeral, named vs unnamed). +- `app/features/scenarios/tests/test_routes_integration.py` — one integration test proving the umbrella criterion verbatim: plans saved with `workspace:<name>` are retrievable via `GET /scenarios?tags=workspace:<name>`. +- `frontend/src/lib/url-params.ts` — `parseTagsParam()` reader (+ tests in `url-params.test.ts`). +- `frontend/src/pages/visualize/planner.tsx` — tag-filter state wired into `useScenarios(tags)`, clickable tag badges, active-filter chips, `?tags=` URL sync. +- `docs/_base/API_CONTRACTS.md` — additive E3 note on the `WS /demo/stream` planning steps. + +**Success definition**: all Success Criteria below check off, the five CI gates +are green, frontend gates green, and a manual dogfood shows a keep-run's plans +filtered by `workspace:<name>` in the planner — and reachable by pasting the +deep-link URL. + +## Why + +- E1 records *which* plan ids a workspace created (`created_objects.scenario_plan_ids`, `app/features/demo/workspace.py:97`), but the plans themselves are unfindable from the planner — the library has NO filter UI even though the backend (`GET /scenarios?tags=`, JSONB `@>` containment, `app/features/scenarios/service.py:462-465`) and the frontend hook (`useScenarios(tags)`, `frontend/src/hooks/use-scenarios.ts:28-38`) have supported tag filtering since PRP-27. +- The pipeline already tags plans — but with fixed, workspace-blind values: `["showcase", "price"]` (`app/features/demo/pipeline.py:1309`) and `["showcase", "holiday"]` (`pipeline.py:1371`). Across runs, every plan looks identical. +- Umbrella #389 success criterion: "Showcase-saved scenario plans carry `["showcase", "workspace:<name>", "source:showcase"]` and are retrievable via `GET /scenarios?tags=workspace:<name>`". +- E4 (#393, PRP authored) renders per-workspace plan deep links from `created_objects`; E3's tag filter is the complementary bulk view ("all plans of workspace X") and the `?tags=` deep link gives E4/E5 a stable URL target. + +## What + +### Designed tag taxonomy (locked decisions) + +| Run | Tags on `showcase-price-cut-10pct` | Tags on `showcase-holiday-uplift` | +|-----|-----|-----| +| Ephemeral showcase run | `["showcase", "price", "source:showcase"]` | `["showcase", "holiday", "source:showcase"]` | +| Keep run, named `bf-demo` | `[..., "source:showcase", "workspace:bf-demo"]` | same + `workspace:bf-demo` | +| Keep run, unnamed | `[..., "source:showcase", "workspace:<workspace_id>"]` | same | + +1. **`showcase` + `price`/`holiday` stay** — existing plans and any operator filters keep working (back-compat; tags are append-only semantics). +2. **`source:showcase` always** — every pipeline-saved plan is showcase-sourced regardless of preservation; this is the namespaced successor to the bare `showcase` tag (umbrella triple). +3. **`workspace:<label>` only when a workspace row exists** (`ctx.workspace_id` non-None — i.e. `preservation="keep"` AND the E1 insert succeeded). Label = `workspace_name` when set, else the 32-hex `workspace_id` — an unnamed workspace's plans stay findable, and the label can never be empty. +4. **The agent-HITL plan is OUT OF SCOPE** — `step_agent_hitl_flow`'s plan is saved through the agent tool path (`SaveScenarioRequest`, `app/features/scenarios/agent_tools.py:35,199`) which carries no `tags` field; threading workspace context into the agent session is a cross-slice change deferred to a future epic. Note it in the PR description. +5. **No tag editing/deleting UI** — the filter reads tags; managing them is out of scope (umbrella). + +### User-visible behavior + +- New keep-run plans carry the workspace tag; `GET /scenarios?tags=workspace:<label>` returns exactly that workspace's two plans (JSONB containment, all listed tags must match). +- Planner saved-plans library: tags render as clickable badges; clicking adds the tag to the active filter; active filters show as chips with per-chip remove and a "Clear" action; the table re-queries server-side via `useScenarios(tagFilter)`; an active filter with zero matches shows a distinct empty-state ("No plans match the selected tags") instead of the no-plans-yet message. +- The filter syncs to the URL as repeated `?tags=` params (read on mount, written on change) — shareable/deep-linkable. +- The `scenario_simulate_and_save` step's `data` payload additively carries the `tags` list it sent (UI/e2e observability). + +### Technical requirements + +- Tag derivation is one pure, unit-testable helper in `pipeline.py`; both save steps call it (no duplicated literals). +- `ctx.workspace_name` is set in `run_pipeline` alongside the E1 `create_workspace` hook — steps never see the request object (signature is `(ctx, client)`, `pipeline.py:1242`). +- `CreateScenarioRequest.tags` caps at 20 items (`app/features/scenarios/schemas.py:203-207`, `Field(max_length=20)` = max list items); E3 sends 4 — no limit risk. Items are unconstrained `str` — colons are fine (the existing `cloned_from`/tag tests and the `workspace:<name>` umbrella spec rely on this). +- No scenarios-slice production code changes — routes/service/schemas already support everything. +- Frontend: `useScenarios(tags)` already encodes repeated `tags=` params and keys the query on `{tags}` (`use-scenarios.ts:28-38`) — refetch on filter change is free. + +### Success Criteria + +- [ ] `_showcase_plan_tags` unit-tested: ephemeral → 3 tags, keep+named → +`workspace:<name>`, keep+unnamed → +`workspace:<workspace_id>`; both steps send helper output. +- [ ] Integration: two plans POSTed with `workspace:e3-it` among tags → `GET /scenarios?tags=workspace:e3-it` returns exactly them; adding a second tag (`tags=workspace:e3-it&tags=price`) narrows to one (containment semantics). +- [ ] Planner: clicking a tag badge filters the table server-side; chips removable; Clear resets; filtered-empty state distinct from no-plans state. +- [ ] `/visualize/planner?tags=workspace:x` pre-applies the filter on load; changing the filter updates the URL. +- [ ] Legacy behavior intact: plans saved before E3 (no `source:showcase`) still list unfiltered; ephemeral-run plans carry no `workspace:` tag. +- [ ] `uv run ruff check . && uv run ruff format --check . && uv run mypy app/ && uv run pyright app/ && uv run pytest -v -m "not integration"` green; integration suite green; `pnpm lint && pnpm test --run` green. + +## All Needed Context + +### Documentation & References + +```yaml +# MUST READ — backend (verified 2026-06-12, dev @ 3194fe8) + +- file: app/features/demo/pipeline.py + why: | + THE file E3 changes. DemoContext dataclass at 212-260 — add + `workspace_name: str | None = None` directly under `workspace_id` (260) + with an `# E3 (#392)` comment (per-PRP comment convention visible at + 238/241/249/254/258). step_scenario_simulate_and_save at 1242 — the POST + /scenarios body with `"tags": ["showcase", "price"]` is at 1300-1311 + (tags literal: 1309); its return data dict at 1337-1344 (add "tags"). + step_multi_plan_compare at 1348 — body with ["showcase", "holiday"] at + 1362-1372 (tags literal: 1371). run_pipeline ctx construction at + 2625-2630; the E1 keep-branch `ctx.workspace_id = await + workspace.create_workspace(req)` at 2634-2635 — set ctx.workspace_name + in the same branch. Steps receive ONLY (ctx, client) — never req. + +- file: app/features/demo/workspace.py + why: | + _collect_created_objects (81-102) records scenario_plan_ids — E3's tags + complement (don't replace) this linkage. READ-ONLY; cite for the + workspace_id format (uuid4().hex, 32 chars — the unnamed-fallback label). + +- file: app/features/demo/schemas.py + why: | + DemoRunRequest.workspace_name (72-78): max_length=100, pattern + ^[a-z0-9][a-z0-9\-_]*$ — so the derived tag is ≤ 110 chars of safe + charset; no sanitization needed in the helper. READ-ONLY in E3. + +- file: app/features/scenarios/schemas.py + why: | + CreateScenarioRequest.tags at 203-207: list[str], Field(max_length=20) + = max 20 ITEMS, items unconstrained str. NO schema change needed. + +- file: app/features/scenarios/service.py + why: | + list_plans tags containment at 462-465: `ScenarioPlan.tags.contains(tags)` + (JSONB @>) on both count and rows statements — "a plan matches when it + carries EVERY listed tag". This is the server-side filter the planner UI + drives. NO change needed. + +- file: app/features/scenarios/routes.py + why: | + GET /scenarios `tags: list[str] | None = Query(...)` at 176-195 — repeated + query params. NO change needed; cited so the implementer trusts the wire + format useScenarios already emits. + +- file: app/features/scenarios/agent_tools.py + why: | + Lines 35, 199 — the agent save path uses SaveScenarioRequest (NO tags + field). Out-of-scope boundary for decision #4; do not modify. + +- file: app/features/demo/tests/test_pipeline.py + why: | + The canned _Client fake echoes tags ("tags": json_body.get("tags", []), + line 140). EXISTING ASSERTIONS TO UPDATE: line 1133 asserts + body["tags"] == ["showcase", "price"] in + test_scenario_simulate_and_save_happy_path (1094); multi-plan tests at + 1268/1312/1333. Test ctx factories build DemoContext directly — keep + runs are simulated by setting ctx.workspace_id/workspace_name, NOT by + running the orchestrator. + +- file: app/features/scenarios/tests/test_routes_integration.py + why: | + Target file for the round-trip integration test (currently has NO tags + coverage — verified by grep). Reuse its existing client/DB fixtures and + cleanup conventions; mark @pytest.mark.integration like its siblings. + +- file: app/features/scenarios/tests/conftest.py + why: Integration DB/client fixture precedent for the new test. + +# MUST READ — frontend (verified 2026-06-12) + +- file: frontend/src/pages/visualize/planner.tsx + why: | + 814 lines. Results/persistence state block at 117-126 — add tagFilter + state here. Hook calls at 128-133: `const scenariosQuery = useScenarios()` + at 132 — becomes useScenarios(tagFilter). Saved-plans Card at 669-762: + CardHeader 671-694 (add the active-filter chip row + Clear here), Tags + cell at 724-726 (`plan.tags.join(', ')` — replace with clickable Badge + list), empty-state at 756-760 (branch on active filter). NO + useSearchParams usage today (verified) — add it. + +- file: frontend/src/hooks/use-scenarios.ts + why: | + useScenarios(tags = [], enabled = true) at 28-38: encodes repeated + `tags=` params, queryKey ['scenarios', { tags }] — the page only needs + to pass state in. NO hook change needed. + +- file: frontend/src/lib/url-params.ts + why: | + Safe query-param readers with validate-at-read-boundary docstring + pattern (1-9). parsePageParam (17), parseIdParam (29), parseEnumParam + (41) — ADD parseTagsParam following this style. Tests colocated in + url-params.test.ts. + +- file: frontend/src/pages/explorer/run-compare.tsx + why: | + useSearchParams precedent at 87 (`const [params, setParams] = + useSearchParams()`) — the read/write URL-state pattern to mirror + (explorer pages treat the query string as filter source of truth). + +- file: frontend/src/components/ui/badge.tsx + why: | + Installed shadcn Badge — use for tag chips (variant="secondary" for + inactive, "default" for active filters). Do NOT run shadcn add. + +- file: frontend/src/types/api.ts + why: | + ScenarioListItem.tags: string[] already typed (search "tags" in the + scenarios block) — NO type changes needed in E3. + +# Issue / initiative context +- url: https://github.com/w7-mgfcode/ForecastLabAI/issues/392 + why: The epic this PRP implements. +- url: https://github.com/w7-mgfcode/ForecastLabAI/issues/389 + why: Umbrella — tag-triple success criterion + out-of-scope list. +- file: PRPs/PRP-showcase-workspace-E1-persistence-backbone.md + why: Foundation PRP — workspace row lifecycle the tags piggyback on. +- file: PRPs/PRP-showcase-workspace-E4-restore-replay.md + why: | + Parallel epic (authored, not started). File-overlap check: E4 touches + demo routes/schemas/workspace.py + showcase.tsx; E3 touches pipeline.py + + planner.tsx + url-params.ts. ONLY shared file: docs/_base/API_CONTRACTS.md + (both additive). E4's WorkspaceArtifactsPanel links plans by id; + E3's ?tags= deep link is the complementary bulk view. +``` + +### Current Codebase tree (relevant subset) + +```bash +app/features/demo/ +├── pipeline.py # DemoContext @212-260; save steps @1242/@1348; ctx build @2626 +└── tests/test_pipeline.py # canned client echoes tags @140; assertions @1133 etc. +app/features/scenarios/ # NO production changes — tags filter fully built (PRP-27) +└── tests/test_routes_integration.py # no tags coverage yet +frontend/src/ +├── pages/visualize/planner.tsx # library card @669-762; useScenarios() @132 +├── hooks/use-scenarios.ts # useScenarios(tags) ready @28-38 +└── lib/url-params.ts (+ .test.ts) # param-reader helpers +``` + +### Desired Codebase tree (files added/modified) + +```bash +app/features/demo/ +├── pipeline.py # MOD — ctx.workspace_name; _showcase_plan_tags(); 2 call sites; step data +tags +└── tests/test_pipeline.py # MOD — helper tests; updated step assertions; keep/unnamed variants +app/features/scenarios/tests/ +└── test_routes_integration.py # MOD — +workspace-tag containment round-trip test +frontend/src/ +├── lib/url-params.ts # MOD — +parseTagsParam +├── lib/url-params.test.ts # MOD — +parseTagsParam cases +└── pages/visualize/planner.tsx # MOD — tagFilter state + URL sync + badges + chips + empty-state +docs/_base/API_CONTRACTS.md # MOD — E3 note on the planning steps +``` + +### Known Gotchas & Library Quirks + +```python +# CRITICAL — steps never see the request. step signatures are (ctx, client) +# (pipeline.py:1242). workspace_name MUST travel on DemoContext, set in +# run_pipeline's keep-branch (2634-2635). Do NOT widen step signatures. + +# CRITICAL — set ctx.workspace_name from req.workspace_name INSIDE the +# `if req.preservation == "keep":` branch BUT independent of the +# create_workspace result: if the row insert failed (workspace_id None), +# decision #3 says NO workspace tag — so the helper keys on +# ctx.workspace_id (not workspace_name) for the "is this a workspace run" +# check. A name with a dead DB produces NO tag (consistent with E1's +# warn-and-continue: no row -> nothing to find later anyway). + +# CRITICAL — JSONB containment filter is ALL-tags-must-match +# (service.py:462-465, .contains = @>). The planner filter is therefore an +# AND filter — say so in the UI copy ("plans carrying every selected tag"). + +# GOTCHA — tags order: append new tags AFTER the existing ones +# (["showcase", "price", "source:showcase", "workspace:..."]) so the +# existing test diffs stay minimal and human-readable. Order is irrelevant +# to containment. + +# GOTCHA — test_pipeline.py:1133 asserts the OLD exact list +# ["showcase", "price"] — it (and the multi-plan equivalents) MUST be +# updated, not deleted: assert the new exact list for an ephemeral ctx +# (the default test ctx has workspace_id=None). + +# GOTCHA — pyright/mypy --strict: _showcase_plan_tags returns list[str]; +# annotate fully. The dataclass field needs `str | None = None` (matches +# sibling style at pipeline.py:236-260). + +# GOTCHA — planner.tsx is a 814-line monolith with NO test file. Keep page +# logic thin: validation logic goes in parseTagsParam (tested in +# url-params.test.ts); the page just wires state. Do not extract a new +# component unless the diff stays smaller that way (it won't — chips and +# badges are ~30 lines inline). + +# GOTCHA — useSearchParams: write with { replace: true } to avoid polluting +# browser history on every chip toggle (run-compare.tsx:87 precedent uses +# setParams; explorer pages show the read pattern). Read ONCE for initial +# state (useState initializer) then treat React state as canonical and +# mirror to the URL in the toggle handlers — planner has heavy local state +# already; do not refactor it to URL-as-source-of-truth in this epic. + +# GOTCHA — repeated params: searchParams.getAll('tags') reads them; +# useScenarios already ENCODES each tag (encodeURIComponent, +# use-scenarios.ts:31) — parseTagsParam receives DECODED values from +# getAll (URLSearchParams decodes) — do not double-decode. + +# GOTCHA — parseTagsParam hygiene: trim, drop empties, dedupe (Set), cap at +# 20 (matches CreateScenarioRequest.tags max items — a hand-edited URL +# with 50 tags must not produce a 50-param query). + +# GOTCHA — `pnpm tsc --noEmit` is VACUOUS (solution-style tsconfig); `tsc -b` +# fails on dev with PRE-EXISTING errors — don't chase them. JS gates: +# `pnpm lint && pnpm test --run`. + +# GOTCHA — repo has mixed CRLF/LF line endings; pipeline.py + planner.tsx +# edits must be surgical — check `git diff --stat` before committing. + +# COORDINATION — E4 (#393) PRP is authored but unstarted; only shared file is +# docs/_base/API_CONTRACTS.md (both additive notes). If E4 lands first, the +# E4 demo-routes diff does not collide with E3's pipeline.py diff. + +# RUNTIME-VERIFICATION LOG (per prp-create step 3): +# - tags literals at pipeline.py:1309 / 1371 — read 2026-06-12 +# - DemoContext field block ends @260 (workspace_id) — read 2026-06-12 +# - ctx build @2625-2630; keep-branch @2634-2635 — read 2026-06-12 +# - CreateScenarioRequest.tags Field(max_length=20) @scenarios/schemas.py:203 +# — Pydantic v2 semantics: max_length on list = max ITEMS (in-repo +# precedent relies on it; re-verify on pydantic major bump with: +# uv run python -c "from pydantic import BaseModel, Field; +# class M(BaseModel): t: list[str] = Field(default_factory=list, max_length=2) +# import pydantic; M(t=['a','b','c'])" -> expect ValidationError) +# - list_plans containment @scenarios/service.py:462-465 — read 2026-06-12 +# - useScenarios(tags) encoding @use-scenarios.ts:28-38 — read 2026-06-12 +# - planner.tsx has NO useSearchParams; library card @669-762 — read 2026-06-12 +# - test_pipeline.py fake echoes tags @140; old assertion @1133 — read 2026-06-12 +# - agent_tools.py SaveScenarioRequest has NO tags @35,199 — grep-verified +``` + +## Implementation Blueprint + +### Data models and structure + +```python +# app/features/demo/pipeline.py — DemoContext addition (after workspace_id, line 260) + # E3 (#392) -- workspace label for plan tagging. Set alongside + # workspace_id in run_pipeline's keep-branch; None on ephemeral runs. + workspace_name: str | None = None + + +# app/features/demo/pipeline.py — pure helper (place near the other module +# helpers under "Helpers shared across steps", after _parse_artifact_key) +def _showcase_plan_tags(ctx: DemoContext, kind: str) -> list[str]: + """Build the tag list for a pipeline-saved scenario plan (E3, #392). + + Always: ["showcase", <kind>, "source:showcase"]. When the run records a + workspace (ctx.workspace_id set -- preservation="keep" AND the E1 insert + succeeded), append "workspace:<label>" where label is the human + workspace_name or, on unnamed runs, the 32-hex workspace_id -- the label + is never empty. No workspace row -> no workspace tag (nothing to find). + """ + tags = ["showcase", kind, "source:showcase"] + if ctx.workspace_id is not None: + tags.append(f"workspace:{ctx.workspace_name or ctx.workspace_id}") + return tags + + +# Call sites: +# pipeline.py:1309 "tags": ["showcase", "price"] -> "tags": _showcase_plan_tags(ctx, "price") +# pipeline.py:1371 "tags": ["showcase", "holiday"] -> "tags": _showcase_plan_tags(ctx, "holiday") +# Step data (1337-1344): add "tags": <the list sent> for observability. + +# run_pipeline keep-branch (2634-2635) becomes: + if req.preservation == "keep": + ctx.workspace_id = await workspace.create_workspace(req) + ctx.workspace_name = req.workspace_name # E3 (#392) -- plan-tag label +``` + +```typescript +// frontend/src/lib/url-params.ts — append (mirror parseEnumParam doc style) +/** + * Parse repeated `tags` query params into a clean filter list. + * + * Trims each value, drops empties, dedupes, and caps at 20 entries + * (matches the backend CreateScenarioRequest.tags item cap) so a + * hand-edited URL degrades to a sane query instead of a 50-param request. + */ +export function parseTagsParam(values: string[]): string[] { + const seen = new Set<string>() + for (const value of values) { + const tag = value.trim() + if (tag) seen.add(tag) + if (seen.size >= 20) break + } + return [...seen] +} +``` + +```tsx +// frontend/src/pages/visualize/planner.tsx — wiring sketch +const [searchParams, setSearchParams] = useSearchParams() // + import +const [tagFilter, setTagFilter] = useState<string[]>(() => + parseTagsParam(searchParams.getAll('tags')) +) +const scenariosQuery = useScenarios(tagFilter) // line 132 + +function applyTagFilter(next: string[]) { + setTagFilter(next) + setSearchParams( + (prev) => { + const params = new URLSearchParams(prev) + params.delete('tags') + next.forEach((t) => params.append('tags', t)) + return params + }, + { replace: true } + ) +} +const addTag = (tag: string) => !tagFilter.includes(tag) && applyTagFilter([...tagFilter, tag]) +const removeTag = (tag: string) => applyTagFilter(tagFilter.filter((t) => t !== tag)) + +// CardHeader (671-694): under the CardDescription, when tagFilter.length > 0: +// <Badge> per active tag with an inline ✕ (lucide X, h-3 w-3) onClick=removeTag +// + a ghost "Clear" Button onClick={() => applyTagFilter([])} +// CardDescription suffix: "Filtering to plans carrying every selected tag." +// Tags cell (724-726): replace join(', ') with +// plan.tags.map(tag => <Badge variant="secondary" className="cursor-pointer" +// onClick={() => addTag(tag)}>{tag}</Badge>) (keep '—' when empty) +// Empty state (756-760): tagFilter.length > 0 +// ? "No plans match the selected tags." + Clear button +// : existing "No saved plans yet..." copy +``` + +### List of tasks (dependency order) + +```yaml +Task 1 — branch & issue hygiene: + RUN: git switch dev && git pull && git switch -c feat/showcase-workspace-tagged-plans + VERIFY: gh issue view 392 --json state # open + +Task 2 — MODIFY app/features/demo/pipeline.py: + - ADD DemoContext.workspace_name (blueprint above; after line 260, E3 comment) + - ADD _showcase_plan_tags helper (blueprint above; full docstring + annotations) + - REPLACE tags literal at 1309 with _showcase_plan_tags(ctx, "price"); + capture the list in a local (sent_tags) and ADD "tags": sent_tags to the + step's return data dict (1337-1344) + - REPLACE tags literal at 1371 with _showcase_plan_tags(ctx, "holiday") + - run_pipeline keep-branch (2634-2635): + ctx.workspace_name = req.workspace_name + +Task 3 — MODIFY app/features/demo/tests/test_pipeline.py: + - ADD test__showcase_plan_tags_ephemeral / _keep_named / _keep_unnamed + (pure-function tests; build DemoContext(seed=42, skip_seed=True, + reset=False) and set workspace_id/workspace_name directly) + - UPDATE test_scenario_simulate_and_save_happy_path (1094): assertion at + 1133 -> ["showcase", "price", "source:showcase"]; also assert + data["tags"] echoes the same list + - ADD test_scenario_simulate_and_save_keep_run_carries_workspace_tag: + ctx.workspace_id = "a"*32; ctx.workspace_name = "bf-demo"; + assert "workspace:bf-demo" in captured body["tags"] + - ADD a body-tags assertion to the multi-plan happy path (1268) — it has + NO tags assertion today: holiday body tags == + ["showcase", "holiday", "source:showcase"] (+ a keep-run variant + asserting the workspace tag flows to plan #2) + +Task 4 — MODIFY app/features/scenarios/tests/test_routes_integration.py: + - ADD @pytest.mark.integration test_list_scenarios_filters_by_workspace_tag: + # POST /scenarios twice (existing create-body fixture/pattern in this + # file) with tags ["showcase","price","source:showcase","workspace:e3-it"] + # and ["showcase","holiday","source:showcase","workspace:e3-it"]; + # POST a third plan WITHOUT the workspace tag (control); + # GET /scenarios?tags=workspace:e3-it -> exactly the 2, total == 2; + # GET /scenarios?tags=workspace:e3-it&tags=price -> exactly 1 (AND); + # cleanup via DELETE /scenarios/{id} (or the file's fixture teardown) + +Task 5 — MODIFY frontend/src/lib/url-params.ts + url-params.test.ts: + - ADD parseTagsParam (blueprint above) + - Tests: empty array -> []; whitespace/empty entries dropped; duplicates + deduped; >20 entries capped at 20; passthrough of 'workspace:bf-demo' + +Task 6 — MODIFY frontend/src/pages/visualize/planner.tsx: + - Imports: useSearchParams (react-router-dom), Badge, X (lucide), + parseTagsParam + - State + applyTagFilter/addTag/removeTag + useScenarios(tagFilter) + (blueprint above) + - CardHeader chips + Clear; clickable Badge tags cell; filtered empty-state + - NOTE: 'scenarios' queryKey includes {tags} -> create/delete mutations + invalidate the prefix and the filtered view refetches — no extra work + +Task 7 — MODIFY docs/_base/API_CONTRACTS.md: + - WS /demo/stream section, planning-phase description: append + "E3 (#392) — pipeline-saved plans now carry source:showcase plus + workspace:<name|workspace_id> on preservation='keep' runs; retrievable + via GET /scenarios?tags=workspace:<label>. The + scenario_simulate_and_save step's data additively echoes `tags`." + +Task 8 — gates, dogfood, commit, PR: + - Backend gates + integration suite (Validation Loop) + - Frontend: pnpm lint && pnpm test --run + - Browser dogfood via the webapp-testing skill (CLAUDE.md workflow step 4) + - git diff --stat (CRLF noise check on pipeline.py / planner.tsx) + - COMMITS (reference #392, no AI trailer), e.g.: + feat(api): tag showcase plans with workspace label (#392) + feat(ui): add tag filter to planner saved-plans library (#392) + test(api): cover workspace-tag containment round trip (#392) + docs(api): document workspace plan tags (#392) + - PR into dev; title `feat(api,ui): showcase workspace-tagged scenario plans (#392)` +``` + +### Integration Points + +```yaml +DATABASE: none — scenario_plan.tags (JSONB, GIN) exists since PRP-27. + +CONFIG: none. + +ROUTES: none — GET /scenarios?tags= already shipped. + +FRONTEND: planner.tsx only; no new React Router routes; ?tags= deep link + becomes a stable target for E4/E5 ("view this workspace's plans"). + +DOCS: docs/_base/API_CONTRACTS.md one additive note (Task 7). RUNBOOKS / + DOMAIN_MODEL sweeps belong to the E5 release gate. +``` + +## Validation Loop + +### Level 1: Syntax & Style + +```bash +uv run ruff check . && uv run ruff format --check . +uv run mypy app/ && uv run pyright app/ +cd frontend && pnpm lint +``` + +### Level 2: Unit Tests (no DB) + +```bash +uv run pytest app/features/demo -v -m "not integration" +cd frontend && pnpm test --run +# New/changed: _showcase_plan_tags cases; updated step-body assertions; +# parseTagsParam cases in url-params.test.ts. +``` + +### Level 3: Integration (real Postgres) + +```bash +docker compose up -d && uv run alembic upgrade head +uv run pytest app/features/scenarios -v -m integration # incl. the new tags round-trip +uv run pytest app/features/demo -v -m integration +``` + +### Level 4: Manual smoke + browser dogfood (seeded stack, uvicorn :8123) + +```bash +# 1. Keep-run produces workspace-tagged plans (showcase_rich saves the plans). +# NOTE: scenario_plan rows persist across showcase runs (reset does not wipe +# them — RUNBOOKS incident 27), so use a UNIQUE name per smoke run: +WS_NAME="e3-smoke-$(date +%s)" +curl -s -X POST http://localhost:8123/demo/run -H 'Content-Type: application/json' \ + -d "{\"seed\":42,\"reset\":true,\"skip_seed\":false,\"scenario\":\"showcase_rich\",\"preservation\":\"keep\",\"workspace_name\":\"$WS_NAME\"}" \ + | python3 -c "import sys,json; r=json.load(sys.stdin); print(r['overall_status'], r['workspace_id'])" +curl -s "http://localhost:8123/scenarios?tags=workspace:$WS_NAME" \ + | python3 -c "import sys,json; r=json.load(sys.stdin); print(r['total'], [s['name'] for s in r['scenarios']])" +# Expect: 2 ['showcase-holiday-uplift', 'showcase-price-cut-10pct'] (order: newest first) + +# 2. Browser dogfood (webapp-testing skill / agent-browser): +# /visualize/planner -> Saved plans table shows tag badges -> click +# workspace:e3-smoke -> table narrows to the 2 plans, chip appears, URL +# carries ?tags=workspace%3Ae3-smoke -> paste that URL in a fresh tab -> +# filter pre-applied -> remove chip -> full list returns. +``` + +## Final validation Checklist + +- [ ] All five gates green: `uv run ruff check . && uv run ruff format --check . && uv run mypy app/ && uv run pyright app/ && uv run pytest -v -m "not integration"` +- [ ] Integration suite green: `uv run pytest -v -m integration` (fresh docker-compose DB) +- [ ] Frontend gates green: `pnpm lint && pnpm test --run` +- [ ] Manual smoke (Level 4 step 1): keep-run plans retrievable via `?tags=workspace:<name>` +- [ ] Browser dogfood (Level 4 step 2) passes — UI verified in a real browser per `.claude/rules/ui-design.md` +- [ ] Ephemeral-run plans carry NO `workspace:` tag (unit-asserted); legacy plans unaffected +- [ ] `git diff --stat` shows surgical diffs (no CRLF whole-file noise) +- [ ] docs/_base/API_CONTRACTS.md updated additively +- [ ] Commits formatted `feat(api)/feat(ui)/test(api)/docs(api): ... (#392)`, no AI trailer; PR into dev + +--- + +## Anti-Patterns to Avoid + +- ❌ Don't widen step signatures to pass the request — workspace_name travels on DemoContext. +- ❌ Don't touch scenarios production code — routes/service/schemas already do everything. +- ❌ Don't tag the agent-HITL plan — SaveScenarioRequest has no tags field; cross-slice threading is a future epic. +- ❌ Don't drop the legacy `showcase`/`price`/`holiday` tags — append, never replace. +- ❌ Don't ship a migration — `scenario_plan.tags` + GIN index exist since PRP-27. +- ❌ Don't refactor planner.tsx to URL-as-source-of-truth — read once, mirror on change. +- ❌ Don't run `shadcn add` — Badge is installed. +- ❌ Don't chase pre-existing `tsc -b` errors — lint + vitest are the JS gates. + +## Confidence Score + +**9/10** for one-pass implementation success. The backend delta is a pure +helper + two literal replacements + one context field, with the exact existing +test assertions that must change already located (test_pipeline.py:1133 etc.); +the server-side filter and the frontend hook are fully built and verified, so +the UI work is pure wiring with installed components. The −1: planner.tsx is a +large untested page, so the chip/badge wiring is verified only by lint + +browser dogfood — a styling or state-sync slip there costs one iteration, not +a redesign. From f2809365082bc937cad790c676e9f6b29c1b818f Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 16:59:40 +0200 Subject: [PATCH 40/44] feat(api): expose showcase workspace list and detail endpoints (#393) --- app/features/demo/routes.py | 83 ++++++++++- app/features/demo/schemas.py | 49 ++++++- app/features/demo/tests/test_routes.py | 175 +++++++++++++++++++++++- app/features/demo/tests/test_schemas.py | 96 ++++++++++++- app/features/demo/workspace.py | 21 ++- 5 files changed, 409 insertions(+), 15 deletions(-) diff --git a/app/features/demo/routes.py b/app/features/demo/routes.py index 660df652..6d3284c4 100644 --- a/app/features/demo/routes.py +++ b/app/features/demo/routes.py @@ -3,22 +3,35 @@ Exposes: - ``POST /demo/run`` -- synchronous; runs the whole pipeline, returns a result. - ``WS /demo/stream`` -- streams one StepEvent per step for the live UI. +- ``GET /demo/workspaces`` -- E4 (#393): list saved workspaces. +- ``GET /demo/workspaces/{workspace_id}`` -- E4 (#393): one workspace's detail. -Both obtain the live FastAPI app from ``request.app`` / ``websocket.app`` and -pass it into the pipeline -- the slice never imports ``app.main`` (circular). +The run/stream handlers obtain the live FastAPI app from ``request.app`` / +``websocket.app`` and pass it into the pipeline -- the slice never imports +``app.main`` (circular). The workspace GETs are the slice's first DB-dependent +routes (``Depends(get_db)``). """ from __future__ import annotations import json -from fastapi import APIRouter, Request, WebSocket, WebSocketDisconnect +from fastapi import APIRouter, Depends, Query, Request, WebSocket, WebSocketDisconnect from pydantic import ValidationError +from sqlalchemy.ext.asyncio import AsyncSession -from app.core.exceptions import ConflictError +from app.core.database import get_db +from app.core.exceptions import ConflictError, NotFoundError from app.core.logging import get_logger -from app.features.demo import service -from app.features.demo.schemas import DemoRunRequest, DemoRunResult, StepEvent +from app.features.demo import service, workspace +from app.features.demo.schemas import ( + DemoRunRequest, + DemoRunResult, + StepEvent, + WorkspaceDetailResponse, + WorkspaceListItem, + WorkspaceListResponse, +) logger = get_logger(__name__) @@ -54,6 +67,64 @@ async def run_demo_pipeline(request: Request, params: DemoRunRequest) -> DemoRun raise ConflictError(str(exc)) from exc +@router.get( + "/workspaces", + response_model=WorkspaceListResponse, + summary="List saved showcase workspaces", + description="List saved showcase workspaces, newest first. Returns 200 + " + "an empty list when no workspaces exist.", +) +async def list_showcase_workspaces( + db: AsyncSession = Depends(get_db), + limit: int = Query(default=20, ge=1, le=100, description="Maximum workspaces to return."), + offset: int = Query(default=0, ge=0, description="Number of workspaces to skip."), +) -> WorkspaceListResponse: + """List saved showcase workspaces (E4, issue #393). + + Args: + db: Async database session from dependency. + limit: Maximum workspaces to return (1-100). + offset: Number of workspaces to skip. + + Returns: + A page of saved workspaces plus the total count. + """ + rows = await workspace.list_workspaces(db, limit=limit, offset=offset) + total = await workspace.count_workspaces(db) + return WorkspaceListResponse( + workspaces=[WorkspaceListItem.model_validate(row) for row in rows], + total=total, + ) + + +@router.get( + "/workspaces/{workspace_id}", + response_model=WorkspaceDetailResponse, + summary="Get a saved showcase workspace", + description="Fetch one saved workspace, including its created-object soft references.", +) +async def get_showcase_workspace( + workspace_id: str, + db: AsyncSession = Depends(get_db), +) -> WorkspaceDetailResponse: + """Get a saved showcase workspace by id (E4, issue #393). + + Args: + workspace_id: External identifier of the workspace. + db: Async database session from dependency. + + Returns: + The full workspace row including ``created_objects``. + + Raises: + NotFoundError: When no workspace matches ``workspace_id``. + """ + row = await workspace.get_workspace(db, workspace_id) + if row is None: + raise NotFoundError(message=f"Workspace not found: {workspace_id}") + return WorkspaceDetailResponse.model_validate(row) + + @router.websocket("/stream") async def stream_demo_pipeline(websocket: WebSocket) -> None: """Stream one StepEvent per pipeline step over a WebSocket. diff --git a/app/features/demo/schemas.py b/app/features/demo/schemas.py index e02738af..cad7d32e 100644 --- a/app/features/demo/schemas.py +++ b/app/features/demo/schemas.py @@ -8,7 +8,7 @@ from __future__ import annotations -from datetime import UTC, datetime +from datetime import UTC, date, datetime from typing import Any, Literal from pydantic import BaseModel, ConfigDict, Field, model_validator @@ -164,3 +164,50 @@ class DemoRunResult(BaseModel): default=None, description="showcase_workspace id recorded for this run, if kept.", ) + + +class WorkspaceListItem(BaseModel): + """A compact row in the saved-workspaces list (E4, issue #393). + + Response model -- plain ``BaseModel`` with ``from_attributes`` (built from + ``ShowcaseWorkspace`` ORM rows), NOT ``ConfigDict(strict=True)``: strict + mode is a request-body policy (see the ``StepEvent`` precedent above). + """ + + model_config = ConfigDict(from_attributes=True) + + workspace_id: str = Field(..., description="Unique external identifier (UUID hex).") + name: str | None = Field(default=None, description="Optional human label.") + status: str = Field(..., description="Lifecycle state -- running / completed / failed.") + seed: int = Field(..., description="Seeder seed the run was started with.") + scenario: str = Field(..., description="Seeder scenario preset value.") + reset: bool = Field(..., description="Whether the run wiped the database first.") + skip_seed: bool = Field(..., description="Whether the run skipped the seed step.") + result_summary: dict[str, Any] | None = Field( + default=None, description="Winner / WAPE / wall-clock display payload." + ) + created_at: datetime = Field(..., description="When the run was recorded (UTC).") + + +class WorkspaceDetailResponse(WorkspaceListItem): + """Full workspace row incl. created objects (E4, issue #393).""" + + store_id: int | None = Field(default=None, description="Showcase grain store id.") + product_id: int | None = Field(default=None, description="Showcase grain product id.") + date_start: date | None = Field(default=None, description="Seeded window start.") + date_end: date | None = Field(default=None, description="Seeded window end.") + created_objects: dict[str, Any] = Field( + default_factory=dict, + description="Soft-reference ids of everything the run created.", + ) + + +class WorkspaceListResponse(BaseModel): + """A page of saved workspaces, newest first (E4, issue #393).""" + + model_config = ConfigDict(from_attributes=True) + + workspaces: list[WorkspaceListItem] = Field( + ..., description="Saved workspaces for the current page; empty when none." + ) + total: int = Field(..., ge=0, description="Total saved workspaces.") diff --git a/app/features/demo/tests/test_routes.py b/app/features/demo/tests/test_routes.py index 5158d1ca..016049db 100644 --- a/app/features/demo/tests/test_routes.py +++ b/app/features/demo/tests/test_routes.py @@ -1,15 +1,20 @@ -"""Route tests for the demo slice (POST /demo/run + WS /demo/stream). +"""Route tests for the demo slice (POST /demo/run + WS /demo/stream + GETs). The demo service is monkeypatched so these tests exercise the route wiring -without a database or a real pipeline run. +without a database or a real pipeline run. The E4 (#393) workspace GET unit +tests monkeypatch the workspace helpers the same way; their integration +counterparts run against the real Postgres via the ``db_session`` fixture. """ +import datetime as _dt from collections.abc import AsyncIterator +from types import SimpleNamespace import pytest from fastapi.testclient import TestClient +from sqlalchemy.ext.asyncio import AsyncSession -from app.features.demo import service +from app.features.demo import service, workspace from app.features.demo.schemas import DemoRunRequest, DemoRunResult, StepEvent from app.main import app @@ -199,3 +204,167 @@ async def fake_stream(_app, params: DemoRunRequest) -> AsyncIterator[StepEvent]: assert event["event_type"] == "pipeline_complete" assert seen["params"].seed == 7 assert seen["params"].preservation == "ephemeral" + + +# ============================================================================= +# E4 (#393) -- GET /demo/workspaces + GET /demo/workspaces/{id} (unit) +# ============================================================================= + + +def _orm_like_row(workspace_id: str = "a" * 32, **overrides: object) -> SimpleNamespace: + """An ORM-shaped stand-in for a ShowcaseWorkspace row.""" + base: dict[str, object] = { + "workspace_id": workspace_id, + "name": "e4-route", + "status": "completed", + "seed": 42, + "scenario": "demo_minimal", + "reset": False, + "skip_seed": True, + "store_id": 3, + "product_id": 7, + "date_start": _dt.date(2026, 1, 1), + "date_end": _dt.date(2026, 3, 31), + "created_objects": {"winning_run_id": "run-abc"}, + "result_summary": {"winner_model_type": "naive"}, + "created_at": _dt.datetime(2026, 6, 1, 12, 0, tzinfo=_dt.UTC), + } + base.update(overrides) + return SimpleNamespace(**base) + + +async def test_list_workspaces_empty(client, monkeypatch): + """E4 (#393) -- empty table yields 200 + an empty page (no 404).""" + + async def fake_list(_db, *, limit: int, offset: int) -> list[SimpleNamespace]: + return [] + + async def fake_count(_db) -> int: + return 0 + + monkeypatch.setattr(workspace, "list_workspaces", fake_list) + monkeypatch.setattr(workspace, "count_workspaces", fake_count) + + resp = await client.get("/demo/workspaces") + assert resp.status_code == 200 + assert resp.json() == {"workspaces": [], "total": 0} + + +async def test_list_workspaces_passes_pagination(client, monkeypatch): + """E4 (#393) -- limit/offset query params reach the helper.""" + seen: dict[str, int] = {} + + async def fake_list(_db, *, limit: int, offset: int) -> list[SimpleNamespace]: + seen["limit"] = limit + seen["offset"] = offset + return [_orm_like_row()] + + async def fake_count(_db) -> int: + return 5 + + monkeypatch.setattr(workspace, "list_workspaces", fake_list) + monkeypatch.setattr(workspace, "count_workspaces", fake_count) + + resp = await client.get("/demo/workspaces", params={"limit": 2, "offset": 3}) + assert resp.status_code == 200 + assert seen == {"limit": 2, "offset": 3} + body = resp.json() + assert body["total"] == 5 + assert body["workspaces"][0]["workspace_id"] == "a" * 32 + # List items are compact -- no created_objects on the page payload. + assert "created_objects" not in body["workspaces"][0] + + +async def test_list_workspaces_rejects_bad_pagination(client): + """E4 (#393) -- out-of-range limit/offset are 422 problem+json.""" + resp = await client.get("/demo/workspaces", params={"limit": 0}) + assert resp.status_code == 422 + resp = await client.get("/demo/workspaces", params={"offset": -1}) + assert resp.status_code == 422 + + +async def test_get_workspace_404(client, monkeypatch): + """E4 (#393) -- unknown workspace_id is a 404 problem+json.""" + + async def fake_get(_db, _workspace_id: str) -> None: + return None + + monkeypatch.setattr(workspace, "get_workspace", fake_get) + + resp = await client.get("/demo/workspaces/" + "0" * 32) + assert resp.status_code == 404 + assert resp.headers["content-type"].startswith("application/problem+json") + assert "Workspace not found" in resp.json()["detail"] + + +async def test_get_workspace_success(client, monkeypatch): + """E4 (#393) -- detail fields round-trip incl. created_objects + grain.""" + + async def fake_get(_db, workspace_id: str) -> SimpleNamespace: + return _orm_like_row(workspace_id=workspace_id) + + monkeypatch.setattr(workspace, "get_workspace", fake_get) + + resp = await client.get("/demo/workspaces/" + "b" * 32) + assert resp.status_code == 200 + body = resp.json() + assert body["workspace_id"] == "b" * 32 + assert body["created_objects"] == {"winning_run_id": "run-abc"} + assert body["store_id"] == 3 + assert body["product_id"] == 7 + assert body["date_start"] == "2026-01-01" + assert body["date_end"] == "2026-03-31" + + +# ============================================================================= +# E4 (#393) -- workspace GET routes against real Postgres (integration) +# ============================================================================= + + +@pytest.mark.integration +async def test_list_workspaces_integration_newest_first(client, db_session: AsyncSession): + """Seeded rows list newest-first with the right total.""" + ids: list[str] = [] + for index in range(3): + workspace_id = await workspace.create_workspace( + DemoRunRequest.model_validate( + {"preservation": "keep", "workspace_name": f"e4-it-{index}"} + ) + ) + assert workspace_id is not None + ids.append(workspace_id) + + resp = await client.get("/demo/workspaces") + assert resp.status_code == 200 + body = resp.json() + assert body["total"] == 3 + assert [w["workspace_id"] for w in body["workspaces"]] == list(reversed(ids)) + assert body["workspaces"][0]["name"] == "e4-it-2" + + paged = await client.get("/demo/workspaces", params={"limit": 1, "offset": 1}) + assert paged.status_code == 200 + paged_body = paged.json() + assert paged_body["total"] == 3 + assert [w["workspace_id"] for w in paged_body["workspaces"]] == [ids[1]] + + +@pytest.mark.integration +async def test_get_workspace_integration_round_trip(client, db_session: AsyncSession): + """created_objects JSONB round-trips through the detail endpoint.""" + workspace_id = await workspace.create_workspace( + DemoRunRequest.model_validate({"preservation": "keep", "workspace_name": "e4-it-detail"}) + ) + assert workspace_id is not None + + resp = await client.get(f"/demo/workspaces/{workspace_id}") + assert resp.status_code == 200 + body = resp.json() + assert body["workspace_id"] == workspace_id + assert body["name"] == "e4-it-detail" + assert body["status"] == "running" + assert body["created_objects"] == {} + assert body["result_summary"] is None + + missing = await client.get("/demo/workspaces/" + "f" * 32) + assert missing.status_code == 404 + assert missing.headers["content-type"].startswith("application/problem+json") diff --git a/app/features/demo/tests/test_schemas.py b/app/features/demo/tests/test_schemas.py index bdbfaac3..c4e120f2 100644 --- a/app/features/demo/tests/test_schemas.py +++ b/app/features/demo/tests/test_schemas.py @@ -1,9 +1,19 @@ """Unit tests for demo slice schemas.""" +import datetime as _dt +from types import SimpleNamespace + import pytest from pydantic import ValidationError -from app.features.demo.schemas import DemoRunRequest, DemoRunResult, StepEvent +from app.features.demo.schemas import ( + DemoRunRequest, + DemoRunResult, + StepEvent, + WorkspaceDetailResponse, + WorkspaceListItem, + WorkspaceListResponse, +) from app.shared.seeder.config import ScenarioPreset @@ -181,3 +191,87 @@ def test_demo_run_result_defaults(): assert result.wall_clock_s == 0.0 # E1 (#390) -- additive Optional field defaults to None (ephemeral runs). assert result.workspace_id is None + + +# ============================================================================= +# E4 (#393) -- workspace response models +# ============================================================================= + + +def _orm_like_workspace_row(**overrides: object) -> SimpleNamespace: + """An ORM-shaped stand-in for a ShowcaseWorkspace row (from_attributes).""" + base: dict[str, object] = { + "workspace_id": "a" * 32, + "name": "e4-demo", + "status": "completed", + "seed": 42, + "scenario": "demo_minimal", + "reset": False, + "skip_seed": True, + "store_id": 3, + "product_id": 7, + "date_start": _dt.date(2026, 1, 1), + "date_end": _dt.date(2026, 3, 31), + "created_objects": {"winning_run_id": "run-abc", "scenario_plan_ids": ["sp-1"]}, + "result_summary": {"winner_model_type": "naive", "winner_wape": 0.2}, + "created_at": _dt.datetime(2026, 6, 1, 12, 0, tzinfo=_dt.UTC), + } + base.update(overrides) + return SimpleNamespace(**base) + + +def test_workspace_list_item_from_attributes_round_trip(): + """E4 (#393) -- list item builds from an ORM-shaped row.""" + item = WorkspaceListItem.model_validate(_orm_like_workspace_row()) + assert item.workspace_id == "a" * 32 + assert item.name == "e4-demo" + assert item.status == "completed" + assert item.seed == 42 + assert item.scenario == "demo_minimal" + assert item.reset is False + assert item.skip_seed is True + assert item.result_summary == {"winner_model_type": "naive", "winner_wape": 0.2} + + +def test_workspace_detail_carries_created_objects_verbatim(): + """E4 (#393) -- detail model passes created_objects + grain through untouched.""" + detail = WorkspaceDetailResponse.model_validate(_orm_like_workspace_row()) + assert detail.created_objects == { + "winning_run_id": "run-abc", + "scenario_plan_ids": ["sp-1"], + } + assert detail.store_id == 3 + assert detail.product_id == 7 + assert detail.date_start == _dt.date(2026, 1, 1) + assert detail.date_end == _dt.date(2026, 3, 31) + + +def test_workspace_detail_tolerates_running_row_nulls(): + """E4 (#393) -- a still-running row (NULL grain/summary) validates.""" + detail = WorkspaceDetailResponse.model_validate( + _orm_like_workspace_row( + status="running", + name=None, + store_id=None, + product_id=None, + date_start=None, + date_end=None, + created_objects={}, + result_summary=None, + ) + ) + assert detail.status == "running" + assert detail.name is None + assert detail.created_objects == {} + assert detail.result_summary is None + + +def test_workspace_list_response_shape(): + """E4 (#393) -- page shape mirrors the scenarios list (items + total).""" + item = WorkspaceListItem.model_validate(_orm_like_workspace_row()) + page = WorkspaceListResponse(workspaces=[item], total=1) + dumped = page.model_dump(mode="json") + assert dumped["total"] == 1 + assert dumped["workspaces"][0]["workspace_id"] == "a" * 32 + # ISO serialization on the wire. + assert isinstance(dumped["workspaces"][0]["created_at"], str) diff --git a/app/features/demo/workspace.py b/app/features/demo/workspace.py index 44e8b475..40b20807 100644 --- a/app/features/demo/workspace.py +++ b/app/features/demo/workspace.py @@ -12,9 +12,9 @@ :func:`finalize_workspace` swallows any error. Both log a structured warning (pattern: the ``app/main.py`` lifespan config-override load). -:func:`get_workspace` / :func:`list_workspaces` are unrouted in E1 -- consumed -by the integration tests now and by the E4 restore/replay routes later -(epic #393). +:func:`get_workspace` / :func:`list_workspaces` / :func:`count_workspaces` are +routed since E4 (epic #393) by ``GET /demo/workspaces`` and +``GET /demo/workspaces/{workspace_id}`` in ``app/features/demo/routes.py``. """ from __future__ import annotations @@ -22,7 +22,7 @@ import uuid from typing import TYPE_CHECKING, Any -from sqlalchemy import select +from sqlalchemy import func, select from sqlalchemy.ext.asyncio import AsyncSession from app.core.database import get_session_maker @@ -193,3 +193,16 @@ async def list_workspaces( .offset(offset) ) return list(result.scalars().all()) + + +async def count_workspaces(db: AsyncSession) -> int: + """Count all workspace rows (E4, issue #393). + + Args: + db: An open async session (caller-owned). + + Returns: + The total number of saved workspaces. + """ + count_stmt = select(func.count()).select_from(ShowcaseWorkspace) + return int(await db.scalar(count_stmt) or 0) From 67dac81e3b97ba18271a909185b00bab2061991f Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 16:59:40 +0200 Subject: [PATCH 41/44] feat(ui): add workspace restore and replay to showcase page (#393) --- .../demo/InspectArtifactsPanel.test.tsx | 1 + .../components/demo/RunHistoryStrip.test.tsx | 19 +++ .../src/components/demo/RunHistoryStrip.tsx | 4 +- .../demo/WorkspaceArtifactsPanel.test.tsx | 92 ++++++++++ .../demo/WorkspaceArtifactsPanel.tsx | 157 ++++++++++++++++++ .../components/demo/WorkspacePanel.test.tsx | 105 ++++++++++++ .../src/components/demo/WorkspacePanel.tsx | 129 ++++++++++++++ frontend/src/components/demo/index.ts | 3 + frontend/src/hooks/index.ts | 1 + frontend/src/hooks/use-demo-pipeline.test.ts | 22 +++ frontend/src/hooks/use-demo-pipeline.ts | 3 + frontend/src/hooks/use-workspaces.ts | 25 +++ frontend/src/pages/showcase.tsx | 141 +++++++++++++++- frontend/src/types/api.ts | 36 ++++ 14 files changed, 734 insertions(+), 4 deletions(-) create mode 100644 frontend/src/components/demo/WorkspaceArtifactsPanel.test.tsx create mode 100644 frontend/src/components/demo/WorkspaceArtifactsPanel.tsx create mode 100644 frontend/src/components/demo/WorkspacePanel.test.tsx create mode 100644 frontend/src/components/demo/WorkspacePanel.tsx create mode 100644 frontend/src/hooks/use-workspaces.ts diff --git a/frontend/src/components/demo/InspectArtifactsPanel.test.tsx b/frontend/src/components/demo/InspectArtifactsPanel.test.tsx index 4a692dfb..4c0e5d57 100644 --- a/frontend/src/components/demo/InspectArtifactsPanel.test.tsx +++ b/frontend/src/components/demo/InspectArtifactsPanel.test.tsx @@ -25,6 +25,7 @@ const baseSummary: DemoSummary = { alias: 'demo-production', wallClockS: 180, v2RunId: 'v2-456', + workspaceId: null, } describe('InspectArtifactsPanel', () => { diff --git a/frontend/src/components/demo/RunHistoryStrip.test.tsx b/frontend/src/components/demo/RunHistoryStrip.test.tsx index 5f74e422..92593ef8 100644 --- a/frontend/src/components/demo/RunHistoryStrip.test.tsx +++ b/frontend/src/components/demo/RunHistoryStrip.test.tsx @@ -23,6 +23,7 @@ const summary: DemoSummary = { alias: 'demo-production', wallClockS: 174.5, v2RunId: 'v2-456', + workspaceId: null, } describe('RunHistoryStrip', () => { @@ -108,6 +109,24 @@ describe('RunHistoryStrip', () => { ) }) + it('E4 (#393) — does NOT append a kept run (workspaceId set)', () => { + const keptSummary: DemoSummary = { ...summary, workspaceId: 'ws-e4-abc' } + const { container } = render( + <RunHistoryStrip onReplay={() => {}} summary={keptSummary} scenario="showcase_rich" />, + ) + // Server-backed WorkspacePanel owns kept runs; localStorage stays empty + // (the persist effect writes the initial '[]') and the strip renders nothing. + expect(JSON.parse(window.localStorage.getItem(STORAGE_KEY) ?? '[]')).toHaveLength(0) + expect(container.firstChild).toBeNull() + }) + + it('E4 (#393) — still appends an ephemeral run (workspaceId null)', () => { + render(<RunHistoryStrip onReplay={() => {}} summary={summary} scenario="demo_minimal" />) + const stored = window.localStorage.getItem(STORAGE_KEY) + expect(stored).not.toBeNull() + expect(JSON.parse(stored!)).toHaveLength(1) + }) + it('Clear button empties history + localStorage', () => { const { container } = render( <RunHistoryStrip onReplay={() => {}} summary={summary} scenario="demo_minimal" />, diff --git a/frontend/src/components/demo/RunHistoryStrip.tsx b/frontend/src/components/demo/RunHistoryStrip.tsx index 39addd31..98629380 100644 --- a/frontend/src/components/demo/RunHistoryStrip.tsx +++ b/frontend/src/components/demo/RunHistoryStrip.tsx @@ -68,7 +68,9 @@ export function RunHistoryStrip({ onReplay, summary, scenario }: RunHistoryStrip // (the React "storing information from previous renders" pattern) rather than // in an effect — calling setState synchronously inside an effect body causes // cascading renders and is flagged by react-hooks/set-state-in-effect. - if (summary && summary !== lastSummary) { + // E4 (#393) — kept runs (workspaceId != null) are owned by the server-backed + // WorkspacePanel; localStorage records ephemeral runs only. + if (summary && summary !== lastSummary && summary.workspaceId === null) { setLastSummary(summary) setItems((prev) => [ diff --git a/frontend/src/components/demo/WorkspaceArtifactsPanel.test.tsx b/frontend/src/components/demo/WorkspaceArtifactsPanel.test.tsx new file mode 100644 index 00000000..8d1e60ce --- /dev/null +++ b/frontend/src/components/demo/WorkspaceArtifactsPanel.test.tsx @@ -0,0 +1,92 @@ +import { cleanup, render } from '@testing-library/react' +import { afterEach, describe, expect, it } from 'vitest' +import { MemoryRouter } from 'react-router-dom' +import { WorkspaceArtifactsPanel } from './WorkspaceArtifactsPanel' +import type { WorkspaceDetail } from '@/types/api' + +afterEach(() => cleanup()) + +const fullWorkspace: WorkspaceDetail = { + workspace_id: 'a'.repeat(32), + name: 'e4-artifacts', + status: 'completed', + seed: 42, + scenario: 'showcase_rich', + reset: false, + skip_seed: true, + result_summary: { winner_model_type: 'prophet_like' }, + created_at: '2026-06-01T12:00:00Z', + store_id: 3, + product_id: 7, + date_start: '2026-01-01', + date_end: '2026-03-31', + created_objects: { + winning_run_id: 'run-win', + v2_run_id: 'run-v2', + batch_id: 'batch-1', + alias: 'demo-production', + agent_session_id: 'sess-1', + scenario_plan_ids: ['sp-1', 'sp-2'], + }, +} + +function renderPanel(workspace: WorkspaceDetail) { + return render( + <MemoryRouter> + <WorkspaceArtifactsPanel workspace={workspace} /> + </MemoryRouter>, + ) +} + +describe('WorkspaceArtifactsPanel', () => { + it('renders deep links for every recorded object', () => { + const { container } = renderPanel(fullWorkspace) + const hrefs = Array.from(container.querySelectorAll('a')).map((a) => + a.getAttribute('href'), + ) + expect(hrefs).toContain('/explorer/runs/run-win') + expect(hrefs).toContain('/explorer/runs/run-v2') + expect(hrefs).toContain('/visualize/planner?scenario_id=sp-1') + expect(hrefs).toContain('/visualize/planner?scenario_id=sp-2') + expect(hrefs).toContain('/visualize/batch/batch-1') + expect(hrefs).toContain('/ops') + expect(hrefs).toContain('/visualize/forecast?store_id=3&product_id=7') + expect(hrefs).toContain('/visualize/backtest?store_id=3&product_id=7') + expect(hrefs).toContain('/chat') + expect(container.textContent).toContain('e4-artifacts') + }) + + it('renders disabled cards (no links) when objects are missing', () => { + const empty: WorkspaceDetail = { + ...fullWorkspace, + name: null, + store_id: null, + product_id: null, + created_objects: {}, + } + const { container } = renderPanel(empty) + // Nothing recorded -> no active links at all. + expect(container.querySelectorAll('a').length).toBe(0) + // Disabled cards still render their labels (with the id-slice header). + expect(container.textContent).toContain('Winning run') + expect(container.textContent).toContain('Scenario plans') + expect(container.textContent).toContain('aaaaaaaa') + }) + + it('tolerates malformed created_objects values', () => { + const malformed: WorkspaceDetail = { + ...fullWorkspace, + created_objects: { + winning_run_id: 123, // wrong type -> treated as missing + scenario_plan_ids: 'not-a-list', + }, + } + const { container } = renderPanel(malformed) + const hrefs = Array.from(container.querySelectorAll('a')).map((a) => + a.getAttribute('href'), + ) + expect(hrefs).not.toContain('/explorer/runs/123') + // Grain links still resolve from the columns. + expect(hrefs).toContain('/visualize/forecast?store_id=3&product_id=7') + }) +}) diff --git a/frontend/src/components/demo/WorkspaceArtifactsPanel.tsx b/frontend/src/components/demo/WorkspaceArtifactsPanel.tsx new file mode 100644 index 00000000..255d62fa --- /dev/null +++ b/frontend/src/components/demo/WorkspaceArtifactsPanel.tsx @@ -0,0 +1,157 @@ +/** + * E4 (#393) — re-attach deep-link card grid for a LOADED workspace. + * + * Mirrors InspectArtifactsPanel's card shape but reads the persisted + * `created_objects` soft references + grain columns from the workspace row + * instead of live step.data — the run is long gone; the row is the memory. + */ + +import { Link } from 'react-router-dom' +import { ArrowUpRight } from 'lucide-react' +import { Card, CardContent } from '@/components/ui/card' +import { ROUTES } from '@/lib/constants' +import type { WorkspaceDetail } from '@/types/api' + +interface ArtifactCard { + label: string + blurb: string + href: string | null + disabledReason?: string +} + +interface WorkspaceArtifactsPanelProps { + workspace: WorkspaceDetail +} + +function asString(value: unknown): string | null { + return typeof value === 'string' && value.length > 0 ? value : null +} + +function buildCards(ws: WorkspaceDetail): ArtifactCard[] { + const objects = ws.created_objects + const winningRunId = asString(objects.winning_run_id) + const v2RunId = asString(objects.v2_run_id) + const batchId = asString(objects.batch_id) + const alias = asString(objects.alias) + const sessionId = asString(objects.agent_session_id) + const planIds = Array.isArray(objects.scenario_plan_ids) + ? objects.scenario_plan_ids.filter((id): id is string => typeof id === 'string') + : [] + const hasGrain = ws.store_id !== null && ws.product_id !== null + + const cards: ArtifactCard[] = [] + + cards.push({ + label: 'Winning run', + blurb: 'Registry detail for the run this workspace promoted.', + href: winningRunId ? `${ROUTES.EXPLORER.RUNS}/${winningRunId}` : null, + disabledReason: 'The run never registered a winner.', + }) + cards.push({ + label: 'V2 feature-frame run', + blurb: 'The prophet_like V2 run with feature groups + safety classes.', + href: v2RunId ? `${ROUTES.EXPLORER.RUNS}/${v2RunId}` : null, + disabledReason: 'No V2 run recorded (demo_minimal flow or v2_train skipped).', + }) + planIds.forEach((planId, index) => { + cards.push({ + label: `Scenario plan ${index + 1}`, + blurb: 'Saved what-if plan from the planning phase.', + href: `${ROUTES.VISUALIZE.PLANNER}?scenario_id=${planId}`, + }) + }) + if (planIds.length === 0) { + cards.push({ + label: 'Scenario plans', + blurb: 'Saved what-if plans from the planning phase.', + href: null, + disabledReason: 'No plans recorded (planning phase skipped or failed).', + }) + } + cards.push({ + label: 'Portfolio batch', + blurb: 'Run-by-run results for the batch preset sweep.', + href: batchId ? `${ROUTES.VISUALIZE.BATCH}/${batchId}` : null, + disabledReason: 'No batch recorded (demo_minimal flow or batch skipped).', + }) + cards.push({ + label: 'Deployment alias', + blurb: alias ? `Ops view of the ${alias} alias.` : 'Ops view of aliases.', + href: alias ? ROUTES.OPS : null, + disabledReason: 'No alias recorded.', + }) + cards.push({ + label: 'Forecast on grain', + blurb: 'Visualize the trained model on the recorded showcase grain.', + href: hasGrain + ? `${ROUTES.VISUALIZE.FORECAST}?store_id=${ws.store_id}&product_id=${ws.product_id}` + : null, + disabledReason: 'The run failed before a grain was discovered.', + }) + cards.push({ + label: 'Backtest on grain', + blurb: 'Horizon-bucket metrics on the recorded showcase grain.', + href: hasGrain + ? `${ROUTES.VISUALIZE.BACKTEST}?store_id=${ws.store_id}&product_id=${ws.product_id}` + : null, + disabledReason: 'The run failed before a grain was discovered.', + }) + cards.push({ + label: 'Agent session', + blurb: 'The chat surface — the recorded session has likely expired.', + href: sessionId ? ROUTES.CHAT : null, + disabledReason: 'No agent session recorded (no LLM key or step skipped).', + }) + + return cards +} + +export function WorkspaceArtifactsPanel({ workspace }: WorkspaceArtifactsPanelProps) { + const cards = buildCards(workspace) + return ( + <Card> + <CardContent className="space-y-3 p-4"> + <h2 className="text-lg font-semibold"> + Workspace artifacts + <span className="ml-2 font-mono text-sm text-muted-foreground"> + {workspace.name ?? workspace.workspace_id.slice(0, 8)} + </span> + </h2> + <p className="text-sm text-muted-foreground"> + Everything this kept run created, re-attached from its workspace row. + Cards greyed out when the run did not record the matching object. + </p> + <div className="grid grid-cols-2 gap-3 lg:grid-cols-4"> + {cards.map((card) => { + const isActive = typeof card.href === 'string' && card.href.length > 0 + return ( + <div + key={card.label} + className={isActive ? '' : 'opacity-50'} + title={isActive ? undefined : card.disabledReason} + > + {isActive ? ( + <Link + to={card.href!} + className="block h-full rounded-md border p-3 transition-colors hover:bg-muted" + > + <div className="flex items-center justify-between gap-1"> + <span className="text-sm font-semibold">{card.label}</span> + <ArrowUpRight className="h-3 w-3 shrink-0" /> + </div> + <p className="mt-1 text-xs text-muted-foreground">{card.blurb}</p> + </Link> + ) : ( + <div className="block h-full cursor-not-allowed rounded-md border p-3"> + <div className="text-sm font-semibold">{card.label}</div> + <p className="mt-1 text-xs text-muted-foreground">{card.blurb}</p> + </div> + )} + </div> + ) + })} + </div> + </CardContent> + </Card> + ) +} diff --git a/frontend/src/components/demo/WorkspacePanel.test.tsx b/frontend/src/components/demo/WorkspacePanel.test.tsx new file mode 100644 index 00000000..2d08aa40 --- /dev/null +++ b/frontend/src/components/demo/WorkspacePanel.test.tsx @@ -0,0 +1,105 @@ +import { QueryClient, QueryClientProvider } from '@tanstack/react-query' +import { cleanup, fireEvent, render } from '@testing-library/react' +import { afterEach, describe, expect, it, vi } from 'vitest' +import { WorkspacePanel } from './WorkspacePanel' +import type { WorkspaceListItem, WorkspaceListResponse } from '@/types/api' + +afterEach(() => { + cleanup() + vi.clearAllMocks() +}) + +const baseItem: WorkspaceListItem = { + workspace_id: 'a'.repeat(32), + name: 'e4-panel', + status: 'completed', + seed: 7, + scenario: 'demo_minimal', + reset: false, + skip_seed: true, + result_summary: { winner_model_type: 'seasonal_naive' }, + created_at: '2026-06-01T12:00:00Z', +} + +let mockResponse: { data: WorkspaceListResponse | undefined; isLoading: boolean } = { + data: undefined, + isLoading: false, +} + +vi.mock('@/hooks/use-workspaces', () => ({ + useWorkspaces: () => mockResponse, +})) + +function renderPanel(props: Partial<Parameters<typeof WorkspacePanel>[0]> = {}) { + const queryClient = new QueryClient({ defaultOptions: { queries: { retry: false } } }) + return render( + <QueryClientProvider client={queryClient}> + <WorkspacePanel + onLoad={() => {}} + onReplay={() => {}} + isRunning={false} + lastWorkspaceId={null} + {...props} + /> + </QueryClientProvider>, + ) +} + +describe('WorkspacePanel', () => { + it('renders the discoverable empty state (panel never hidden)', () => { + mockResponse = { data: { workspaces: [], total: 0 }, isLoading: false } + const { container } = renderPanel() + expect(container.textContent).toContain('Saved workspaces') + expect(container.textContent).toContain('No saved workspaces yet') + }) + + it('renders a workspace row with name, scenario, seed, status, and winner', () => { + mockResponse = { data: { workspaces: [baseItem], total: 1 }, isLoading: false } + const { container } = renderPanel() + expect(container.textContent).toContain('e4-panel') + expect(container.textContent).toContain('demo_minimal') + expect(container.textContent).toContain('seed 7') + expect(container.textContent).toContain('COMPLETED') + expect(container.textContent).toContain('winner seasonal_naive') + // No destructive badge on a reset=false row. + expect(container.textContent).not.toContain('DESTRUCTIVE') + }) + + it('shows the destructive badge on reset=true rows', () => { + mockResponse = { + data: { workspaces: [{ ...baseItem, reset: true }], total: 1 }, + isLoading: false, + } + const { container } = renderPanel() + expect(container.textContent).toContain('DESTRUCTIVE') + }) + + it('falls back to the workspace_id slice when the row is unnamed', () => { + mockResponse = { + data: { workspaces: [{ ...baseItem, name: null }], total: 1 }, + isLoading: false, + } + const { container } = renderPanel() + expect(container.textContent).toContain('aaaaaaaa') + }) + + it('invokes onLoad / onReplay with the list item', () => { + mockResponse = { data: { workspaces: [baseItem], total: 1 }, isLoading: false } + const onLoad = vi.fn() + const onReplay = vi.fn() + const { container } = renderPanel({ onLoad, onReplay }) + const buttons = Array.from(container.querySelectorAll('button')) + fireEvent.click(buttons.find((b) => (b.textContent ?? '').includes('Load'))!) + expect(onLoad).toHaveBeenCalledWith(baseItem) + fireEvent.click(buttons.find((b) => (b.textContent ?? '').includes('Replay'))!) + expect(onReplay).toHaveBeenCalledWith(baseItem) + }) + + it('disables both actions while a run is in flight', () => { + mockResponse = { data: { workspaces: [baseItem], total: 1 }, isLoading: false } + const { container } = renderPanel({ isRunning: true }) + const buttons = Array.from(container.querySelectorAll('button')) + expect(buttons.length).toBeGreaterThanOrEqual(2) + expect(buttons.every((b) => b.disabled)).toBe(true) + }) +}) diff --git a/frontend/src/components/demo/WorkspacePanel.tsx b/frontend/src/components/demo/WorkspacePanel.tsx new file mode 100644 index 00000000..6638b597 --- /dev/null +++ b/frontend/src/components/demo/WorkspacePanel.tsx @@ -0,0 +1,129 @@ +/** + * E4 (#393) — server-backed saved-workspaces panel for the Showcase page. + * + * Lists `showcase_workspace` rows (newest first) with two actions per row: + * - Load — re-attach: the page repopulates the run controls + renders the + * artifact deep-link cards. Read-only; no run starts. + * - Replay — re-run: the page re-submits the recorded config verbatim through + * the existing WS run path with preservation="keep". + * + * The panel stays dumb: it hands the LIST item to the page callbacks; detail + * fetching (created_objects) lives in the page via useWorkspace. + */ + +import { useEffect } from 'react' +import { useQueryClient } from '@tanstack/react-query' +import { FolderOpen, Play } from 'lucide-react' +import { Button } from '@/components/ui/button' +import { Card, CardContent } from '@/components/ui/card' +import { useWorkspaces } from '@/hooks/use-workspaces' +import type { WorkspaceListItem } from '@/types/api' + +interface WorkspacePanelProps { + /** Called when the operator clicks Load — restore config + artifacts, no run. */ + onLoad: (ws: WorkspaceListItem) => void + /** Called when the operator clicks Replay — re-run the recorded config. */ + onReplay: (ws: WorkspaceListItem) => void + /** Disables both actions while a pipeline run is in flight. */ + isRunning: boolean + /** summary.workspaceId of the latest kept run — triggers a list refetch. */ + lastWorkspaceId: string | null +} + +function statusClass(status: WorkspaceListItem['status']): string { + switch (status) { + case 'completed': + return 'text-success font-semibold' + case 'failed': + return 'text-destructive font-semibold' + default: + return 'text-muted-foreground font-semibold' + } +} + +function winnerOf(ws: WorkspaceListItem): string | null { + const winner = ws.result_summary?.winner_model_type + return typeof winner === 'string' ? winner : null +} + +export function WorkspacePanel({ onLoad, onReplay, isRunning, lastWorkspaceId }: WorkspacePanelProps) { + const { data, isLoading } = useWorkspaces() + const queryClient = useQueryClient() + + // Refetch the list once the latest kept run settles — syncing React state to + // an external system (the server-backed list) is the sanctioned effect use. + useEffect(() => { + if (lastWorkspaceId) { + void queryClient.invalidateQueries({ queryKey: ['workspaces'] }) + } + }, [lastWorkspaceId, queryClient]) + + const items = data?.workspaces ?? [] + + return ( + <Card> + <CardContent className="space-y-3 p-4"> + <div className="flex items-center justify-between"> + <h2 className="text-sm font-semibold">Saved workspaces</h2> + {data && data.total > items.length && ( + <span className="text-xs text-muted-foreground"> + showing {items.length} of {data.total} + </span> + )} + </div> + {items.length === 0 ? ( + <p className="text-sm text-muted-foreground"> + {isLoading + ? 'Loading workspaces…' + : 'No saved workspaces yet — tick "Save as workspace" before a run to keep it.'} + </p> + ) : ( + <ul className="space-y-2"> + {items.map((ws) => ( + <li + key={ws.workspace_id} + className="flex flex-wrap items-center justify-between gap-2 rounded-md border px-3 py-2 text-xs" + > + <div className="flex flex-wrap items-center gap-3 font-mono"> + <span className="font-semibold">{ws.name ?? ws.workspace_id.slice(0, 8)}</span> + <span className="rounded bg-muted px-2 py-0.5">{ws.scenario}</span> + <span>seed {ws.seed}</span> + <span className={statusClass(ws.status)}>{ws.status.toUpperCase()}</span> + {winnerOf(ws) && <span>winner {winnerOf(ws)}</span>} + {ws.reset && ( + <span className="text-destructive"> + DESTRUCTIVE (replay wipes all data) + </span> + )} + <span className="text-muted-foreground"> + {new Date(ws.created_at).toLocaleString()} + </span> + </div> + <div className="flex items-center gap-2"> + <Button + size="sm" + variant="outline" + disabled={isRunning} + onClick={() => onLoad(ws)} + > + <FolderOpen className="mr-1 h-3 w-3" /> + Load + </Button> + <Button + size="sm" + variant="outline" + disabled={isRunning} + onClick={() => onReplay(ws)} + > + <Play className="mr-1 h-3 w-3" /> + Replay + </Button> + </div> + </li> + ))} + </ul> + )} + </CardContent> + </Card> + ) +} diff --git a/frontend/src/components/demo/index.ts b/frontend/src/components/demo/index.ts index 48f34346..ccfe7b71 100644 --- a/frontend/src/components/demo/index.ts +++ b/frontend/src/components/demo/index.ts @@ -1 +1,4 @@ export * from './demo-step-card' +// E4 (#393) — showcase workspace restore/replay panels. +export * from './WorkspacePanel' +export * from './WorkspaceArtifactsPanel' diff --git a/frontend/src/hooks/index.ts b/frontend/src/hooks/index.ts index eebde40d..fb3e6aa7 100644 --- a/frontend/src/hooks/index.ts +++ b/frontend/src/hooks/index.ts @@ -14,3 +14,4 @@ export * from './use-rag-sources' export * from './use-websocket' export * from './use-seeder' export * from './use-demo-pipeline' +export * from './use-workspaces' diff --git a/frontend/src/hooks/use-demo-pipeline.test.ts b/frontend/src/hooks/use-demo-pipeline.test.ts index 135e463e..f06dd5d5 100644 --- a/frontend/src/hooks/use-demo-pipeline.test.ts +++ b/frontend/src/hooks/use-demo-pipeline.test.ts @@ -107,9 +107,31 @@ describe('applyEvent', () => { alias: 'demo-production', wallClockS: 42, v2RunId: null, + workspaceId: null, }) }) + it('E4 (#393) — captures workspace_id from pipeline_complete data', () => { + const next = applyEvent( + initialState(), + makeEvent({ + event_type: 'pipeline_complete', + step_name: 'summary', + status: 'pass', + data: { workspace_id: 'ws-e4-abc' }, + }) + ) + expect(next.summary?.workspaceId).toBe('ws-e4-abc') + }) + + it('E4 (#393) — legacy pipeline_complete without workspace_id yields null', () => { + const next = applyEvent( + initialState(), + makeEvent({ event_type: 'pipeline_complete', step_name: 'summary', status: 'pass', data: {} }) + ) + expect(next.summary?.workspaceId).toBeNull() + }) + it('reports a failed pipeline_complete as fail', () => { const next = applyEvent( initialState(), diff --git a/frontend/src/hooks/use-demo-pipeline.ts b/frontend/src/hooks/use-demo-pipeline.ts index 328bcd53..f29db921 100644 --- a/frontend/src/hooks/use-demo-pipeline.ts +++ b/frontend/src/hooks/use-demo-pipeline.ts @@ -30,6 +30,8 @@ export interface DemoSummary { wallClockS: number /** PRP-41 — populated when the SHOWCASE_RICH v2_train step registered a run. */ v2RunId: string | null + /** E4 (#393) — populated when the run was kept as a showcase workspace. */ + workspaceId: string | null } export interface DemoPipelineState { @@ -123,6 +125,7 @@ export function applyEvent(state: DemoPipelineState, event: StepEvent): DemoPipe alias: toStringOrNull(event.data.alias), wallClockS: toNumber(event.data.wall_clock_s) ?? 0, v2RunId: toStringOrNull(event.data.v2_run_id), + workspaceId: toStringOrNull(event.data.workspace_id), } return { ...state, phase: 'done', summary } } diff --git a/frontend/src/hooks/use-workspaces.ts b/frontend/src/hooks/use-workspaces.ts new file mode 100644 index 00000000..8fc02054 --- /dev/null +++ b/frontend/src/hooks/use-workspaces.ts @@ -0,0 +1,25 @@ +import { useQuery } from '@tanstack/react-query' +import { api } from '@/lib/api' +import type { WorkspaceDetail, WorkspaceListResponse } from '@/types/api' + +/** + * E4 (#393) — list saved showcase workspaces, newest first. Server-backed + * source of truth for `preservation="keep"` runs (the localStorage + * RunHistoryStrip stays ephemeral-only). + */ +export function useWorkspaces(limit = 20, enabled = true) { + return useQuery({ + queryKey: ['workspaces', { limit }], + queryFn: () => api<WorkspaceListResponse>('/demo/workspaces', { params: { limit } }), + enabled, + }) +} + +/** E4 (#393) — fetch one workspace, including its created-object soft references. */ +export function useWorkspace(workspaceId: string, enabled = true) { + return useQuery({ + queryKey: ['workspaces', workspaceId], + queryFn: () => api<WorkspaceDetail>(`/demo/workspaces/${workspaceId}`), + enabled: enabled && !!workspaceId, + }) +} diff --git a/frontend/src/pages/showcase.tsx b/frontend/src/pages/showcase.tsx index 6f6e38ed..61d5b947 100644 --- a/frontend/src/pages/showcase.tsx +++ b/frontend/src/pages/showcase.tsx @@ -3,19 +3,28 @@ import { Play, Loader2, Trophy, AlertTriangle, ArrowRight, Square } from 'lucide import { useState } from 'react' import { useDemoPipeline } from '@/hooks/use-demo-pipeline' import type { DemoStep } from '@/hooks/use-demo-pipeline' +import { useWorkspace } from '@/hooks/use-workspaces' import { DemoPhasePanel } from '@/components/demo/DemoPhasePanel' import { ScenarioPicker } from '@/components/demo/ScenarioPicker' import { ShowcaseKpiStrip } from '@/components/demo/ShowcaseKpiStrip' import { InspectArtifactsPanel } from '@/components/demo/InspectArtifactsPanel' import { RunHistoryStrip } from '@/components/demo/RunHistoryStrip' +import { WorkspacePanel } from '@/components/demo/WorkspacePanel' +import { WorkspaceArtifactsPanel } from '@/components/demo/WorkspaceArtifactsPanel' import { Button } from '@/components/ui/button' import { Card, CardContent, CardDescription, CardHeader, CardTitle } from '@/components/ui/card' import { Checkbox } from '@/components/ui/checkbox' +import { Input } from '@/components/ui/input' import { ROUTES } from '@/lib/constants' import { cn } from '@/lib/utils' +import type { WorkspaceListItem } from '@/types/api' const TERMINAL_STATUSES = new Set(['pass', 'fail', 'skip', 'warn']) +// E4 (#393) — mirrors the backend DemoRunRequest.workspace_name pattern +// (schemas.py): lowercase letters/digits, then -/_ allowed; ≤100 chars. +const WORKSPACE_NAME_PATTERN = /^[a-z0-9][a-z0-9\-_]*$/ + /** * PRP-38 / PRP-39 / PRP-40 — resolve the per-step Inspect deep link. * @@ -108,11 +117,72 @@ export default function ShowcasePage() { } = useDemoPipeline() const [reseed, setReseed] = useState(false) const [resetDb, setResetDb] = useState(false) + // E4 (#393) — workspace controls + restore state. + const [seed, setSeed] = useState(42) + const [keepWorkspace, setKeepWorkspace] = useState(false) + const [workspaceName, setWorkspaceName] = useState('') + const [selectedWorkspaceId, setSelectedWorkspaceId] = useState<string | null>(null) + + // The page (not the panel) resolves the loaded workspace's detail — the + // artifacts panel needs detail-only created_objects. + const { data: loadedWorkspace } = useWorkspace( + selectedWorkspaceId ?? '', + !!selectedWorkspaceId + ) const completed = steps.filter((s) => TERMINAL_STATUSES.has(s.status)).length + const trimmedName = workspaceName.trim() + const nameInvalid = + keepWorkspace && trimmedName !== '' && !WORKSPACE_NAME_PATTERN.test(trimmedName) + const handleRun = () => { - start({ seed: 42, skip_seed: !reseed, reset: resetDb, scenario }) + // Starting a run detaches any loaded workspace — live cards take over. + setSelectedWorkspaceId(null) + start({ + seed, + skip_seed: !reseed, + reset: resetDb, + scenario, + // Omit the preservation fields entirely on ephemeral runs (legacy + // byte-compat); omit workspace_name when the input is empty. + ...(keepWorkspace + ? { + preservation: 'keep' as const, + ...(trimmedName ? { workspace_name: trimmedName } : {}), + } + : {}), + }) + } + + // E4 (#393) — Load: recorded config repopulates the controls; the detail + // query then renders the artifacts panel. Read-only — no run starts. + const handleLoadWorkspace = (ws: WorkspaceListItem) => { + setScenario(ws.scenario) + setSeed(ws.seed) + setReseed(!ws.skip_seed) + setResetDb(ws.reset) + setKeepWorkspace(true) + setWorkspaceName(ws.name ?? '') + setSelectedWorkspaceId(ws.workspace_id) + } + + // E4 (#393) — Replay: Load, then re-submit the recorded config VERBATIM + // through the existing WS run path with preservation='keep' (a replay is + // itself a workspace run). setScenario runs first (picker-desync gotcha: + // start() does not sync the picker state). + const handleReplayWorkspace = (ws: WorkspaceListItem) => { + handleLoadWorkspace(ws) + // The re-run's live cards take over; the original row stays untouched. + setSelectedWorkspaceId(null) + start({ + seed: ws.seed, + scenario: ws.scenario, + reset: ws.reset, + skip_seed: ws.skip_seed, + preservation: 'keep', + ...(ws.name ? { workspace_name: ws.name } : {}), + }) } // For the Inspect link to surface store_id/product_id on the train/backtest @@ -164,13 +234,21 @@ export default function ShowcasePage() { {/* PRP-41 — KPI strip at the top, hidden until at least one step completes. */} <ShowcaseKpiStrip steps={steps} /> - {/* PRP-41 — Replayable run history (localStorage FIFO 5). */} + {/* PRP-41 — Replayable run history (localStorage FIFO 5; ephemeral runs only). */} <RunHistoryStrip onReplay={(req) => start(req)} summary={phase === 'done' ? summary : null} scenario={scenario} /> + {/* E4 (#393) — server-backed saved workspaces (Load + Replay). */} + <WorkspacePanel + onLoad={handleLoadWorkspace} + onReplay={handleReplayWorkspace} + isRunning={isRunning} + lastWorkspaceId={summary?.workspaceId ?? null} + /> + {/* Controls */} <Card> <CardHeader> @@ -186,7 +264,7 @@ export default function ShowcasePage() { <CardContent className="space-y-4"> <div className="flex flex-wrap items-end gap-6"> <ScenarioPicker value={scenario} onChange={setScenario} disabled={isRunning} /> - <Button onClick={handleRun} disabled={isRunning} size="lg"> + <Button onClick={handleRun} disabled={isRunning || nameInvalid} size="lg"> {isRunning ? ( <Loader2 className="mr-2 h-4 w-4 animate-spin" /> ) : ( @@ -226,6 +304,57 @@ export default function ShowcasePage() { <span className="ml-1 text-destructive">(destructive — wipes all data)</span> </span> </label> + + {/* E4 (#393) — controllable seed (restore is meaningless without it). */} + <label className="flex items-center gap-2 text-sm"> + <span>Seed</span> + <Input + type="number" + min={0} + className="h-9 w-24" + value={seed} + onChange={(e) => { + const next = Number.parseInt(e.target.value, 10) + setSeed(Number.isNaN(next) || next < 0 ? 0 : next) + }} + disabled={isRunning} + /> + </label> + + {/* E4 (#393) — preservation controls. */} + <label className="flex items-center gap-2 text-sm"> + <Checkbox + checked={keepWorkspace} + onCheckedChange={(v) => setKeepWorkspace(v === true)} + disabled={isRunning} + /> + <span> + Save as workspace + <span className="ml-1 text-muted-foreground">(keeps this run restorable)</span> + </span> + </label> + + {keepWorkspace && ( + <div className="flex flex-col gap-1 text-sm"> + <label className="flex items-center gap-2"> + <span>Name</span> + <Input + className="h-9 w-48" + placeholder="optional, e.g. black-friday" + value={workspaceName} + onChange={(e) => setWorkspaceName(e.target.value)} + disabled={isRunning} + maxLength={100} + aria-invalid={nameInvalid} + /> + </label> + {nameInvalid && ( + <p className="text-xs text-destructive"> + Lowercase letters/digits only, then “-” or “_” (must not start with either). + </p> + )} + </div> + )} </div> {phase === 'running' && ( @@ -308,6 +437,12 @@ export default function ShowcasePage() { {phase === 'done' && summary && ( <InspectArtifactsPanel steps={steps} summary={summary} /> )} + + {/* E4 (#393) — re-attached artifacts of a LOADED workspace. Any started + run detaches it (selectedWorkspaceId cleared) so live cards take over. */} + {phase !== 'running' && loadedWorkspace && ( + <WorkspaceArtifactsPanel workspace={loadedWorkspace} /> + )} </div> ) } diff --git a/frontend/src/types/api.ts b/frontend/src/types/api.ts index 54ff956a..93de98cc 100644 --- a/frontend/src/types/api.ts +++ b/frontend/src/types/api.ts @@ -781,6 +781,10 @@ export interface DemoRunRequest { skip_seed?: boolean // PRP-38 — optional scenario picker; default is 'demo_minimal' (back-compat). scenario?: ScenarioPreset + // E4 (#393) — preservation policy (E1 backend fields, first UI exposure). + // Omit both to keep the legacy ephemeral behavior byte-identical. + preservation?: 'ephemeral' | 'keep' + workspace_name?: string } // Aggregate result returned by the synchronous POST /demo/run. @@ -792,6 +796,38 @@ export interface DemoRunResult { winning_run_id: string | null alias: string | null wall_clock_s: number + // E4 (#393) — non-null on preservation='keep' runs. + workspace_id: string | null +} + +// === Showcase Workspaces (E4, #393) === + +// A compact row from GET /demo/workspaces. +export interface WorkspaceListItem { + workspace_id: string + name: string | null + status: 'running' | 'completed' | 'failed' + seed: number + scenario: ScenarioPreset + reset: boolean + skip_seed: boolean + result_summary: Record<string, unknown> | null + created_at: string +} + +// Full row from GET /demo/workspaces/{workspace_id}. +export interface WorkspaceDetail extends WorkspaceListItem { + store_id: number | null + product_id: number | null + date_start: string | null + date_end: string | null + created_objects: Record<string, unknown> +} + +// Page shape of GET /demo/workspaces. +export interface WorkspaceListResponse { + workspaces: WorkspaceListItem[] + total: number } // === AI Model Configuration (/config) === From 41a3cd1a3891d7b1b1ed8e5c027291ce71e55f5c Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 16:59:40 +0200 Subject: [PATCH 42/44] test(api): add demo replay same-config regression test (#393) --- tests/test_e2e_demo.py | 83 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+) diff --git a/tests/test_e2e_demo.py b/tests/test_e2e_demo.py index ac3a5278..5ef406ff 100644 --- a/tests/test_e2e_demo.py +++ b/tests/test_e2e_demo.py @@ -531,6 +531,89 @@ def test_run_demo_showcase_rich_full_epic( ) +# E4 (#393) — wall-clock budget per demo_minimal replay run (11 steps, the +# fastest preset; reset+seed dominates). Two sequential runs share one test. +REPLAY_RUN_TIMEOUT_S: float = 240.0 + + +def _post_demo_run(body_dict: dict[str, object], timeout_s: float) -> dict[str, object]: + """POST /demo/run with a JSON body; return the parsed DemoRunResult.""" + import json + + body = json.dumps(body_dict).encode("utf-8") + req = urllib.request.Request( # noqa: S310 — http://127.0.0.1 internal URL + f"{DEMO_API_URL}/demo/run", + data=body, + headers={"Content-Type": "application/json"}, + method="POST", + ) + try: + with urllib.request.urlopen(req, timeout=timeout_s) as resp: # noqa: S310 + payload = resp.read() + assert resp.status == 200, f"POST /demo/run -> {resp.status}" + except urllib.error.HTTPError as exc: + raise AssertionError(f"POST /demo/run failed: HTTP {exc.code} body={exc.read()!r}") from exc + result: dict[str, object] = json.loads(payload) + return result + + +@pytest.mark.integration +def test_demo_replay_same_config_twice( + uvicorn_subprocess: subprocess.Popen[bytes], +) -> None: + """E4 (#393) — replaying the IDENTICAL config stays green (no 409/500). + + The umbrella #389 replay-regression guard: the same ``preservation="keep"`` + body runs twice sequentially — the harshest path (re-seed + re-register + over the first run's accumulated model_run rows) must survive the #146 + (`_find_duplicate` multi-match 500) and #324 (safer-promote alias + corruption) fixes. Asserts both runs pass with DISTINCT workspace ids and + that ``GET /demo/workspaces`` lists both rows as completed. + """ + import json + + body_dict: dict[str, object] = { + "seed": 42, + "reset": True, + "skip_seed": False, + "scenario": "demo_minimal", + "preservation": "keep", + "workspace_name": "replay-regression", + } + + first = _post_demo_run(body_dict, REPLAY_RUN_TIMEOUT_S) + assert first["overall_status"] == "pass", ( + f"first run did not pass: " + f"steps={[(s['step_name'], s['status'], s['detail']) for s in first['steps']]}" # type: ignore[index] + ) + assert first["workspace_id"], "first keep-run surfaced no workspace_id" + + # Replay: the IDENTICAL body (verbatim semantics — incl. reset=true). + second = _post_demo_run(body_dict, REPLAY_RUN_TIMEOUT_S) + assert second["overall_status"] == "pass", ( + f"replay did not pass (replay regression — #146/#324 guard): " + f"steps={[(s['step_name'], s['status'], s['detail']) for s in second['steps']]}" # type: ignore[index] + ) + assert second["workspace_id"], "replay keep-run surfaced no workspace_id" + assert first["workspace_id"] != second["workspace_id"], ( + "replay must create a NEW workspace row, not reuse the original" + ) + + # Both rows are listed (newest first) and settled to completed. + with urllib.request.urlopen( # noqa: S310 — http://127.0.0.1 internal URL + f"{DEMO_API_URL}/demo/workspaces?limit=100", timeout=10.0 + ) as resp: + assert resp.status == 200 + page = json.loads(resp.read()) + replay_rows = [w for w in page["workspaces"] if w["name"] == "replay-regression"] + assert len(replay_rows) >= 2, f"expected >=2 replay-regression rows, got {len(replay_rows)}" + listed_ids = {w["workspace_id"] for w in replay_rows} + assert {first["workspace_id"], second["workspace_id"]} <= listed_ids + for row in replay_rows: + if row["workspace_id"] in {first["workspace_id"], second["workspace_id"]}: + assert row["status"] == "completed" + + @pytest.mark.integration def test_run_demo_precondition_failure_exits_2() -> None: """A bogus API URL surfaces as a precondition failure with exit 2. From ee844f120ff8f060d944453a0febb1d837ed2c27 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 16:59:40 +0200 Subject: [PATCH 43/44] docs(api): document workspace restore endpoints (#393) --- docs/_base/API_CONTRACTS.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/_base/API_CONTRACTS.md b/docs/_base/API_CONTRACTS.md index 2ce5dc8e..7947f13a 100644 --- a/docs/_base/API_CONTRACTS.md +++ b/docs/_base/API_CONTRACTS.md @@ -60,6 +60,8 @@ All endpoints serve JSON; error responses use `application/problem+json` (RFC 78 | seeder | POST | `/seeder/phase2-enrichment` | PRP-38 — run Phase 2 generators (lifecycle, replenishment, exogenous, returns) against the existing seeded data. `422 application/problem+json` on an empty database. | | demo | POST | `/demo/run` | Run the end-to-end demo pipeline in-process; returns a `DemoRunResult`. `409 application/problem+json` if a run is already active. **PRP-38** — body accepts an Optional `scenario: 'demo_minimal' \| 'showcase_rich' \| 'sparse'` field; default `'demo_minimal'` (back-compat). **E1 (#390)** — body accepts additive Optional `preservation: 'ephemeral' \| 'keep'` (default `'ephemeral'`, today's no-row behavior) and `workspace_name: str \| null` (pattern `^[a-z0-9][a-z0-9\-_]*$`, ≤100 chars); `workspace_name` without `preservation='keep'` → `422 application/problem+json`. `preservation='keep'` records the run as a `showcase_workspace` row; `DemoRunResult` gains an additive Optional `workspace_id: str \| null`. **E2 (#391)** — `scenario` accepts all 8 `ScenarioPreset` values (`retail_standard` / `holiday_rush` / `high_variance` / `stockout_heavy` / `new_launches` / `sparse` / `demo_minimal` / `showcase_rich`); only `showcase_rich` changes the step table (24 rows), every other preset runs the legacy 11-row flow. | | demo | WS | `/demo/stream` | Stream one `StepEvent` per pipeline step for the live Showcase page | +| demo | GET | `/demo/workspaces` | **E4 (#393)** — list saved showcase workspaces, newest first (`limit` 1-100 default 20 / `offset`); `200` + empty list on an empty table | +| demo | GET | `/demo/workspaces/{workspace_id}` | **E4 (#393)** — full workspace row incl. `created_objects` soft references + grain/window columns; `404 application/problem+json` when missing | | config | GET | `/config/ai` | Effective AI-model config (agent LLM + RAG embeddings); API keys masked, never raw | | config | PATCH | `/config/ai` | Persist + apply AI-model changes live (no restart). `409` if an embedding-dimension change would orphan indexed RAG chunks (resend with `force=true`) | | config | GET | `/config/providers/health` | Per-provider connectivity — Ollama probed live, cloud providers by API-key presence | @@ -94,6 +96,7 @@ Drives the end-to-end demo pipeline for the dashboard Showcase page. Verified ag - PRP-38 — `scenario="showcase_rich"` extends the data phase with `phase2_enrichment` + `historical_backfill` steps and the modeling phase with `v2_train` (one V2 `prophet_like` run). Phase ids are `data` / `modeling` / `decision` / `verify` / `agent` / `cleanup` (6 phases). - PRP-40 — `scenario="showcase_rich"` ALSO adds two phases inserted BEFORE `verify`: `planning` (2 steps — `scenario_simulate_and_save`, `multi_plan_compare`) and `knowledge` (3 steps — `embedding_provider_probe`, `rag_index_subset`, `rag_retrieve_probe`). Total step count: 19 for `showcase_rich`, 11 for `demo_minimal` and `sparse`. Phase ids on `showcase_rich` are `data` / `modeling` / `decision` / `planning` / `knowledge` / `verify` / `agent` / `cleanup` (8 phases). The knowledge steps SKIP gracefully when the embedding provider is unreachable; the pipeline still goes green. - E3 (#392) — the planning-phase steps tag the plans they save: pipeline-saved plans now carry `source:showcase` (alongside the legacy `showcase` + `price`/`holiday` tags), and on `preservation="keep"` runs additionally `workspace:<workspace_name|workspace_id>` — retrievable via `GET /scenarios?tags=workspace:<label>` (JSONB containment, all listed tags must match). The `scenario_simulate_and_save` step's `data` additively echoes the `tags` list it sent. +- E4 (#393) — the start frame's E1 preservation fields are now exercised by the Showcase UI ("Save as workspace" checkbox + name + seed inputs). **Replay** re-submits a recorded workspace's config verbatim (`seed`/`scenario`/`reset`/`skip_seed`) with `preservation="keep"` (+ the recorded `workspace_name`), creating a NEW `showcase_workspace` row each time — the original row is never mutated; names are non-unique by design. Saved rows are read back over `GET /demo/workspaces` (+ `/{workspace_id}`). ## Async Events / Queues From b1c859319f6f7fe099929f3eab27a445a9932133 Mon Sep 17 00:00:00 2001 From: Gabor Szabo <shellsnake@icloud.com> Date: Fri, 12 Jun 2026 16:59:40 +0200 Subject: [PATCH 44/44] docs(repo): track showcase workspace e4 prp (#393) --- ...RP-showcase-workspace-E4-restore-replay.md | 778 ++++++++++++++++++ 1 file changed, 778 insertions(+) create mode 100644 PRPs/PRP-showcase-workspace-E4-restore-replay.md diff --git a/PRPs/PRP-showcase-workspace-E4-restore-replay.md b/PRPs/PRP-showcase-workspace-E4-restore-replay.md new file mode 100644 index 00000000..164428d3 --- /dev/null +++ b/PRPs/PRP-showcase-workspace-E4-restore-replay.md @@ -0,0 +1,778 @@ +name: "PRP — Showcase Workspace E4: Restore/Replay (issue #393)" +description: | + +## Purpose + +Implement the restore/replay epic of the showcase-workspace initiative (umbrella +#389): the E1 (#390) `showcase_workspace` rows become listable and loadable over +two new GET endpoints, the Showcase UI gains workspace controls (keep + name), +a server-backed workspace panel with **Load** (re-attach) and **Replay** (re-run) +actions, deep links to every object a kept run created, and a backend regression +test proving a same-config replay completes without 409/500 (blockers #146/#324 +stay fixed). Parallel epic after Foundation E1; independent of E2 (#391, merged) +and E3 (#392, not started). + +## Core Principles + +1. **Context is King**: every reference below was verified against the live code on 2026-06-12 (branch `dev` @ 3194fe8, post-E1/E2 merge). +2. **Validation Loops**: each level is executable as written. +3. **Information Dense**: patterns cite exact file:line. +4. **Progressive Success**: backend read endpoints → frontend types/hooks → controls UI → workspace panel → replay test. +5. **Global rules**: follow CLAUDE.md / AGENTS.md; all five CI gates must pass; UI work follows `.claude/rules/ui-design.md` + `shadcn-ui.md`. + +--- + +## Goal + +An operator on `/showcase` can (a) mark a run **"Save as workspace"** with an +optional name, (b) see every saved workspace in a server-backed panel (newest +first, replacing the localStorage FIFO-5 for *workspace* runs — localStorage +stays for ephemeral runs only), (c) **Load** a workspace: its recorded config +(seed, scenario, reset, skip_seed, keep+name) populates the run controls and its +created objects (runs, plans, batch, alias, artifacts, grain) render as deep-link +cards — read-only, no run starts, and (d) **Replay** a workspace: the recorded +configuration is re-submitted verbatim through the existing WS run path with +`preservation="keep"`, producing a NEW workspace row, without any 409/500 from +the registry/scenario layers. Legacy clients and ephemeral runs behave +byte-identically to today. + +**Deliverable** (all additive — no migration, no schema change, no new tables): + +- `app/features/demo/schemas.py` — `WorkspaceListItem`, `WorkspaceDetailResponse`, `WorkspaceListResponse` (plain `BaseModel` + `from_attributes`, NOT strict). +- `app/features/demo/workspace.py` — `count_workspaces()` helper (list/get already exist, unrouted since E1). +- `app/features/demo/routes.py` — `GET /demo/workspaces` (paginated list) + `GET /demo/workspaces/{workspace_id}` (detail, 404 on miss) — the demo slice's first DB-dependent routes. +- `frontend/src/types/api.ts` — `DemoRunRequest` +`preservation`/`workspace_name`; `DemoRunResult` +`workspace_id`; new workspace types. +- `frontend/src/hooks/use-demo-pipeline.ts` — `DemoSummary.workspaceId` (from `pipeline_complete.data.workspace_id`). +- `frontend/src/hooks/use-workspaces.ts` — NEW TanStack Query hooks (list + detail), pattern: `use-scenarios.ts`. +- `frontend/src/components/demo/WorkspacePanel.tsx` — NEW list panel with Load/Replay actions. +- `frontend/src/components/demo/WorkspaceArtifactsPanel.tsx` — NEW re-attach deep-link card grid driven by `created_objects` (pattern: `InspectArtifactsPanel.tsx`). +- `frontend/src/components/demo/RunHistoryStrip.tsx` — skip-append when the summary carries a `workspaceId` (server is source of truth for workspace runs). +- `frontend/src/pages/showcase.tsx` — keep-checkbox + name input + seed input wired into the start frame; panels mounted. +- Tests: backend route unit + integration tests, the **replay regression** integration test (same config twice → both green, two distinct workspace rows), frontend vitest for the reducer/strip/panels/hooks. +- `docs/_base/API_CONTRACTS.md` — two new endpoint rows + E4 notes. + +**Success definition**: all Success Criteria below check off, the five CI gates +are green, the frontend gates are green, and a manual browser dogfood on a +seeded stack shows save → list → load (links resolve) → replay (green pipeline, +new workspace row). + +## Why + +- E1 records workspaces but nothing reads them: `get_workspace`/`list_workspaces` are explicitly "unrouted in E1 -- consumed ... by the E4 restore/replay routes later" (`app/features/demo/workspace.py:15-17`). +- The only run memory in the UI is the localStorage FIFO-5 (`frontend/src/components/demo/RunHistoryStrip.tsx:21-22`), whose Replay button hardcodes `seed: 42` and drops the preservation fields (`RunHistoryStrip.tsx:133-138`) — it cannot restore a real configuration. +- Umbrella #389 success criterion: "A prior workspace can be restored: its config reloads into the UI and a replay with the same seed/preset completes without 409/500" + risk row: "add a replay regression test that runs the same config twice". +- Replay blockers are fixed but untested-for-regression: #146 (`registry/service.py` `_find_duplicate` now `.limit(1)` + `.scalars().first()`, ~lines 659-710) and #324 (champion via `ctx.winning_run_id`, parseable `safer_promote_flow` artifact_uri, `_restore_demo_alias_after_failure` at `pipeline.py:2708-2716`). No test currently runs the pipeline twice back-to-back (verified `tests/test_e2e_demo.py`). + +## What + +### Designed semantics — Restore vs Replay (required by issue #393) + +| Action | Meaning | Effect | +|--------|---------|--------| +| **Load** (= restore, re-attach) | "Show me that run again" | Read-only. Recorded config populates the Showcase controls (scenario picker, Re-seed/Reset checkboxes, seed, keep-checkbox ON, name). Created objects render as deep-link cards from `created_objects` + grain columns. **No run starts; no rows are written.** | +| **Replay** (= re-run) | "Run that configuration again" | Load, then immediately `start()` through the existing WS path with the recorded `seed`/`scenario`/`reset`/`skip_seed` **verbatim** and `preservation="keep"` + the recorded `workspace_name` (names are non-unique by design, `models.py:61`). A NEW workspace row is created; the original row is never mutated. | + +Decisions locked here (so implementation doesn't re-litigate): +1. **Replay is always `preservation="keep"`** — a replay is itself a workspace run; its row is the audit trail of the re-run. +2. **Config replays verbatim** — incl. `reset`/`skip_seed`. A `reset=true` workspace replays destructively; the panel renders a destructive badge on such rows (same styling language as the Reset checkbox, `showcase.tsx:218-228`). +3. **No provenance column** (`replayed_from`) in E4 — that needs a migration + request-field threading; deferred (note it in the PR description as a possible follow-up). +4. **No DELETE endpoint** — issue scope is "list/load + replay". Deletion is a future epic. +5. **localStorage split**: `RunHistoryStrip` keeps recording **ephemeral** runs only; a summary with `workspaceId != null` is NOT appended (umbrella risk table: "server is source of truth in workspace mode; localStorage stays for ephemeral runs only"). + +### User-visible behavior + +- `GET /demo/workspaces?limit=&offset=` → `{"workspaces": [...], "total": N}`, newest first, `200` + empty list on an empty table (mirror `GET /scenarios`). +- `GET /demo/workspaces/{workspace_id}` → full row incl. `created_objects` + `result_summary`; `404 application/problem+json` when missing. +- `/showcase` controls gain: **Save as workspace** checkbox + **Workspace name** input (visible only when checked; client-side pattern validation) + a small **Seed** number input (today `handleRun` hardcodes `seed: 42`, `showcase.tsx:115` — restore is meaningless without a controllable seed). +- A **Workspaces** panel lists saved workspaces (name, scenario, seed, status, winner, created_at, destructive-reset badge) with **Load** and **Replay** buttons (both disabled while `isRunning`). +- Loading a workspace also renders its **artifacts panel**: deep-link cards for winning run, V2 run, scenario plans, batch, alias, grain-scoped forecast/backtest, agent session. +- After a kept run completes, the panel refreshes (query invalidation keyed on the new `summary.workspaceId`). + +### Technical requirements + +- The two GET routes use `Depends(get_db)` (`app/core/database.py:43`) — first DB dependency in `demo/routes.py`; they delegate to `workspace.get_workspace` / `workspace.list_workspaces` / new `count_workspaces` and build response models via `model_validate` (from_attributes). +- Response schemas are plain `BaseModel` — NOT `ConfigDict(strict=True)`; strict mode is for request bodies only (precedent + rationale: `demo/schemas.py:88-95` StepEvent docstring). +- 404 via `NotFoundError` from `app.core.exceptions` (pattern: `scenarios/routes.py:220-223`). +- The demo slice still imports no other feature slice (`app.core.*` / `app.shared.*` only). +- Frontend start frame sends `preservation`/`workspace_name` only when relevant (omitting them keeps legacy byte-compat; backend defaults apply). +- The replay regression test runs the in-process pipeline twice with the same config and asserts both runs green + two distinct `workspace_id`s. + +### Success Criteria + +- [ ] `GET /demo/workspaces` returns `200` + `{"workspaces": [], "total": 0}` on an empty table; after a `preservation="keep"` run it lists the row newest-first with config + `result_summary`. +- [ ] `GET /demo/workspaces/{id}` returns the full row (incl. `created_objects`); unknown id → `404` `application/problem+json`. +- [ ] Showcase: a run with "Save as workspace" checked sends `preservation="keep"` (+ name when filled); the panel shows the new row after `pipeline_complete`. +- [ ] Load populates scenario/seed/reset/skip_seed/keep/name controls and renders working deep links (at minimum: winning run → `/explorer/runs/{id}`). +- [ ] Replay re-runs the recorded config verbatim through `WS /demo/stream` and ends green with a NEW `workspace_id`; the original row is unchanged. +- [ ] Ephemeral runs: zero workspace queries on the write path (unchanged E1 behavior); `RunHistoryStrip` still records them; kept runs are NOT appended to localStorage. +- [ ] Backend replay regression test: same config twice → both `overall_status="pass"`, no 409/500, distinct workspace ids. +- [ ] `uv run ruff check . && uv run ruff format --check . && uv run mypy app/ && uv run pyright app/ && uv run pytest -v -m "not integration"` green; integration suite green; frontend `pnpm lint && pnpm test --run` green. + +## All Needed Context + +### Documentation & References + +```yaml +# MUST READ — backend (verified 2026-06-12, dev @ 3194fe8) + +- file: app/features/demo/workspace.py + why: | + get_workspace (line 157) and list_workspaces (line 173) ALREADY EXIST, + take a caller-owned AsyncSession, and were written for these E4 routes + (docstring lines 15-17). list_workspaces orders created_at.desc, id.desc + (line 191) — the routes reuse them as-is. ADD count_workspaces here + (pattern: scenarios/service.py:455 func.count().select_from()). + +- file: app/features/demo/models.py + why: | + ShowcaseWorkspace columns (lines 59-81): workspace_id String(32) unique, + name (nullable), status (running/completed/failed CHECK), seed, scenario, + reset, skip_seed, store_id/product_id/date_start/date_end (nullable), + created_objects JSONB, result_summary JSONB nullable, + TimestampMixin + created_at/updated_at. The response schemas mirror EXACTLY these fields. + +- file: app/features/demo/schemas.py + why: | + DemoRunRequest (29) already carries preservation (68) + workspace_name + (72) + the requires-keep validator (80) — NO request changes in E4. + StepEvent docstring (88-95) is the precedent for "response/event models + are plain BaseModel, NOT strict". DemoRunResult.workspace_id (163) + already exists. Append the three new workspace response models here. + +- file: app/features/demo/routes.py + why: | + Current surface: POST /run (28-54), WS /stream (57-85), _error_event (88). + Router prefix="/demo" tags=["demo"] (25). NO GET endpoints and NO DB + dependency today — add both. ConflictError import precedent at line 18; + add NotFoundError + Depends/Query/AsyncSession/get_db imports. + +- file: app/features/scenarios/routes.py + why: | + THE list/get precedent to mirror: GET "" list with Query(ge/le) params + (168-195), GET "/{id}" raising NotFoundError(message=...) when the + service returns None (198-223). Copy the docstring/summary/description + style verbatim. + +- file: app/features/scenarios/schemas.py + why: | + Response-model precedent: ScenarioPlanResponse (323), ScenarioListItem + (~362), ScenarioListResponse (390) — all ConfigDict(from_attributes=True), + Field(...) with descriptions, list+total page shape. Mirror this shape. + +- file: app/features/scenarios/service.py + why: | + list_plans (436-472): count_stmt = select(func.count()).select_from(...), + rows query, total = int(await db.scalar(count_stmt) or 0). Use the same + count idiom in workspace.count_workspaces. + +- file: app/core/database.py + why: get_db dependency (line 43) — yields request-scoped AsyncSession. + +- file: app/core/exceptions.py + why: NotFoundError -> RFC 7807 404 problem+json via registered handlers. + +- file: app/features/demo/pipeline.py + why: | + DO NOT MODIFY in E4 (E1 hooks complete): create hook 2631-2635, finalize + 2719-2724, pipeline_complete.data.workspace_id 2747. #324 alias-restore + safeguard 2708-2716. Cite only; replay flows through it unchanged. + +- file: app/features/registry/service.py + why: | + _find_duplicate (~659-710) — the #146 fix (.limit(1) + .scalars().first() + under the "detect" policy) is what makes back-to-back replays survive + accumulated duplicate model_run rows. The replay regression test is the + guard that keeps this true. READ-ONLY for this PRP. + +- file: app/features/demo/tests/conftest.py + why: | + client fixture: ASGITransport over app.main.app (route unit tests + monkeypatch the service/workspace fns — no DB needed). db_session + fixture: real-Postgres session that WIPES showcase_workspace on teardown + — reuse for the new integration route tests. + +- file: app/features/demo/tests/test_routes.py + why: | + Route-test conventions: async tests over the client fixture, + monkeypatch.setattr(service, "run_pipeline_sync", fake) (lines 40-52), + 409 path (55+). New GET unit tests monkeypatch + app.features.demo.routes.workspace functions the same way. + +- file: tests/test_e2e_demo.py + why: | + Integration e2e conventions: @pytest.mark.integration, uvicorn_subprocess + fixture, urllib.request POST to /demo/run with json body + budget asserts + (210-295, test_run_demo_showcase_rich_e2e). The replay regression test + follows this exact shape with scenario=demo_minimal for speed. + +# MUST READ — frontend (verified 2026-06-12) + +- file: frontend/src/pages/showcase.tsx + why: | + 313 lines. useDemoPipeline destructure (94-108), reseed/resetDb state + (109-110), handleRun hardcoding seed 42 (114-116), RunHistoryStrip mount + (168-172), controls Card (175-237: ScenarioPicker + Run + Stop + two + Checkbox labels — add the keep-checkbox/name/seed inputs here), + resolveInspectHref deep-link map (38-92 — the href vocabulary + WorkspaceArtifactsPanel reuses). + +- file: frontend/src/hooks/use-demo-pipeline.ts + why: | + DemoSummary (24-33) — ADD workspaceId: string | null. applyEvent + pipeline_complete branch (117-127) — read event.data.workspace_id via the + existing toStringOrNull helper. start() (238-246) accepts a full + DemoRunRequest and uses req.scenario for the idle layout; NOTE it does + NOT update the `scenario` picker state — Replay must call setScenario + first, then start (same latent desync exists in today's strip replay). + Exported pure helpers (applyEvent, initialState) are unit-tested in + use-demo-pipeline.test.ts — extend there. + +- file: frontend/src/components/demo/RunHistoryStrip.tsx + why: | + localStorage key forecastlab.showcase.runs.v1 (21), cap 5 (22), + append-once-during-render pattern (71-86 — the sanctioned React pattern; + copy it for the workspace-panel query invalidation trigger), onReplay + hardcoded req (129-142). E4 change: skip append when + summary.workspaceId != null. Its test file shows the localStorage-mock + vitest conventions. + +- file: frontend/src/components/demo/InspectArtifactsPanel.tsx + why: | + THE re-attach precedent: InspectCard {label, blurb, href|null, + disabledReason} (15-20), buildCards (35+), disabled-card rendering, ROUTES + deep links. WorkspaceArtifactsPanel mirrors this but reads + created_objects JSONB + grain columns instead of live step.data. + +- file: frontend/src/hooks/use-scenarios.ts + why: | + TanStack Query conventions to copy for use-workspaces.ts: api<T> calls, + queryKey ['scenarios', {...}] shapes (28-47), invalidateQueries on + mutation success (50-59). + +- file: frontend/src/lib/api.ts + why: api<T>(endpoint, {params}) wrapper; ApiError carries RFC 7807 detail. + +- file: frontend/src/types/api.ts + why: | + Demo block (740-795): ScenarioPreset union (748), StepEvent (760), + DemoRunRequest (778 — MISSING preservation/workspace_name), DemoRunResult + (787 — MISSING workspace_id). Add the workspace types next to this block. + +- file: frontend/src/lib/constants.ts + why: | + ROUTES map (2+): EXPLORER.RUNS, VISUALIZE.{PLANNER,BATCH,FORECAST,BACKTEST}, + OPS, CHAT, KNOWLEDGE, ADMIN. DEMO_WS_URL (76-78). + +- file: frontend/src/components/demo/index.ts + why: Barrel — export the two new components here. + +- file: frontend/src/lib/uuid-utils.ts + why: safeRandomUUID — NEVER call crypto.randomUUID directly (LAN/non-secure + context crash, issue #332/PR #384; an ESLint guard enforces this). + +# Issue / initiative context +- url: https://github.com/w7-mgfcode/ForecastLabAI/issues/393 + why: The epic this PRP implements (restore-vs-replay semantics designed above). +- url: https://github.com/w7-mgfcode/ForecastLabAI/issues/389 + why: Umbrella — success criteria, out-of-scope list (no export bundle, no + per-phase config), replay-regression risk row. +- file: PRPs/PRP-showcase-workspace-E1-persistence-backbone.md + why: The Foundation PRP — table design rationale, no-FK decision, E1 test map. +``` + +### Current Codebase tree (relevant subset) + +```bash +app/features/demo/ +├── models.py # ShowcaseWorkspace @37 (E1; complete — untouched in E4) +├── pipeline.py # E1 hooks complete — UNTOUCHED in E4 +├── routes.py # POST /run @28; WS /stream @57 — NO GETs, NO DB dep +├── schemas.py # 166 lines; DemoRunRequest @29; DemoRunResult @130 +├── service.py # lock @19; PipelineBusyError @22 — untouched +├── workspace.py # get @157 / list @173 exist; NO count helper +└── tests/ # conftest (client + db_session), test_{schemas,models,pipeline,routes,workspace}.py +frontend/src/ +├── pages/showcase.tsx # 313 lines +├── hooks/use-demo-pipeline.ts # DemoSummary @24; applyEvent @86 +├── hooks/use-scenarios.ts # TanStack precedent +├── components/demo/{RunHistoryStrip,InspectArtifactsPanel,ScenarioPicker,...}.tsx +└── types/api.ts # demo block @740-795 +tests/test_e2e_demo.py # integration e2e; NO back-to-back run test +``` + +### Desired Codebase tree (files added/modified) + +```bash +app/features/demo/ +├── schemas.py # MOD — +WorkspaceListItem +WorkspaceDetailResponse +WorkspaceListResponse +├── workspace.py # MOD — +count_workspaces(db) -> int +├── routes.py # MOD — +GET /demo/workspaces +GET /demo/workspaces/{workspace_id} +└── tests/ + ├── test_routes.py # MOD — GET unit tests (workspace fns monkeypatched) + integration GET tests (db_session) + └── test_schemas.py # MOD — response-model from_attributes round-trip +tests/test_e2e_demo.py # MOD — +test_demo_replay_same_config_twice (integration) +frontend/src/ +├── types/api.ts # MOD — DemoRunRequest/+2, DemoRunResult/+1, +3 workspace types +├── hooks/use-demo-pipeline.ts # MOD — DemoSummary.workspaceId +├── hooks/use-demo-pipeline.test.ts # MOD — pipeline_complete carries workspace_id +├── hooks/use-workspaces.ts # NEW — useWorkspaces / useWorkspace +├── hooks/index.ts # MOD — re-export (match existing barrel style) +├── components/demo/WorkspacePanel.tsx # NEW — list + Load/Replay +├── components/demo/WorkspacePanel.test.tsx # NEW +├── components/demo/WorkspaceArtifactsPanel.tsx # NEW — created_objects deep links +├── components/demo/WorkspaceArtifactsPanel.test.tsx # NEW +├── components/demo/RunHistoryStrip.tsx # MOD — ephemeral-only append +├── components/demo/RunHistoryStrip.test.tsx # MOD — kept-run not appended +├── components/demo/index.ts # MOD — barrel exports +└── pages/showcase.tsx # MOD — controls + panels wiring +docs/_base/API_CONTRACTS.md # MOD — 2 endpoint rows + E4 notes +``` + +### Known Gotchas & Library Quirks + +```python +# CRITICAL — NO pipeline.py / models.py / migration changes in E4. The E1 hooks +# and table are complete; replay flows through the existing run path. If you +# think you need a schema change, you've scope-crept (provenance is deferred). + +# CRITICAL — response models are NOT strict. Strict mode is a REQUEST-body +# policy (security-patterns.md); StepEvent's docstring (schemas.py:88-95) is +# the in-file precedent. Use ConfigDict(from_attributes=True) like +# scenarios/schemas.py:323+. date_start/date_end are plain `date` fields — +# fine on response models (test_strict_mode_policy only walks strict=True +# request models). + +# CRITICAL — route order: register the two GET routes on the SAME router +# (prefix="/demo"). Paths /demo/workspaces and /demo/workspaces/{id} cannot +# collide with /demo/run or the WS /demo/stream. Static-vs-param ordering is +# irrelevant here (no /demo/workspaces/{x} value equals a static sibling). + +# CRITICAL — replay regression test budget: use scenario=demo_minimal (11 steps, +# fastest). First run: reset=true, skip_seed=false (clean deterministic data). +# Second run: SAME body but skip_seed=true is WRONG for "verbatim" semantics — +# replay the IDENTICAL body (reset=true, skip_seed=false) to prove the +# harshest path (re-seed + re-register over accumulated rows) stays green. +# Both with preservation="keep". Assert distinct workspace_ids. + +# GOTCHA — useDemoPipeline.start(req) does NOT sync the scenario picker state +# (use-demo-pipeline.ts:238-246 reads req.scenario only for the idle layout). +# Replay/Load must call setScenario(ws.scenario) explicitly, then start(). + +# GOTCHA — workspace_name pattern is ^[a-z0-9][a-z0-9\-_]*$ (schemas.py:76). +# Validate client-side (lowercase letters/digits/-/_ , must not start with +# - or _) and disable Run with an inline hint on violation — otherwise the +# operator gets a raw 422 problem+json from the WS error event. + +# GOTCHA — sending workspace_name requires preservation="keep" (model_validator +# schemas.py:80-85). The UI must OMIT workspace_name (not send "") when the +# keep-checkbox is off or the input is empty. + +# GOTCHA — workspace.scenario is stored as a plain string (models.py:67). +# Type it ScenarioPreset on the frontend; values come from the enum, but the +# replay path passes it through verbatim — backend re-validates on start. + +# GOTCHA — replaying a reset=true workspace WIPES the database (destructive). +# Render the destructive badge; do not add a confirm dialog (the run controls +# already expose reset with the same severity styling — consistency wins). +# sparse-preset workspaces may legitimately replay RED (RUNBOOKS incident 28: +# expected-fail preset) — the panel shows recorded status, not a promise. + +# GOTCHA — holiday_rush replays with reset=false ADD rows (union date window, +# RUNBOOKS incident 28). Verbatim replay inherits this documented behavior. + +# GOTCHA — never call crypto.randomUUID directly in frontend code — use +# safeRandomUUID from @/lib/uuid-utils (issue #332; ESLint guard fires). + +# GOTCHA — `pnpm tsc --noEmit` is VACUOUS on this repo (solution-style +# tsconfig checks zero files) and `tsc -b` currently fails on dev with +# PRE-EXISTING errors. Do not chase those. The real frontend gates that must +# be green: `pnpm lint && pnpm test --run`. Verify your new files compile by +# their vitest imports + eslint. + +# GOTCHA — shadcn: every primitive needed (card, button, checkbox, input, +# badge, skeleton) is ALREADY installed under frontend/src/components/ui/. +# Do NOT run shadcn add. If you believe a new primitive is required, stop — +# recheck the design (rule: .claude/rules/shadcn-ui.md; CLI pin gotchas in +# memory shadcn-cli-version-pin). + +# GOTCHA — repo has mixed CRLF/LF line endings; check `git diff --stat` before +# committing (Write/Edit emit LF — fine for NEW files; for showcase.tsx / +# routes.py edits verify the diff is surgical). + +# GOTCHA — mypy --strict AND pyright --strict gate merge: annotate route params +# (AsyncSession = Depends(get_db)), `-> None` on tests, full fixture types. + +# COORDINATION — E3 (#392, workspace-tagged scenario plans) is an open parallel +# epic touching pipeline.py + scenario tags. No shared files with E4 except +# docs/_base/API_CONTRACTS.md — keep that edit additive and self-contained. + +# RUNTIME-VERIFICATION LOG (per prp-create step 3): +# - workspace.get_workspace/list_workspaces signatures — read workspace.py:157,173 (2026-06-12) +# - ShowcaseWorkspace full column set — read models.py:59-89 (2026-06-12) +# - DemoRunRequest already has preservation/workspace_name — read schemas.py:64-85 +# - pipeline_complete.data.workspace_id emitted — read pipeline.py:2747 +# - #146 fix present (.limit(1)+.scalars().first()) — Explore-agent verified registry/service.py:659-710 +# - #324 fix present (winning_run_id champion + alias-restore) — read pipeline.py:2708-2716 +# - frontend ui/input.tsx exists — `ls frontend/src/components/ui/` (2026-06-12) +# - No third-party API claims beyond in-repo working patterns (func.count, +# from_attributes, TanStack useQuery) — no import probe required. +``` + +## Implementation Blueprint + +### Data models and structure + +```python +# app/features/demo/schemas.py — APPEND (mirror scenarios/schemas.py:362-397 shape) + +class WorkspaceListItem(BaseModel): + """A compact row in the saved-workspaces list (E4, issue #393).""" + + model_config = ConfigDict(from_attributes=True) + + workspace_id: str = Field(..., description="Unique external identifier (UUID hex).") + name: str | None = Field(default=None, description="Optional human label.") + status: str = Field(..., description="running / completed / failed.") + seed: int = Field(..., description="Seeder seed the run was started with.") + scenario: str = Field(..., description="Seeder scenario preset value.") + reset: bool = Field(..., description="Whether the run wiped the database first.") + skip_seed: bool = Field(..., description="Whether the run skipped the seed step.") + result_summary: dict[str, Any] | None = Field( + default=None, description="Winner / WAPE / wall-clock display payload." + ) + created_at: datetime = Field(..., description="When the run was recorded (UTC).") + + +class WorkspaceDetailResponse(WorkspaceListItem): + """Full workspace row incl. created objects (E4, issue #393).""" + + store_id: int | None = Field(default=None, description="Showcase grain store id.") + product_id: int | None = Field(default=None, description="Showcase grain product id.") + date_start: date | None = Field(default=None, description="Seeded window start.") + date_end: date | None = Field(default=None, description="Seeded window end.") + created_objects: dict[str, Any] = Field( + default_factory=dict, + description="Soft-reference ids of everything the run created.", + ) + + +class WorkspaceListResponse(BaseModel): + """A page of saved workspaces, newest first (E4, issue #393).""" + + model_config = ConfigDict(from_attributes=True) + + workspaces: list[WorkspaceListItem] = Field( + ..., description="Saved workspaces for the current page; empty when none." + ) + total: int = Field(..., ge=0, description="Total saved workspaces.") +``` + +```typescript +// frontend/src/types/api.ts — extend the demo block (after line 795) +export interface DemoRunRequest { + seed?: number + reset?: boolean + skip_seed?: boolean + scenario?: ScenarioPreset + // E4 (#393) — preservation policy (E1 backend fields, first UI exposure). + preservation?: 'ephemeral' | 'keep' + workspace_name?: string +} +// DemoRunResult: + workspace_id: string | null + +export interface WorkspaceListItem { + workspace_id: string + name: string | null + status: 'running' | 'completed' | 'failed' + seed: number + scenario: ScenarioPreset + reset: boolean + skip_seed: boolean + result_summary: Record<string, unknown> | null + created_at: string +} +export interface WorkspaceDetail extends WorkspaceListItem { + store_id: number | null + product_id: number | null + date_start: string | null + date_end: string | null + created_objects: Record<string, unknown> +} +export interface WorkspaceListResponse { + workspaces: WorkspaceListItem[] + total: number +} +``` + +### List of tasks (dependency order) + +```yaml +Task 1 — branch & issue hygiene: + RUN: git switch dev && git pull && git switch -c feat/showcase-workspace-restore-replay + VERIFY: gh issue view 393 --json state # open + +Task 2 — MODIFY app/features/demo/schemas.py: + - ADD import: date (datetime is already imported at schemas.py:11) + - APPEND WorkspaceListItem / WorkspaceDetailResponse / WorkspaceListResponse (blueprint above) + - Docstring note: response models, from_attributes, NOT strict (StepEvent precedent) + +Task 3 — MODIFY app/features/demo/workspace.py: + - ADD: async def count_workspaces(db: AsyncSession) -> int + # select(func.count()).select_from(ShowcaseWorkspace); int(await db.scalar(...) or 0) + # pattern: scenarios/service.py:455,467 + - UPDATE module docstring line 15-17 ("unrouted in E1") -> note E4 routes them + +Task 4 — MODIFY app/features/demo/routes.py: + - ADD imports: Depends, Query (fastapi); AsyncSession; get_db; NotFoundError; + workspace module; the three new schemas + - ADD GET "/workspaces" -> WorkspaceListResponse + # limit: Query(default=20, ge=1, le=100), offset: Query(default=0, ge=0) + # rows = await workspace.list_workspaces(db, limit=limit, offset=offset) + # total = await workspace.count_workspaces(db) + # WorkspaceListResponse(workspaces=[WorkspaceListItem.model_validate(r) for r in rows], total=total) + - ADD GET "/workspaces/{workspace_id}" -> WorkspaceDetailResponse + # row = await workspace.get_workspace(db, workspace_id) + # if row is None: raise NotFoundError(message=f"Workspace not found: {workspace_id}") + # mirror scenarios/routes.py:198-223 docstring style + - Place both ABOVE the WS handler for file readability (no routing semantics) + +Task 5 — backend tests: + - MODIFY app/features/demo/tests/test_schemas.py (unit): + # WorkspaceListItem.model_validate(orm-like SimpleNamespace/ShowcaseWorkspace(), from_attributes) + # detail model carries created_objects verbatim + - MODIFY app/features/demo/tests/test_routes.py: + UNIT (monkeypatch app.features.demo.routes.workspace): + - test_list_workspaces_empty -> 200 {"workspaces": [], "total": 0} + - test_list_workspaces_passes_pagination -> limit/offset forwarded + - test_get_workspace_404 -> problem+json content-type, status 404 + - test_get_workspace_success -> detail fields round-trip + # NOTE: monkeypatched fns never touch db -> get_db yields an unused + # session; no DB needed (session connects lazily on first query) + INTEGRATION (@pytest.mark.integration, db_session fixture seeds rows): + - insert 3 rows via db_session -> GET list newest-first, total=3 + - GET detail by workspace_id -> created_objects round-trips JSONB + - MODIFY tests/test_e2e_demo.py (@pytest.mark.integration): + test_demo_replay_same_config_twice(uvicorn_subprocess): + # body = {"seed": 42, "reset": true, "skip_seed": false, + # "scenario": "demo_minimal", "preservation": "keep", + # "workspace_name": "replay-regression"} + # POST /demo/run twice sequentially (urllib pattern, lines 210-250). + # Assert: both 200 + overall_status == "pass" (replay blockers + # #146/#324 regression guard); workspace_id non-null and DIFFERENT + # across the two runs; GET /demo/workspaces lists >= 2 rows named + # replay-regression with status completed. + +Task 6 — MODIFY frontend/src/types/api.ts: + - DemoRunRequest += preservation? / workspace_name? (comment: E4 #393) + - DemoRunResult += workspace_id: string | null + - ADD WorkspaceListItem / WorkspaceDetail / WorkspaceListResponse + +Task 7 — MODIFY frontend/src/hooks/use-demo-pipeline.ts (+ test): + - DemoSummary += workspaceId: string | null + - applyEvent pipeline_complete: workspaceId: toStringOrNull(event.data.workspace_id) + - test: pipeline_complete event with workspace_id -> summary.workspaceId set; + absent key -> null (legacy back-compat) + +Task 8 — CREATE frontend/src/hooks/use-workspaces.ts (pattern: use-scenarios.ts): + - useWorkspaces(limit = 20): + useQuery({ queryKey: ['workspaces', { limit }], + queryFn: () => api<WorkspaceListResponse>('/demo/workspaces', { params: { limit } }) }) + - useWorkspace(workspaceId: string, enabled = true): + queryKey ['workspaces', workspaceId]; enabled: enabled && !!workspaceId + - Re-export from hooks/index.ts (match existing barrel entries) + +Task 9 — CREATE frontend/src/components/demo/WorkspacePanel.tsx (+ test): + - Props: { onLoad: (ws: WorkspaceListItem) => void, + onReplay: (ws: WorkspaceListItem) => void, + isRunning: boolean, + lastWorkspaceId: string | null } # summary.workspaceId — triggers refetch + # RESOLVED design: props receive the LIST item; the PAGE resolves the full + # detail via useWorkspace(selectedId) and only then setLoadedWorkspace — + # WorkspaceArtifactsPanel needs detail-only created_objects. Replay needs + # only list-item fields (seed/scenario/reset/skip_seed/name) — no detail + # fetch required on the replay path. + - useWorkspaces() list; invalidate ['workspaces'] when lastWorkspaceId changes + (append-once-during-render pattern, RunHistoryStrip.tsx:71-86, or a + useEffect keyed on lastWorkspaceId — syncing to an external system). + - Row: name (or workspace_id slice), scenario badge, seed, status color + (pass green / fail red — RunHistoryStrip styling 118-126), winner from + result_summary, created_at toLocaleString, DESTRUCTIVE badge when reset. + - Buttons: Load + Replay (variant outline, size sm), disabled when isRunning. + Per the RESOLVED design above: the panel stays dumb (list items only); + detail fetching lives in the page (Task 12). + - Render null-state: "No saved workspaces yet" muted text (do NOT hide the + panel entirely — discoverability of the new feature). + +Task 10 — CREATE frontend/src/components/demo/WorkspaceArtifactsPanel.tsx (+ test): + - Props: { workspace: WorkspaceDetail } + - Mirror InspectArtifactsPanel InspectCard shape (label/blurb/href/disabledReason). + - Cards from created_objects (keys per workspace.py:_collect_created_objects:88-101): + winning_run_id -> `${ROUTES.EXPLORER.RUNS}/${id}` + v2_run_id -> `${ROUTES.EXPLORER.RUNS}/${id}` + scenario_plan_ids[] -> `${ROUTES.VISUALIZE.PLANNER}?scenario_id=${id}` (one card per plan, label Plan 1/2) + batch_id -> `${ROUTES.VISUALIZE.BATCH}/${id}` + alias -> ROUTES.OPS + agent_session_id -> ROUTES.CHAT (blurb: session likely expired — link is the chat surface) + grain (store_id+product_id cols) -> FORECAST + BACKTEST query links + (`?store_id=&product_id=` — resolveInspectHref vocabulary, showcase.tsx:46-57) + - Missing key -> disabled card with disabledReason (InspectArtifactsPanel pattern). + +Task 11 — MODIFY frontend/src/components/demo/RunHistoryStrip.tsx (+ test): + - In the append-once block (71-86): skip append when summary.workspaceId is + non-null (comment: E4 #393 — server-backed workspaces own kept runs). + - Test: summary with workspaceId -> items unchanged; without -> appended. + +Task 12 — MODIFY frontend/src/pages/showcase.tsx: + - State: keepWorkspace (bool), workspaceName (string), seed (number, default 42), + selectedWorkspaceId (string | null). + - Detail resolution (RESOLVED design from Task 9): const { data: loadedWorkspace } = + useWorkspace(selectedWorkspaceId ?? '', !!selectedWorkspaceId) — the page, + not the panel, owns the detail fetch; WorkspaceArtifactsPanel renders only + when the detail query has data. + - Controls card: ADD Seed <Input type="number"> (small, labeled), ADD + "Save as workspace" Checkbox (same label pattern as Re-seed, 206-216), ADD + name <Input> shown when checked, with inline pattern-violation hint + (^[a-z0-9][a-z0-9\-_]*$) that disables Run. + - handleRun: build req { seed, skip_seed: !reseed, reset: resetDb, scenario, + ...(keepWorkspace ? { preservation: 'keep' as const, + ...(workspaceName ? { workspace_name: workspaceName } : {}) } : {}) } + - onLoad(ws: WorkspaceListItem): setScenario(ws.scenario); setSeed(ws.seed); + setReseed(!ws.skip_seed); setResetDb(ws.reset); setKeepWorkspace(true); + setWorkspaceName(ws.name ?? ''); setSelectedWorkspaceId(ws.workspace_id) + # useWorkspace then resolves the detail -> WorkspaceArtifactsPanel renders + - onReplay(ws: WorkspaceListItem): onLoad(ws) THEN start({ seed: ws.seed, scenario: ws.scenario, + reset: ws.reset, skip_seed: ws.skip_seed, preservation: 'keep', + ...(ws.name ? { workspace_name: ws.name } : {}) }) + # setScenario before start — picker-desync gotcha + - Mount <WorkspacePanel ... lastWorkspaceId={summary?.workspaceId ?? null} /> + near RunHistoryStrip; mount <WorkspaceArtifactsPanel workspace={loadedWorkspace}/> + when loadedWorkspace && phase === 'idle' (a started run replaces it with live cards). + +Task 13 — MODIFY docs/_base/API_CONTRACTS.md: + - Endpoint table, after the demo WS row: + | demo | GET | `/demo/workspaces` | E4 (#393) — list saved showcase workspaces, newest first (`limit`/`offset`); `200` + empty list on an empty table | + | demo | GET | `/demo/workspaces/{workspace_id}` | E4 (#393) — full workspace row incl. `created_objects` soft references; `404` when missing | + - WS /demo/stream section: append an E4 note — the start frame's E1 + preservation fields are now exercised by the Showcase UI; replay re-submits + a recorded config verbatim with preservation="keep". + +Task 14 — gates, dogfood, commit, PR: + - Backend gates + integration suite (Validation Loop below) + - Frontend: pnpm lint && pnpm test --run + - Browser dogfood via the webapp-testing skill (CLAUDE.md workflow step 4): + seeded stack -> save -> list -> load -> links -> replay green + - git diff --stat (CRLF noise check) + - COMMITS (reference #393, no AI trailer), e.g.: + feat(api): expose showcase workspace list and detail endpoints (#393) + feat(ui): add workspace restore and replay to showcase page (#393) + test(api): add demo replay same-config regression test (#393) + docs(api): document workspace restore endpoints (#393) + - PR into dev; title `feat(api,ui): showcase workspace restore/replay (#393)` +``` + +### Integration Points + +```yaml +DATABASE: none — E4 reads the E1 table; no migration. + +CONFIG: none — no new settings or env vars. + +ROUTES: two GETs on the existing demo router (app/main.py:156 wiring unchanged). + +FRONTEND: showcase page only; no new React Router routes (deep links target + existing pages). New components exported via components/demo/index.ts barrel. + +DOCS: docs/_base/API_CONTRACTS.md only (Task 13). RUNBOOKS/DOMAIN_MODEL sweeps + belong to the E5 release gate — do not scope-creep them here. +``` + +## Validation Loop + +### Level 1: Syntax & Style + +```bash +uv run ruff check . && uv run ruff format --check . +uv run mypy app/ && uv run pyright app/ +cd frontend && pnpm lint +# Expected: clean. Both Python type checkers are --strict and gate merge. +# (pnpm tsc --noEmit is vacuous on this repo — lint + vitest are the JS gates.) +``` + +### Level 2: Unit Tests (no DB) + +```bash +uv run pytest app/features/demo -v -m "not integration" +uv run pytest app/core/tests/test_strict_mode_policy.py -v # AST walker still green +cd frontend && pnpm test --run +# New/changed: test_routes GET unit tests (workspace fns monkeypatched), +# test_schemas from_attributes round-trip, use-demo-pipeline workspaceId, +# WorkspacePanel/WorkspaceArtifactsPanel/RunHistoryStrip vitest. +``` + +### Level 3: Integration (real Postgres) + +```bash +docker compose up -d && uv run alembic upgrade head +uv run pytest app/features/demo -v -m integration +# Workspace GET routes against seeded rows (db_session fixture wipes on teardown). + +# Replay regression (slow — runs the demo pipeline twice; needs the e2e harness): +uv run pytest tests/test_e2e_demo.py::test_demo_replay_same_config_twice -v -m integration +``` + +### Level 4: Manual smoke + browser dogfood (seeded local stack, uvicorn :8123) + +```bash +# 1. Keep-run + list + detail round-trip +curl -s -X POST http://localhost:8123/demo/run -H 'Content-Type: application/json' \ + -d '{"skip_seed": true, "preservation": "keep", "workspace_name": "e4-smoke"}' \ + | python3 -c "import sys,json; print(json.load(sys.stdin)['workspace_id'])" +curl -s "http://localhost:8123/demo/workspaces?limit=5" | python3 -m json.tool | head -30 +curl -s "http://localhost:8123/demo/workspaces/<id-from-above>" | python3 -m json.tool | head -40 +curl -s -o /dev/null -w "%{http_code} %{content_type}\n" \ + http://localhost:8123/demo/workspaces/deadbeefdeadbeefdeadbeefdeadbeef # 404 problem+json + +# 2. Browser dogfood (webapp-testing skill / agent-browser): +# /showcase -> tick "Save as workspace", name "e4-dogfood", Run -> green -> +# panel shows the row -> Load -> controls repopulate + artifact links resolve +# (winning run opens /explorer/runs/{id}) -> Replay -> second green run -> +# panel shows TWO rows named e4-dogfood -> ephemeral run -> localStorage +# strip appends it, workspace panel does NOT. +``` + +## Final validation Checklist + +- [ ] All five gates green: `uv run ruff check . && uv run ruff format --check . && uv run mypy app/ && uv run pyright app/ && uv run pytest -v -m "not integration"` +- [ ] Integration suite green: `uv run pytest -v -m integration` (fresh docker-compose DB) +- [ ] Replay regression test green: same config twice → both pass, distinct workspace ids +- [ ] Frontend gates green: `pnpm lint && pnpm test --run` +- [ ] Legacy behavior byte-identical: ephemeral runs write no rows; start frames without new keys validate; `GET /demo/workspaces` on empty table → `200 {"workspaces": [], "total": 0}` +- [ ] Browser dogfood passes (Level 4 step 2) — UI verified in a real browser per `.claude/rules/ui-design.md` +- [ ] `git diff --stat` shows surgical diffs (no CRLF whole-file noise) +- [ ] docs/_base/API_CONTRACTS.md updated additively +- [ ] Commits formatted `feat(api)/feat(ui)/test(api)/docs(api): ... (#393)`, no AI trailer; PR into dev + +--- + +## Anti-Patterns to Avoid + +- ❌ Don't touch `pipeline.py`, `models.py`, or migrations — E4 is read-endpoints + UI + tests only. +- ❌ Don't add `replayed_from` provenance, DELETE endpoints, or export bundles — out of scope (umbrella #389). +- ❌ Don't make the workspace response models strict — strict mode is request-body policy. +- ❌ Don't mutate the original workspace row on replay — replay creates a new row, period. +- ❌ Don't remove the localStorage RunHistoryStrip — it stays for ephemeral runs only. +- ❌ Don't call `crypto.randomUUID` directly — `safeRandomUUID` (ESLint-enforced). +- ❌ Don't run `shadcn add` — every needed primitive is already installed. +- ❌ Don't chase pre-existing `tsc -b` errors — lint + vitest are the JS gates. +- ❌ Don't import another feature slice from `app/features/demo/` — core/shared only. + +## Confidence Score + +**8.5/10** for one-pass implementation success. The backend half is a near-copy +of the scenarios list/get precedent over helpers E1 deliberately pre-built for +this epic, and the restore-vs-replay semantics the issue required designing are +fully specified above. The deductions: (a) the showcase.tsx wiring touches many +small pieces (controls, two new panels, picker-desync ordering) where a missed +interaction costs an iteration, and (b) the replay regression test runs the real +pipeline twice and may surface environment-dependent flakiness (wall-clock, +accumulated rows) that needs tuning rather than code fixes.