From 12ad37ed523377cd4f5890460e4123cf68acb3bf Mon Sep 17 00:00:00 2001 From: SoundMindsAI Date: Wed, 3 Jun 2026 16:25:34 -0400 Subject: [PATCH] docs(state): finalize feat_list_count_columns (PR #436) + correct freshness-gate framing MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Prepend feat_list_count_columns (PR #436, 606d43d9) to Last 5 merges. - Correct the infra_generated_artifact_freshness_gate entry: it merged as PR #433 (c5c36c65) + docs PR #435 — the prior text was committed on the feature branch before the PR number existed and still said "PR forthcoming". - Drop feat_studies_convergence_visibility to the older-entries pointer (keeps the list at 5). - Reset "Current branch / execution context" + "In flight" to main / nothing-in-flight (both 2026-06-03 features merged). Co-Authored-By: Claude Opus 4.8 (1M context) Signed-off-by: SoundMindsAI --- state.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/state.md b/state.md index 04fd94fb..6cda3051 100644 --- a/state.md +++ b/state.md @@ -16,8 +16,8 @@ MVP1 (v0.1) **shipped** — all six differentiators live (Bayesian/TPE optimizer ## Current branch / execution context -- **Branch:** `infra_generated_artifact_freshness_gate` (8 commits ahead of main, PR forthcoming). `pr.yml` not yet observed against the new branch — the new `generated-artifacts-fresh` job + `copy-docs-freshness.yml` workflow will fire on first push. -- **Active feature:** `infra_generated_artifact_freshness_gate` shipping both phases together (the standalone `infra_openapi_types_freshness_gate/` Phase-2-only record is retired at finalization). The latest pre-feature merge was `chore_scorecard_pin_deps_postcss` (PR #430, 2026-06-03). Next after this PR merges: pull from the MVP2 Idea/Plan backlog (run `/pipeline status`). +- **Branch:** `main` (PR #436 `feat_list_count_columns` just merged `606d43d9`; PR #433 `infra_generated_artifact_freshness_gate` merged earlier the same day). All `pr.yml` checks green (smoke skipped — opt-in/off). +- **Active feature:** _None in flight._ `feat_list_count_columns` shipped 2026-06-03 (PR #436). Next: pull from the MVP2 Idea/Plan backlog (run `/pipeline status`). - **Alembic head:** `0022_solr_engine_auth_check` (added by `infra_adapter_solr` Story A6 — extends `clusters.engine_type` + `clusters.auth_kind` CHECK constraints for Solr). - **Python:** 3.13. **Frontend stack:** Next 16 (App Router + Turbopack), React 19, Tailwind 4 (CSS-first), Vitest 4, ESLint 9 (flat), TypeScript 6, Playwright (chromium, single worker) for E2E. - **Coverage gates:** backend 80% (`fail_under` in pyproject), UI vitest + tsc + ESLint + Next build, plus a full-stack smoke E2E job. Live pass counts: see the latest `pr.yml` run (the historical per-feature counts moved to `state_history.md`). @@ -26,16 +26,16 @@ MVP1 (v0.1) **shipped** — all six differentiators live (Bayesian/TPE optimizer Detail + reasoning for each is in [`state_history.md`](state_history.md). -- **2026-06-03** — `infra_generated_artifact_freshness_gate` (PR forthcoming). Both phases shipped together: Phase 1 (`copy-docs` freshness gate) + Phase 2 (offline OpenAPI exporter + `openapi.json` snapshot + `types.ts` gate + chained fix). The standalone `infra_openapi_types_freshness_gate/` folder (the discoverable-record-if-Phase-2-ships-alone) is retired at finalization. **Phase 1:** `copy-docs.mjs` now prunes `ui/public/docs/` to `{README.md} ∪ {DOCS[].dest}` (FR-9, so a renamed entry never leaves a stale public copy); new `.github/workflows/copy-docs-freshness.yml` runs on every PR with no `paths-ignore` filter (FR-3 escape from pr.yml's `docs/**` filter so docs-only PRs still get the check). **Phase 2:** `backend/app/openapi_export.py` emits the canonical OpenAPI schema offline (no live DB/Redis/ES/OpenSearch/Solr/OpenAI — verified by `test_openapi_export.py` running against deliberately-unreachable REDIS_URL); `ui/openapi.json` (149KB, 52 paths) committed as the canonical snapshot; `gen-types.mjs` refactored to use the lockfile-pinned `node_modules/.bin/openapi-typescript` (no `npx` fallback) with a source-invariant banner extracted to the pure module `ui/scripts/gen-types-banner.mjs`; new `pr.yml` job `generated-artifacts-fresh` runs the snapshot + types guards + an AC-7 clean-tree determinism step that proves the regenerator is itself deterministic across runs. Single chained fix command: `bash scripts/regen-generated-artifacts.sh`. New `ui/.prettierignore` lists `src/lib/types.ts` + `public/docs/*.md` — the generator is the source of truth, prettier on them would make the gates flap. Tangential inline fix (per CLAUDE.md tangential-discoveries rule): `studies-table-ceiling-badge.test.tsx` fixture was missing `trial_count: 0`; the regen of types.ts surfaced the schema/test drift. 48 new test cases total (10 backend unit + 11 + 6 vitest + 7×3 shell-guard self-tests). No migration (head stays `0022`). Cross-model: Epic 1 GPT-5.5 phase-gate 3 findings (1 accepted-and-fixed, 2 rejected with cited counter-evidence); Epic 2 GPT-5.5 phase-gate 5 findings (all 5 rejected — 2 false positives from the slim-diff input, 2 plan-authorized override patterns, 1 inline-fix-per-CLAUDE.md guidance). +- **2026-06-03** — `feat_list_count_columns` (PR #436, squash-merged `606d43d9`). Adds an at-a-glance count column to two list tables: `/query-sets` gains **Queries** (`query_count`) and `/templates` gains **Parameters** (`param_count` = the template's tuning surface). Two different impls by data shape: `query_count` counts child `queries` rows via a new **batched `GROUP BY` aggregate** `repo.count_queries_for_sets` (one query/page, no N+1 — mirrors `count_trials_for_studies` from `feat_studies_convergence_visibility`; `QuerySetSummary` had previously *omitted* the count "to avoid N+1 at list time", an objection the batch removes); `param_count` is `len(declared_params)` — free, since `declared_params` is a JSONB column already on the template row (not a child relationship). Bug caught by mypy mid-impl: the aggregate column is labeled `query_count` NOT `count` — SQLAlchemy `Row` is tuple-like + exposes a built-in `.count()` method, so `row.count` would resolve to the bound method. Regenerated `ui/openapi.json` + `types.ts` (the freshness gate validated them green). Regenerated in-app guides 03 + 04 against a populated stack (`make up` + `make seed-demo` mid-session) so the list screenshots show the new columns with real data; promoted the walkthrough videos to match; dropped a briefly-filed `chore_guide_regen_*` idea once the populated regen made it obsolete. No migration (head stays `0022`). 14 new tests (5 integration + 2 contract + 7 vitest). Cross-model: Gemini 1 (rejected — `len(... or {})` guards an unreachable NULL; `declared_params` NOT NULL in model + migration 0003); GPT-5.5 final 2 (both rejected — slim-diff false positives claiming types.ts wasn't updated; it was, + tsc green). All 17 `pr.yml` checks green. +- **2026-06-03** — `infra_generated_artifact_freshness_gate` (PR #433, squash-merged `c5c36c65`; finalized via docs PR #435 `0dab5ec3`). Both phases shipped together: Phase 1 (`copy-docs` freshness gate) + Phase 2 (offline OpenAPI exporter + `openapi.json` snapshot + `types.ts` gate + chained fix). The standalone `infra_openapi_types_freshness_gate/` folder was retired at finalization. **Phase 1:** `copy-docs.mjs` now prunes `ui/public/docs/` to `{README.md} ∪ {DOCS[].dest}` (FR-9, so a renamed entry never leaves a stale public copy); new `.github/workflows/copy-docs-freshness.yml` runs on every PR with no `paths-ignore` filter (FR-3 escape from pr.yml's `docs/**` filter so docs-only PRs still get the check). **Phase 2:** `backend/app/openapi_export.py` emits the canonical OpenAPI schema offline (no live DB/Redis/ES/OpenSearch/Solr/OpenAI — verified by `test_openapi_export.py` running against deliberately-unreachable REDIS_URL); `ui/openapi.json` (149KB, 52 paths) committed as the canonical snapshot; `gen-types.mjs` refactored to use the lockfile-pinned `node_modules/.bin/openapi-typescript` (no `npx` fallback) with a source-invariant banner extracted to the pure module `ui/scripts/gen-types-banner.mjs`; new `pr.yml` job `generated-artifacts-fresh` runs the snapshot + types guards + an AC-7 clean-tree determinism step that proves the regenerator is itself deterministic across runs. Single chained fix command: `bash scripts/regen-generated-artifacts.sh`. New `ui/.prettierignore` lists `src/lib/types.ts` + `public/docs/*.md` — the generator is the source of truth, prettier on them would make the gates flap. Tangential inline fix: `studies-table-ceiling-badge.test.tsx` fixture was missing `trial_count: 0`. 48 new test cases (10 backend unit + 11 + 6 vitest + 7×3 shell-guard self-tests). No migration (head stays `0022`). Cross-model: Epic 1 GPT-5.5 3 findings (1 fixed, 2 rejected); Epic 2 GPT-5.5 5 findings (all rejected); Gemini 3 (all accepted — atexit cleanup, atomic-write try/finally, Windows shell flag); final GPT-5.5 clean. All 17 `pr.yml` checks green. - **2026-06-03** — `chore_scorecard_pin_deps_postcss` (PR #430). Resolved the actionable OSSF Scorecard findings on the public code-scanning surface — the one real vulnerability + the ~60 `PinnedDependencies` alerts. **Vulnerability #72:** `postcss < 8.5.10` (moderate XSS via unescaped `` in CSS stringify) was transitive — `next@16.2.6` hard-pins `postcss@8.4.31`; added a pnpm `overrides` (`postcss@<8.5.10` → `^8.5.15`) so the whole tree (incl. Next's bundled copy) resolves to 8.5.15, regenerated `ui/pnpm-lock.yaml`, verified `pnpm build` + 1008 vitest green. **PinnedDependencies:** pinned all 56 GitHub Action `uses:` refs to 40-char commit SHAs (`# vX` comments) across all 5 workflows; pinned the 4 `pr.yml` service-container images (postgres/redis/elasticsearch/opensearch) by manifest digest; pinned the Dockerfile base images by digest via single `BASE_IMAGE` ARGs (`python:3.14-slim` in `Dockerfile` — collapsed from the original split `PYTHON_VERSION`/`PYTHON_DIGEST` after Gemini flagged the digest-wins-over-tag override footgun; `node:26-bookworm-slim` declared once + reused by the 3 `ui/Dockerfile` stages). Dependabot already runs github-actions + docker weekly so the pins stay fresh. **Left intentionally:** npmCommand (`npm install -g pnpm@9`) + pipCommand (docs-site `pip install`) — impractical to hash-pin, not "images"; workflow `services.*.image` digests need manual refresh (Dependabot's github-actions ecosystem updates `uses:` only); Tier-3 intrinsic findings (relaxed branch protection, solo-dev review ratio, project age, fuzzing, OpenSSF badge, SAST). No `backend/app/` source, no migration (head stays `0022`). Cross-model: Gemini 2 findings (both accepted + fixed — the `BASE_IMAGE` consolidations above), each re-validated with `docker buildx build --check`. Both `docker buildx` CI jobs green on the final commit. - **2026-06-02** — `bug_llm_capability_cache_no_refresh` (PR #426, squash-merged `432dcf59`). The OpenAI capability check ran exactly once at api startup (`main.py:94`, fire-and-forget lifespan task) + cached in Redis with a 24h TTL (`capability_check.py:48`); nothing repopulated it, so any stack up >24h silently lost all LLM-dependent capability — `POST /judgments/generate` returned `503 LLM_PROVIDER_INCAPABLE "cache miss"` until an api restart. Confirmed live at 34h uptime (zero `openai:capabilities:*` keys; `docker compose restart api` fixed it). **Fix (Option A, locked at preflight D-1):** new `read_or_recompute_capability_result()` helper reads the cache, recomputes inline via `check_capabilities()` on miss (writes back), returns `None` on empty key (preserves the `/healthz` "no key" semantic). `agent_judgments_dispatch._check_llm_preflight` opts in; `/healthz` (200ms SLO, Rule #11) + chat orchestrator stay read-only (D-5). A per-worker `asyncio.Lock` single-flight + in-lock double-checked read collapses concurrent in-worker recompute bursts to 1 probe (D-4, refined after GPT-5.5 caught the original "WEB_CONCURRENCY × probes" bound undercounting concurrent requests); defensive try/except returns `None` on unexpected failure (→ caller's existing 503 envelope, not a bare 500). Options B (background refresh) + C (stale-but-usable) rejected (D-2/D-3). Shipped via `/bug-fix --ship` → `/impl-execute --ad-hoc`. No `backend/app/` source beyond the helper + call-site swap, no migration (head stays `0022`). 7 unit tests (`TestReadOrRecomputeCapabilityResult`) + 1 integration test (`test_generate_recovers_after_capability_cache_expiry`); test-fixture monkeypatch sites updated to the new symbol. 2194 unit pass, 330 contract pass. Cross-model: Gemini 4 (1 accepted — `api_key: str | None`; 3 rejected as hunk-isolated false positives on `AsyncMock.assert_not_awaited`, stdlib since 3.8), GPT-5.5 final 2 (both accepted — the asyncio.Lock single-flight + the exception wrapper, each with a new regression test). Ride-along: `/idea-preflight` SKILL.md routing fix (no longer hard-codes `/pipeline --auto` — routes to `/bug-fix`/`/impl-execute --ad-hoc` by prefix+scope). All 12 `pr.yml` checks green. - **2026-06-02** — `infra_smoke_reseed_runtime_budget` (PR #424, squash-merged `035d7941`). Clears the last of the three-PR Solr-CI debt chain (`infra_solr_ci_readiness` backend half → `infra_solr_smoke_stability` Solr boot → this, the reseed-runtime half). The smoke job's `demo-ubi.spec.ts` `beforeAll` reseed exceeded the 25-min `timeout-minutes` cap once Solr actually booted (AC-8 of `feat_demo_ubi_study_comparison` bounds the in-flight reseed at 1140s/~19 min hard ceiling, ~28 min worst case per §14 — Playwright + setup overhead pushed total past 25 min; PR #383 run 26790636716 hit it at 25:18). **Fix (Option A, locked at idea-preflight):** extend `ui/playwright.config.ts`'s `testIgnore` CI-gated branch by one entry (`'**/demo-ubi.spec.ts'`, the 7th alongside the 6 pre-existing demo-data-dependent specs) — the `process.env.CI ? [...] : []` ternary gates it to GHA runs, so local `make up` smoke (`CI=` unset) keeps full demo-ubi coverage. Option B (timeout bump → 35 min) rejected (D-3: <7 min margin against §14 worst case); Option C (env-var reseed scenario filter, ~2-3h multi-file) deferred per operator (D-2). New vitest regression guard `ui/src/__tests__/playwright-config-test-ignore.test.ts` (3 assertions: demo-ubi in CI branch, all 7 entries present, demo-ubi not outside the ternary). Runbook `docs/03_runbooks/smoke-solr-stability.md` §5 documents the exclusion + the reseed-runtime-vs-Solr-stability split; pr.yml + state.md stale "exceeds the cap" framing refreshed to "runtime block cleared, flip `SMOKE_TEST=true` after the §16 `playwright test --list` verification". 5 stories / 1 epic. No `backend/app/` source, no migration (head stays `0022`). §16 manual verification confirmed AC-1 (`CI=true` → 86 tests/30 files, 0 demo-ubi) + AC-2 (`CI=` unset → 110 tests/37 files, demo-ubi discovered). Cross-model: spec GPT-5.5 3 cycles (13 findings, all applied), plan GPT-5.5 3 cycles (11 findings, all applied), Gemini 2 (both accepted — `import.meta.url` path resolution + CRLF normalization), GPT-5.5 final 3 (2 accepted: §4→§5 pointer + runbook markdown links; 1 rejected: AC-7 file-shape re-raise, counter-evidence cited). All 12 `pr.yml` checks green. -- **2026-06-02** — `feat_studies_convergence_visibility` (Epic 1 via PR #421 `e5c3b8b9`; Epic 2 via PR #422 `49a0e1b0`). **Epic 1** — studies-list convergence visibility: `GET /api/v1/studies` items gain `trial_count` (non-baseline total) + `convergence_verdict` (reuses the shipped `classify_convergence` via a count-gated path; bounded to 1–2 queries/page via `count_trials_for_studies` + `resolve_list_convergence_verdicts`); `/studies` UI gains Trials + Convergence columns reusing `CONVERGENCE_VERDICT_VALUES` + the `convergence_verdict` glossary key. (Epic 1 landed bundled inside the PR #421 squash-merge alongside `complementary-architecture.md` — surfaced during the Epic 2 CI watch; the Epic 2 branch was rebased onto `e5c3b8b9` to drop the duplicate Epic 1 commits.) **Epic 2** — demo data that shows real optimization: rewrote the 5 small `SCENARIOS` with the decoy-by-title pattern (best-answer terms in description/body/bullets, decoy terms in title) so the equal-midpoint baseline under-ranks and a differentiated boost lifts ≥ 0.10 (per-scenario headroom: baseline 0.561–0.690, lift +0.230 to +0.295, all `best < 0.99`); bumped small-scenario `max_trials` 12 → 50 via the new shared `DEMO_SMALL_STUDY_MAX_TRIALS` constant single-sourced from `scripts/seed_meaningful_demos.py` (imported by `demo_seeding.py` so CLI + home-button reseed can't drift) so demo studies clear `STUDIES_TPE_WARMUP_FLOOR` and convergence reads `converged`/`still_improving` instead of a uniform `too_few_trials`. New tests: engine-backed headroom (6 — 5 scenarios + resolver-parity guard; ES/OS hard-gated in CI via `_require_es_or_fail`, Solr skip-gated per D-18); shape invariants (21 — full {0,1,2,3} rubric per query); max_trials single-source guards (4); heavy-lane AC-7/AC-8 block reading persisted `Study.baseline_metric`/`best_metric` via the live list path. Tangential inline fix: `/healthz` contract test now accepts the `solr` subsystem the live response carries (live since 2026-05-31). No migration (head stays `0022`). Cross-model: Epic 2 phase-gate GPT-5.5 cycle 1 (6 findings — 4 accepted+fixed, 1 accepted-as-comment, 1 deferred to docs), cycle 2 clean; final GPT-5.5 review (2 findings — both rejected: Solr-CLI scope is `infra_adapter_solr` Story A13 territory, header-tooltip UX matches the sibling-column convention); Gemini (2 pre-rebase findings on Epic 1 code — moot after rebase). All 12 `pr.yml` checks green. -_(older entries — full narrative in [`state_history.md`](state_history.md): `bug/cli-seed-ubi-missing-engine-type` PR #419, `chore_template_library_expansion` PR #416, `infra_smoke_reseed_runtime_budget` PR #424, `infra_solr_smoke_stability` PR #383, `infra_solr_ci_readiness` Phase 1 PR #367, MVP2 backlog batch PR #364, `feat_study_convergence_indicator` PR #352, `feat_overnight_autopilot` PR #343, `infra_adapter_solr` PR #336, …)_ +_(older entries — full narrative in [`state_history.md`](state_history.md): `feat_studies_convergence_visibility` PR #421/#422, `bug/cli-seed-ubi-missing-engine-type` PR #419, `chore_template_library_expansion` PR #416, `infra_smoke_reseed_runtime_budget` PR #424, `infra_solr_smoke_stability` PR #383, `infra_solr_ci_readiness` Phase 1 PR #367, MVP2 backlog batch PR #364, `feat_study_convergence_indicator` PR #352, `feat_overnight_autopilot` PR #343, `infra_adapter_solr` PR #336, …)_ ## In flight -- **`infra_generated_artifact_freshness_gate`** — branch ready, PR forthcoming. 8 commits (Stories 1.1, 1.2, 1.2-fix, 2.1, 2.2 a, 2.2 b, 2.3, 2.4). 48 new test cases. AC-7 clean-tree determinism verified locally. +- _None._ Both 2026-06-03 features (`infra_generated_artifact_freshness_gate` PR #433, `feat_list_count_columns` PR #436) merged. - **Plan-stage, `/impl-execute`-ready (no gates):** the 4 remaining PR #413 (2026-06-02) spec/plan pairs in `02_mvp2/` (`chore_template_library_expansion` shipped via PR #416): `chore_studies_post_arq_spy_fixture`, `bug_judgment_header_omits_click_bucket`, `bug_baseline_phase_test_isolation`, `chore_ubi_reader_search_after_pagination`. Plus the 5 pairs from PR #364 still pending after this PR ships — of which two are **design-ahead** (`feat_apply_path_normalizer_declaration` + `feat_query_normalizer_typed_pipeline`, both gated on `feat_query_normalization_tuning` Phase 1 merging — do not `/impl-execute` until then); the other three (`feat_overnight_studies_summary_card`, `chore_arq_pool_aclose_deprecation`, `chore_cluster_detail_rung_badge`) are ungated. ## Queued (priority-ordered by dashboard / dep graph)