From bca81a4fa7356bb0f51c407668004f61a654bc50 Mon Sep 17 00:00:00 2001 From: SoundMindsAI Date: Wed, 3 Jun 2026 18:18:09 -0400 Subject: [PATCH] docs(state): finalize feat_studies_list_trial_convergence_columns (PR #438) Prepend the #438 entry to Last 5 merges (noting it restored the lost PR #421 Story 1.2 columns + corrected the doc-drift); drop the now-6th entry to the older-entries pointer; update branch context + in-flight to reflect all three 2026-06-03 features merged. Co-Authored-By: Claude Opus 4.8 (1M context) Signed-off-by: SoundMindsAI --- state.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/state.md b/state.md index 6cda3051..62e651e7 100644 --- a/state.md +++ b/state.md @@ -16,8 +16,8 @@ MVP1 (v0.1) **shipped** — all six differentiators live (Bayesian/TPE optimizer ## Current branch / execution context -- **Branch:** `main` (PR #436 `feat_list_count_columns` just merged `606d43d9`; PR #433 `infra_generated_artifact_freshness_gate` merged earlier the same day). All `pr.yml` checks green (smoke skipped — opt-in/off). -- **Active feature:** _None in flight._ `feat_list_count_columns` shipped 2026-06-03 (PR #436). Next: pull from the MVP2 Idea/Plan backlog (run `/pipeline status`). +- **Branch:** `main` (PR #438 `feat_studies_list_trial_convergence_columns` just merged `03976c5e`; PRs #436 + #433 merged earlier the same day). All `pr.yml` checks green (smoke skipped — opt-in/off). +- **Active feature:** _None in flight._ `feat_studies_list_trial_convergence_columns` shipped 2026-06-03 (PR #438). Next: pull from the MVP2 Idea/Plan backlog (run `/pipeline status`). - **Alembic head:** `0022_solr_engine_auth_check` (added by `infra_adapter_solr` Story A6 — extends `clusters.engine_type` + `clusters.auth_kind` CHECK constraints for Solr). - **Python:** 3.13. **Frontend stack:** Next 16 (App Router + Turbopack), React 19, Tailwind 4 (CSS-first), Vitest 4, ESLint 9 (flat), TypeScript 6, Playwright (chromium, single worker) for E2E. - **Coverage gates:** backend 80% (`fail_under` in pyproject), UI vitest + tsc + ESLint + Next build, plus a full-stack smoke E2E job. Live pass counts: see the latest `pr.yml` run (the historical per-feature counts moved to `state_history.md`). @@ -26,16 +26,16 @@ MVP1 (v0.1) **shipped** — all six differentiators live (Bayesian/TPE optimizer Detail + reasoning for each is in [`state_history.md`](state_history.md). +- **2026-06-03** — `feat_studies_list_trial_convergence_columns` (PR #438, squash-merged `03976c5e`). **Restored lost work, not a new feature.** The `/studies` list was supposed to show **Trials** (`trial_count`) + **Convergence** (`convergence_verdict`) columns — built + reviewed under `feat_studies_convergence_visibility` (PR #421, Story 1.2 / commit `ed5ca276`) — but `ed5ca276` was dropped in the PR #421/#422 rebase that de-duplicated Epic 1 commits. Only the Story 1.1 *backend* fields (`b90d5477`) landed; the frontend columns never reached main, so the two fields sat returned-but-unsurfaced for ~30h. Discovered while the operator asked to "add a trials column" — the backend already returned it. Restored by cherry-picking `ed5ca276` (the reviewed original > a fresh reimpl: header `tooltipKey`, `hideable`, `satisfies Record` exhaustiveness, the `study.trial_count` glossary key, a dedicated unit test + an E2E spec). The restoration caught a `TooltipProvider` gap in `page.test.tsx` the lost commit never exercised (the header `tooltipKey` renders an `` whose Radix `Tooltip` needs a provider; `layout.tsx` supplies one in prod but the isolated page render didn't) — confirming `ed5ca276` never ran the full suite on main. Corrected the false "shipped" claims in `state_history.md` + the `feat_studies_convergence_visibility` plan tracker (CORRECTION annotations). Regenerated guide 06's studies-list screenshot against the populated stack (shows TRIALS 50/200/200/15 + CONVERGED / TOO FEW TRIALS badges). No migration (head stays `0022`). Cross-model: Gemini 3 (all accepted — forward-compat `if (!badge)` guard for an unmapped verdict + a regression test, since `convergence_verdict` is backend-COMPUTED not a fixed DB-enum so a rolling deploy could emit an unmapped value; + 2 `ReactNode` import cleanups); GPT-5.5 final 1 (rejected — prototype-method-named verdict is unreachable for a computed classification + cosmetic + matches the `StatusBadge` plain-index precedent). All 17 `pr.yml` checks green. - **2026-06-03** — `feat_list_count_columns` (PR #436, squash-merged `606d43d9`). Adds an at-a-glance count column to two list tables: `/query-sets` gains **Queries** (`query_count`) and `/templates` gains **Parameters** (`param_count` = the template's tuning surface). Two different impls by data shape: `query_count` counts child `queries` rows via a new **batched `GROUP BY` aggregate** `repo.count_queries_for_sets` (one query/page, no N+1 — mirrors `count_trials_for_studies` from `feat_studies_convergence_visibility`; `QuerySetSummary` had previously *omitted* the count "to avoid N+1 at list time", an objection the batch removes); `param_count` is `len(declared_params)` — free, since `declared_params` is a JSONB column already on the template row (not a child relationship). Bug caught by mypy mid-impl: the aggregate column is labeled `query_count` NOT `count` — SQLAlchemy `Row` is tuple-like + exposes a built-in `.count()` method, so `row.count` would resolve to the bound method. Regenerated `ui/openapi.json` + `types.ts` (the freshness gate validated them green). Regenerated in-app guides 03 + 04 against a populated stack (`make up` + `make seed-demo` mid-session) so the list screenshots show the new columns with real data; promoted the walkthrough videos to match; dropped a briefly-filed `chore_guide_regen_*` idea once the populated regen made it obsolete. No migration (head stays `0022`). 14 new tests (5 integration + 2 contract + 7 vitest). Cross-model: Gemini 1 (rejected — `len(... or {})` guards an unreachable NULL; `declared_params` NOT NULL in model + migration 0003); GPT-5.5 final 2 (both rejected — slim-diff false positives claiming types.ts wasn't updated; it was, + tsc green). All 17 `pr.yml` checks green. - **2026-06-03** — `infra_generated_artifact_freshness_gate` (PR #433, squash-merged `c5c36c65`; finalized via docs PR #435 `0dab5ec3`). Both phases shipped together: Phase 1 (`copy-docs` freshness gate) + Phase 2 (offline OpenAPI exporter + `openapi.json` snapshot + `types.ts` gate + chained fix). The standalone `infra_openapi_types_freshness_gate/` folder was retired at finalization. **Phase 1:** `copy-docs.mjs` now prunes `ui/public/docs/` to `{README.md} ∪ {DOCS[].dest}` (FR-9, so a renamed entry never leaves a stale public copy); new `.github/workflows/copy-docs-freshness.yml` runs on every PR with no `paths-ignore` filter (FR-3 escape from pr.yml's `docs/**` filter so docs-only PRs still get the check). **Phase 2:** `backend/app/openapi_export.py` emits the canonical OpenAPI schema offline (no live DB/Redis/ES/OpenSearch/Solr/OpenAI — verified by `test_openapi_export.py` running against deliberately-unreachable REDIS_URL); `ui/openapi.json` (149KB, 52 paths) committed as the canonical snapshot; `gen-types.mjs` refactored to use the lockfile-pinned `node_modules/.bin/openapi-typescript` (no `npx` fallback) with a source-invariant banner extracted to the pure module `ui/scripts/gen-types-banner.mjs`; new `pr.yml` job `generated-artifacts-fresh` runs the snapshot + types guards + an AC-7 clean-tree determinism step that proves the regenerator is itself deterministic across runs. Single chained fix command: `bash scripts/regen-generated-artifacts.sh`. New `ui/.prettierignore` lists `src/lib/types.ts` + `public/docs/*.md` — the generator is the source of truth, prettier on them would make the gates flap. Tangential inline fix: `studies-table-ceiling-badge.test.tsx` fixture was missing `trial_count: 0`. 48 new test cases (10 backend unit + 11 + 6 vitest + 7×3 shell-guard self-tests). No migration (head stays `0022`). Cross-model: Epic 1 GPT-5.5 3 findings (1 fixed, 2 rejected); Epic 2 GPT-5.5 5 findings (all rejected); Gemini 3 (all accepted — atexit cleanup, atomic-write try/finally, Windows shell flag); final GPT-5.5 clean. All 17 `pr.yml` checks green. - **2026-06-03** — `chore_scorecard_pin_deps_postcss` (PR #430). Resolved the actionable OSSF Scorecard findings on the public code-scanning surface — the one real vulnerability + the ~60 `PinnedDependencies` alerts. **Vulnerability #72:** `postcss < 8.5.10` (moderate XSS via unescaped `` in CSS stringify) was transitive — `next@16.2.6` hard-pins `postcss@8.4.31`; added a pnpm `overrides` (`postcss@<8.5.10` → `^8.5.15`) so the whole tree (incl. Next's bundled copy) resolves to 8.5.15, regenerated `ui/pnpm-lock.yaml`, verified `pnpm build` + 1008 vitest green. **PinnedDependencies:** pinned all 56 GitHub Action `uses:` refs to 40-char commit SHAs (`# vX` comments) across all 5 workflows; pinned the 4 `pr.yml` service-container images (postgres/redis/elasticsearch/opensearch) by manifest digest; pinned the Dockerfile base images by digest via single `BASE_IMAGE` ARGs (`python:3.14-slim` in `Dockerfile` — collapsed from the original split `PYTHON_VERSION`/`PYTHON_DIGEST` after Gemini flagged the digest-wins-over-tag override footgun; `node:26-bookworm-slim` declared once + reused by the 3 `ui/Dockerfile` stages). Dependabot already runs github-actions + docker weekly so the pins stay fresh. **Left intentionally:** npmCommand (`npm install -g pnpm@9`) + pipCommand (docs-site `pip install`) — impractical to hash-pin, not "images"; workflow `services.*.image` digests need manual refresh (Dependabot's github-actions ecosystem updates `uses:` only); Tier-3 intrinsic findings (relaxed branch protection, solo-dev review ratio, project age, fuzzing, OpenSSF badge, SAST). No `backend/app/` source, no migration (head stays `0022`). Cross-model: Gemini 2 findings (both accepted + fixed — the `BASE_IMAGE` consolidations above), each re-validated with `docker buildx build --check`. Both `docker buildx` CI jobs green on the final commit. - **2026-06-02** — `bug_llm_capability_cache_no_refresh` (PR #426, squash-merged `432dcf59`). The OpenAI capability check ran exactly once at api startup (`main.py:94`, fire-and-forget lifespan task) + cached in Redis with a 24h TTL (`capability_check.py:48`); nothing repopulated it, so any stack up >24h silently lost all LLM-dependent capability — `POST /judgments/generate` returned `503 LLM_PROVIDER_INCAPABLE "cache miss"` until an api restart. Confirmed live at 34h uptime (zero `openai:capabilities:*` keys; `docker compose restart api` fixed it). **Fix (Option A, locked at preflight D-1):** new `read_or_recompute_capability_result()` helper reads the cache, recomputes inline via `check_capabilities()` on miss (writes back), returns `None` on empty key (preserves the `/healthz` "no key" semantic). `agent_judgments_dispatch._check_llm_preflight` opts in; `/healthz` (200ms SLO, Rule #11) + chat orchestrator stay read-only (D-5). A per-worker `asyncio.Lock` single-flight + in-lock double-checked read collapses concurrent in-worker recompute bursts to 1 probe (D-4, refined after GPT-5.5 caught the original "WEB_CONCURRENCY × probes" bound undercounting concurrent requests); defensive try/except returns `None` on unexpected failure (→ caller's existing 503 envelope, not a bare 500). Options B (background refresh) + C (stale-but-usable) rejected (D-2/D-3). Shipped via `/bug-fix --ship` → `/impl-execute --ad-hoc`. No `backend/app/` source beyond the helper + call-site swap, no migration (head stays `0022`). 7 unit tests (`TestReadOrRecomputeCapabilityResult`) + 1 integration test (`test_generate_recovers_after_capability_cache_expiry`); test-fixture monkeypatch sites updated to the new symbol. 2194 unit pass, 330 contract pass. Cross-model: Gemini 4 (1 accepted — `api_key: str | None`; 3 rejected as hunk-isolated false positives on `AsyncMock.assert_not_awaited`, stdlib since 3.8), GPT-5.5 final 2 (both accepted — the asyncio.Lock single-flight + the exception wrapper, each with a new regression test). Ride-along: `/idea-preflight` SKILL.md routing fix (no longer hard-codes `/pipeline --auto` — routes to `/bug-fix`/`/impl-execute --ad-hoc` by prefix+scope). All 12 `pr.yml` checks green. -- **2026-06-02** — `infra_smoke_reseed_runtime_budget` (PR #424, squash-merged `035d7941`). Clears the last of the three-PR Solr-CI debt chain (`infra_solr_ci_readiness` backend half → `infra_solr_smoke_stability` Solr boot → this, the reseed-runtime half). The smoke job's `demo-ubi.spec.ts` `beforeAll` reseed exceeded the 25-min `timeout-minutes` cap once Solr actually booted (AC-8 of `feat_demo_ubi_study_comparison` bounds the in-flight reseed at 1140s/~19 min hard ceiling, ~28 min worst case per §14 — Playwright + setup overhead pushed total past 25 min; PR #383 run 26790636716 hit it at 25:18). **Fix (Option A, locked at idea-preflight):** extend `ui/playwright.config.ts`'s `testIgnore` CI-gated branch by one entry (`'**/demo-ubi.spec.ts'`, the 7th alongside the 6 pre-existing demo-data-dependent specs) — the `process.env.CI ? [...] : []` ternary gates it to GHA runs, so local `make up` smoke (`CI=` unset) keeps full demo-ubi coverage. Option B (timeout bump → 35 min) rejected (D-3: <7 min margin against §14 worst case); Option C (env-var reseed scenario filter, ~2-3h multi-file) deferred per operator (D-2). New vitest regression guard `ui/src/__tests__/playwright-config-test-ignore.test.ts` (3 assertions: demo-ubi in CI branch, all 7 entries present, demo-ubi not outside the ternary). Runbook `docs/03_runbooks/smoke-solr-stability.md` §5 documents the exclusion + the reseed-runtime-vs-Solr-stability split; pr.yml + state.md stale "exceeds the cap" framing refreshed to "runtime block cleared, flip `SMOKE_TEST=true` after the §16 `playwright test --list` verification". 5 stories / 1 epic. No `backend/app/` source, no migration (head stays `0022`). §16 manual verification confirmed AC-1 (`CI=true` → 86 tests/30 files, 0 demo-ubi) + AC-2 (`CI=` unset → 110 tests/37 files, demo-ubi discovered). Cross-model: spec GPT-5.5 3 cycles (13 findings, all applied), plan GPT-5.5 3 cycles (11 findings, all applied), Gemini 2 (both accepted — `import.meta.url` path resolution + CRLF normalization), GPT-5.5 final 3 (2 accepted: §4→§5 pointer + runbook markdown links; 1 rejected: AC-7 file-shape re-raise, counter-evidence cited). All 12 `pr.yml` checks green. -_(older entries — full narrative in [`state_history.md`](state_history.md): `feat_studies_convergence_visibility` PR #421/#422, `bug/cli-seed-ubi-missing-engine-type` PR #419, `chore_template_library_expansion` PR #416, `infra_smoke_reseed_runtime_budget` PR #424, `infra_solr_smoke_stability` PR #383, `infra_solr_ci_readiness` Phase 1 PR #367, MVP2 backlog batch PR #364, `feat_study_convergence_indicator` PR #352, `feat_overnight_autopilot` PR #343, `infra_adapter_solr` PR #336, …)_ +_(older entries — full narrative in [`state_history.md`](state_history.md): `bug_llm_capability_cache_no_refresh` PR #426, `infra_smoke_reseed_runtime_budget` PR #424, `feat_studies_convergence_visibility` PR #421/#422, `bug/cli-seed-ubi-missing-engine-type` PR #419, `chore_template_library_expansion` PR #416, `infra_solr_smoke_stability` PR #383, `infra_solr_ci_readiness` Phase 1 PR #367, MVP2 backlog batch PR #364, `feat_study_convergence_indicator` PR #352, `feat_overnight_autopilot` PR #343, `infra_adapter_solr` PR #336, …)_ ## In flight -- _None._ Both 2026-06-03 features (`infra_generated_artifact_freshness_gate` PR #433, `feat_list_count_columns` PR #436) merged. +- _None._ The three 2026-06-03 features (`infra_generated_artifact_freshness_gate` PR #433, `feat_list_count_columns` PR #436, `feat_studies_list_trial_convergence_columns` PR #438) all merged. - **Plan-stage, `/impl-execute`-ready (no gates):** the 4 remaining PR #413 (2026-06-02) spec/plan pairs in `02_mvp2/` (`chore_template_library_expansion` shipped via PR #416): `chore_studies_post_arq_spy_fixture`, `bug_judgment_header_omits_click_bucket`, `bug_baseline_phase_test_isolation`, `chore_ubi_reader_search_after_pagination`. Plus the 5 pairs from PR #364 still pending after this PR ships — of which two are **design-ahead** (`feat_apply_path_normalizer_declaration` + `feat_query_normalizer_typed_pipeline`, both gated on `feat_query_normalization_tuning` Phase 1 merging — do not `/impl-execute` until then); the other three (`feat_overnight_studies_summary_card`, `chore_arq_pool_aclose_deprecation`, `chore_cluster_detail_rung_badge`) are ungated. ## Queued (priority-ordered by dashboard / dep graph)