Figma pipeline think-once redesign + workbench accuracy harness + heroui-20260606 trial#5
Open
raveracker wants to merge 119 commits into
Open
Figma pipeline think-once redesign + workbench accuracy harness + heroui-20260606 trial#5raveracker wants to merge 119 commits into
raveracker wants to merge 119 commits into
Conversation
Benchmarks every pipeline agent for token/time/accuracy via Claude Code OTEL (per-agent attribution through agent.name + llm_request spans), scores generated components against a hosted-site oracle, and runs a scenario matrix (icon fan-in, complexity tiers, cold/warm cache, build/update). Outputs report.md + results.json + dashboard.html. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
14 TDD tasks building collector (OTLP receiver) + runner env/matrix + analyzer (per-agent tokens incl. thinking-estimate, time, fan-in blocking, dominance) + report (md + inline-SVG html). Fixture-driven, no new runtime deps (node:test). Accuracy oracle + live trial deferred to Plan 2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…bodies Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…llup Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…code & nullish fixes, fan-in/ns docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements renderMarkdown (Task 9) — results.json → Markdown string with per-agent rollup table, dominance, icon fan-in blocking, OTEL cross-check, and accuracy placeholder. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hybrid oracle (HeroUI Storybook for component rungs, Figma for template/page), ~9-run trial across a 7-rung complexity ladder, build-from-scratch (atomic) so accuracy is a real fidelity gradient. Adds 4 scorers + composite, trialset aggregator, report accuracy section; fills Plan 1's reserved accuracy:null. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…trial 15 TDD tasks: 4 pure scorers (visual/style/structural/gates) + composite, zlib PNG decoder, 7-rung ladder + score glue, trialset aggregator, report accuracy/ladder extensions, e2e fixtures, capture stubs, and the operator runbook for the 9-run HeroUI trial. Playwright (capture only) is the sole new dep. Fills Plan 1's reserved accuracy:null. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…stall deferred to live phase)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ff precision (review) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes RTK detection/docs/config across README, wizard (Claude + Cursor), init command, cli protocol, PIPELINE, CHANGELOG, config schema + example, and the fcc doctor. Redux Toolkit references are unaffected. Also drops root .DS_Store. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…uard + JSON fallback Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…allback) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ust stdin read safeEncode now compares wire length to JSON length and only emits the Brevit wire form when it is strictly smaller, making brevit opportunistic/never-harmful. readPayloadInput replaces readFileSync(0) with readStdinSync() that drains fd 0 tolerating EAGAIN on pipes. Adds 3 new tests (118/118 green). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…fest/cli cross-refs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… available Hardens MCP failure path in coordinator and fetcher (+ Cursor mirrors): - Forbids claude/claude --agent CLI re-entry from Bash (proven wasted-run bug) - Adds HARD ABORT code 3 on "No such tool available" to error table - Adds one transient retry in fetcher before declaring reachability fail - Applies equivalent deltas to .cursor/prompts/ mirrors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…trials, never scored Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the coordinator-produced buildPlan artifact (Step 8.5) in both the canonical manifest protocol and its Cursor mirror, establishing the field-authority contract that lets builders execute mechanically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t directive slices Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pi/render-mode Each builder agent now has an "Execute the directive — do not re-reason" section that instructs it to consume the coordinator's think-once buildPlan fields directly (Brevit wire form or JSON), rather than re-deriving them. Fields per agent: component-builder (resolvedLayer, apiShape, renderMode, requiredA11y, unboundDecision, dropPolicy), icon-generator (fillModel, a11y), story-author (resolvedLayer, apiShape), test-author (resolvedLayer, apiShape, renderMode). Cursor mirrors updated with matching delta notes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…builds Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n-type coverage Fixes workbench report 05 defects: collapsed @theme (hollow semantic.css), single-mode collapse (light+dark → only light emitted), and silently dropped blur/effect tokens. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…(off-by-one fix) Replace structural flat-tree heuristic with INTENT-signal classification (form-control children, button-rows, multi-region, full-canvas) so Input/Card/Form are graded at the correct tier. Emit layerConfidence (high|medium|low); low is a FLAG for the coordinator's think-once pass (Step 8.5) — not a silent down-grade. Updated protocol, fetcher agent, manifest contract rules, and Cursor mirrors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…a11y, no placeholder copy Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…hard checklist Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…le utilities - Resolve aria-hidden/aria-label contradiction: decorative = aria-hidden, labelled = role=img, never both (report-04) - Enforce named re-exports uniformly in icon barrel; drop mixed default/named form that broke render build (report-08) - Add named scale utility preference over arbitrary brackets to tailwind-v4 adapter Gotchas - Mirror both icon rules into .cursor/prompts/icon-generator.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the `brevit` optional-integration block (sibling to `graphify`) to config.schema.json and mirrors it into config.example.json. Also adds `figma.dsUrl` to the figma sub-schema for the token-system design-system URL captured at wizard Step 7.9. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…igma Adds Step 7.55 (Brevit npm install, project-dependency exception to verify-don't-build posture) and Step 7.9 (greenfield opt-in DS token build via figma-coordinator tokens-only) to the wizard recipe and its Cursor mirror + command summary. Updates the "why no auto-install" section to document Brevit as the explicit project-dep exception. Step 8 report block gains Brevit and Design system summary lines. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…verage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… @-keys on decode; harden tests - Replace static top-level `import from 'brevit'` with a lazy cached loader so ERR_MODULE_NOT_FOUND at module-link time can never crash unrelated commands (fcc --version etc.); encode() throws when client is null, roundTrips() catches it, and safeEncode() falls back to JSON — chain confirmed. - Remove blanket `@`-line skip in decodeText so JSON-LD-style keys like `@id` are preserved on decode (abbreviations are hard-OFF so no stray headers exist). - Add 1ms Atomics.wait backoff in readStdinSync EAGAIN branch to avoid hot spin on a non-blocking fd 0. - Harden safeEncode fallback test: pin exact JSON.stringify output instead of weak truthiness check. - Add pure-function tests (flatten/unflatten/decodeText require no brevit), @-key preservation test, fcc --version lazy-import confirmation, and CLI-level @id round-trip test. 123 → 127 tests, all green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rs onto each run Adds readReachability(dir) helper to build-results.mjs that reads figma-manifest.json for reachabilityStatus, scans for degraded marker files (contract.json, mcp-probe.sh, mcp-call.sh) in the trial dir and scratch/, and checks for a zero-byte fetcher-output.txt. The three fields are spread onto every run object so the scoring guard can read them. Malformed figma-manifest.json is caught and falls back to null. Two new self-contained TDD tests (tmp dirs, no shared fixture mutation) bring the suite from 127 to 129 passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ing TRIAL dir Wire isScorableTrial into run-accuracy.mjs so degraded/fail/zero-byte runs are skipped (accuracy.composite=null, unscorable reason recorded) while legacy captures (no reachability data) score with a warning. Add existsSync guard inside runAccuracy() and openShots() so a missing TRIAL dir throws an explicit error instead of silently reading a stale/deleted default path. Add scorability-gate.test.mjs (5 unit tests) covering all gate branches. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ection review Re-point the scored ladder to the new complexity selection (atom/chip/molecule/ switch/organism/all-icons/tabs/template/extreme + tokens), each mapped to its hero-ui v3 oracle source. Adds the component-selection review doc. Trial data (ladder-nodes.json, STEPS.md, resume-trial.sh, target/, ref-heroui/) is local per the workbench gitignore, as with the prior trial. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e), keep heavy data ignored The heroui-20260606 STEPS.md was invisible because workbench/trials/ was fully gitignored. Narrow the ignore to data only so the run steps, ladder definition, and env script are version-controlled + visible in the repo/PR; target/, ref-heroui/, bodies/, node_modules/ stay ignored. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… heroui-20260603 baseline Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…roui-20260603 - Runs renamed from atomic-design rungs (atom/molecule/organism/template) to complexity tiers (trivial-/moderate-/complex-/extreme-<component>). - config designMethodology atomic -> flat; component placement is flat (target/src/components/<Name>/); rung-map targetTsx flattened. - Delete the heroui-20260603 reports dir + stale KG handovers; scrub the trial id from harness TRIAL defaults, run-one comment, and the run-manifest test fixture (historical design/plan docs keep it as a dated record). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…holder in historical specs/plans Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reverse-engineers the figma-to-code agents off the
heroui-20260603workbench trial to hit 98–100% source/structural accuracy, ~80% fewer tokens, zero MCP-fallback failures, think-first execution — and re-tools the workbench to validate it.Agent redesign
buildPlan(per-component directives: resolvedLayer/apiShape/renderMode/requiredA11y/tokenBindings/unboundDecision); builders execute the directive, stop re-deriving. This is the core token-savings lever.claude --agentBash-subprocess re-entry that wasted a whole trial run (proven RCA);No such tool available→ clean hard-abort; fetcher one-shot transient retry; workbench guard so degraded manifests are never scored.var()semantic aliases →@theme inline) + per-mode[data-theme]+ all token types (fixes the droppedblur+ hollowsemantic.css).layerConfidence; component-builder"use client"/zero-TODO/compound/a11y/no-placeholder self-check; react adapter never narrows native unions; icon a11y + barrel consistency; tailwind named scale utilities.fcc brevit:encodeemits the flattened form only when it round-trips AND is smaller than JSON, else raw JSON; lazy import so a missing brevit never crashes the CLI).Workbench
isScorableTrial/reachabilityStatusproducer wired into scoring; silent stale-TRIAL-default now fails loudly; de-staled report message.heroui-20260606trial — 11-rung complexity ladder (adds tokens/switch/tabs/chip/dashboard-demo/calendar rungs that each target a specific fix), fresh oracle (herouiv3), selection review doc + STEPS. Old trial data cleaned (1.6 G freed). All ladder Figma node IDs verified live.Validation
docs/superpowers/.STEPS.md);rung-map.targetTsxpaths are predicted and reconciled to build output before scoring (STEPS §6).🤖 Generated with Claude Code