Figma pipeline think-once redesign + workbench accuracy harness + heroui-20260606 trial by raveracker · Pull Request #5 · raveracker/figma-code-composer

raveracker · 2026-06-06T06:10:22Z

Reverse-engineers the figma-to-code agents off the heroui-20260603 workbench trial to hit 98–100% source/structural accuracy, ~80% fewer tokens, zero MCP-fallback failures, think-first execution — and re-tools the workbench to validate it.

Agent redesign

Think-once layer — coordinator Step 8.5 emits one buildPlan (per-component directives: resolvedLayer/apiShape/renderMode/requiredA11y/tokenBindings/unboundDecision); builders execute the directive, stop re-deriving. This is the core token-savings lever.
MCP hardening — bans the claude --agent Bash-subprocess re-entry that wasted a whole trial run (proven RCA); No such tool available → clean hard-abort; fetcher one-shot transient retry; workbench guard so degraded manifests are never scored.
Full-DS tokens — fetcher full-variable-collection mode; token-builder emits three real layers (primitives → var() semantic aliases → @theme inline) + per-mode [data-theme] + all token types (fixes the dropped blur + hollow semantic.css).
Accuracy fixes — intent-based layer classification + layerConfidence; component-builder "use client"/zero-TODO/compound/a11y/no-placeholder self-check; react adapter never narrows native unions; icon a11y + barrel consistency; tailwind named scale utilities.
Brevit wire format — opportunistic + size-guarded (fcc brevit:encode emits the flattened form only when it round-trips AND is smaller than JSON, else raw JSON; lazy import so a missing brevit never crashes the CLI).
Wizard installs brevit + opt-in design-system token build from Figma.
RTK (Rust Token Killer) integration removed (Redux Toolkit untouched).

Workbench

Accuracy + render harness (visual/style/struct/gates), quality judge panels, dashboard.
isScorableTrial/reachabilityStatus producer wired into scoring; silent stale-TRIAL-default now fails loudly; de-staled report message.
New heroui-20260606 trial — 11-rung complexity ladder (adds tokens/switch/tabs/chip/dashboard-demo/calendar rungs that each target a specific fix), fresh oracle (heroui v3), selection review doc + STEPS. Old trial data cleaned (1.6 G freed). All ladder Figma node IDs verified live.

Validation

135/135 harness tests green.
Spec + plan in docs/superpowers/.
The live trial run itself is the operator step (two terminals per STEPS.md); rung-map.targetTsx paths are predicted and reconciled to build output before scoring (STEPS §6).

🤖 Generated with Claude Code

Benchmarks every pipeline agent for token/time/accuracy via Claude Code OTEL (per-agent attribution through agent.name + llm_request spans), scores generated components against a hosted-site oracle, and runs a scenario matrix (icon fan-in, complexity tiers, cold/warm cache, build/update). Outputs report.md + results.json + dashboard.html. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

14 TDD tasks building collector (OTLP receiver) + runner env/matrix + analyzer (per-agent tokens incl. thinking-estimate, time, fan-in blocking, dominance) + report (md + inline-SVG html). Fixture-driven, no new runtime deps (node:test). Accuracy oracle + live trial deferred to Plan 2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…bodies Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…llup Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…code & nullish fixes, fan-in/ns docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements renderMarkdown (Task 9) — results.json → Markdown string with per-agent rollup table, dominance, icon fan-in blocking, OTEL cross-check, and accuracy placeholder. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Hybrid oracle (HeroUI Storybook for component rungs, Figma for template/page), ~9-run trial across a 7-rung complexity ladder, build-from-scratch (atomic) so accuracy is a real fidelity gradient. Adds 4 scorers + composite, trialset aggregator, report accuracy section; fills Plan 1's reserved accuracy:null. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…trial 15 TDD tasks: 4 pure scorers (visual/style/structural/gates) + composite, zlib PNG decoder, 7-rung ladder + score glue, trialset aggregator, report accuracy/ladder extensions, e2e fixtures, capture stubs, and the operator runbook for the 9-run HeroUI trial. Playwright (capture only) is the sole new dep. Fills Plan 1's reserved accuracy:null. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…stall deferred to live phase)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ff precision (review) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Removes RTK detection/docs/config across README, wizard (Claude + Cursor), init command, cli protocol, PIPELINE, CHANGELOG, config schema + example, and the fcc doctor. Redux Toolkit references are unaffected. Also drops root .DS_Store. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…uard + JSON fallback Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…allback) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ust stdin read safeEncode now compares wire length to JSON length and only emits the Brevit wire form when it is strictly smaller, making brevit opportunistic/never-harmful. readPayloadInput replaces readFileSync(0) with readStdinSync() that drains fd 0 tolerating EAGAIN on pipes. Adds 3 new tests (118/118 green). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…fest/cli cross-refs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… available Hardens MCP failure path in coordinator and fetcher (+ Cursor mirrors): - Forbids claude/claude --agent CLI re-entry from Bash (proven wasted-run bug) - Adds HARD ABORT code 3 on "No such tool available" to error table - Adds one transient retry in fetcher before declaring reachability fail - Applies equivalent deltas to .cursor/prompts/ mirrors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…trials, never scored Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Documents the coordinator-produced buildPlan artifact (Step 8.5) in both the canonical manifest protocol and its Cursor mirror, establishing the field-authority contract that lets builders execute mechanically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…t directive slices Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…pi/render-mode Each builder agent now has an "Execute the directive — do not re-reason" section that instructs it to consume the coordinator's think-once buildPlan fields directly (Brevit wire form or JSON), rather than re-deriving them. Fields per agent: component-builder (resolvedLayer, apiShape, renderMode, requiredA11y, unboundDecision, dropPolicy), icon-generator (fillModel, a11y), story-author (resolvedLayer, apiShape), test-author (resolvedLayer, apiShape, renderMode). Cursor mirrors updated with matching delta notes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…builds Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@theme

…n-type coverage Fixes workbench report 05 defects: collapsed @theme (hollow semantic.css), single-mode collapse (light+dark → only light emitted), and silently dropped blur/effect tokens. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…(off-by-one fix) Replace structural flat-tree heuristic with INTENT-signal classification (form-control children, button-rows, multi-region, full-canvas) so Input/Card/Form are graded at the correct tier. Emit layerConfidence (high|medium|low); low is a FLAG for the coordinator's think-once pass (Step 8.5) — not a silent down-grade. Updated protocol, fetcher agent, manifest contract rules, and Cursor mirrors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…a11y, no placeholder copy Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…hard checklist Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…le utilities - Resolve aria-hidden/aria-label contradiction: decorative = aria-hidden, labelled = role=img, never both (report-04) - Enforce named re-exports uniformly in icon barrel; drop mixed default/named form that broke render build (report-08) - Add named scale utility preference over arbitrary brackets to tailwind-v4 adapter Gotchas - Mirror both icon rules into .cursor/prompts/icon-generator.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds the `brevit` optional-integration block (sibling to `graphify`) to config.schema.json and mirrors it into config.example.json. Also adds `figma.dsUrl` to the figma sub-schema for the token-system design-system URL captured at wizard Step 7.9. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…igma Adds Step 7.55 (Brevit npm install, project-dependency exception to verify-don't-build posture) and Step 7.9 (greenfield opt-in DS token build via figma-coordinator tokens-only) to the wizard recipe and its Cursor mirror + command summary. Updates the "why no auto-install" section to document Brevit as the explicit project-dep exception. Step 8 report block gains Brevit and Design system summary lines. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…verage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@id

… @-keys on decode; harden tests - Replace static top-level `import from 'brevit'` with a lazy cached loader so ERR_MODULE_NOT_FOUND at module-link time can never crash unrelated commands (fcc --version etc.); encode() throws when client is null, roundTrips() catches it, and safeEncode() falls back to JSON — chain confirmed. - Remove blanket `@`-line skip in decodeText so JSON-LD-style keys like `@id` are preserved on decode (abbreviations are hard-OFF so no stray headers exist). - Add 1ms Atomics.wait backoff in readStdinSync EAGAIN branch to avoid hot spin on a non-blocking fd 0. - Harden safeEncode fallback test: pin exact JSON.stringify output instead of weak truthiness check. - Add pure-function tests (flatten/unflatten/decodeText require no brevit), @-key preservation test, fcc --version lazy-import confirmation, and CLI-level @id round-trip test. 123 → 127 tests, all green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rs onto each run Adds readReachability(dir) helper to build-results.mjs that reads figma-manifest.json for reachabilityStatus, scans for degraded marker files (contract.json, mcp-probe.sh, mcp-call.sh) in the trial dir and scratch/, and checks for a zero-byte fetcher-output.txt. The three fields are spread onto every run object so the scoring guard can read them. Malformed figma-manifest.json is caught and falls back to null. Two new self-contained TDD tests (tmp dirs, no shared fixture mutation) bring the suite from 127 to 129 passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ing TRIAL dir Wire isScorableTrial into run-accuracy.mjs so degraded/fail/zero-byte runs are skipped (accuracy.composite=null, unscorable reason recorded) while legacy captures (no reachability data) score with a warning. Add existsSync guard inside runAccuracy() and openShots() so a missing TRIAL dir throws an explicit error instead of silently reading a stale/deleted default path. Add scorability-gate.test.mjs (5 unit tests) covering all gate branches. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ection review Re-point the scored ladder to the new complexity selection (atom/chip/molecule/ switch/organism/all-icons/tabs/template/extreme + tokens), each mapped to its hero-ui v3 oracle source. Adds the component-selection review doc. Trial data (ladder-nodes.json, STEPS.md, resume-trial.sh, target/, ref-heroui/) is local per the workbench gitignore, as with the prior trial. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e), keep heavy data ignored The heroui-20260606 STEPS.md was invisible because workbench/trials/ was fully gitignored. Narrow the ignore to data only so the run steps, ladder definition, and env script are version-controlled + visible in the repo/PR; target/, ref-heroui/, bodies/, node_modules/ stay ignored. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… heroui-20260603 baseline Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…roui-20260603 - Runs renamed from atomic-design rungs (atom/molecule/organism/template) to complexity tiers (trivial-/moderate-/complex-/extreme-<component>). - config designMethodology atomic -> flat; component placement is flat (target/src/components/<Name>/); rung-map targetTsx flattened. - Delete the heroui-20260603 reports dir + stale KG handovers; scrub the trial id from harness TRIAL defaults, run-one comment, and the run-manifest test fixture (historical design/plan docs keep it as a dated record). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…holder in historical specs/plans Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

raveracker and others added 30 commits June 2, 2026 18:59

chore(workbench): scaffold harness package + node:test runner

213e97f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(workbench): OTLP json walkers for events, spans, token metrics

bc19976

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(workbench): proportional thinking-token estimator from response …

3549182

…bodies Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(workbench): load raw response bodies from body_ref events

e58de63

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(workbench): per-agent aggregation, fan-in blocking, dominance ro…

d47e96c

…llup Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs(workbench): fix Task 4 fixture ns scale in plan to match impl

e7bc34c

feat(workbench): assemble results.json from a trial directory

4659c32

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(workbench): address analyzer review — multi-run guard, dead-…

05cd896

…code & nullish fixes, fan-in/ns docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(workbench): OTLP/HTTP receiver appending per-signal jsonl

4fdeef9

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(workbench): telemetry env block builder

e0708b7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(workbench): scenario matrix + run-manifest row builder

d2165f8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(workbench): markdown report renderer

96a3f0a

Implements renderMarkdown (Task 9) — results.json → Markdown string with per-agent rollup table, dominance, icon fan-in blocking, OTEL cross-check, and accuracy placeholder. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(workbench): self-contained inline-SVG html dashboard

d357942

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(workbench): report CLI writing report.md + dashboard.html

4a59fff

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test(workbench): end-to-end fixtures→report smoke

d2db86f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(workbench): manual live-capture runbook

4ce84c1

fix(workbench): harden svgBars against non-numeric values (review nit)

4b09b4d

chore(workbench): add weights config + trialset script (Playwright in…

90e6a49

…stall deferred to live phase)

feat(workbench): pure RGBA visual diff scorer

5eecadb

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(workbench): computed-style match scorer

8f8ef2e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(workbench): structural DOM/prop-surface scorer

ba573ec

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(workbench): injectable-runner quality-gate scorer

a532583

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(workbench): weighted composite accuracy with build-fail cap

4b1f13b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(workbench): record composite cap reason + preserve visual-di…

6833bce

…ff precision (review) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(workbench): minimal zlib PNG decoder for visual scoring

be939b3

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(workbench): trialset aggregator merging N runs with comparisons

3259f11

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

raveracker and others added 30 commits June 6, 2026 02:01

docs: spec + implementation plan for figma pipeline think-once redesign

1f4acda

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

build: add brevit dependency for token-efficient agent payloads

3fd27dd

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(fcc): brevit pre-flatten encode/decode wrapper with round-trip g…

4d719eb

…uard + JSON fallback Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(fcc): brevit:encode / brevit:decode subcommands (graceful JSON f…

a33975b

…allback) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

docs(protocols): brevit opportunistic size-guarded wire format + mani…

32eb5ae

…fest/cli cross-refs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(workbench): guard — degraded/non-reachable manifests are failed …

960ce51

…trials, never scored Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(coordinator): Step 8.5 think-once buildPlan + size-guarded Brevi…

9b2306d

…t directive slices Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(fetcher): full variable-collection mode for design-system token …

707e2d7

…builds Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(component-builder): client-directive, zero-TODO, compound/union, …

f19702e

…a11y, no placeholder copy Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(react-adapter): never narrow native attribute unions; use-client …

dcc8739

…hard checklist Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(pipeline): document brevit protocol + think-once buildPlan in co…

2566403

…verage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(workbench): de-stale accuracy-pending message (drop Plan 2 framing)

d9c1c9a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(workbench): complexity-named runs + flat placement; drop deleted…

d7b53a4

… heroui-20260603 baseline Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: genericize deleted heroui-20260603 trial id to <trial-id> place…

dcc993c

…holder in historical specs/plans Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figma pipeline think-once redesign + workbench accuracy harness + heroui-20260606 trial#5

Figma pipeline think-once redesign + workbench accuracy harness + heroui-20260606 trial#5
raveracker wants to merge 119 commits into
mainfrom
workbench-agent-benchmark

raveracker commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raveracker commented Jun 6, 2026

Agent redesign

Workbench

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant