Frozen: 2026-05-22 08:20 (Europe/Berlin) Branch: develop Last commit:
fd4d867· fix(eval): probing must inspect response for @Error Version: v0.7.0 staged (pivot), pre-commit
We deleted the UI-interaction layer entirely. After comparing v0.6.2
empirically against Marionette MCP,
the conclusion was unambiguous: Marionette's architectural choice (run
inside the app via a tiny MarionetteBinding extending
WidgetsFlutterBinding) gives it real GestureBinding.handlePointerEvent,
hit-test filtering, custom-widget config, multi-touch, screenshots — all
things we can never have through eval because of @visibleForTesting +
the hub-library re-export problem documented in v0.6.x.
Rather than try to clone Marionette, we doubled down on what's actually ours: non-invasive remote control + structured observability.
Removed (8 tools + 11 files):
rc_flutter_tap,rc_flutter_widget_geometry,rc_flutter_wait_for_widget,rc_flutter_enter_text(gesture / text-input)rc_flutter_widget_tree,rc_flutter_widget_find,rc_flutter_widget_properties(inspector)rc_flutter_screenshot(Marionette has it natively)- Source:
src/flutter/gesture_dart.ts,src/flutter/inspector.ts, 8 tool handlers insrc/tools/flutter/*.ts - Tests:
test/gesture-dart.test.ts,test/inspector.test.ts - Demos:
scripts/flutter-tap-demo.mjs,scripts/flutter-login-demo.mjs,scripts/flutter-inspector-demo.mjs,scripts/eval-debug.mjs - Learnings:
inspector-tree-keys.md,screenshot-availability.md,framework-rebuild-pacing.md
Kept (14 tools):
- PTY layer (8):
rc_start,rc_send_keys,rc_read_screen,rc_read_stream,rc_wait_for,rc_status,rc_stop,rc_resize - Flutter/Dart-VM observability (6):
rc_flutter_endpoints,rc_flutter_connect,rc_flutter_drain_errors,rc_flutter_drain_logs,rc_flutter_hot_reload,rc_flutter_eval(read-only)
Docs rewritten:
- README: explicit "we are not an agentic UI testing framework, use Marionette for that, here is the boundary" framing up top
- SKILL.md (project-local + global): "When to use this skill vs Marionette MCP" section right after the intro; the canonical loop is now the observability loop (rebuild → drain errors → fix), not a UI interaction loop
- CLAUDE.md: pruned learnings trigger map; added "what this repo is NOT" section to prevent future drift back into UI testing
- vm-service-eval-quirks.md + eval-diagnostic-discipline.md: kept and
re-pointed at
rc_flutter_evalas the surviving eval-driven tool
Gates: typecheck ✅, 25/25 unit tests ✅, smoke (14 tools, generic PTY happy path) ✅. SERVER_VERSION 0.6.1 → 0.7.0.
The pivot is done — the next moves are stabilisation + user-facing:
- Push v0.7.0 + tag the cut. Anything that uses the removed tools
externally will break loudly with
Unknown tool. That's by design. - Re-run user's Flutter Web test session with Marionette + this tool together (the hybrid pattern documented in the new SKILL.md). Validate that the boundary feels right in real use.
- Optionally: publish to npm under
@moinsen/agentic-rc-mcpsonpxinstall works without the repo clone. ~15 min. - Optionally: small CHANGELOG.md summarising v0.5 → v0.6 → v0.7 evolution including the pivot rationale. ~20 min.
rc_flutter_evalis still subject to all 5 eval-scope constraints invm-service-eval-quirks.md. The library-probe handles the worst case (Flutter Webweb_entrypoint.dart) by falling back to material; for user-app symbols that only resolve inmain.dartscope, eval may fail with@Error. That's acceptable:rc_flutter_evalis documented as read-only inspection, not UI driving.- Sessions are tied to the MCP-server process. Killing Claude Code kills all sessions (clean, no zombies).
- No CHANGELOG yet; git log + this file is the only summary.
- Active spec areas: none — v0.7.0 is steady-state. Any new tool added here MUST be evaluated against "does Marionette already do this?" before starting. If yes, redirect users; don't add it.
- Empirical Dart-eval constraints captured in
docs/learnings/vm-service-eval-quirks.md. Apply when extendingrc_flutter_evalor when an agent's eval call fails. - Forensic learnings retained:
vm-service-eval-quirks.md,flutter-hot-reload-pipeline.md,flutter-endpoint-sniffing.md,eval-diagnostic-discipline.md. The three removed (inspector-tree-keys, screenshot-availability, framework-rebuild-pacing) were tool-specific and went with their tools. - Demo discipline: the three remaining live demos
(
flutter-drive.mjs,flutter-error-detect.mjs,flutter-vm-agentic-loop.mjs) cover the kept tool surface. Run them if you touch anything Flutter-side.npm run smokecovers PTY.
- Read this file.
git log -5 --onelineandgit status -s— drift check since 2026-05-22 08:20.- If the user asks about UI interaction / tap / scroll / text input on Flutter: point them at Marionette MCP. Don't add the tools back here. Reference SKILL.md "When to use Marionette MCP instead".
- If the user asks for a new generic remote-control feature (new PTY capability, new VM-service-stream subscription, new structured observability surface): that's in scope. Build it.
- Recent calibration: v0.7.0 pivot — code deletion + doc rewrite + verification — came in ~45 min wall-time. The empirical agent velocity for a focused refactor with clear scope continues to be under 1 hour.