An MCP server that turns an AI agent into an autonomous operator of interactive local programs. Spawn
flutter run,npm run dev, REPLs, TUIs — then drive, observe, introspect, and quit them through structured tool calls. No human in the loop pressingr, copy-pasting log excerpts, or reading the Dart VM Service URL off the terminal.
22 MCP tools · 64 unit tests · 6 live-driven demo scripts · Claude Code skill bundled · v0.6.0 real-world hardened.
When you tell Claude Code "run my app and verify the new feature works", today it gets stuck in the same place every time:
- It spawns the process in the background. ✅
- It tails the log a few times. ✅
- The log stops scrolling. It can't tell if the app is ready or deadlocked. Asks you.
- To trigger hot-reload it has to press
r. It can't. Asks you to press it and paste what happened. - Something crashes. The full exception is somewhere in 5000 lines of scroll. It has to grep, guess where the error block ends, hope it didn't miss anything.
- The bug is "the counter Text widget doesn't show the right value". The agent can't see the widget. It can only re-read the source code and guess. It has no introspection.
agentic-rc-mcp removes every one of those blockers.
| Layer | What it does | Why it matters |
|---|---|---|
| 1. PTY control (8 tools) | Spawn programs in a real pseudo-terminal. Send keys (<Enter>, <Tab>, <C-c>, …). Read the rendered screen — including TUIs like Flutter, vim, top. Wait for patterns with timeout. |
The agent can press r, see what changed, know when "ready" appeared — exactly like a human at the terminal. |
| 2. Flutter / Dart-VM lifecycle (7 tools) | Auto-detect the VM-service WebSocket URL from flutter run's output. Open a programmatic connection. Trigger hot-reload with a structured {success, duration_ms} result. Subscribe to Stdout / Stderr / Logging / Extension / Debug streams. Evaluate Dart in the live app. Capture screenshots. |
No more grep-the-console for Exception caught — exceptions arrive as structured events with file:line, widget name, stack trace. No copy-pasting Debug URLs. |
| 3. Flutter inspector (3 tools) | Fetch the live widget tree as JSON with source locations. Search by Key, runtime type, description substring, or source file. Read any widget's properties — colour, alignment, text content, callback bindings. |
The agent can see the UI structurally without a screenshot. "Find the FAB" → valueId. "What does the counter Text say?" → data: "You have pushed the button this many times:". |
The three layers compose: at the bottom you can still rc_send_keys("r") for
anything; at the top you can rc_flutter_widget_find({by: "key", value: "submit"})
and get back an exact widget reference in milliseconds. Same MCP server, one
session ID flowing through all of it.
+------------------+ stdio +────────────────────── agentic-rc-mcp ──────────────────────+
| Claude Code | <-------> | |
| (MCP client) | JSON-RPC | ┌─ SessionManager ─────────────────────────────────────┐ |
+------------------+ | │ id → Session │ |
| └──────────┬──────────────────────────────────────────┘ |
| │ owns |
| ┌─ Session ▼─────────────────────────────────────────┐ |
| │ │ |
| │ ┌───── PTY layer ──────┐ │ |
| │ │ node-pty <══> │ ──→ child process │ |
| │ │ @xterm/headless │ (flutter / vite / …) │ |
| │ │ + raw ring buffer │ │ |
| │ └──────────┬───────────┘ │ |
| │ │ feeds │ |
| │ ┌──── Endpoint sniffer ─────────────────────────┐ │ |
| │ │ regex over PTY output → ws / http / devtools │ │ |
| │ └──────────┬───────────────────────────────────┘ │ |
| │ │ unblocks │ |
| │ ┌──── VmServiceClient ──── WS ─────► Dart VM │ |
| │ │ getVM, evaluate, │ |
| │ │ streamListen(Stderr, │ |
| │ │ Extension, Debug, …) │ |
| │ └──────────┬───────────────── │ |
| │ │ wraps │ |
| │ ┌──── FlutterService ────┐ ─── ext.flutter.* ───► │ |
| │ │ error buffer, logs, │ ext.flutter.inspector.* │ |
| │ │ hot-reload, eval, │ │ |
| │ │ screenshot, inspector │ │ |
| │ └─────────────────────────┘ │ |
| └───────────────────────────────────────────────────────┘ |
+───────────────────────────────────────────────────────────────+
- PTY: real pseudo-terminal via
node-pty, so the child program thinks it's interactive (isatty(0)==1). - Screen rendering:
@xterm/headlessruns xterm.js without a DOM, applying ANSI/curses sequences and exposing the rendered viewport programmatically — so TUIs like Flutter, vim, top render correctly. - Endpoint sniffer: parses every chunk of Flutter output for the four
forms Flutter prints (Chrome / macOS desktop / iOS / Android each emit
different strings). When the WS URL isn't printed explicitly it's
synthesised from the DevTools URL's
?uri=query param or the HTTP URL. - VM-service client: JSON-RPC 2.0 over WebSocket. Used for everything that isn't a keystroke or screen-read.
| Tool | Does |
|---|---|
rc_start |
Spawn a command inside a real PTY. Returns session_id. |
rc_send_keys |
Write input. Supports <Enter>, <Tab>, <Esc>, <C-c>, <C-d>, arrows, F-keys, <M-x>. Plain text passes through. |
rc_read_screen |
Read the rendered viewport. Modes: screen / scrollback / tail. |
rc_read_stream |
Read raw bytes since a cursor (for log-style apps). |
rc_wait_for |
Block (with timeout) until a pattern appears. Literal substring or /regex/flags. |
rc_status |
Status of one or all sessions: pid, state, exit_code, bytes I/O, Flutter endpoints once detected. |
rc_stop |
Terminate a session. SIGTERM → 2 s grace → SIGKILL. |
rc_resize |
Change cols/rows of a running PTY. |
| Tool | Does |
|---|---|
rc_flutter_endpoints |
Returns sniffed WS / HTTP / DevTools URLs (auto-synthesised on macOS desktop where Flutter omits the WS line). |
rc_flutter_connect |
Opens the VM-service WebSocket + subscribes to Stdout / Stderr / Logging / Extension / Debug. Idempotent. |
rc_flutter_drain_errors |
Returns + clears structured exception events. Use this instead of grepping the console. |
rc_flutter_drain_logs |
Returns + clears structured log events. |
rc_flutter_hot_reload |
Triggers r, parses Flutter's report into {success, libraries_reloaded, duration_ms} or {success:false, reason, console_excerpt}. |
rc_flutter_eval |
Run arbitrary Dart in the root library scope of the main isolate. |
rc_flutter_screenshot |
PNG via ext.flutter.screenshot. Graceful extension_not_registered fallback on macOS desktop — pair with Peekaboo for that platform. |
| Tool | Does |
|---|---|
rc_flutter_widget_tree |
Fetch live widget hierarchy as JSON. Defaults to user-code-only: framework subtrees collapse to {_elided:true, framework_node_count:N} markers. Opts: include_framework, source_prefix (strict path filter), flat:true (returns list with ancestry paths instead of nested tree — saves ~70% tokens). |
rc_flutter_widget_find |
Search by key / type / description / source_contains. Returns matches with ancestry path and valueId. |
rc_flutter_widget_properties |
Diagnostic properties of any widget by valueId — text content, padding, colour, callbacks (incl. closure name!), …. |
This is where agentic-rc-mcp replaces Peekaboo and chrome-devtools-mcp for
Flutter apps — both of which struggle with Flutter's custom-rendered canvas.
We don't dispatch OS-level pointer events (the framework's
handlePointerEvent is @visibleForTesting and the VM-service eval refuses
to compile references to it). Instead the tap tool walks to the nearest
interactive widget and invokes its onPressed / onTap closure directly
— same setState, same rebuild, same side-effects, no GUI access needed.
| Tool | Does |
|---|---|
rc_flutter_tap |
Tap a widget by key / type / text / value_id / coordinate. Default walker order: self → descendants → ancestors (so custom wrappers like TPKButton around TextButton work). Detects ambiguous descendants and asks you to disambiguate. descend:false opts into the pre-v0.6 self → ancestors-only behaviour. |
rc_flutter_widget_geometry |
Returns {rect:{x,y,width,height}, widget_type} for a matched widget — useful for layout verification. Supports by:'text'. |
rc_flutter_wait_for_widget |
Block (with timeout) until a widget matching {by, value} appears (or disappears, with appear:false). Supports by:'text'. Bubbles up eval errors instead of polling silently. |
rc_flutter_enter_text |
Fill a TextField / TextFormField. Walks to the underlying EditableText, mutates its TextEditingController.text (so onChanged fires, validators run, listeners notify). Modes: replace (default), append, clear. Must-have for any login / form / search-bar flow — without this the agent can't get past an auth gate. |
Diagnostic discipline (v0.6+): every gesture tool result now carries
eval_ok / eval_kind / eval_error / expression_preview so a failure
tells you why — eval_kind:"@Error" with a Dart compile error is
acted upon differently than eval_kind:"@Instance" with
reason:"no_callback_found". See
docs/learnings/eval-diagnostic-discipline.md.
The composition that makes this powerful: rc_flutter_enter_text to fill,
rc_flutter_tap to submit, rc_flutter_widget_find +
rc_flutter_widget_properties to verify the state change. End-to-end
behavioural testing entirely through MCP. See
scripts/flutter-tap-demo.mjs — 7
synthetic taps on the counter app's FAB, each verified by re-reading the
Text widget's data property (0 → 7).
Requires Node ≥ 20.
git clone <this-repo>
cd agentic_rc_cli
npm install # postinstall fixes node-pty's spawn-helper perms on macOS
npm run build
npm link # makes `agentic-rc-mcp` available globallyHeads-up: npm 10 occasionally extracts
node-pty'sspawn-helperprebuilt binary without the executable bit, which manifests at runtime asposix_spawnp failed. The included postinstall script (scripts/fix-node-pty-permissions.js) chmods it back. If you ever see that error after a clean install, re-runnpm install.
Drop .mcp.json next to the project you want the agent to drive (or merge
into an existing one):
{
"mcpServers": {
"agentic-rc": {
"command": "agentic-rc-mcp"
}
}
}Restart Claude Code. The tools appear as mcp__agentic-rc__rc_start,
mcp__agentic-rc__rc_flutter_widget_find, etc. See
.mcp.json.example for variants (direct dist path, dev
mode via tsx).
This repo ships a Claude Code skill at
.claude/skills/agentic-rc/SKILL.md
that teaches Claude when to reach for each tool — the canonical Flutter
agentic loop, the inspector pattern, named-key cheat sheet, platform gotchas.
-
Project-local: the skill is auto-loaded when you open Claude Code in this repo's directory.
-
Global: copy it to your global skills dir so it's available in every project:
npm run install:skill # → ~/.claude/skills/agentic-rc/SKILL.mdIdempotent — re-run after each
git pull.
That sequence is exactly what
scripts/flutter-inspector-demo.mjs,
scripts/flutter-vm-agentic-loop.mjs,
and scripts/flutter-tap-demo.mjs run as
end-to-end smoke tests against the sample
flutter_example/ counter app. The tap demo executes
7 synthetic taps on the FAB and asserts the counter Text's data property
transitions 0 → 7 — pure VM-service, no GUI access.
| Token | Bytes sent |
|---|---|
<Enter> / <Return> |
\r |
<Tab> |
\t |
<Esc> / <Escape> |
\x1b |
<Space> |
|
<Backspace> / <BS> |
\x7f |
<Delete> |
\x1b[3~ |
<Up> <Down> <Left> <Right> |
\x1b[A..D |
<Home> / <End> |
\x1b[H / \x1b[F |
<PageUp> / <PageDown> |
\x1b[5~ / \x1b[6~ |
<F1>..<F12> |
xterm sequences |
<C-c> / <Ctrl-c> (any letter) |
\x03 |
<M-x> / <Alt-x> (any letter) |
\x1b + x |
Plain characters pass through verbatim. Set "raw": true to skip the parser
and send literal < / >.
rc_read_screenwithmode: "screen"— for any TUI that redraws (Flutter, vim, top,npm run devwith spinners). You get what the user would see on the terminal right now.rc_read_screenwithmode: "scrollback"or"tail"— for the history of what was rendered, post-curses processing. Best for log lines that scrolled off the viewport.rc_read_stream— for pure log-style apps (no cursor tricks) where you want every byte in order, with a cursor for incremental reads.rc_flutter_drain_errors— once a session has VM-service errors going this is always preferred over PTY grepping. You get structured events with stream origin, timestamp, message, and the raw VM-service payload.
npm test # vitest — 33 tests (keys, sessions, endpoints, inspector)
npm run typecheck # strict tsc --noEmit
npm run build # emit dist/
# Live end-to-end demo scripts (each drives a fresh MCP server over stdio):
npm run smoke # 8 generic PTY tools
node scripts/flutter-drive.mjs # spawn flutter, hot-reload, quit
node scripts/flutter-error-detect.mjs # detect runtime exceptions via PTY
node scripts/flutter-vm-agentic-loop.mjs # full structured loop via VM service
node scripts/flutter-inspector-demo.mjs # widget-tree + find + properties
node scripts/flutter-tap-demo.mjs # 7 taps + assert counter 0 → 7
node scripts/flutter-login-demo.mjs # full login flow: enter email + pw, submit, verify- Not network-remote. Stdio only — MCP client and controlled processes run on the same machine. (Architecture is ready for it; just no transport written.)
- Not multi-user. Single process, single session registry, no auth.
- No persistence. Killing the MCP server kills every child it started.
- No pixel taps inside non-Flutter windows. For Flutter apps we DO
fire onPressed/onTap directly via
rc_flutter_tap— Peekaboo and chrome-devtools-mcp are no longer needed. For other GUI apps (Electron, native Cocoa, web) you still need an OS-level driver: Peekaboo orchrome-devtools-mcp— then drain errors via this MCP to see what your tap broke. - No Windows yet. node-pty supports ConPTY; untested with this code.
MIT — see LICENSE.