Skip to content

Ai SDK updates, stability and quality fine tune#2916

Open
abose wants to merge 7 commits into
mainfrom
ai
Open

Ai SDK updates, stability and quality fine tune#2916
abose wants to merge 7 commits into
mainfrom
ai

Conversation

@abose
Copy link
Copy Markdown
Member

@abose abose commented May 15, 2026

No description provided.

abose added 7 commits May 15, 2026 18:52
The JS SDK was split out of @anthropic-ai/claude-code in v2 and
moved to @anthropic-ai/claude-agent-sdk. The CLI binary still ships
under @anthropic-ai/claude-code (used by Phoenix's terminal "claude"
command); the SDK now lives in its own package with the same query()
signature and SDKResultMessage shape, so the rest of claude-code-agent.js
is unchanged beyond the import swap.

- @anthropic-ai/claude-code: ^1.0.0 → ^2.1.118
- @anthropic-ai/claude-agent-sdk: new ^0.2.126
- zod: ^3.25.76 → ^4.0.0

Lock files regenerated by npm install.
Two complaints about the takeScreenshot tool:

1. Claude was sometimes shooting the whole editor window when the
   user obviously meant "how does the rendered preview look".
2. The user-visible label said "Screenshot of full page" which was
   misleading — the default capture is the entire Phoenix editor app
   window (toolbar + sidebar + code area + panels), not a rendered
   web page.

Tightening on both fronts:

- mcp-editor-tools.js: tool description now starts "ALMOST ALWAYS pass
  a selector — capturing the full editor returns a busy image full of
  editor chrome that's hard to reason about" and explicitly lists the
  trigger phrases ("how does it look", "is the page rendering",
  "check the preview") that should fire #panel-live-preview-frame.
  Selector schema unchanged — the IDs were already documented, the
  recommendation just had to be louder.
- claude-code-agent.js system prompt: same trigger-phrase nudge plus
  a last-resort line: when the user asks about something in the
  editor you can't identify from getEditorState, take a screenshot
  (no selector) to see what they're looking at.
- strings.js: rename AI_CHAT_TOOL_SCREENSHOT_FULL_PAGE to
  AI_CHAT_TOOL_SCREENSHOT_FULL_EDITOR with value "the full editor"
  so the chat bubble reads "Screenshot of the full editor" instead
  of the misleading "Screenshot of full page".
execJsInLivePreview was in the "no timeout" bucket on the rationale
that user-supplied JS can legitimately take a while. The downside:
when the live preview is wedged or just slow to settle, the agent
hangs indefinitely on "Inspecting preview" with no way to recover.

Add a timeoutMs parameter to the tool. The model picks a value that
fits the snippet it's running; the call is raced against the
timeout and either resolves or surfaces a deterministic timeout
error so the agent can move on.

- _execPeerWithTimeout now accepts an optional overrideMs that wins
  over the static EXEC_PEER_TIMEOUT_MS map.
- _resolveCallerTimeout floors at 5000ms (no point picking a tighter
  timeout — the preview frame may still be settling). No upper limit;
  a user can legitimately request a long-running inspection.
- Default 10000ms when timeoutMs is omitted.
…rome-devtools

Two adjacent failure modes the previous prompt didn't catch:

1. When the user asked editor-context questions ("can you see the
   open file", "in this page", "on the screen"), Claude sometimes
   answered from generic Claude knowledge — "No, I don't have
   visibility into your editor" — instead of calling getEditorState.
2. When Claude DID decide it needed to see the page, it sometimes
   reached for the user's chrome-devtools MCP (which opens a fresh
   separate browser session) instead of phoenix-editor.takeScreenshot
   (which captures the live preview inside Phoenix, reflecting the
   user's actual unsaved edits).

Add an upfront block to the system prompt that:
- Explicitly anti-patterns the "I can't see..." refusal — call
  getEditorState / takeScreenshot / execJsInLivePreview instead.
- Explicitly steers ALL preview interactions (screenshots, JS eval,
  DOM inspection, viewport resize, reload) to the phoenix-editor MCP
  rather than chrome-devtools or other browser MCPs. Only fall back
  to a non-Phoenix browser context when the user explicitly asks.
…itorState fallback hint

Bundled set of editor-tool refinements driven by support traces:

- getEditorState now reports inDesignMode (true when the code editor
  is hidden and the live preview is expanded full-bleed). The model
  needs this to interpret "I can't see your code edits" cases —
  design mode hides the code view, and that's a separate fact from
  the active file.
- getEditorState response now appends a fallback hint pointing the
  model at takeScreenshot (no selector) for any UI question the JSON
  doesn't answer (e.g. "what's in the Problems panel" / "what does
  this sidebar say" — UI panels not represented in the state object).
- takeScreenshot description rewritten to a clean binary rule —
  "rendered live preview" → selector='#panel-live-preview-frame',
  "anything else (Problems panel, file tree, toolbar, any Phoenix UI)"
  → no selector. Earlier "ALMOST ALWAYS pass a selector" framing was
  over-discouraging the legit no-selector case where the user is
  asking about Phoenix UI.
- controlEditor gains a toggleDesignMode operation with an `enabled`
  boolean. The op description spells out what design mode is (full
  live preview, code editor hidden, content-focused browser-like
  view) so the model picks it for the right intents.
- System prompt: explains what the live preview actually is (the
  rendered view of the active HTML/CSS/JS/SVG/Markdown file), so
  Claude has a mental model before reading the per-tool sections.
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant