Skip to content

Update Studio Code agent's supported Sonnet model to Sonnet 5#4024

Open
youknowriad wants to merge 1 commit into
trunkfrom
claude/gracious-feistel-1d8623
Open

Update Studio Code agent's supported Sonnet model to Sonnet 5#4024
youknowriad wants to merge 1 commit into
trunkfrom
claude/gracious-feistel-1d8623

Conversation

@youknowriad

@youknowriad youknowriad commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Related issues

  • Related to #

How AI was used in this PR

Claude Code located the central model registry, applied the model swap, updated the affected unit tests, and drafted this PR. The author reviewed the diff.

Proposed Changes

The Studio Code agent's selectable Anthropic Sonnet option is bumped from Sonnet 4.6 to Sonnet 5, and Sonnet 5 becomes the default model. Users picking Sonnet in the model picker (and new sessions that don't pin a model) now get the newer model. Existing sessions that recorded the old claude-sonnet-4-6 id gracefully fall back to the default via resolveSessionModel, so nothing breaks for in-flight chats.

All model-driven UI (the /model slash command, both composer model pickers, the AiModelId type) derives from the single AI_MODELS array, so this is a one-line registry change plus test updates.

Testing Instructions

  • Open the Studio Code agent, open the model picker, and confirm Sonnet 5 appears and is selectable.
  • Run a turn against Sonnet 5 and confirm the request succeeds against the configured provider.
  • npm test -- apps/cli/ai/tests/slash-commands.test.ts apps/cli/ai/tests/pi-runtime.test.ts

Note: unit tests, lint, and typecheck could not be run green in the authoring environment due to a pre-existing local toolchain issue (Node 20 vs required ≥22, unbundled @studio/common/Playground deps, a @wordpress/ui version mismatch) unrelated to this change. Please rely on CI.

Pre-merge Checklist

  • Have you checked for TypeScript, React or other console errors?
  • Confirm the claude-sonnet-5 model id matches what the provider expects.

Bump the selectable Anthropic Sonnet model from Sonnet 4.6 to Sonnet 5 so
users get the latest model, and make it the default.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@wpmobilebot

Copy link
Copy Markdown
Collaborator

📊 Performance Test Results

Comparing b73ed7d vs trunk

app-size

Metric trunk b73ed7d Diff Change
App Size (Mac) 1316.77 MB 1316.77 MB 0.00 MB ⚪ 0.0%

site-editor

Metric trunk b73ed7d Diff Change
load 1112 ms 1088 ms 24 ms ⚪ 0.0%

site-startup

Metric trunk b73ed7d Diff Change
siteCreation 6517 ms 6494 ms 23 ms ⚪ 0.0%
siteStartup 1863 ms 1859 ms 4 ms ⚪ 0.0%

Results are median values from multiple test runs.

Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff)

@youknowriad youknowriad requested review from sejas and wojtekn July 1, 2026 10:39
@youknowriad

Copy link
Copy Markdown
Contributor Author

I built a couple of sites locally, quality seems good.
I run the evals too.

@youknowriad

Copy link
Copy Markdown
Contributor Author

I've been running the evals for hours now. My tests suggest that Sonnet 4.6 is actually better than Sonnet 5.0. It passes our evals more consistently. Can anyone confirm? cc @chubes4 @sejas

I'm starting to doubt whether we should make this update.

@youknowriad

Copy link
Copy Markdown
Contributor Author

More specifically, the slideshow and shop evals (so the plugin recommendation related evals)

@chubes4

chubes4 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@youknowriad I will run an eval soon, but anecdotally I did see some posts from users on X about how Sonnet 5 ends up more expensive than Opus on long-running coding tasks and is better suited for knowledge work than coding.

@youknowriad

Copy link
Copy Markdown
Contributor Author

the funny thing is that I'm not even comparing it with Opus but to Sonnet 4.6 here.

@youknowriad

Copy link
Copy Markdown
Contributor Author
Screenshot 2026-07-01 at 8 20 52 PM

I run more evals, Sonnet 5 is definitely flakier than Sonnet 4.6, especially for the "slideshow" eval.

@chubes4

chubes4 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

I ran a couple of evals via studio-eval with Sonnet 5 vs Sonnet 4.6.

Sonnet 5 timed out, but 4.6 completed successfully on both runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants