Skip to content

fix(core): press CUA keypress combinations as a single chord#2266

Open
yawbtng wants to merge 1 commit into
browserbase:mainfrom
yawbtng:fix-cua-keypress-chord
Open

fix(core): press CUA keypress combinations as a single chord#2266
yawbtng wants to merge 1 commit into
browserbase:mainfrom
yawbtng:fix-cua-keypress-chord

Conversation

@yawbtng

@yawbtng yawbtng commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

why

CUA keypress actions describe a single key chord (modifiers held down while the main key is pressed), but V3CuaAgentHandler.executeAction pressed each key in the array separately. page.keyPress(modifier) presses and releases the modifier, so by the time the main key was pressed the modifier was already up.

The concrete failure: a ["Control", "A"] keypress sends Control on its own (a no-op) and then A through the plain typing path — so instead of select-all, the agent types a literal a into the focused field. Any select-all / copy / paste / cut / shortcut pattern silently fails and corrupts input. Because the agent-replay cache recorded the broken per-key sequence, replays reproduced the bug too.

This is provider-dependent, based on the shape each client emits:

Provider emits for a combo old behavior status
OpenAI keys: ["CTRL", "A"] Ctrl then literal a ❌ broken
Google (key_combination) .split("+")["Control", "A"] Ctrl then literal a ❌ broken
Microsoft (fara-7b) keys: string[] (per-key) Ctrl then literal a ❌ broken
Anthropic keys: ["ctrl+s"] (single +-joined string) chorded correctly ✅ unaffected

Anthropic only worked by accident — it pre-joins with +, which page.keyPress already chords internally.

what changed

packages/core/lib/v3/handlers/v3CuaAgentHandler.ts — in the keypress case, map each key and join into one +-delimited combination, then call page.keyPress once. page.keyPress already holds modifiers down for the final key and already special-cases the literal + key, so single keys, already-combined strings, and Ctrl++-style inputs all stay correct. mapKeyToPlaywright is idempotent (CTRL/ControlControl), so Google's pre-mapped arrays and Anthropic's combined string are unchanged. The recorded replay step is now a single press Control+A instead of the broken press Control, press A.

test plan

New packages/core/tests/unit/cua-keypress-chord.test.ts (5 cases, all passing):

  • ["Control", "A"] → single keyPress("Control+A")
  • alias normalization: ["CTRL", "A"]keyPress("Control+A")
  • single key ["Enter"]keyPress("Enter") (unchanged)
  • already-combined ["ctrl+s"]keyPress("ctrl+s") (Anthropic shape, unchanged)
  • empty [] → no keyPress call

Existing CUA suites (anthropic-cua-triple-click, openai-cua-client, microsoft-cua-client, anthropic-cua-adaptive-thinking) — 25 tests still green.


Related: this is exactly the class of provider-specific CUA regression that #2188 proposes catching with a deterministic bench task.


Summary by cubic

Fixes CUA keypress combos by pressing them as one chord, not as separate keys. Shortcuts like Ctrl+A now work across OpenAI, Google key_combination, and Microsoft clients instead of typing letters.

  • Bug Fixes
    • Map keys and join with "+" before one page.keyPress call; supports arrays, already-joined strings, and the literal "+" key.
    • Added unit tests for combos, alias normalization, single key, already-combined, and empty input.

Written for commit 4f921ee. Summary will update on new commits.

Review in cubic

CUA keypress actions describe a single key chord, but the executor
pressed each key in the array separately, releasing modifiers before
the main key. Combinations like ["Control", "A"] sent Ctrl alone and
then typed a literal "a" instead of select-all. This broke the OpenAI,
Google (key_combination), and Microsoft computer-use clients, which emit
multi-element key arrays; Anthropic (single +-joined string) was fine.

Join the mapped keys into one +-delimited combination so page.keyPress
holds modifiers down for the main key, and record the chord as a single
replay step. mapKeyToPlaywright is idempotent, so already-combined and
single-key inputs are unchanged.

Adds unit coverage for chord, alias normalization, single-key,
already-combined, and empty-array cases.
@changeset-bot

changeset-bot Bot commented Jun 22, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 4f921ee

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server-v3 Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions

Copy link
Copy Markdown
Contributor

This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run.
Approving the latest commit mirrors it into an internal PR owned by the approver.
If new commits are pushed later, the internal PR stays open but is marked stale until someone approves the latest external commit and refreshes it.

@github-actions github-actions Bot added external-contributor Tracks PRs mirrored from external contributor forks. external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. labels Jun 22, 2026

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant Provider as CUA Provider
    participant Handler as V3CuaAgentHandler
    participant ActionMapper as mapKeyToPlaywright
    participant Browser as Playwright Page

    Note over Provider,Browser: CUA Keypress Action Flow

    Provider->>Handler: executeAction({ type: "keypress", keys })
    
    alt keys is array (OpenAI, Google, Microsoft shape)
        Handler->>Handler: keyList = keys (already array)
    else keys is string (Anthropic shape)
        Handler->>Handler: keyList = [keys]
    end

    alt keyList.length > 0
        Handler->>ActionMapper: map each element via mapKeyToPlaywright()
        ActionMapper-->>Handler: normalized key strings
        
        Handler->>Handler: Join mapped keys with "+"
        Note over Handler: e.g. ["Control", "A"] → "Control+A"
        
        Handler->>Browser: page.keyPress("Control+A")
        Note over Browser: page.keyPress holds modifiers<br/>down for final key
        
        Browser-->>Handler: keypress complete
        
        alt recording enabled
            Handler->>Handler: recordCuaActStep("press Control+A")
        end
    else keyList.length === 0
        Handler->>Handler: Skip (no keyPress call)
    end

    Handler-->>Provider: { success: true }
Loading

Re-trigger cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. external-contributor Tracks PRs mirrored from external contributor forks.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant