Skip to content

test(pty): stabilize flaky PTY interactions#361

Merged
benvinegar merged 3 commits into
mainfrom
test/stabilize-pager-pty-wheel
May 24, 2026
Merged

test(pty): stabilize flaky PTY interactions#361
benvinegar merged 3 commits into
mainfrom
test/stabilize-pager-pty-wheel

Conversation

@benvinegar
Copy link
Copy Markdown
Member

@benvinegar benvinegar commented May 24, 2026

Summary

  • Stabilize flaky PTY mouse-wheel assertions by retrying pager wheel ticks one at a time
  • Give pager wheel retry tests enough timeout headroom for slow CI runs
  • Stabilize the note-focus PTY test by using the explicit cancel control after verifying shortcuts are blocked while the draft composer is active

Context

Main CI was intermittently failing in PTY integration tests. The original failure was pager wheel scrolling; after that was addressed, CI exposed another unrelated PTY timing flake in the note-focus cancellation test.

Verification

  • bun test ./test/pty/pager.test.ts (5x)
  • bun test ./test/pty/notes.test.ts -t "draft note focus blocks app shortcuts until cancelled"
  • bun run test:integration
  • bun run typecheck
  • bun run format:check
  • bun run lint

This PR description was generated by Pi using GPT-5

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 24, 2026

Greptile Summary

This PR stabilizes flaky PTY wheel-scroll assertions by replacing single 10-tick scroll bursts (followed by a 5 s waitForSnapshot) with a new scrollWheelUntil helper that fires one tick at a time, checks the predicate after each tick with a 700 ms window, and retries up to 12 times before failing.

  • New helper scrollWheelUntil: closes over the module-level harness, accepts a direction and a predicate, and surfaces the last waitForSnapshot error on exhaustion.
  • Four scroll tests updated: stdin patch mode, stdin patch auto theme, general pager mode, and explicit pager mode --pager all now use the helper instead of the old burst + long poll pattern.

Confidence Score: 4/5

Safe to merge — changes are test-only and address a real CI flakiness root cause.

The retry-per-tick approach is sound and well-scoped. The two observations (outer timeout budget arithmetic and the dead non-Error fallback) are edge cases that won't affect typical CI runs, but are worth keeping in mind if the 20 s default ever needs adjusting for especially slow machines.

test/pty/pager.test.ts — specifically the interaction between the 700 ms per-attempt window, up to 12 retries per direction, and the 20 s global test timeout.

Important Files Changed

Filename Overview
test/pty/pager.test.ts Introduces scrollWheelUntil helper that retries one PTY wheel tick at a time (up to 12 × 700 ms) instead of a single 10-tick burst, and applies it to four scroll tests; two minor concerns around the global timeout budget and the dead non-Error fallback branch.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Test calls scrollWheelUntil] --> B{direction?}
    B -- down --> C[session.scrollDown 1]
    B -- up --> D[session.scrollUp 1]
    C --> E[harness.waitForSnapshot 700ms]
    D --> E
    E -- predicate matches --> F[return snapshot]
    E -- timeout / no match --> G[save lastError\nattempt += 1]
    G --> H{attempt < 12?}
    H -- yes --> B
    H -- no --> I{lastError instanceof Error?}
    I -- yes --> J[throw lastError]
    I -- no --> K[throw generic timeout Error]
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
test/pty/pager.test.ts:22-34
**Total timeout budget may still be tight under extreme CI load**

Tests that call `scrollWheelUntil` twice (down + up) can accumulate up to 12 × 700 ms per direction = ~8.4 s each, for a combined ~16.8 s on top of the 15 s `waitForText` startup timeout and the 200 ms `waitIdle`. That's a ceiling of ~32 s against the file-wide `setDefaultTimeout(20_000)`. In practice the predicate matches on the first or second attempt, so the tests pass; but if the PTY is slow enough that either direction needs 4–5 retries (≥ 2.8 s) alongside a slow startup, the test can still time out via the outer limit rather than the inner 700 ms window.

### Issue 2 of 2
test/pty/pager.test.ts:36-40
**Fallback error message swallows the actual timeout text**

If `waitForSnapshot` throws a non-`Error` value (or if `lastError` is still `undefined` because the loop never ran, though the loop guard prevents that), execution falls through to the generic `"Timed out waiting for pager wheel scroll ${direction}."` message instead of the last real snapshot captured inside `waitForSnapshot`. Since `waitForSnapshot` always throws a proper `Error` with the last PTY snapshot in its message, the `instanceof Error` branch will always win in practice — but the dead fallback hides a future footgun if the predicate path ever changes to throw a non-`Error`.

Reviews (1): Last reviewed commit: "test(pty): stabilize pager wheel scrolli..." | Re-trigger Greptile

Comment thread test/pty/pager.test.ts
Comment on lines +22 to +34
for (let attempt = 0; attempt < 12; attempt += 1) {
if (direction === "down") {
await session.scrollDown(1);
} else {
await session.scrollUp(1);
}

try {
return await harness.waitForSnapshot(session, predicate, 700);
} catch (error) {
lastError = error;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Total timeout budget may still be tight under extreme CI load

Tests that call scrollWheelUntil twice (down + up) can accumulate up to 12 × 700 ms per direction = ~8.4 s each, for a combined ~16.8 s on top of the 15 s waitForText startup timeout and the 200 ms waitIdle. That's a ceiling of ~32 s against the file-wide setDefaultTimeout(20_000). In practice the predicate matches on the first or second attempt, so the tests pass; but if the PTY is slow enough that either direction needs 4–5 retries (≥ 2.8 s) alongside a slow startup, the test can still time out via the outer limit rather than the inner 700 ms window.

Prompt To Fix With AI
This is a comment left during a code review.
Path: test/pty/pager.test.ts
Line: 22-34

Comment:
**Total timeout budget may still be tight under extreme CI load**

Tests that call `scrollWheelUntil` twice (down + up) can accumulate up to 12 × 700 ms per direction = ~8.4 s each, for a combined ~16.8 s on top of the 15 s `waitForText` startup timeout and the 200 ms `waitIdle`. That's a ceiling of ~32 s against the file-wide `setDefaultTimeout(20_000)`. In practice the predicate matches on the first or second attempt, so the tests pass; but if the PTY is slow enough that either direction needs 4–5 retries (≥ 2.8 s) alongside a slow startup, the test can still time out via the outer limit rather than the inner 700 ms window.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I raised the pager PTY file timeout to 45s so the worst-case wheel retry budget has headroom instead of racing Bun's outer test timeout.

This comment was generated by Pi using GPT-5

Comment thread test/pty/pager.test.ts Outdated
Comment on lines +36 to +40
if (lastError instanceof Error) {
throw lastError;
}

throw new Error(`Timed out waiting for pager wheel scroll ${direction}.`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Fallback error message swallows the actual timeout text

If waitForSnapshot throws a non-Error value (or if lastError is still undefined because the loop never ran, though the loop guard prevents that), execution falls through to the generic "Timed out waiting for pager wheel scroll ${direction}." message instead of the last real snapshot captured inside waitForSnapshot. Since waitForSnapshot always throws a proper Error with the last PTY snapshot in its message, the instanceof Error branch will always win in practice — but the dead fallback hides a future footgun if the predicate path ever changes to throw a non-Error.

Prompt To Fix With AI
This is a comment left during a code review.
Path: test/pty/pager.test.ts
Line: 36-40

Comment:
**Fallback error message swallows the actual timeout text**

If `waitForSnapshot` throws a non-`Error` value (or if `lastError` is still `undefined` because the loop never ran, though the loop guard prevents that), execution falls through to the generic `"Timed out waiting for pager wheel scroll ${direction}."` message instead of the last real snapshot captured inside `waitForSnapshot`. Since `waitForSnapshot` always throws a proper `Error` with the last PTY snapshot in its message, the `instanceof Error` branch will always win in practice — but the dead fallback hides a future footgun if the predicate path ever changes to throw a non-`Error`.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The helper now tracks the last failure message and always throws it after retries, preserving the waitForSnapshot diagnostic text even if a future path throws a non-Error.

This comment was generated by Pi using GPT-5

@benvinegar benvinegar changed the title test(pty): stabilize pager wheel scrolling test(pty): stabilize flaky PTY interactions May 24, 2026
@benvinegar benvinegar merged commit 01ee8a1 into main May 24, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant