test(cli): drain PTY stdout before resolving E2E exit (fixes flaky help test)#537
test(cli): drain PTY stdout before resolving E2E exit (fixes flaky help test)#537tobyhede wants to merge 3 commits into
exit (fixes flaky help test)#537Conversation
The E2E helper's `exit` promise resolved directly on `pty.onExit`, which is not ordered after the final `pty.onData`. On a loaded runner the OS pipe can deliver the exit event while trailing stdout is still in flight, so tests that do `await r.exit` then read `r.output` intermittently observe a truncated buffer (the flaky runner-aware-help failure, which hopped across Node 22/24 matrix nodes on identical code). Resolve `exit` only after the stdout stream has drained: capture the exit event, then wait until no further data has arrived for a short settle window (50ms, reset by each new chunk) bounded by a 2s hard cap so a chatty process can never hang the suite. A bare microtask is insufficient because pipe data may still be in flight after the exit event. Fixes the latent race for all E2E suites (smoke, runner-aware-help, doctor, database-url, auth-login-cancel) without per-test changes. Fixes #536
|
📝 WalkthroughWalkthroughAdds a pipe-based ChangesCLI test runner helper and dotenv banner fix
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 4❌ Failed checks (4 warnings)
✅ Passed checks (1 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/cli/tests/helpers/pty.ts`:
- Around line 139-153: The settle logic in the pty exit handler is anchored only
to lastDataAt, so the Promise in the onExit callback can resolve immediately if
output was already quiet before the exit event. Update the exit-handling flow in
the pty helper to record the onExit timestamp (or compute from max(lastDataAt,
exitAt)) and base the quietFor/grace period on that value so late in-flight
output still gets the full settle window. Keep the fix localized to the
onExit/check logic and the DRAIN_CAP_MS/SETTLE_MS timing path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 448e418d-5bfa-4e84-84ab-f60676871946
📒 Files selected for processing (1)
packages/cli/tests/helpers/pty.ts
The runner-aware `--help` E2E test flaked in CI, always dropping the tail of the help (the Examples section, where `bunx stash init` / `bunx stash db install` live) while the earlier Usage line survived. Root cause is a Linux-only node-pty behaviour, not a race: `--help` emits ~5KB in a single `console.log` and the process exits immediately. When the pty slave closes with data still unread, Linux discards the pty buffer, so under CI load node-pty never delivers the tail — `onData` fires for the head only. macOS ptys retain that buffer, which is why it never reproduced locally. A post-exit "drain" wait (the previous approach on this branch) cannot help: the bytes are gone before `onData` is ever called. Fix: add a pipe-based `run()` helper (`child_process.spawn`, piped stdio) for non-interactive command assertions. Pipes buffer in-process and deliver every byte before `close`, so large bursts are never truncated on any platform. Migrate `runner-aware-help.e2e.test.ts` to it. The PTY `render()` helper stays for interactive tests, and its speculative drain change is reverted. Also pass `quiet: true` to dotenv's `config()` calls: v17 prints an `injected env (N) from …` banner to stdout on every call, so the CLI was emitting four noisy, non-deterministic banner lines ahead of its own output on every invocation. A smoke-test guard keeps the banner from returning.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/cli/tests/helpers/run.ts (1)
39-80: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winConsider a hard timeout to prevent hangs, mirroring the
pty.tsfix.If the spawned command hangs, this promise never resolves and the child process is never killed — it leaks until vitest's own test timeout. The PR's own rationale for
pty.ts'sexitfix explicitly calls for "a hard timeout to prevent hanging"; the same defensive cap (withchild.kill()on timeout) would giverun()parity for chatty/stuck non-interactive commands.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/cli/tests/helpers/run.ts` around lines 39 - 80, The run() helper can hang forever if the spawned CLI never exits, since the Promise only resolves on child close and never force-kills the process. Add a hard timeout inside run() around the spawn()ed child, and on expiry call child.kill() before rejecting or resolving with a timeout result so stuck non-interactive commands cannot leak until the test runner times out. Keep the change scoped to run() and preserve the existing stdout/stderr capture and close/error handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/cli/tests/helpers/run.ts`:
- Around line 66-78: The Run helper in run.ts is treating signal-terminated
children as successful by defaulting `exitCode` to 0 in the `child.on('close')`
handler. Update the `RunResult` construction so `exitCode` reflects only an
actual numeric exit code and does not mask a non-null `signal`; use the `signal`
field from the close event as the indicator of termination. Keep the fix
localized to the `child.on('close', (code, signal) => ...)` logic and the
`RunResult` shape so tests using `r.exitCode` can distinguish success from crash
or kill.
---
Nitpick comments:
In `@packages/cli/tests/helpers/run.ts`:
- Around line 39-80: The run() helper can hang forever if the spawned CLI never
exits, since the Promise only resolves on child close and never force-kills the
process. Add a hard timeout inside run() around the spawn()ed child, and on
expiry call child.kill() before rejecting or resolving with a timeout result so
stuck non-interactive commands cannot leak until the test runner times out. Keep
the change scoped to run() and preserve the existing stdout/stderr capture and
close/error handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e339d247-3bd5-44df-a7f7-c84255214828
📒 Files selected for processing (4)
packages/cli/src/bin/main.tspackages/cli/tests/e2e/runner-aware-help.e2e.test.tspackages/cli/tests/e2e/smoke.e2e.test.tspackages/cli/tests/helpers/run.ts
✅ Files skipped from review due to trivial changes (1)
- packages/cli/tests/e2e/smoke.e2e.test.ts
`child.on('close')` gives `code === null` when the child was killed by a
signal. Defaulting `exitCode` to `code ?? 0` reported a crash/kill as a clean
exit, so `expect(r.exitCode).toBe(0)` could pass over a SIGSEGV/SIGKILL. Keep
`exitCode` as the real numeric code (or null on signal) and widen the type to
`number | null` so tests can tell success from a crash.
The runner-aware `--help` E2E test flaked in CI, always dropping the tail of the help (the Examples section with `bunx stash init` / `bunx stash db install`) while the earlier Usage line survived. Root cause is a Linux-only node-pty behaviour, not a race: `--help` emits ~5KB in a single `console.log` and the process exits immediately. When the pty slave closes with data still unread, Linux discards the pty buffer, so under CI load node-pty never delivers the tail — `onData` fires for the head only. macOS ptys retain that buffer, which is why it never reproduced locally. Fix: add a pipe-based `run()` helper (`child_process.spawn`, piped stdio) for non-interactive command assertions. Pipes buffer in-process and deliver every byte before `close`, so large bursts are never truncated on any platform. `run()` keeps `exitCode` as the real numeric code (null on signal) so a crash can't masquerade as a clean exit. Migrate `runner-aware-help.e2e.test.ts` to it; the PTY `render()` helper stays for interactive tests. Also pass `quiet: true` to dotenv's `config()` calls: v17 prints an `injected env (N) from …` banner to stdout on every call, so the CLI was emitting four noisy, non-deterministic banner lines ahead of its own output on every invocation. A smoke-test guard keeps the banner from returning. Mirrors the fix on fix/cli-e2e-pty-drain (#537) so this branch's CI is green independently.
The runner-aware `--help` E2E test flaked in CI, always dropping the tail of the help (the Examples section with `bunx stash init` / `bunx stash db install`) while the earlier Usage line survived. Root cause is a Linux-only node-pty behaviour, not a race: `--help` emits ~5KB in a single `console.log` and the process exits immediately. When the pty slave closes with data still unread, Linux discards the pty buffer, so under CI load node-pty never delivers the tail — `onData` fires for the head only. macOS ptys retain that buffer, which is why it never reproduced locally. Fix: add a pipe-based `run()` helper (`child_process.spawn`, piped stdio) for non-interactive command assertions. Pipes buffer in-process and deliver every byte before `close`, so large bursts are never truncated on any platform. `run()` keeps `exitCode` as the real numeric code (null on signal) so a crash can't masquerade as a clean exit. Migrate `runner-aware-help.e2e.test.ts` to it; the PTY `render()` helper stays for interactive tests. Also pass `quiet: true` to dotenv's `config()` calls: v17 prints an `injected env (N) from …` banner to stdout on every call, so the CLI was emitting four noisy, non-deterministic banner lines ahead of its own output on every invocation. A smoke-test guard keeps the banner from returning. Mirrors the fix on fix/cli-e2e-pty-drain (#537) so this branch's CI is green independently.
auxesis
left a comment
There was a problem hiding this comment.
Review
Verdict: COMMENT — no blocking issues. This is a tight, well-scoped PR: it adds quiet: true to the four dotenv config() calls to silence dotenv v17's injected env banner, and introduces a pipe-based run() E2E helper to replace flaky PTY capture for non-interactive commands. Both changes are sensible and the production change (main.ts) is directly asserted by the new smoke check. The findings below are all test-quality gaps — none point at a runtime bug.
After de-duplication, 2 distinct findings survived verification (one per source; they describe different issues, so there is no cross-model corroboration). A few additional verified-but-secondary test gaps are listed below rather than posted inline.
Review stats
| Source | Raw | Survived |
|---|---|---|
| codex (gpt-5.5) [test-gap] | 1 | 1 |
| claude (claude-opus-4-8) [test-gap] | 1 | 1 |
- Cross-model overlap: 0 kept findings were corroborated by 2+ models — the two findings are distinct (dotenv-banner assertion vacuity vs.
run()non-zero exit coverage). - Dropped: 0.
Additional findings not posted inline
These are verified but lower-severity; folded here to keep inline noise down (all single-source: claude, substantiated against the code):
run()'s signal /exitCode: nullpath is untested. The helper's headline behaviour (commitaedfd9d: keepexitCodenullon a signal-killed child instead of masking it as0) has no test. It's hard to reach through the current API —run()always spawnsSTASH_BIN, exposes no handle to signal the child, and (unlike its siblingrunPiped) has notimeoutMs/SIGKILLescape hatch to force anullcode. Consider extracting thecode → exitCodemapping for a direct unit test, or driving a self-terminating child.RunResult.stderris never asserted independently. Every consumer reads only the combinedoutput; if the stderr pipe wiring regressed (e.g. bothon('data')handlers appended to the same buffer) no test would catch it. A failure-path test asserting error text lands inr.stderrwould lock the channel split down.run()substantially duplicatesrunPiped()intests/helpers/spawn-piped.ts— both pipe-spawnSTASH_BINwithstdio: ['ignore','pipe','pipe'], accumulate stdout/stderr separately, and returnexitCode: number | null.run()adds ANSI stripping + combinedoutput/raw+CI: 'true', but drops the timeout/SIGKILLguardrunPipedcarries (spawn-piped.ts:61-64), so arun()-invoked CLI that ever hangs would stall until vitest's own timeout rather than failing fast. Not a coverage issue — flagging so the author can decide whether to converge the two helpers or port the guard over.
(No crypto/security-relevant changes in this diff.)
| // `config()` call unless `quiet: true` is passed. Guard against that | ||
| // noise regressing back into the CLI's stdout (it also destabilises the | ||
| // PTY output capture these e2e tests rely on). | ||
| expect(r.output).not.toContain('injected env') |
There was a problem hiding this comment.
[single-source: codex] This regression assertion passes vacuously — it never exercises the dotenv-injection branch it means to guard. dotenv only prints the injected env (N) from … banner when it actually loads a file. There is no .env / .env.* anywhere in the repo (confirmed), and this test spawns the CLI in the default cwd, so dotenv injects nothing and the banner never appears — with or without quiet: true. not.toContain('injected env') therefore cannot fail even if the quiet: true flags in main.ts were reverted, so it doesn't actually protect against the regression.
To exercise the branch, run the CLI in a temp cwd that contains a .env:
const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'stash-dotenv-quiet-'))
try {
fs.writeFileSync(path.join(tmpDir, '.env'), 'STASH_DOTENV_QUIET_TEST=1\n')
const r = await run(['--version'], { cwd: tmpDir })
expect(r.exitCode).toBe(0)
expect(r.output).toContain(pkg.version)
expect(r.output).not.toContain('injected env')
} finally {
fs.rmSync(tmpDir, { recursive: true, force: true })
}That fails if any config() call regresses to dotenv's default noisy behaviour. (Using the new run() helper here also sidesteps the PTY drain flakiness this PR is otherwise moving away from.)
| res({ | ||
| // `code` is null when the child was terminated by `signal`; keep it | ||
| // null rather than masking a kill/crash as a clean 0 exit. | ||
| exitCode: code, |
There was a problem hiding this comment.
[single-source: claude] run()'s non-zero exit-code propagation has no test — only exit-0 paths are exercised. Both run() consumers in runner-aware-help.e2e.test.ts assert exitCode === 0 (lines 33, 53), and every exit-1 case in the smoke suite still uses render. So this exitCode: code propagation — the whole point of the helper (commit aedfd9d: don't mask a signal-terminated child as a clean 0) — is uncovered. Classic lopsided coverage: the success exit is tested, the failure exit is not.
A drop-in e2e mirroring the existing unknown top-level command smoke case, but through run():
it('surfaces a non-zero exit code + error output for an unknown command', async () => {
const r = await run(['definitely-not-a-command'])
expect(r.exitCode).toBe(1)
expect(r.output).toContain(messages.cli.unknownCommand)
expect(r.output).toContain('definitely-not-a-command')
})Passes today; fails if run() ever regresses to coercing a non-zero/null code to 0 or drops the stream carrying the error text.
ReviewWhat it does: Fixes a flaky CLI E2E test where PTY output capture truncated the The code itself is solid — Findings
The PR's root-cause analysis is that a PTY drops the tail of a large Nothing else surfaced — the 🤖 Generated with Claude Code |
What
Fixes an intermittent failure in the CLI node-pty E2E suite where the captured PTY output buffer is only partially populated, so assertions on
r.outputsee a truncated buffer (e.g.runner-aware-help.e2e.test.tsassertingpnpm dlx stash initbut seeing only the env banner).Fixes #536
Root cause
In
packages/cli/tests/helpers/pty.tstheexitpromise resolved directly onpty.onExit.onExitis not ordered after the finalonData— node-pty (and the underlying OS pipe) can deliver the exit event while trailing stdout is still in flight. Tests doawait r.exitthen immediately readr.output(a getter over therawbuffer accumulated inonData), so on a loaded runner the trailing help text has not landed inrawwhen the assertion reads it.This is a nondeterministic timing flake, not a regression: the same commit re-run produced Node 24 fail→pass and Node 22 pass→fail — the failure hops matrix nodes on identical code. The latent helper race dates to
a4532a56(original node-pty suite); the observed failure was introduced by0560049(runner-aware help E2E coverage).Fix
Resolve
exitonly after the stdout stream has drained: ononExit, capture the exit event, then wait until no further data has arrived for a short settle window (SETTLE_MS = 50, reset by each new chunk) bounded by a hard cap (DRAIN_CAP_MS = 2000) so a chatty process can never hang the suite. A baresetImmediate/microtask is insufficient because pipe data may still be in flight after the exit event. The{ exitCode, signal? }return shape is unchanged, so no per-test changes are needed.Blast radius resolved
Single-file helper fix; every suite that does
await r.exitthen reads output benefits:smoke.e2e.test.tsrunner-aware-help.e2e.test.tsdoctor.e2e.test.tsdatabase-url.e2e.test.tsauth-login-cancel.e2e.test.tsVerification
Ran locally (macOS, darwin-arm64, Node 24; node-pty prebuild available):
pnpm --filter "stash..." build— success (JS + DTS).tests/e2e/runner-aware-help.e2e.test.ts— 5 consecutive runs, 8/8 passing each.tests/e2e/smoke.e2e.test.ts— 3 consecutive runs, 7/7 passing each.tests/e2e/suite — 6 files / 23 tests passing.Changeset
None — test-infra only, does not affect published
stashoutput.Summary by CodeRabbit
--helpoutput so help text and runner-labeled examples remain consistent across package-manager runner scenarios.run(...)helper that captures complete output reliably in CI.