[Bugfix #905] Replay EXIT frame to clients that connect after PTY exit#953
[Bugfix #905] Replay EXIT frame to clients that connect after PTY exit#953mohidmakhdoomi wants to merge 11 commits into
Conversation
… after PTY exit Fast-exiting commands (e.g. `exit 1`) could finish before SessionManager connected its client. The shellper broadcast the EXIT frame only to clients connected at exit time, so the late-connecting client never learned the session ended and hung until the 15s test timeout. WSL/slow disk widened the race, which is why the session-manager integration tests timed out locally while passing (when not skipped) elsewhere. Retain exit info after the PTY exits and replay an EXIT frame to any client that connects post-exit (during the HELLO/WELCOME handshake). Adds a regression test that exits the PTY before connecting and asserts the client still receives an EXIT frame. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CMAP Review (3-way)
Gemini: "The fix elegantly resolves the race condition by retaining and replaying the PTY exit frame for late-connecting clients, supported by a solid and focused regression test." Codex: "Focused fix for the late-connecting EXIT race, with an appropriate regression test and solid BUGFIX PR hygiene." (Codex could not independently re-run Vitest — its sandbox is read-only/ No REQUEST_CHANGES. No code changes required. |
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Architect Integration Review — CI follow-up (do NOT merge yet)The code fix is correct and approved on its merits, but CI's Tower Integration Tests job is failing and this is a regression introduced by this PR (main's last 3 runs of that job are green). Root cause (regression from the EXIT-replay fix)
Fix (per Codex, validated against the code)In const session = manager.getSession(terminalId);
if (!session || typeof (session as { once?: unknown }).once !== 'function') {
return Promise.resolve();
}
// The 'exit' event already fired and won't fire again — don't wait out the
// 5s safety timeout (Bugfix #905 made exits propagate before teardown attaches).
if ((session as { status?: string }).status === 'exited') {
return Promise.resolve();
}
Asks
Architect integration review |
…eady-exited sessions
The EXIT-replay fix made shellper exits propagate before teardown attaches
its once('exit') listener. waitForTerminalExit attached the listener after
the event had already fired, so an already-exited session waited out the full
5s safety timeout — A-then-B sequential teardown (3 terminals) overran the
send-integration e2e afterAll 10s budget in CI.
Short-circuit on session.status === 'exited' before attaching the listener.
Exports waitForTerminalExit and adds focused unit tests (already-exited
resolves immediately without attaching the listener; missing session resolves;
running session still resolves on the exit event).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Architect Integration Review — APPROVED ✅
Sign-off:
Note: the first Gemini pass returned REQUEST_CHANGES, but it reviewed a stale view and requested the exact fix already present in this PR; the re-run on HEAD confirmed APPROVE. Architect integration review |
…protection blocker) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixes #905
Root cause
The 3 session-manager integration tests guarded by
it.skipIf(!!process.env.CI)were timing out locally (15s each), not on CI (where they're skipped). Investigation found a genuine race condition in the shellper, not an environmental/test-only flake:ShellperProcessbroadcasts theEXITframe only to clients connected at the moment the PTY exits (pty.onExit→broadcast).handleHellosentWELCOME+REPLAYto a connecting client but never anEXITframe, even when the PTY had already exited.SessionManager.createSessionconnects its client only afterspawn → read info → waitForSocket → connect. A fast-exiting command (/bin/sh -c 'exit 1',exit 0) finishes before that connect completes, so theEXITbroadcast reaches nobody. The client then hangs forever waiting for anexitevent →session-exit/session-errornever fires → 15s test timeout.The original
skipIf(CI)(Spec 0104) was added for an unrelated reason: node-pty native module resolution fails in child node processes on GitHub Actions. That rationale is independent of this race and still valid, so the CI skip is left in place.Fix
packages/codev/src/terminal/shellper-process.ts:exitInfo) when it exits; reset it on every (re)spawn.handleHello, afterWELCOME+REPLAY, if the PTY has already exited, replay theEXITframe to the late-connecting client so it doesn't hang. No double-send: the broadcast only reaches already-connected clients, the replay only the late one.Regression test
shellper-process.test.ts: exit the PTY before any client connects, then connect and assert the client still receives anEXITframe.Results
pnpm build+ full unit suite: 152 files / 3211 tests passed, 13 skipped, 0 failed.Test plan
dist/builtCMAP Review
🤖 Generated with Claude Code