Skip to content

fix(stream): reduce keepalive interval and handle sink closed gracefully#68

Open
85339098-afk wants to merge 1 commit into
7as0nch:mainfrom
85339098-afk:fix-stream-disconnect
Open

fix(stream): reduce keepalive interval and handle sink closed gracefully#68
85339098-afk wants to merge 1 commit into
7as0nch:mainfrom
85339098-afk:fix-stream-disconnect

Conversation

@85339098-afk

Copy link
Copy Markdown
Contributor

Fix two root causes for stream disconnect. See full description in commit.

Two root causes identified for 'stream disconnected before completion'
errors when using MiMo with CodeX (issues 7as0nch#60, 7as0nch#65):

1. 15-second keepalive interval is too long for MiMo thinking mode,
   which can have 30s+ first token delay. Reduced to 5 seconds and
   made configurable via MIMO2CODEX_KEEPALIVE_MS env var.

2. When sink closes (CodeX times out), the for-await loop returns
   immediately without yielding to the event loop, leaving the
   upstream ChatStream generator cancelled without proper cleanup.
   Added setImmediate-based yield for graceful cancellation.
@jason837

jason837 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Thanks for digging into #60/#65 🙏 Splitting the two changes:

Change 1 (keepalive 15s→5s + env override): 👍 happy to take this. Lowering the keepalive cadence is very plausibly the actual root cause for the disconnects under MiMo thinking mode. One nit: Number(x) || 5000 doesn't guard negatives — MIMO2CODEX_KEEPALIVE_MS=-1 becomes setInterval(fn, -1), which Node clamps to 1ms (keepalive storm). Could you wrap it, e.g. Math.max(1000, Number(...) || 5000)?

Change 2 (setImmediate yield): I don't think this does what the comment says, could you double-check it actually helps in isolation? Walking the call chain:

Upstream cancellation is already handled — req.on("close") → ac.abort() (server.ts:737-738) aborts the upstream fetch(..., { signal }), so the connection is torn down regardless of this yield.
The for await return already triggers the generator's .return() → finally { reader.releaseLock() } by spec; a prior event-loop tick doesn't change that.
This branch is a normal return (incomplete), not a throw, so it never reaches the catch in handleResponses, and sink.closed() is already true so no SSE error event can be emitted anyway.
So as far as I can tell the await new Promise(r => setImmediate(r)) is a no-op for cleanup. If you saw it change behavior, can you share the repro? Otherwise I'd suggest dropping it (or at least correcting the comment) so we don't leave a misleading note behind.

Last thing — per the repo's change-log rule, this needs entries in doc/tag-log.md + doc/tag-log.zh.md and a ReleaseHighlight in web/src/release-notes.tsx. Could you add those for the keepalive change? Then this is good to merge. Thanks again!

@2016011ak47

Copy link
Copy Markdown

Root cause:

eq.on('close') -> ac.abort() is too aggressive for streaming

I've been debugging the same stream disconnect issue on v0.5.28 and found another root cause beyond the keepalive fix.

In server.js, three places register
eq.on('close', () => ac.abort()). For the two streaming endpoints (/v1/chat/completions ~L708 and /v1/responses ~L945), the �c.signal is passed to callOpenAICompat() / callResponsesPassthrough(). In Node.js fetch, the AbortSignal aborts both the initial request AND the response body ReadableStream. So even after the upstream response is received and we're mid-way through pipeChatStreamToResponses, if the client (Codex) momentarily closes its TCP connection,
eq.on('close') fires -> �c.abort() kills the upstream response body -> stream breaks.

This explains why the keepalive fix alone doesn't fully resolve it — keepalive prevents idle socket timeout, but
eq.on('close') fires on any TCP close.

Fix (tested on v0.5.28)

After receiving the upstream response, remove the close listener so mid-stream client disconnects don't kill the upstream:

js const _onClientClose = () => ac.abort(); req.on('close', _onClientClose); // ... after upstreamRes = await callOpenAICompat(...): req.off('close', _onClientClose);

Applied to both streaming endpoints. This plus the keepalive fix resolves the disconnect completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants