fix(ipc): bound native dispose() with a timeout so job children always exit#1906
Open
rinarakaki wants to merge 1 commit into
Open
fix(ipc): bound native dispose() with a timeout so job children always exit#1906rinarakaki wants to merge 1 commit into
rinarakaki wants to merge 1 commit into
Conversation
…s exit The process job child awaits dispose() (native FFI teardown) without a timeout before process.exit(0). If native disposal hangs on a handle that never drains, the child never exits and lingers indefinitely holding the job's full RSS while still answering supervisor pings, so neither the orphan reaper nor the no-op SIGTERM handler reclaims it. Race dispose() against DISPOSE_TIMEOUT (10s) and fall through to process.exit(0) on expiry. dispose() is still attempted first, preserving the libc++abi guard (livekit/node-sdks#564) on the normal path; the timeout only turns the pathological hang-forever case into exit-anyway. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 19773f6 The changes in this PR will be included in the next version bump. This PR includes changesets to release 35 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1905.
Problem
The process job child awaits
dispose()(native FFI teardown) without a timeout beforeprocess.exit(0)(agents/src/ipc/job_proc_lazy_main.ts). If native disposal hangs on a handle that never drains, the child never exits — it lingers indefinitely holding the job's full RSS while still answering supervisor pings, so neither the orphan reaper nor the no-opSIGTERMhandler reclaims it. On a long-running fleet these accumulate into a multi-day memory climb and eventual OOM. Full production trace in #1905.Fix
Race
dispose()against aDISPOSE_TIMEOUT(10s) and fall through toprocess.exit(0)on expiry.dispose()is still attempted first, so the normal path keeps the libc++abi-crash guard (livekit/node-sdks#564) intact; the timeout only changes the pathological case from "hang forever" to "exit anyway". A crash on exit is strictly preferable to an immortal zombie holding the job's full RSS.Notes
@livekit/agentspatch).dispose()resolves well under 10s); the timeout only fires when disposal is genuinely wedged.🤖 Generated with Claude Code