fix(execution): fix isolated-vm memory leak and add worker recycling by waleedlatif1 · Pull Request #4108 · simstudioai/sim

waleedlatif1 · 2026-04-11T16:51:03Z

Summary

Root cause: ECS app tasks were crashing with exit code 139 (SIGSEGV) after ~87 minutes of steady memory growth (~200MB/min). Three bugs:
1. Host-side isolated-vm handles (Context, Script, Reference, ExternalCopy) were never explicitly released — only isolate.dispose() was called in the finally block. Per isolated-vm issue #198, disposing an isolate while child handles still exist causes stuck-GC states and native memory accumulation outside the V8 heap. Per issue #42, unreleased References hold their parent context alive indefinitely.
2. Workers had no execution limit — under sustained load the 60s idle timeout never fires, so workers run indefinitely and accumulate V8 heap fragmentation that persists even with correct per-execution cleanup.
3. --no-node-snapshot was missing from worker spawns. isolated-vm v6 README explicitly requires this for Node.js 20+ due to shared V8 snapshot heap incompatibility (issue #377).
Fix 1 (isolated-vm-worker.cjs): Hoist all ivm handle declarations before the try block so finally can reach them. Release children (scripts, references, external copies, context) before calling isolate.dispose() — correct order per the library's issue tracker. All releases wrapped in try/catch so a bad isolate state can't crash the worker.
Fix 2 (isolated-vm.ts): Add graceful worker recycling after N lifetime executions (default 500, tunable via IVM_MAX_EXECUTIONS_PER_WORKER). Workers are marked retiring — no new dispatches, in-flight executions drain normally, then the worker is cleaned up and a replacement spawns automatically. Zero user-visible impact.
Fix 3 (isolated-vm.ts): Add --no-node-snapshot flag to worker spawn — required by isolated-vm v6 for Node.js 20+.

References

Type of Change

Bug fix

Testing

Tested manually — build passes clean

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

vercel · 2026-04-11T16:51:10Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		Apr 11, 2026 5:54pm

cursor · 2026-04-11T16:51:14Z

PR Summary

Medium Risk
Touches the isolated execution worker pool and cleanup logic; errors could cause stalled executions or reduced capacity, but changes are scoped to resource management and process spawning.

Overview
Fixes native memory growth in isolated-vm execution by explicitly tracking and release()ing all host-side handles (Context, Script, Reference, ExternalCopy, callbacks) before isolate.dispose() in the worker.

Adds worker recycling in the pool: workers track lifetimeExecutions, get marked retiring after IVM_MAX_EXECUTIONS_PER_WORKER (new env var, default 500), stop receiving new work, and are cleaned up once idle; pool sizing/selection now ignores retiring workers. Also spawns workers with node --no-node-snapshot for Node 20+ compatibility.

^{Reviewed by Cursor Bugbot for commit e05a36c. Configure here.}

greptile-apps · 2026-04-11T17:03:02Z

Greptile Summary

This PR fixes three root causes of SIGSEGV crashes (exit code 139) in ECS app tasks: (1) isolated-vm child handles (Script, Reference, Callback, ExternalCopy, Context) were not explicitly released before isolate.dispose(), causing stuck-GC states and native memory accumulation; (2) workers had no lifetime execution cap, accumulating V8 heap fragmentation indefinitely; (3) --no-node-snapshot was missing, causing heap incompatibility on Node.js 20+. The previous reviewer concern about retiring workers blocking the pool has been resolved in commit 33bf94a — retiring workers are now excluded from pool-size calculations and a replacement is spawned proactively.

Confidence Score: 5/5

Safe to merge — all three root causes of the SIGSEGV crash are correctly addressed and the previous P1 pool-sizing concern is resolved.

No P0 or P1 findings remain. The handle-release ordering (children before isolate.dispose), the pool-size accounting fix, and the --no-node-snapshot flag are all implemented correctly. The lifetimeExecutions counter is correctly incremented in the normal result and timeout paths with no double-counting risk. The retiring-worker drain-then-cleanup flow is sound.

No files require special attention.

Important Files Changed

Filename	Overview
apps/sim/lib/execution/isolated-vm-worker.cjs	Hoists all ivm handle declarations before try block; releases Script, Callback, Reference, ExternalCopy, and Context in child-before-parent order in finally; wraps each release in try/catch to survive bad-isolate states.
apps/sim/lib/execution/isolated-vm.ts	Adds lifetimeExecutions counter and retiring flag to WorkerInfo; marks workers for retirement after MAX_EXECUTIONS_PER_WORKER executions; excludes retiring workers from pool-size accounting in acquireWorker and drainQueue; adds --no-node-snapshot to worker spawn args.
apps/sim/lib/core/config/env.ts	Adds IVM_MAX_EXECUTIONS_PER_WORKER env var (default 500) used to tune the new worker recycling threshold.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([Execution request]) --> B[acquireWorker]
    B --> C{Non-retiring worker\nwith capacity?}
    C -- Yes --> D[dispatchToWorker]
    C -- No, pool not full\nexcluding retiring --> E[spawnWorker]
    E --> D
    C -- Pool at capacity --> F[enqueueExecution]

    D --> G[[Worker isolate executes]]
    G --> H{Complete / Timeout / SendFail}

    H -- Normal result --> I[lifetimeExecutions++]
    H -- Timeout +1s --> J[lifetimeExecutions++]
    H -- SendFail --> K[resolve error]

    I --> L{>= MAX_EXECUTIONS_PER_WORKER?}
    J --> L
    L -- Yes --> M[retiring = true]
    L -- No --> N{retiring &&\nactiveExecutions == 0?}
    M --> N
    N -- Yes --> O[cleanupWorker]
    N -- No --> P[resetWorkerIdleTimeout]
    K --> P

    O --> Q[resolve result / error]
    P --> Q
    Q --> R[drainQueue]
    R --> S{Queue items?\npool below POOL_SIZE?}
    S -- Yes --> E
    S -- No --> T([Done])

_{Reviews (4): Last reviewed commit: "fix(execution): increment lifetimeExecut..." | Re-trigger Greptile}

apps/sim/lib/execution/isolated-vm.ts

…pool sizing

waleedlatif1 · 2026-04-11T17:29:02Z

@greptile

waleedlatif1 · 2026-04-11T17:29:06Z

@cursor review

apps/sim/lib/execution/isolated-vm.ts

…ze check

waleedlatif1 · 2026-04-11T17:42:38Z

@greptile

waleedlatif1 · 2026-04-11T17:42:42Z

@cursor review

apps/sim/lib/execution/isolated-vm.ts

waleedlatif1 · 2026-04-11T17:59:37Z

@greptile

waleedlatif1 · 2026-04-11T17:59:40Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit e05a36c. Configure here.}

fix(execution): fix isolated-vm memory leak and add worker recycling

5ad269b

greptile-apps bot reviewed Apr 11, 2026

View reviewed changes

apps/sim/lib/execution/isolated-vm.ts Show resolved Hide resolved

fix(execution): mirror retirement check in send-failure path and fix …

33bf94a

…pool sizing

vercel bot temporarily deployed to Preview April 11, 2026 17:22 Inactive

chore(execution): remove verbose comments from isolated-vm changes

64c38cd

vercel bot temporarily deployed to Preview April 11, 2026 17:25 Inactive

cursor bot reviewed Apr 11, 2026

View reviewed changes

apps/sim/lib/execution/isolated-vm.ts Show resolved Hide resolved

fix(execution): apply retiring-worker exclusion to drainQueue pool si…

dc8e7f1

…ze check

vercel bot temporarily deployed to Preview April 11, 2026 17:42 Inactive

cursor bot reviewed Apr 11, 2026

View reviewed changes

apps/sim/lib/execution/isolated-vm.ts Show resolved Hide resolved

fix(execution): increment lifetimeExecutions on parent-side timeout

e05a36c

vercel bot temporarily deployed to Preview April 11, 2026 17:54 Inactive

cursor bot reviewed Apr 11, 2026

View reviewed changes

waleedlatif1 merged commit 20cc018 into staging Apr 11, 2026
12 checks passed

waleedlatif1 deleted the fix/isolated-vm-memory-leak branch April 11, 2026 18:22

waleedlatif1 mentioned this pull request Apr 11, 2026

v0.6.37: audit logs page, isolated-vm worker rotation, permission groups ui #4114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(execution): fix isolated-vm memory leak and add worker recycling#4108

fix(execution): fix isolated-vm memory leak and add worker recycling#4108
waleedlatif1 merged 5 commits intostagingfrom
fix/isolated-vm-memory-leak

waleedlatif1 commented Apr 11, 2026

Uh oh!

vercel bot commented Apr 11, 2026 •

edited

Loading

Uh oh!

cursor bot commented Apr 11, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

waleedlatif1 commented Apr 11, 2026

Uh oh!

waleedlatif1 commented Apr 11, 2026

Uh oh!

Uh oh!

waleedlatif1 commented Apr 11, 2026

Uh oh!

waleedlatif1 commented Apr 11, 2026

Uh oh!

Uh oh!

waleedlatif1 commented Apr 11, 2026

Uh oh!

waleedlatif1 commented Apr 11, 2026

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

waleedlatif1 commented Apr 11, 2026

Summary

References

Type of Change

Testing

Checklist

Uh oh!

vercel bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

greptile-apps bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

waleedlatif1 commented Apr 11, 2026

Uh oh!

waleedlatif1 commented Apr 11, 2026

Uh oh!

Uh oh!

waleedlatif1 commented Apr 11, 2026

Uh oh!

waleedlatif1 commented Apr 11, 2026

Uh oh!

Uh oh!

waleedlatif1 commented Apr 11, 2026

Uh oh!

waleedlatif1 commented Apr 11, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Apr 11, 2026 •

edited

Loading

cursor bot commented Apr 11, 2026 •

edited

Loading

greptile-apps bot commented Apr 11, 2026 •

edited

Loading