Fix/orphaned worker processes by thegodtune · Pull Request #3481 · triggerdotdev/trigger.dev

thegodtune · 2026-04-30T17:13:24Z

✅ Checklist

I have followed every step in the contributing guide
The PR title follows the convention
I ran and tested the code works

Testing

Ran existing taskRunProcess.test.ts suite, passes clean
Added taskRunProcessPool.test.ts covering getAllPids() on a fresh pool
Integration test against local self-hosted instance:
1. Started CLI with locally built binary against http://localhost:3030
2. Triggered a 60-second sleep task to force worker processes to spawn
3. Confirmed active-runs.json contained real PIDs in workerPids
4. Sent kill -9 <CLI_PID>, bypassing all signal handlers, exactly what pnpm does
5. Waited 5 seconds for watchdog poll cycle
6. ps -p <pid1>,<pid2>,<pid3> -> All dead ✔️

Changelog

Fix orphaned trigger-dev-run-worker processes that accumulate and consume significant CPU when the CLI is killed ungracefully via SIGKILL. The watchdog now reads worker PIDs from active-runs.json and kills them when the parent CLI process dies.

Screenshots

Fix: Kill orphaned worker processes when the CLI is killed ungracefully

Problem

When running trigger.dev dev through pnpm and the session is stopped, trigger-dev-run-worker child processes are left alive on the machine, consuming significant CPU, up to 450%+ combined after several restarts.

The root cause is how pnpm handles process termination. When you do Ctrl+C, pnpm sends SIGKILL directly to the CLI process, not SIGTERM. SIGKILL cannot be caught or handled. Node.js signal handlers (process.on("SIGINT", ...), process.on("SIGTERM", ...)) never run. The graceful shutdown() path, which calls taskRunProcessPool.shutdown() and kills all tracked worker processes, is bypassed entirely.

On Linux and macOS, child processes are not automatically killed when their parent dies. So the trigger-dev-run-worker processes spawned by TaskRunProcess.initialize() via fork() continue running indefinitely, orphaned, with no parent to report to and no work to do.

What already existed

PR #3191 introduced a detached watchdog process (devWatchdog.ts) that survives SIGKILL and handles server-side cleanup. It polls for parent death, then calls /engine/v1/dev/disconnect to cancel in-flight runs on the server. This is correct and important.

However, the watchdog only addresses the server's view of those runs. It does not kill the actual OS-level worker processes on the user's machine. Those processes keep running regardless of what the API call does.

How I found it

Tracing the codebase from the issue report:

DevSupervisor.init() registers SIGINT/SIGTERM handlers and spawns the watchdog, but those handlers are unreachable under SIGKILL.
TaskRunProcessPool manages two maps: availableProcessesByVersion (idle, reusable processes) and busyProcessesByVersion (actively executing). Both are populated with TaskRunProcess instances, each wrapping a forked child process with a known PID.
DevSupervisor.#updateActiveRunsFile() writes active-runs.json to .trigger/ in the user's project directory, the file the watchdog reads on parent death. It contained parentPid and runFriendlyIds but not the worker PIDs.
devWatchdog.ts reads that file in onParentDied(), calls disconnect, and exits. No process killing.

The gap: the watchdog had everything it needed to cancel runs on the server, but no information about which OS processes to kill locally.

What I changed

Three files, one new test file.

1. `packages/cli-v3/src/dev/taskRunProcessPool.ts`

Added getAllPids(), which collects PIDs from both the available and busy process maps:

getAllPids(): number[] {
  const pids: number[] = [];
  for (const processes of this.availableProcessesByVersion.values()) {
    for (const process of processes) {
      if (process.pid !== undefined) pids.push(process.pid);
    }
  }
  for (const processSet of this.busyProcessesByVersion.values()) {
    for (const process of processSet) {
      if (process.pid !== undefined) pids.push(process.pid);
    }
  }
  return pids;
}

This includes both idle pooled processes and actively executing ones; both are orphaned under SIGKILL.

2. `packages/cli-v3/src/dev/devSupervisor.ts`

Two changes here:

#updateActiveRunsFile(), now includes workerPids alongside the existing fields:

const data = {
  parentPid: process.pid,
  runFriendlyIds: Array.from(this.runControllers.keys()),
  workerPids: this.taskRunProcessPool?.getAllPids() ?? [],
};

Periodic refresh interval, I discovered during testing that a timing issue exists: worker processes are spawned after #updateActiveRunsFile() is first called when a run is dequeued, so the file would be written before the PID existed, leaving workerPids empty. A 2-second refresh interval keeps the file current as processes enter and leave the pool:

// In init():
this.activeRunsUpdateInterval = setInterval(() => {
  this.#updateActiveRunsFile();
}, 2_000);

// In shutdown():
if (this.activeRunsUpdateInterval) {
  clearInterval(this.activeRunsUpdateInterval);
}

The interval is cleared on clean shutdown, so it doesn't interfere with the normal Ctrl+C exit path.

3. `packages/cli-v3/src/dev/devWatchdog.ts`

Updated readActiveRuns() to return the new workerPids field, and added killWorkerProcesses() called at the start of onParentDied():

async function killWorkerProcesses(pids: number[]): Promise<void> {
  for (const pid of pids) {
    try { process.kill(pid, "SIGTERM"); } catch { /* Already dead */ }
  }

  if (pids.length === 0) return;

  await new Promise((resolve) => setTimeout(resolve, 3_000));

  for (const pid of pids) {
    try {
      process.kill(pid, 0);
      process.kill(pid, "SIGKILL");
    } catch { /* Already dead, good */ }
  }
}

Worker processes are killed before the disconnect API call; there's no dependency between the two, but it makes sense to handle the local machine first.

How I tested it

Unit tests: ran the existing taskRunProcess.test.ts suite (passes clean) and added taskRunProcessPool.test.ts covering getAllPids() returning an empty array on a fresh pool and returning only defined numeric values.

Integration test against a local self-hosted instance: ran the full Docker stack and webapp locally, then:

Started the CLI with the locally built binary pointed at http://localhost:3030
Triggered a long-running task (60-second sleep) to force worker processes to spawn
Confirmed active-runs.json contained real PIDs in workerPids
Sent SIGKILL to the CLI process (kill -9 <pid>) — bypassing all signal handlers, exactly what pnpm does
Waited 5 seconds for the watchdog's poll cycle to detect parent death
Checked all worker PIDs: ps -p <pid1>,<pid2>,<pid3> → All dead ✓

Before this fix, the same sequence left all worker processes running indefinitely. After, they're gone within the watchdog's poll interval.

Backward compatibility

The change to active-runs.json is additive; workerPids defaults to [] if the field is missing, so any existing watchdog reading an old-format file degrades gracefully. The periodic interval only runs during an active dev session and is always cleared on clean shutdown, leaving the normal Ctrl+C path completely unaffected.

Fixes #2909

When pnpm sends SIGKILL to the CLI process tree, SIGINT/SIGTERM handlers never run, leaving trigger-dev-run-worker child processes alive as zombies consuming significant CPU. The existing watchdog (triggerdotdev#3191) handles server-side run cancellation but does not kill the OS-level worker processes. This fix: - Adds getAllPids() to TaskRunProcessPool to collect PIDs from both available and busy process maps - Periodically refreshes active-runs.json (every 2s) so workerPids stays current as processes are spawned and returned to the pool - Extends the watchdog's onParentDied() to SIGTERM all tracked worker PIDs, wait 3s, then SIGKILL any survivors before calling disconnect Fixes triggerdotdev#2909

changeset-bot · 2026-04-30T17:13:29Z

🦋 Changeset detected

Latest commit: 85952d8

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 29 packages

Name	Type
trigger.dev	Major
d3-chat	Patch
references-d3-openai-agents	Patch
references-nextjs-realtime	Patch
references-realtime-hooks-test	Patch
references-realtime-streams	Patch
references-telemetry	Patch
@trigger.dev/build	Major
@trigger.dev/core	Major
@trigger.dev/python	Major
@trigger.dev/react-hooks	Major
@trigger.dev/redis-worker	Major
@trigger.dev/rsc	Major
@trigger.dev/schema-to-json	Major
@trigger.dev/sdk	Major
@trigger.dev/database	Major
@trigger.dev/otlp-importer	Major
@internal/cache	Patch
@internal/clickhouse	Patch
@internal/llm-model-catalog	Patch
@internal/redis	Patch
@internal/replication	Patch
@internal/run-engine	Patch
@internal/schedule-engine	Patch
@internal/testcontainers	Patch
@internal/tracing	Patch
@internal/tsql	Patch
@internal/zod-worker	Patch
@internal/sdk-compat-tests	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

coderabbitai · 2026-04-30T17:13:40Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 43efd465-a958-4eb4-818d-e856f54b4c41

📥 Commits

Reviewing files that changed from the base of the PR and between 19c1675 and 85952d8.

📒 Files selected for processing (5)

.changeset/silly-planes-march.md
packages/cli-v3/src/dev/devSupervisor.ts
packages/cli-v3/src/dev/devWatchdog.ts
packages/cli-v3/src/dev/taskRunProcessPool.test.ts
packages/cli-v3/src/dev/taskRunProcessPool.ts

Walkthrough

This PR implements cleanup logic for orphaned trigger-dev-run-worker processes when the CLI is forcefully terminated. It introduces a getAllPids() method to TaskRunProcessPool to enumerate process identifiers, updates devSupervisor to periodically record worker process IDs in active-runs.json, and modifies devWatchdog to terminate tracked worker processes using SIGTERM with a 3-second grace period followed by SIGKILL for any remaining processes. A new changeset documents this as a major release behavior, and a test suite validates the PID enumeration functionality.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Key observations

The changes introduce process lifecycle management across multiple modules with interdependent file I/O and signal handling
SIGTERM/SIGKILL termination logic in devWatchdog requires careful review for timing and edge cases (e.g., processes that die between SIGTERM and SIGKILL checks)
The periodic update interval in devSupervisor (2-second cadence) and its interaction with watchdog lifecycle should be validated
New getAllPids() method is straightforward but relies on correct iteration of both available and busy process maps
Test coverage for the new method is present but limited to empty-pool and numeric-values assertions

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-30T17:13:40Z

Hi @thegodtune, thanks for your interest in contributing!

This project requires that pull request authors are vouched, and you are not in the list of vouched users.

This PR will be closed automatically. See https://github.com/triggerdotdev/trigger.dev/blob/main/CONTRIBUTING.md for more details.

thegodtune · 2026-04-30T17:44:15Z

Note on approach vs. previous attempts (#2993, #3041):
Previous PRs addressing this issue added SIGINT/SIGTERM signal handlers in dev.ts to ensure clean worker shutdown on polite termination. That approach is correct for the graceful exit path.
This PR specifically targets the SIGKILL path, which is what pnpm uses when you press Ctrl+C during pnpm dlx trigger.dev dev. SIGKILL cannot be caught or handled by any signal handler, so the cleanup code in dev.ts never runs regardless of what handlers are registered. The existing watchdog introduced in #3191 already survives SIGKILL (since it's detached and unref'd), but it only handled server-side run cancellation, not the OS-level worker processes still running on the user's machine.
This fix extends the watchdog to also kill those processes directly, which is the only path available after SIGKILL. The two approaches are complementary, signal handlers for graceful stops, watchdog PID cleanup for hard kills.

thegodtune added 2 commits April 30, 2026 16:53

chore: add changeset

85952d8

github-actions Bot closed this Apr 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/orphaned worker processes#3481

Fix/orphaned worker processes#3481
thegodtune wants to merge 2 commits intotriggerdotdev:mainfrom
thegodtune:fix/orphaned-worker-processes

thegodtune commented Apr 30, 2026

Uh oh!

changeset-bot Bot commented Apr 30, 2026

Uh oh!

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading

Review failed

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

thegodtune commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thegodtune commented Apr 30, 2026

✅ Checklist

Testing

Changelog

Screenshots

Fix: Kill orphaned worker processes when the CLI is killed ungracefully

Problem

What already existed

How I found it

What I changed

1. packages/cli-v3/src/dev/taskRunProcessPool.ts

2. packages/cli-v3/src/dev/devSupervisor.ts

3. packages/cli-v3/src/dev/devWatchdog.ts

How I tested it

Backward compatibility

Uh oh!

changeset-bot Bot commented Apr 30, 2026

🦋 Changeset detected

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Estimated code review effort

Key observations

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

thegodtune commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `packages/cli-v3/src/dev/taskRunProcessPool.ts`

2. `packages/cli-v3/src/dev/devSupervisor.ts`

3. `packages/cli-v3/src/dev/devWatchdog.ts`

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading