Skip to content

Commit e134da7

Browse files
fix(run-engine): debounce hot-key lock contention and 5xx feedback loop (#3453)
## Changes Three changes in `internal-packages/run-engine/src/engine/systems/debounceSystem.ts`, in order of impact: 1. **Fast-path skip before the lock.** In `handleExistingRun`, do an unlocked read of `delayUntil` (and `createdAt` for the max-duration check) from the run row before entering `runLock.lock("handleDebounce", ...)`. If `newDelayUntil <= currentDelayUntil` and the run is still within its max-duration window, return the existing run immediately without taking the lock. Safe because debounce is monotonic-forward only — a stale read either matches reality or undershoots, both of which decay correctly (re-checked properly inside the lock by whichever caller is actually pushing forward). Trailing-mode triggers carrying `updateData` still take the lock so the data update is applied. 2. **Quantize `newDelayUntil`.** Round the computed `newDelayUntil` to 1-second buckets (configurable via `quantizeNewDelayUntilMs`, set to 0 to disable). Without quantization, every call has a slightly larger `newDelayUntil` than the last and they all pass the fast-path check. With it, concurrent callers on the same key share a target time and ~95% short-circuit. User-visible effect: a debounced run might fire up to 1s earlier than the strict spec — non-issue for typical debounce use cases (chat summarization, batched notifications, etc.). 3. **Graceful lock-contention fallback.** Wrap the `runLock.lock(...)` call so `LockAcquisitionTimeoutError` and Redlock `ExecutionError` / `ResourceLockedError` return the existing run id with success instead of propagating a 5xx. Debounce is best-effort: if we can't take the lock, the herd is already updating it for us; fall in line. This kills the 5xx → SDK-retry feedback loop. With (1)+(2) this rarely fires; without them it's the difference between 5xx and 200. Defaults preserve current behaviour aside from quantization (1s) and fast-path (on). Both are configurable via `RunEngineOptions.debounce`. ## ✅ Checklist - [x] I have followed every step in the [contributing guide](https://github.com/triggerdotdev/trigger.dev/blob/main/CONTRIBUTING.md) - [x] The PR title follows the convention. - [x] I ran and tested the code works --- ## Changelog Reduce 5xx feedback loops on hot debounce keys by quantizing `delayUntil`, adding an unlocked fast-path skip before the redlock, and gracefully handling redlock contention in `handleDebounce` so the SDK no longer retries into a herd. --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
1 parent 4b28080 commit e134da7

7 files changed

Lines changed: 1051 additions & 6 deletions

File tree

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
area: webapp
3+
type: fix
4+
---
5+
6+
Reduce 5xx feedback loops on hot debounce keys by quantizing `delayUntil`,
7+
adding an unlocked fast-path skip, and gracefully handling redlock
8+
contention in `handleDebounce` so the SDK no longer retries into a herd.

apps/webapp/app/env.server.ts

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -666,6 +666,21 @@ const EnvironmentSchema = z
666666
.int()
667667
.default(60_000 * 60), // 1 hour
668668

669+
/**
670+
* Bucket size in milliseconds used to quantize the newly computed `delayUntil`
671+
* in the debounce system. Quantization collapses concurrent triggers on the
672+
* same hot debounce key onto the same target time so the unlocked fast-path
673+
* skip is effective. Set to 0 to disable. Default: 1000ms (1s).
674+
*/
675+
RUN_ENGINE_DEBOUNCE_QUANTIZE_NEW_DELAY_UNTIL_MS: z.coerce.number().int().min(0).default(1000),
676+
677+
/**
678+
* Whether the unlocked fast-path skip is enabled in the debounce system.
679+
* Acts as a kill switch in case the fast-path needs to be disabled in
680+
* production without a redeploy. Default: "1" (enabled).
681+
*/
682+
RUN_ENGINE_DEBOUNCE_FAST_PATH_SKIP_ENABLED: z.string().default("1"),
683+
669684
RUN_ENGINE_WORKER_REDIS_HOST: z
670685
.string()
671686
.optional()
@@ -837,6 +852,7 @@ const EnvironmentSchema = z
837852
.default("info"),
838853
RUN_ENGINE_TREAT_PRODUCTION_EXECUTION_STALLS_AS_OOM: z.string().default("0"),
839854
RUN_ENGINE_READ_REPLICA_SNAPSHOTS_SINCE_ENABLED: z.string().default("0"),
855+
RUN_ENGINE_DEBOUNCE_USE_REPLICA_FOR_FAST_PATH_READ: z.string().default("0"),
840856

841857
/** How long should the presence ttl last */
842858
DEV_PRESENCE_SSE_TIMEOUT: z.coerce.number().int().default(30_000),

apps/webapp/app/v3/runEngine.server.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,9 @@ function createRunEngine() {
214214
// Debounce configuration
215215
debounce: {
216216
maxDebounceDurationMs: env.RUN_ENGINE_MAXIMUM_DEBOUNCE_DURATION_MS,
217+
quantizeNewDelayUntilMs: env.RUN_ENGINE_DEBOUNCE_QUANTIZE_NEW_DELAY_UNTIL_MS,
218+
fastPathSkipEnabled: env.RUN_ENGINE_DEBOUNCE_FAST_PATH_SKIP_ENABLED === "1",
219+
useReplicaForFastPathRead: env.RUN_ENGINE_DEBOUNCE_USE_REPLICA_FOR_FAST_PATH_READ === "1",
217220
},
218221
});
219222

internal-packages/run-engine/src/engine/index.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -324,6 +324,9 @@ export class RunEngine {
324324
executionSnapshotSystem: this.executionSnapshotSystem,
325325
delayedRunSystem: this.delayedRunSystem,
326326
maxDebounceDurationMs: options.debounce?.maxDebounceDurationMs ?? 60 * 60 * 1000, // Default 1 hour
327+
quantizeNewDelayUntilMs: options.debounce?.quantizeNewDelayUntilMs ?? 1000,
328+
fastPathSkipEnabled: options.debounce?.fastPathSkipEnabled ?? true,
329+
useReplicaForFastPathRead: options.debounce?.useReplicaForFastPathRead ?? false,
327330
});
328331

329332
this.pendingVersionSystem = new PendingVersionSystem({

0 commit comments

Comments
 (0)