feat(run-ops): webapp db topology, flags, and split-mode resolver wiring#4117
Conversation
|
WalkthroughThis PR introduces a gated dual-database "run-ops split" architecture separating control-plane data from run-scoped task/batch data. It adds a cache-first ChangesRelated PRs: None identified. Suggested labels: Suggested reviewers: None specified. Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
88d1290 to
d5610a9
Compare
413a945 to
99643f8
Compare
@trigger.dev/build
trigger.dev
@trigger.dev/core
@trigger.dev/python
@trigger.dev/react-hooks
@trigger.dev/redis-worker
@trigger.dev/rsc
@trigger.dev/schema-to-json
@trigger.dev/sdk
commit: |
460477f to
1af2bab
Compare
cdc4eb9 to
e0b35d5
Compare
f3248e0 to
fc39e05
Compare
8024e36 to
f9b9b0b
Compare
fc39e05 to
49667ee
Compare
f9b9b0b to
0937b15
Compare
49667ee to
f2c2b2c
Compare
0937b15 to
729daf1
Compare
f2c2b2c to
4ba7198
Compare
729daf1 to
bd6fc79
Compare
4ba7198 to
0d2f934
Compare
bd6fc79 to
a7e0846
Compare
0d2f934 to
8b126c6
Compare
a7e0846 to
4119616
Compare
9effd74 to
d46aab2
Compare
d087c25 to
b554794
Compare
…ngine wiring Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ape only Migration/drain is deferred, so residency is decided purely by id-shape (ownerEngine): 25-char cuid -> LEGACY, 27-char ksuid -> NEW, unclassifiable -> LEGACY. This is behavior-preserving in production, which never injected a custom isKnownMigrated and, with no migration, always saw the default false. - delete knownMigratedFilter.server.ts + its test - readThrough: drop the isKnownMigrated dep + migrated short-circuit; KEEP the unclassifiable->LEGACY new-then-legacy fallback - resolveInheritedMintKind: collapse to pure ownerEngine id-shape (no deps) - mintBatchFriendlyId: drop isKnownMigrated/isSplitEnabled from ResolveDeps - runEngineHandlersShared: drop isKnownMigrated from EventReadDeps/readRunForEvent (batch-write residency probe via newReplica.batchTaskRun.findFirst is untouched) - tests: delete injected-marker cases, keep pure id-shape assertions Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eration labels Add a pure unit test for ControlPlaneCache covering per-slot round-trips, null-vs-miss distinction, epoch-based invalidation, per-slot key isolation, bounded eviction, and TTL expiry. Add a testcontainer test for probeDistinctDatabases covering distinct clusters, same physical database (with reason), same-cluster-different-database, and fail-closed probe failure. Strip developer-enumeration labels from three existing test files (readThrough step numbers, runEngineHandlers Test-X comments) and rename the run-detail loader read-through test to drop the non-domain "shape 1" name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… deps apps/webapp/package.json declares @internal/run-ops-database (workspace) and @testcontainers/postgresql but the lockfile importer entry was never regenerated, so pnpm install --frozen-lockfile fails for the webapp. Regenerate the importer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Enabling RUN_OPS_SPLIT_ENABLED without REALTIME_BACKEND_NATIVE_ENABLED silently breaks realtime: Electric replicates only from the control-plane DB, so NEW-resident (ksuid) runs on the dedicated run-ops DB are invisible and every realtime subscription hangs. Add a boot-time interlock that refuses split mode in that misconfiguration, mirroring the existing distinct-DB data-loss sentinel. The check is a pure predicate (assertSplitRealtimeInterlock) run synchronously inside assertRunOpsSplitSentinel on the same eager-boot path, failing fast before the async DB probe and before any run-ops routing is wired. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n diagnostics - gate runOpsTopology splitEnabled on RUN_OPS_SPLIT_ENABLED so provisioning both DSNs before flipping the flag cannot open a second pool or route writes ahead of the distinct-DB sentinel - rethrow the original UnclassifiableRunId in the cross-seam guard so its value/valueLength keep reflecting the real waitpoint id - log run-found-but-environment-unresolved distinctly from missing-run - correct the RUN_OPS_DATABASE_URL doc comment (Prisma datasource, not the webapp runtime pool) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…by worker-mock fix)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uth-env through the cache-first resolver The ControlPlaneCache served env/org data with no invalidation, so admin/control-plane writes were only reflected after the TTL. Add two invalidation scopes to the cache (invalidateEnvironment for one env's slots; invalidateOrganization via a per-org epoch that env/authEnv values are stamped with, so all of an org's cached rows drop with no reverse index), expose them on the resolver, and call them at every write site that mutates cache-served data: pause/resume, archive, env/org concurrency + burst-factor, API-key regeneration, feature flags, API/batch rate limits, runs enable/disable, org + project delete, and stream-basin provisioning. Also extend the resolver's authenticated-env slot to carry `git` and make the run-engine adapter's resolveAuthenticatedEnv delegate to the cache-first, split-aware resolver instead of issuing its own $replica.findFirst, so it honors splitEnabled() and the cache like its siblings while still returning `git` and the deleted-project guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… OFF With the split OFF there is a single DB, so a run and its environment are co-located and there is no cross-seam FK/check to replace (matches main). Skip the always-on hot-path read in that branch; the split-ON branch is unchanged (cache-first, throws on a genuinely missing env). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
b554794 to
071cdc1
Compare
… routing, run lifecycle (#4118) ## What Routes the webapp write path through the run-ops split seam: trigger/batch minting, idempotency-key resolution, and the run-lifecycle services now determine residency and dispatch writes to the correct store. - **Trigger & batch** (`runEngine/services/triggerTask.server.ts`, `batchTrigger.server.ts`, `createBatch.server.ts`, `streamBatchItems.server.ts`, `v3/services/batchTriggerV3.server.ts`): mint ids with the run-ops-aware minting and route creation/streaming through the store; batch children inherit the parent's residency. - **Idempotency** (`runEngine/concerns/idempotencyKeys.server.ts` + new `idempotencyResidency.server.ts`): idempotency-key lookup/dedup is residency-aware so a keyed retrigger resolves against the store that owns the original run. - **Run lifecycle services** (`createCheckpoint`, `createTaskRunAttempt`, `enqueueDelayedRun`, `expireEnqueuedRun`, `finalizeTaskRun`, `resumeBatchRun`, `cancelDevSessionRuns`, `executeTasksWaitingForDeploy`, `triggerFailedTask`): resolve their target run through the store rather than a fixed client. - **Reads that fan out from writes** (`runsRepository` + `clickhouseRunsRepository`, `BulkActionV2` + batch read-through, realtime `sessions`/`runReader`, alerts `deliverAlert`/`performTaskRunAlerts`): route through the read-through resolver. - `9535ae63d` — resolves the parent run through an injectable run store in `TriggerFailedTaskService`. - `bf8f7c881` — drops the "known-migrated" concept from write-path and read repos; residency is id-shape only. - `515b897ea` — self-defaults `resolveWaitpointThroughReadThrough` to the safe run-ops clients. ## Why PR6 of the run-ops split stack. This is the write-path counterpart to the read foundation in the previous PRs: with it in place, both reads and writes route through the seam. Additive when the split is disabled (id-shape resolution collapses to the control-plane client); behavior-changing on the minting, idempotency, and lifecycle paths when enabled. ## Tests Large new/expanded vitest suite under `apps/webapp/test/` and colocated service tests: trigger-task and batch-trigger store routing, residency inheritance, idempotency dedup residency + legacy-authority, bulk-action read routing, cancel-dev-session routing, alerts store routing, runs-repository read-through, realtime session/run-reader read-through and stream-registration routing, and the waitpoint read-through default. Testcontainers-backed; no mocks. ## Notes Draft, **stacked on #4117** (`runops/pr05-webapp-foundation`). Review that first; this diff is against it. Server-change / changeset note to be added at stack-assembly time. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What
Wires the run-ops split into the webapp: database topology, environment flags, split-mode gating, and the control-plane resolver/cache layer that the run-store and run-engine seams from the previous PR plug into.
apps/webapp/app/db.server.ts,env.server.ts,entry.server.tsx): adds the run-ops database clients/topology and the environment variables that configure and gate the split.apps/webapp/app/v3/runOpsMigration/): the webapp-side machinery —splitMode.server.ts,controlPlaneResolver.server.ts+controlPlaneCache.server.ts,readThrough.server.ts,crossSeamGuard.server.ts,distinctDbSentinel.server.ts, id-minting helpers (mintBatchFriendlyId,runOpsMintKind,resolveInheritedMintKind),runOpsCascadeCleanup.server.ts, the split read gate, and route/unblock catalogs.app/v3/runStore.server.ts,runEngine.server.ts,runEngineHandlers.server.ts+ newrunEngineHandlersShared.server.ts): points the webapp's store/engine construction at the resolver, and factors shared handler logic out so both seams use one path.runtimeEnvironment.server.ts,eventRepository/index.server.ts,taskRunHeartbeatFailed.server.ts,engineVersion.server.tsroute their run/environment lookups read-through the resolver.413a94511— interlocks split mode against the native realtime backend so the two aren't enabled in an incompatible combination (see.server-changes/run-ops-split-realtime-interlock.md).dc74c57fd— drops the earlier "known-migrated" read layer; residency is determined by id-shape only.Why
PR5 of the run-ops split stack. This is the webapp foundation layer: it stands up the DB topology, flags, and resolver/cache the rest of the stack depends on, and repoints webapp read paths through the resolver. Additive when the split is not enabled (existing single-DB behavior preserved behind flags); behavior-changing on the read-through paths and the realtime interlock.
Tests
New vitest coverage across
apps/webapp/test/and colocated*.server.test.tsfiles: db topology, split mode, split read gate, cross-seam guard, mint cutover / flip latency, control-plane cache, control-plane resolver, distinct-db sentinel, read-through loaders (route loaders, run-detail loaders,findEnvironmentFromRun), and the run-engine handlers. Testcontainers-backed; no mocks.pnpm-lock.yamlsynced for the two new webapp deps.Notes
Draft, stacked on #4116 (
runops/pr04-store-engine). Review that first; this diff is against it.Server-change / changeset note to be added at stack-assembly time.
🤖 Generated with Claude Code