Skip to content

feat(run-ops): webapp db topology, flags, and split-mode resolver wiring#4117

Merged
d-cs merged 14 commits into
mainfrom
runops/pr05-webapp-foundation
Jul 3, 2026
Merged

feat(run-ops): webapp db topology, flags, and split-mode resolver wiring#4117
d-cs merged 14 commits into
mainfrom
runops/pr05-webapp-foundation

Conversation

@d-cs

@d-cs d-cs commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

What

Wires the run-ops split into the webapp: database topology, environment flags, split-mode gating, and the control-plane resolver/cache layer that the run-store and run-engine seams from the previous PR plug into.

  • DB topology & env (apps/webapp/app/db.server.ts, env.server.ts, entry.server.tsx): adds the run-ops database clients/topology and the environment variables that configure and gate the split.
  • runOpsMigration module (new apps/webapp/app/v3/runOpsMigration/): the webapp-side machinery — splitMode.server.ts, controlPlaneResolver.server.ts + controlPlaneCache.server.ts, readThrough.server.ts, crossSeamGuard.server.ts, distinctDbSentinel.server.ts, id-minting helpers (mintBatchFriendlyId, runOpsMintKind, resolveInheritedMintKind), runOpsCascadeCleanup.server.ts, the split read gate, and route/unblock catalogs.
  • Store/engine wiring (app/v3/runStore.server.ts, runEngine.server.ts, runEngineHandlers.server.ts + new runEngineHandlersShared.server.ts): points the webapp's store/engine construction at the resolver, and factors shared handler logic out so both seams use one path.
  • Read-path touch-ups: runtimeEnvironment.server.ts, eventRepository/index.server.ts, taskRunHeartbeatFailed.server.ts, engineVersion.server.ts route their run/environment lookups read-through the resolver.
  • 413a94511 — interlocks split mode against the native realtime backend so the two aren't enabled in an incompatible combination (see .server-changes/run-ops-split-realtime-interlock.md).
  • dc74c57fd — drops the earlier "known-migrated" read layer; residency is determined by id-shape only.

Why

PR5 of the run-ops split stack. This is the webapp foundation layer: it stands up the DB topology, flags, and resolver/cache the rest of the stack depends on, and repoints webapp read paths through the resolver. Additive when the split is not enabled (existing single-DB behavior preserved behind flags); behavior-changing on the read-through paths and the realtime interlock.

Tests

New vitest coverage across apps/webapp/test/ and colocated *.server.test.ts files: db topology, split mode, split read gate, cross-seam guard, mint cutover / flip latency, control-plane cache, control-plane resolver, distinct-db sentinel, read-through loaders (route loaders, run-detail loaders, findEnvironmentFromRun), and the run-engine handlers. Testcontainers-backed; no mocks. pnpm-lock.yaml synced for the two new webapp deps.

Notes

Draft, stacked on #4116 (runops/pr04-store-engine). Review that first; this diff is against it.

Server-change / changeset note to be added at stack-assembly time.

🤖 Generated with Claude Code

@changeset-bot

changeset-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 071cdc1

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

This PR introduces a gated dual-database "run-ops split" architecture separating control-plane data from run-scoped task/batch data. It adds a cache-first ControlPlaneResolver, a split-aware RunStore with residency-based routing, a readThroughRun cross-DB read layer, a crossSeamGuard for completion routing, boot-time distinct-database and realtime interlocks, and a cuid/ksuid run-id mint-kind cutover mechanism with inheritance rules for child runs and batches. Run-engine event handlers and batch completion logic are refactored to use these shared helpers. New environment variables gate all behavior (OFF by default), and extensive unit and Testcontainers-based integration tests validate cross-database resolution, routing, caching, and cutover semantics. A cascade cleanup service and cache invalidation hooks are also added for consistency on writes.

Changes

Related PRs: None identified.

Suggested labels: review_needed_junior_swe, review_depth_standard

Suggested reviewers: None specified.

Poem

A rabbit hops between two dens,
One old, one new, with cache-lined pens,
It sniffs each run-id, cuid or ksuid,
Routes it home where it belongs, unswayed.
Boot-time checks stand guard at the door—
Split enabled? Probe once more.
Hop, hop, hooray, the burrow's split with care! 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title is concise and accurately summarizes the main change: webapp DB topology, flags, and split-mode resolver wiring.
Description check ✅ Passed The description is thorough and on-topic, but it does not follow the template headings for Closes, checklist, changelog, or screenshots.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch runops/pr05-webapp-foundation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

devin-ai-integration[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@d-cs d-cs force-pushed the runops/pr04-store-engine branch from 88d1290 to d5610a9 Compare July 2, 2026 18:02
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from 413a945 to 99643f8 Compare July 2, 2026 18:02
@pkg-pr-new

pkg-pr-new Bot commented Jul 2, 2026

Copy link
Copy Markdown

Open in StackBlitz

@trigger.dev/build

npm i https://pkg.pr.new/@trigger.dev/build@071cdc1

trigger.dev

npm i https://pkg.pr.new/trigger.dev@071cdc1

@trigger.dev/core

npm i https://pkg.pr.new/@trigger.dev/core@071cdc1

@trigger.dev/python

npm i https://pkg.pr.new/@trigger.dev/python@071cdc1

@trigger.dev/react-hooks

npm i https://pkg.pr.new/@trigger.dev/react-hooks@071cdc1

@trigger.dev/redis-worker

npm i https://pkg.pr.new/@trigger.dev/redis-worker@071cdc1

@trigger.dev/rsc

npm i https://pkg.pr.new/@trigger.dev/rsc@071cdc1

@trigger.dev/schema-to-json

npm i https://pkg.pr.new/@trigger.dev/schema-to-json@071cdc1

@trigger.dev/sdk

npm i https://pkg.pr.new/@trigger.dev/sdk@071cdc1

commit: 071cdc1

@d-cs d-cs force-pushed the runops/pr04-store-engine branch from 460477f to 1af2bab Compare July 2, 2026 19:25
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch 2 times, most recently from cdc4eb9 to e0b35d5 Compare July 2, 2026 20:21
@d-cs d-cs force-pushed the runops/pr04-store-engine branch from f3248e0 to fc39e05 Compare July 3, 2026 08:51
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from 8024e36 to f9b9b0b Compare July 3, 2026 08:51
@d-cs d-cs force-pushed the runops/pr04-store-engine branch from fc39e05 to 49667ee Compare July 3, 2026 10:02
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from f9b9b0b to 0937b15 Compare July 3, 2026 10:02
@d-cs d-cs force-pushed the runops/pr04-store-engine branch from 49667ee to f2c2b2c Compare July 3, 2026 10:36
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from 0937b15 to 729daf1 Compare July 3, 2026 10:36
@d-cs d-cs force-pushed the runops/pr04-store-engine branch from f2c2b2c to 4ba7198 Compare July 3, 2026 10:43
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from 729daf1 to bd6fc79 Compare July 3, 2026 10:44
@d-cs d-cs force-pushed the runops/pr04-store-engine branch from 4ba7198 to 0d2f934 Compare July 3, 2026 11:08
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from bd6fc79 to a7e0846 Compare July 3, 2026 11:08
@d-cs d-cs force-pushed the runops/pr04-store-engine branch from 0d2f934 to 8b126c6 Compare July 3, 2026 12:08
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from a7e0846 to 4119616 Compare July 3, 2026 12:08
@d-cs d-cs force-pushed the runops/pr04-store-engine branch from 9effd74 to d46aab2 Compare July 3, 2026 16:33
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from d087c25 to b554794 Compare July 3, 2026 16:33
Base automatically changed from runops/pr04-store-engine to main July 3, 2026 16:42
d-cs and others added 3 commits July 3, 2026 17:43
…ngine wiring

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ape only

Migration/drain is deferred, so residency is decided purely by id-shape
(ownerEngine): 25-char cuid -> LEGACY, 27-char ksuid -> NEW, unclassifiable
-> LEGACY. This is behavior-preserving in production, which never injected a
custom isKnownMigrated and, with no migration, always saw the default false.

- delete knownMigratedFilter.server.ts + its test
- readThrough: drop the isKnownMigrated dep + migrated short-circuit; KEEP the
  unclassifiable->LEGACY new-then-legacy fallback
- resolveInheritedMintKind: collapse to pure ownerEngine id-shape (no deps)
- mintBatchFriendlyId: drop isKnownMigrated/isSplitEnabled from ResolveDeps
- runEngineHandlersShared: drop isKnownMigrated from EventReadDeps/readRunForEvent
  (batch-write residency probe via newReplica.batchTaskRun.findFirst is untouched)
- tests: delete injected-marker cases, keep pure id-shape assertions

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eration labels

Add a pure unit test for ControlPlaneCache covering per-slot round-trips,
null-vs-miss distinction, epoch-based invalidation, per-slot key isolation,
bounded eviction, and TTL expiry. Add a testcontainer test for
probeDistinctDatabases covering distinct clusters, same physical database
(with reason), same-cluster-different-database, and fail-closed probe failure.

Strip developer-enumeration labels from three existing test files (readThrough
step numbers, runEngineHandlers Test-X comments) and rename the run-detail
loader read-through test to drop the non-domain "shape 1" name.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
d-cs and others added 11 commits July 3, 2026 17:43
… deps

apps/webapp/package.json declares @internal/run-ops-database (workspace) and
@testcontainers/postgresql but the lockfile importer entry was never regenerated,
so pnpm install --frozen-lockfile fails for the webapp. Regenerate the importer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Enabling RUN_OPS_SPLIT_ENABLED without REALTIME_BACKEND_NATIVE_ENABLED
silently breaks realtime: Electric replicates only from the control-plane
DB, so NEW-resident (ksuid) runs on the dedicated run-ops DB are invisible
and every realtime subscription hangs.

Add a boot-time interlock that refuses split mode in that misconfiguration,
mirroring the existing distinct-DB data-loss sentinel. The check is a pure
predicate (assertSplitRealtimeInterlock) run synchronously inside
assertRunOpsSplitSentinel on the same eager-boot path, failing fast before
the async DB probe and before any run-ops routing is wired.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n diagnostics

- gate runOpsTopology splitEnabled on RUN_OPS_SPLIT_ENABLED so provisioning
  both DSNs before flipping the flag cannot open a second pool or route writes
  ahead of the distinct-DB sentinel
- rethrow the original UnclassifiableRunId in the cross-seam guard so its
  value/valueLength keep reflecting the real waitpoint id
- log run-found-but-environment-unresolved distinctly from missing-run
- correct the RUN_OPS_DATABASE_URL doc comment (Prisma datasource, not the
  webapp runtime pool)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uth-env through the cache-first resolver

The ControlPlaneCache served env/org data with no invalidation, so admin/control-plane
writes were only reflected after the TTL. Add two invalidation scopes to the cache
(invalidateEnvironment for one env's slots; invalidateOrganization via a per-org epoch that
env/authEnv values are stamped with, so all of an org's cached rows drop with no reverse
index), expose them on the resolver, and call them at every write site that mutates
cache-served data: pause/resume, archive, env/org concurrency + burst-factor, API-key
regeneration, feature flags, API/batch rate limits, runs enable/disable, org + project
delete, and stream-basin provisioning.

Also extend the resolver's authenticated-env slot to carry `git` and make the run-engine
adapter's resolveAuthenticatedEnv delegate to the cache-first, split-aware resolver instead
of issuing its own $replica.findFirst, so it honors splitEnabled() and the cache like its
siblings while still returning `git` and the deleted-project guard.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… OFF

With the split OFF there is a single DB, so a run and its environment are
co-located and there is no cross-seam FK/check to replace (matches main).
Skip the always-on hot-path read in that branch; the split-ON branch is
unchanged (cache-first, throws on a genuinely missing env).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from b554794 to 071cdc1 Compare July 3, 2026 16:44
@d-cs d-cs marked this pull request as ready for review July 3, 2026 16:46
@d-cs d-cs enabled auto-merge (squash) July 3, 2026 17:02
@d-cs d-cs merged commit 8465ac5 into main Jul 3, 2026
57 checks passed
@d-cs d-cs deleted the runops/pr05-webapp-foundation branch July 3, 2026 17:02
d-cs added a commit that referenced this pull request Jul 3, 2026
… routing, run lifecycle (#4118)

## What

Routes the webapp write path through the run-ops split seam:
trigger/batch minting, idempotency-key resolution, and the run-lifecycle
services now determine residency and dispatch writes to the correct
store.

- **Trigger & batch** (`runEngine/services/triggerTask.server.ts`,
`batchTrigger.server.ts`, `createBatch.server.ts`,
`streamBatchItems.server.ts`, `v3/services/batchTriggerV3.server.ts`):
mint ids with the run-ops-aware minting and route creation/streaming
through the store; batch children inherit the parent's residency.
- **Idempotency** (`runEngine/concerns/idempotencyKeys.server.ts` + new
`idempotencyResidency.server.ts`): idempotency-key lookup/dedup is
residency-aware so a keyed retrigger resolves against the store that
owns the original run.
- **Run lifecycle services** (`createCheckpoint`,
`createTaskRunAttempt`, `enqueueDelayedRun`, `expireEnqueuedRun`,
`finalizeTaskRun`, `resumeBatchRun`, `cancelDevSessionRuns`,
`executeTasksWaitingForDeploy`, `triggerFailedTask`): resolve their
target run through the store rather than a fixed client.
- **Reads that fan out from writes** (`runsRepository` +
`clickhouseRunsRepository`, `BulkActionV2` + batch read-through,
realtime `sessions`/`runReader`, alerts
`deliverAlert`/`performTaskRunAlerts`): route through the read-through
resolver.
- `9535ae63d` — resolves the parent run through an injectable run store
in `TriggerFailedTaskService`.
- `bf8f7c881` — drops the "known-migrated" concept from write-path and
read repos; residency is id-shape only.
- `515b897ea` — self-defaults `resolveWaitpointThroughReadThrough` to
the safe run-ops clients.

## Why

PR6 of the run-ops split stack. This is the write-path counterpart to
the read foundation in the previous PRs: with it in place, both reads
and writes route through the seam. Additive when the split is disabled
(id-shape resolution collapses to the control-plane client);
behavior-changing on the minting, idempotency, and lifecycle paths when
enabled.

## Tests

Large new/expanded vitest suite under `apps/webapp/test/` and colocated
service tests: trigger-task and batch-trigger store routing, residency
inheritance, idempotency dedup residency + legacy-authority, bulk-action
read routing, cancel-dev-session routing, alerts store routing,
runs-repository read-through, realtime session/run-reader read-through
and stream-registration routing, and the waitpoint read-through default.
Testcontainers-backed; no mocks.

## Notes

Draft, **stacked on #4117** (`runops/pr05-webapp-foundation`). Review
that first; this diff is against it.

Server-change / changeset note to be added at stack-assembly time.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants