feat(run-ops): ClickHouse multi-source replication fan-in + admin ops#4119
feat(run-ops): ClickHouse multi-source replication fan-in + admin ops#4119d-cs wants to merge 7 commits into
Conversation
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📜 Recent review details⏰ Context from checks skipped due to timeout. (12)
|
| Area | Changes |
|---|---|
| Replication service core | Multi-source runtime, per-source transaction and acknowledgment handling, source validation, and per-source metrics/versioning |
| Global registry | New configured-sources storage and accessors |
| Instance initialization | Split-aware boot flow, dual-source rebuild, and fatal misconfiguration handling |
| Admin endpoints | New status loader plus backfill/worker selection of the active replication service |
| Docs | New changelog entry for fan-in replication |
| Tests | Unit and integration coverage for source building, split checks, leader locks, dedup, backfill versioning, and metrics |
Sequence Diagram(s)
sequenceDiagram
participant Init as initializeRunsReplicationInstance
participant Global as runsReplicationGlobal
participant Split as isSplitEnabled
participant Service as RunsReplicationService
Init->>Service: create legacy-only service
Init->>Global: setRunsReplicationGlobal(legacy)
Init->>Split: resolve split gate
Split-->>Init: splitEnabled
Init->>Init: buildReplicationSources()
Init->>Init: assertReplicationCoversSplit()
Init->>Service: recreate with dual sources (if enabled)
Init->>Global: setRunsReplicationGlobal(dual service)
Init->>Service: start()
sequenceDiagram
participant Loader as admin.api.v1.runs-replication.status loader
participant Global as getRunsReplicationConfiguredSources
participant Redis
Loader->>Global: fetch configured sources
Loader->>Redis: check leader-lock key per source
Redis-->>Loader: exists boolean
Loader-->>Loader: build JSON response with leader flags
Related PRs: None identified.
Suggested labels: area: webapp, type: feature
Suggested reviewers: None identified.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title clearly matches the main change: multi-source ClickHouse replication fan-in with admin operations. |
| Description check | ✅ Passed | The description covers the key What/Why/Tests points, but it omits the template's issue reference, checklist items, and screenshots section. |
| Docstring Coverage | ✅ Passed | No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check. |
| Linked Issues check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Out of Scope Changes check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
✨ Finishing Touches
📝 Generate docstrings
- Create stacked PR
- Commit on current branch
🧪 Generate unit tests (beta)
- Create PR with unit tests
- Commit unit tests in branch
runops/pr07-replication
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands.
515b897 to
cb97148
Compare
5de29cb to
5c2d010
Compare
@trigger.dev/build
trigger.dev
@trigger.dev/core
@trigger.dev/python
@trigger.dev/react-hooks
@trigger.dev/redis-worker
@trigger.dev/rsc
@trigger.dev/schema-to-json
@trigger.dev/sdk
commit: |
c59d9c5 to
d5d7fa1
Compare
8fb6e8a to
4e08dc7
Compare
a1ff262 to
a8068e9
Compare
4e08dc7 to
11dc0b7
Compare
a8068e9 to
0ef3a6b
Compare
11dc0b7 to
a9bc9e6
Compare
0db90f0 to
d5415e8
Compare
277ecea to
2bba3b8
Compare
aa55b6b to
3153bc4
Compare
2bba3b8 to
0f1da3f
Compare
3153bc4 to
d561590
Compare
0f1da3f to
5a17a98
Compare
d561590 to
9e7c367
Compare
5a17a98 to
6b7fb6d
Compare
9e7c367 to
e23432d
Compare
6b7fb6d to
ac94ac3
Compare
e23432d to
8dff8b2
Compare
ac94ac3 to
62ce160
Compare
8dff8b2 to
891d81a
Compare
62ce160 to
2be7e51
Compare
891d81a to
5140cbc
Compare
2be7e51 to
b4bee3f
Compare
5140cbc to
f8f3096
Compare
b4bee3f to
3538e4e
Compare
f8f3096 to
4bda37a
Compare
3538e4e to
e497155
Compare
4bda37a to
ea22f52
Compare
e497155 to
ef571f9
Compare
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…comments/test names Remove the internal plan-enumeration labels from runs-replication comments and test names, keeping the behavioral descriptions intact. Comment/label hygiene only; no product logic or test behavior changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, fix test status type - Register the implicit single source with id "legacy" so its leader-lock key matches the id the admin status route probes; otherwise leadership always reads false in the non-split config. - Guard the shutdown-path client.stop() fan-out against re-firing per incoming transaction and add a catch so rejections don't surface as unhandled. - Use the TaskRunStatus type alias (not the const value) for status annotations in the dual-source dedup tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tion fan-in Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ef571f9 to
766138c
Compare
…fore replacing it The legacy-only instance constructed at boot opens a replication client (Redis + Redlock) eagerly; when the split gate resolves to a multi-source service it was replaced without cleanup, leaking one Redis connection per split-enabled boot. shutdown() the bootstrap instance first (idempotent, safe on the never-started instance). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| runsReplicationService = new RunsReplicationService({ | ||
| clickhouseFactory: new TestReplicationClickhouseFactory(clickhouse), | ||
| serviceName: "runs-replication", | ||
| redisOptions, | ||
| sources: [ | ||
| { | ||
| id: "legacy", | ||
| pgConnectionUrl: legacyUrl, | ||
| slotName: "tr_legacy_v1", | ||
| publicationName: "tr_legacy_v1_pub", | ||
| originGeneration: 0, | ||
| }, | ||
| { | ||
| id: "new", | ||
| pgConnectionUrl: newUrl, | ||
| slotName: "tr_new_v1", | ||
| publicationName: "tr_new_v1_pub", | ||
| originGeneration: 1, | ||
| }, | ||
| ], | ||
| maxFlushConcurrency: 1, | ||
| flushIntervalMs: 100, | ||
| flushBatchSize: 1, | ||
| leaderLockTimeoutMs: 5000, | ||
| leaderLockExtendIntervalMs: 1000, | ||
| ackIntervalSeconds: 5, | ||
| tracer, | ||
| logLevel: "warn", | ||
| }); |
There was a problem hiding this comment.
🚩 Test files construct RunsReplicationService without required pgConnectionUrl/slotName/publicationName fields
The integration tests in runsReplicationService.part8.test.ts (lines 51-79) and runsReplicationService.part9.test.ts construct RunsReplicationService with sources but without the required pgConnectionUrl, slotName, and publicationName fields. At runtime this works because those fields are only used as fallback when sources is empty/undefined (runsReplicationService.server.ts:261-272). However, this would fail TypeScript strict type checking. Consider making these fields optional when sources is provided (e.g., via a discriminated union type), or pass dummy values in tests to satisfy the type contract.
Was this helpful? React with 👍 or 👎 to provide feedback.
What
Extends the ClickHouse runs-replication service to fan in from multiple Postgres sources (the control-plane DB and the run-ops DB) instead of a single source, plus the admin operations to run and observe it.
services/runsReplicationService.server.ts, newrunsReplicationInstance.server.ts,runsReplicationGlobal.server.ts): factors the replication service into per-source instances and a coordinator so a single ClickHouse target is fed from more than one Postgres source.routes/admin.api.v1.runs-replication.status.ts,admin.api.v1.runs-replication.backfill.ts,v3/services/adminWorker.server.ts): adds a status endpoint reporting per-source replication state and updates the backfill entrypoint for the multi-source shape.Why
PR7 of the run-ops split stack, and the final piece: once run state can live in a separate run-ops DB (earlier PRs), the analytics replication into ClickHouse has to consume both sources so runs remain queryable regardless of residency. Behavior-changing for the replication service internals; the ClickHouse-facing output is unchanged (still one runs stream), and single-source operation is preserved when the split is not enabled.
Tests
New vitest coverage:
runsReplicationInstance.test.ts(per-source instance behavior) andrunsReplicationService.part8/part9suites exercising the multi-source coordinator. Testcontainers-backed (ClickHouse + Postgres); no mocks.Notes
Draft, stacked on #4118 (
runops/pr06-write-path). Review that first; this diff is against it.Server-change / changeset note to be added at stack-assembly time.
🤖 Generated with Claude Code