feat(clickhouse): infer mixed-type JSON arrays as Array(Dynamic) on insert#4095
Conversation
…nsert
Set input_format_json_infer_array_of_dynamic_from_array_of_different_types=1
on all native-JSON inserts (task_runs_v2, task_events_v1/v2, metrics_v1,
sessions_v1). Mixed-type arrays (e.g. [{...}, "str"]) are now inferred as
Array(Dynamic) rather than deeply nested Tuple types, which avoids hitting
the binary type-complexity limit during background merges.
Making the setting explicit at insert time keeps the behavior deterministic
and version-controlled, independent of the server profile / compatibility
version.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AaChyhestFMBYBWh6bgcCF
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📜 Recent review details⏰ Context from checks skipped due to timeout. (24)
WalkthroughThis PR adds the ClickHouse insert setting Changes
Sequence Diagram(s)Not applicable. Related issues: None specified Related PRs: None specified Suggested labels: clickhouse, bug-fix Suggested reviewers: None specified 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal-packages/clickhouse/src/taskRuns.ts (1)
206-218: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winConsider a regression test for mixed-type array inference.
The existing integration test in
taskRuns.test.tsinsertsoutput/error/array fields with homogeneous element types, so it wouldn't catch a regression if this setting were dropped or misapplied. Given the PR's motivation is specifically avoiding type-complexity merge failures on mixed-type JSON arrays, a test inserting a mixed-type array (e.g.,[1, "hello", [1,2,3]]) and asserting successful insert/read would directly validate the fix.Also applies to: 339-358
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 7993d0a3-b71e-47e9-ba24-09006fe8b17a
📒 Files selected for processing (5)
.server-changes/clickhouse-array-of-dynamic-inference.mdinternal-packages/clickhouse/src/metrics.tsinternal-packages/clickhouse/src/sessions.tsinternal-packages/clickhouse/src/taskEvents.tsinternal-packages/clickhouse/src/taskRuns.ts
📜 Review details
⏰ Context from checks skipped due to timeout. (25)
- GitHub Check: internal / 🧪 Unit Tests: Internal (10, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (8, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (9, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (1, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (12, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (2, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (6, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (7, 12)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 10)
- GitHub Check: internal / 🧪 Unit Tests: Internal (4, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (5, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (11, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (3, 12)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (10, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (9, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 10)
- GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
- GitHub Check: typecheck / typecheck
- GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
Files:
internal-packages/clickhouse/src/metrics.tsinternal-packages/clickhouse/src/taskEvents.tsinternal-packages/clickhouse/src/sessions.tsinternal-packages/clickhouse/src/taskRuns.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Files:
internal-packages/clickhouse/src/metrics.tsinternal-packages/clickhouse/src/taskEvents.tsinternal-packages/clickhouse/src/sessions.tsinternal-packages/clickhouse/src/taskRuns.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
internal-packages/clickhouse/src/metrics.tsinternal-packages/clickhouse/src/taskEvents.tsinternal-packages/clickhouse/src/sessions.tsinternal-packages/clickhouse/src/taskRuns.ts
**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs}: Usepnpm run typecheckfor changes in apps (apps/*) and internal packages (internal-packages/*), and never usebuildto verify those changes.
Use Vitest for tests, and never mock anything; use testcontainers instead.
Prefer static imports over dynamicimport(), and only use dynamic imports for unresolved circular dependencies, genuine code-splitting needs, or conditional runtime loading.
Files:
internal-packages/clickhouse/src/metrics.tsinternal-packages/clickhouse/src/taskEvents.tsinternal-packages/clickhouse/src/sessions.tsinternal-packages/clickhouse/src/taskRuns.ts
**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs,md,mdx}
📄 CodeRabbit inference engine (CLAUDE.md)
Always import from
@trigger.dev/sdkwhen writing Trigger.dev tasks; never use@trigger.dev/sdk/v3or deprecatedclient.defineJob.
Files:
internal-packages/clickhouse/src/metrics.tsinternal-packages/clickhouse/src/taskEvents.tsinternal-packages/clickhouse/src/sessions.tsinternal-packages/clickhouse/src/taskRuns.ts
🧠 Learnings (10)
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).
Applied to files:
internal-packages/clickhouse/src/metrics.tsinternal-packages/clickhouse/src/taskEvents.tsinternal-packages/clickhouse/src/sessions.tsinternal-packages/clickhouse/src/taskRuns.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.
Applied to files:
internal-packages/clickhouse/src/metrics.tsinternal-packages/clickhouse/src/taskEvents.tsinternal-packages/clickhouse/src/sessions.tsinternal-packages/clickhouse/src/taskRuns.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma error P1001 ("Can't reach database server") in TypeScript, don’t assume a single error shape. Prisma can surface P1001 via two different error classes/fields: `PrismaClientKnownRequestError` exposes it as `err.code === "P1001"` (common during mid-query connection drops), while `PrismaClientInitializationError` exposes it as `err.errorCode === "P1001"` (common on client startup failure). Therefore, predicates should use `err.code === "P1001" || err.errorCode === "P1001"`. Do not flag `err.code === "P1001"` as “unreachable/never matches,” as it is expected in production.
Applied to files:
internal-packages/clickhouse/src/metrics.tsinternal-packages/clickhouse/src/taskEvents.tsinternal-packages/clickhouse/src/sessions.tsinternal-packages/clickhouse/src/taskRuns.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma errors for P1001 ("Can't reach database server"), do not assume it only appears under a single property name. Prisma may surface P1001 via either `PrismaClientKnownRequestError` (`err.code === "P1001"`, e.g., mid-query connection drops) or `PrismaClientInitializationError` (`err.errorCode === "P1001"`, e.g., client startup connection failure). To reliably detect the condition, check `err.code === "P1001" || err.errorCode === "P1001"`, and avoid review rules that would incorrectly flag `err.code === "P1001"` as unreachable/never-matching.
Applied to files:
internal-packages/clickhouse/src/metrics.tsinternal-packages/clickhouse/src/taskEvents.tsinternal-packages/clickhouse/src/sessions.tsinternal-packages/clickhouse/src/taskRuns.ts
📚 Learning: 2026-06-13T19:53:13.759Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3937
File: packages/trigger-sdk/skills/realtime-and-frontend/SKILL.md:258-260
Timestamp: 2026-06-13T19:53:13.759Z
Learning: When reviewing code that uses `trigger.dev/react-hooks`’s `useRealtimeRun`, preserve the call signature where the first argument is the full realtime handle object (not `handle.id`). This is intentional to maintain type-safety and is consistent with the official docs; do not suggest changing the first argument from the handle object to `handle.id`.
Applied to files:
internal-packages/clickhouse/src/metrics.tsinternal-packages/clickhouse/src/taskEvents.tsinternal-packages/clickhouse/src/sessions.tsinternal-packages/clickhouse/src/taskRuns.ts
📚 Learning: 2026-06-17T17:13:49.929Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3948
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.bulk-actions.$bulkActionParam/route.tsx:48-62
Timestamp: 2026-06-17T17:13:49.929Z
Learning: In triggerdotdev/trigger.dev, within `dashboardLoader`/`dashboardAction` (or similar context resolver code) whenever you resolve an organization ID from an organization slug for RBAC/enterprise authorization scope, always read from the primary Prisma client (`prisma`), not `$replica`. Using `$replica` can hit replica-lag and cause the RBAC lookup/authorization to run without the correct org scope (bypassing intended role enforcement). Implement the slug→org lookup with `prisma.organization.findFirst(...)` (or equivalent primary-client query) and add an inline comment documenting why the primary client is required (replica lag could lead to unscoped RBAC checks).
Applied to files:
internal-packages/clickhouse/src/metrics.tsinternal-packages/clickhouse/src/taskEvents.tsinternal-packages/clickhouse/src/sessions.tsinternal-packages/clickhouse/src/taskRuns.ts
📚 Learning: 2026-06-23T13:04:21.413Z
Learnt from: carderne
Repo: triggerdotdev/trigger.dev PR: 4023
File: apps/webapp/app/services/upsertBranch.server.ts:14-18
Timestamp: 2026-06-23T13:04:21.413Z
Learning: In TypeScript, it’s valid to `import { type X }` and then use `typeof X` in a type-only position, e.g. `type Alias = z.infer<typeof X>`. The `type` modifier suppresses the runtime import, but the type checker still has the full exported type so `z.infer<typeof X>` can resolve correctly. In code reviews, don’t flag this as a TypeScript compile error as long as `typeof X` is used in a type context (e.g., with `z.infer`, `type` aliases, generics), not as a runtime value.
Applied to files:
internal-packages/clickhouse/src/metrics.tsinternal-packages/clickhouse/src/taskEvents.tsinternal-packages/clickhouse/src/sessions.tsinternal-packages/clickhouse/src/taskRuns.ts
📚 Learning: 2026-06-04T18:16:35.386Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3836
File: apps/supervisor/src/backpressure/backpressureMonitor.ts:3-5
Timestamp: 2026-06-04T18:16:35.386Z
Learning: When reviewing TypeScript in this repo, apply the rule “prefer type aliases over interfaces” only to data/object shapes and union/intersection type modeling. If an interface is being used as a behavioral contract for collaborators to implement (e.g., method-shape interfaces that define required behavior, such as `BackpressureLogger` / `BackpressureSignalSource` in `apps/supervisor/src/backpressure/backpressureMonitor.ts`), keep it as an `interface` and do not flag it as a type-alias-vs-interface violation.
Applied to files:
internal-packages/clickhouse/src/metrics.tsinternal-packages/clickhouse/src/taskEvents.tsinternal-packages/clickhouse/src/sessions.tsinternal-packages/clickhouse/src/taskRuns.ts
📚 Learning: 2026-06-09T17:58:04.699Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 3879
File: apps/webapp/app/models/vercelIntegration.server.ts:619-630
Timestamp: 2026-06-09T17:58:04.699Z
Learning: In this codebase, outbound raw `fetch` calls should typically rely on Node/undici’s default request timeout (about ~300s) rather than adding a per-call `AbortController` + `setTimeout` wrapper inside individual functions (e.g. in files like `apps/webapp/app/models/vercelIntegration.server.ts`). During code review, do not flag the absence of a per-call timeout on a single `fetch` as an issue; if per-call timeouts are needed, they should be implemented via a codebase-wide convention (e.g., a shared fetch wrapper or documented pattern) rather than ad-hoc per-function changes.
Applied to files:
internal-packages/clickhouse/src/metrics.tsinternal-packages/clickhouse/src/taskEvents.tsinternal-packages/clickhouse/src/sessions.tsinternal-packages/clickhouse/src/taskRuns.ts
📚 Learning: 2026-05-14T14:54:39.095Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3545
File: .server-changes/agent-view-sessions.md:10-10
Timestamp: 2026-05-14T14:54:39.095Z
Learning: In the `trigger.dev` repository, do not flag inconsistent dot vs slash notation in route/path strings inside `.server-changes/*.md` files. These markdown files are consumed verbatim into the changelog, so the mixed notation (e.g., `resources.orgs.../runs.$runParam/...`) is intentional and should be preserved as-is.
Applied to files:
.server-changes/clickhouse-array-of-dynamic-inference.md
🔇 Additional comments (5)
internal-packages/clickhouse/src/metrics.ts (1)
18-30: LGTM!internal-packages/clickhouse/src/sessions.ts (1)
113-125: LGTM!Also applies to: 127-139
internal-packages/clickhouse/src/taskEvents.ts (1)
27-40: LGTM!Also applies to: 203-216
internal-packages/clickhouse/src/taskRuns.ts (1)
206-218: LGTM!Also applies to: 221-233, 339-358, 361-377
.server-changes/clickhouse-array-of-dynamic-inference.md (1)
1-7: LGTM!
Inserts a task run whose output.data contains an array of mixed element
types ([1, "hello", {...}, [1,2,3]]) and asserts it inserts and round-trips
through the materialized output_text column. Guards the
input_format_json_infer_array_of_dynamic_from_array_of_different_types
setting against regression.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AaChyhestFMBYBWh6bgcCF
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01AaChyhestFMBYBWh6bgcCF
✅ Checklist
Testing
pnpm run typecheck --filter @internal/clickhousepasses. This only adds a ClickHouse input-format setting to existing insert calls; the setting affects type inference for newly-inserted/merged data and is non-destructive to existing rows.Changelog
Sets
input_format_json_infer_array_of_dynamic_from_array_of_different_types = 1on every native-JSON insert path:task_runs_v2(output,error) —insertTaskRuns,insertTaskRunsCompactArrays, and the async-insert variantstask_events_v1/task_events_v2(attributes)metrics_v1sessions_v1Why
Our JSON columns contain arrays with mixed element types (e.g.
[{"key":"value"}, "string", "string"]). With this setting off — which is the effective default under24.12compatibility — ClickHouse infers those as deeply nested unnamedTuple(JSON, Nullable(String), …)types. ClickHouse 26.2 introducedinput_format_binary_max_type_complexity(default 1000), and those tuple type trees exceeded the limit, causing background merges to fail with Code 117.With the setting on (the default since 25.8), mixed-type arrays are inferred as a single
Array(Dynamic)— a simpler, flatter type representation that never approaches the complexity limit, even once the upstream default limit is restored.Setting this explicitly at insert time keeps behavior deterministic and version-controlled, so it does not depend on the server profile or a future compatibility bump. This is a forward-only change: it only affects newly inserted/merged data and does not rewrite existing parts. Our read path re-serializes these columns to strings (
toJSONStringvia the materialized*_textcolumns), so the internal Tuple → Array(Dynamic) representation change is transparent to the application.Companion server-side setting
To also apply this on the ClickHouse side (covers merges and any writes not going through these code paths), set it on the default user:
Screenshots
N/A
💯
🤖 Generated with Claude Code
https://claude.ai/code/session_01AaChyhestFMBYBWh6bgcCF
Generated by Claude Code