Skip to content

feat(l2): @clickhouse/client byte-compat embedded façade (Layer 2)#49

Open
ShawnChen-Sirius wants to merge 3 commits into
mainfrom
feat/layer2-clickhouse-js-compat
Open

feat(l2): @clickhouse/client byte-compat embedded façade (Layer 2)#49
ShawnChen-Sirius wants to merge 3 commits into
mainfrom
feat/layer2-clickhouse-js-compat

Conversation

@ShawnChen-Sirius

Copy link
Copy Markdown
Contributor

What

Adds Layer 2: an embedded-only façade that mirrors the @clickhouse/client surface (createClient, query/command/exec/insert/ping/stream, ResultSet, the typed error hierarchy) on top of the v3 Layer 1 connection. The goal is byte-compatible behavior so code written against clickhouse-js runs unchanged against an in-process chDB.

Shape

  • src/layer2/*.ts (compiled to dist/layer2) — a thin forwarder to Layer 1.
  • chdb://memory clients share one process-wide reference-counted Session; chdb:///path clients share one per absolute path. Opening a different on-disk path while one is live surfaces ChdbConnectionError (single active connection per process).
  • query() is eager-buffered via Layer 1 queryAsync so errors surface at await query() (byte-compat); stream() replays the buffer through a newline transform.
  • Int64/UInt64 → string in JSON-family output (injects output_format_json_quote_64bit_integers=1) to match server/clickhouse-js and avoid silent JSON.parse precision loss.
  • Typed error hierarchy re-exported from the package root so instanceof holds against the same module instance the runtime throws from.

Tests

  • New suites under test/v3/layer2/* (formats / url / sql_guard / result_set / client / config / errors / adversarial / compat-types).
  • Backend-swappable upstream/conformance.test.ts (embedded by default, or CHDB_UPSTREAM_BACKEND=server) asserting true ClickHouse semantics.
  • Opt-in parity.test.ts (CHDB_PARITY_URL) + a non-gating clickhouse-js literal-suite import-rewrite harness, with a layer2.yml CI workflow.

Full v3 suite: 257 passing, 16 skipped (opt-in server parity), 0 failing. Typecheck clean.

Rebase note

Rebased onto current main (now carries the global session-cleanup safety net). That safety net exposed two harness issues, fixed in the last commit:

  1. vitest evaluated the CJS entrypoint twice (transformed import vs the require('../../index.js') the compiled Layer 2 uses to reach Layer 1), so the cleanup hook drained a different instance than Layer 2's sessions — a timed-out heavy query then leaked its connection and cascaded "only one active data directory" failures. Fixed by externalizing the entrypoint so every importer shares one Node-cached instance.
  2. The conformance test held a beforeAll-shared client that the now-effective cleanup tears down after the first case; switched to a per-test client (behavior-preserving — each case is self-contained).

Honest gaps (follow-ups, not this PR)

  • True zero-copy streaming (currently buffers the whole result).
  • The clickhouse-js literal-jest harness is non-gating / server-coupled; the verified gate is upstream/conformance.test.ts.
  • Bun/Deno runtime matrix runs in CI only.
  • Parity server pinned to CH 24.8 — bump to track chdb-core's underlying CH version.

🤖 Generated with Claude Code

ShawnChen-Sirius and others added 2 commits June 18, 2026 14:13
Layer 2 mirrors the @clickhouse/client surface (createClient / query / command /
exec / insert / ping / stream, ResultSet, the typed error hierarchy) over an
in-process chDB connection, so code written against clickhouse-js runs unchanged
in embedded mode. chdb://memory clients share one reference-counted Session;
query() is eager-buffered so errors surface at await; JSON-family output injects
output_format_json_quote_64bit_integers=1 to keep 64-bit ints lossless, matching
clickhouse-js's own default.

Inserts and query parameters are encoded exactly as clickhouse-js does, decoded
by the engine rather than re-implemented:

- insert serializes row arrays to a FORMAT-tailed dataset (JSONEachRow for object
  rows, JSONCompactEachRow for positional arrays, inferred when omitted) instead
  of a SQL VALUES literal, so arrays / maps / tuples / Nested / JSON round-trip;
  an explicit null is ClickHouse NULL, only an undefined cell is rejected.
- a clickhouse-js-faithful query-parameter formatter (src/layer2/params.ts):
  top-level strings unquoted and null as the TSV token, but inside Array/Tuple/Map
  strings are quoted, null is the NULL keyword and booleans are TRUE/FALSE; Date is
  a Unix timestamp; tuples use a TupleParam wrapper (exported here, and a
  clickhouse-js TupleParam is accepted structurally for migration). The formatted
  map is bound verbatim via a preformatted option on Layer 1's queryBindAsync,
  leaving Layer 1's own serializer untouched.

Parameter handling is type-aware via the engine, not guarded client-side: an
out-of-range Int64 is a typed error while the same value binds fine to a Float64.

Includes the Layer 2 unit/behavior tests and a vitest config fix that externalizes
the CJS entrypoint so the suite's session-cleanup safety net and the tests share a
single module instance.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s's own suite

Two harnesses turn the byte-compat claim into something checked, not asserted.

Parity (test/v3/layer2/upstream/conformance.test.ts + parity.test.ts) runs the
same queries through embedded chDB and a real clickhouse-server and compares the
decoded output. The server image tracks chDB's kernel (ClickHouse 26.5); both
clients set output_format_json_quote_64bit_integers=1 (clickhouse-js's own suite
default) so 64-bit ints are lossless on both sides. Clients are created per test
so the global session-cleanup afterEach cannot tear down a shared connection.

Upstream-literal (scripts/upstream-suite) runs clickhouse-js's OWN integration
suite against embedded chDB: it clones clickhouse-js at the installed client's
version, redirects the suite's client factory to chdb://memory, and runs the
specs on vitest serially. The previous harness ran jest, but clickhouse-js
migrated to vitest — so it had been parsing zero specs and silently passing.
skip-list.json drops whole files for capabilities embedded lacks (HTTP transport,
sockets, compression, RBAC, server runtime, cluster, TLS); expectations.patch
marks the per-case embedded-vs-server divergences it.fails (documented in the
suite README) so they must keep failing without hiding a regression.

Both jobs gate CI (continue-on-error removed). Against clickhouse-js 1.20.0 on
embedded chDB: 175 passing, 38 expected-fail, 12 skipped, 0 unexpected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ShawnChen-Sirius ShawnChen-Sirius force-pushed the feat/layer2-clickhouse-js-compat branch from 664de56 to d0487c1 Compare June 18, 2026 02:16
Two independent failures on the PR layer2 workflow:

1. parity (22/23 red) — the clickhouse/clickhouse-server:26.5 image's
   entrypoint writes a users.d snippet requiring a password for `default`
   when CLICKHOUSE_PASSWORD is unset, so every parity query returned
   code 194 REQUIRED_PASSWORD while embedded chDB returned the correct
   semantic code (e.g. 60 UNKNOWN_TABLE). Locally the worklog's
   `clickhousectl local install 26.5` doesn't add that snippet, hence the
   local 23/23 green. Set CLICKHOUSE_SKIP_USER_SETUP=1 on the service so
   the image keeps the upstream password-less default — the service is
   sealed inside the GH Actions network, no real auth needed.

2. upstream-literal (6 "Expect test to fail") — 4 DateTime/Date cases
   and 2 others (floating point types, JS Date objects) were marked
   `it.fails` in expectations.patch but actually pass in CI: GitHub
   runners run UTC, which eliminates the DateTime session-timezone
   divergence the worklog deferred as Stage B. The gate is working as
   designed (an unexpected pass is a hard failure), and these cases have
   simply graduated from "known divergence" to "byte-compat under UTC".
   Drop the 6 hunks from expectations.patch and pin TZ=UTC on the
   upstream step so the contract reproduces off CI. Remaining 32 markers
   stay; categories 4 and 6 in the README narrowed accordingly.

The macos-15-intel/20.x check is unrelated (a native chDB abort-recovery
crash on that one matrix cell after the abort-storm stress test); a
re-run should clear it and the underlying robustness issue belongs to
chdb-core, not this PR.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
@ShawnChen-Sirius ShawnChen-Sirius force-pushed the feat/layer2-clickhouse-js-compat branch from f857f98 to fe175e7 Compare June 18, 2026 02:35
@ShawnChen-Sirius

Copy link
Copy Markdown
Contributor Author

@chibugai, review this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant