Skip to content

Commit b0cf092

Browse files
committed
[SEA-NodeJS] Kernel backend: mTLS, custom HTTP headers & User-Agent
Wire the SEA/kernel path's remaining TLS-adjacent connection options through to the napi binding: - mTLS client identity: `clientCertPem` / `clientKeyPem` (PEM string or Buffer) on the internal SEA options, normalised to Buffers and routed to the kernel `TlsConfig::client_cert_pem` / `client_key_pem`. Enforces both-or-neither up front with an actionable error. - Custom HTTP headers + User-Agent: mirrors the Python connector's `use_kernel` path (`session.py` + `backend/kernel/client.py`). Headers cross the FFI as an ordered list (`Array<{name,value}>`, the napi `HeaderEntry` shape matching the kernel core `Vec<(String,String)>` and Python's `List[Tuple]`): the caller's `customHeaders` first, then the connector's composed `User-Agent` appended last. The connector UA is always emitted (via the same `buildUserAgentString` the Thrift path uses, with `userAgentEntry` folded in) and, being last, is authoritative — the kernel folds the last `User-Agent` into its base `DatabricksJDBCDriverOSS/...` UA, preserving the result-disposition gating token. The kernel-managed reserved names `Authorization` / `x-databricks-org-id` are dropped before the FFI hop, matching Python's `_KERNEL_MANAGED_HEADERS` double-wall. Adds `buildSeaHttpOptions`, extends `buildSeaTlsOptions`/`SeaTlsOptions`, and factors PEM normalisation into a shared helper. Bumps KERNEL_REV and regenerates `native/sea/index.d.ts` for the new napi fields. Unit tests cover mTLS pairing/validation, ordered header pass-through, reserved-name dropping, and User-Agent composition/ordering. Depends on the kernel napi change exposing clientCertPem / clientKeyPem / customHeaders; KERNEL_REV must be repointed to that commit once merged. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
1 parent 4b9e16e commit b0cf092

5 files changed

Lines changed: 387 additions & 35 deletions

File tree

KERNEL_REV

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
8bedaabf69f5bce5a957a8775f29dbb8dbdd2e71
1+
6eda90b6922ee82923f0f590b2e1bdb686007129

lib/contracts/InternalConnectionOptions.ts

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,4 +41,22 @@ export interface InternalConnectionOptions {
4141
* @internal SEA path only.
4242
*/
4343
customCaCert?: Buffer | string;
44+
45+
/**
46+
* SEA-only: PEM-encoded client certificate (string or `Buffer`) for
47+
* mutual TLS (mTLS). Must be supplied together with `clientKeyPem`; a
48+
* leaf cert optionally followed by its intermediate chain is accepted.
49+
* Mirrors the Python connector's `_tls_client_cert_file`.
50+
* @internal SEA path only.
51+
*/
52+
clientCertPem?: Buffer | string;
53+
54+
/**
55+
* SEA-only: PEM-encoded private key (string or `Buffer`) for the mTLS
56+
* client certificate. Must be supplied together with `clientCertPem`.
57+
* For portability supply a PKCS#8 key (`BEGIN PRIVATE KEY`). Mirrors the
58+
* Python connector's `_tls_client_cert_key_file`.
59+
* @internal SEA path only.
60+
*/
61+
clientKeyPem?: Buffer | string;
4462
}

lib/sea/SeaAuth.ts

Lines changed: 168 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ import { ConnectionOptions } from '../contracts/IDBSQLClient';
1616
import { InternalConnectionOptions } from '../contracts/InternalConnectionOptions';
1717
import AuthenticationError from '../errors/AuthenticationError';
1818
import HiveDriverError from '../errors/HiveDriverError';
19+
import { buildUserAgentString } from '../utils';
1920

2021
/**
2122
* Default local listener port for the U2M authorization-code callback.
@@ -115,10 +116,44 @@ export interface SeaTlsOptions {
115116
checkServerCertificate?: boolean;
116117
/** PEM-encoded CA bytes to add to the trust store. */
117118
customCaCert?: Buffer;
119+
/**
120+
* PEM-encoded client certificate for mutual TLS (kernel
121+
* `TlsConfig::client_cert_pem`). Paired with {@link clientKeyPem} —
122+
* `buildSeaTlsOptions` rejects supplying only one before the FFI hop.
123+
* The napi shape takes a `Buffer`; the public surface also accepts a
124+
* PEM string, normalised here.
125+
*/
126+
clientCertPem?: Buffer;
127+
/**
128+
* PEM-encoded private key for the mTLS client certificate (kernel
129+
* `TlsConfig::client_key_pem`). Paired with {@link clientCertPem}.
130+
*/
131+
clientKeyPem?: Buffer;
132+
}
133+
134+
/**
135+
* HTTP options shared across all auth-mode variants. Mirrors the napi
136+
* binding's `ConnectionOptions.customHeaders` (kernel
137+
* `HttpConfig::custom_headers`).
138+
*
139+
* Carries the extra request headers the SEA path sends on every request:
140+
* the caller's `customHeaders` plus the composed `User-Agent` (the kernel
141+
* appends a `User-Agent` entry to its base UA rather than replacing it).
142+
*
143+
* An **ordered list** of `{ name, value }` pairs — the napi shape
144+
* (`Array<HeaderEntry>`), which mirrors the kernel core's
145+
* `Vec<(String, String)>` and the Python connector's `http_headers`
146+
* `List[Tuple[str, str]]`. Order is preserved and duplicate names are
147+
* allowed (e.g. a caller `User-Agent` followed by the connector's, which
148+
* the kernel folds last-wins).
149+
*/
150+
export interface SeaHttpOptions {
151+
customHeaders?: Array<{ name: string; value: string }>;
118152
}
119153

120154
export type SeaNativeConnectionOptions = SeaSessionDefaults &
121155
SeaTlsOptions &
156+
SeaHttpOptions &
122157
(
123158
| {
124159
hostName: string;
@@ -168,24 +203,71 @@ export function isBlankOrReserved(s: string): boolean {
168203
const MAX_U32 = 0xffffffff;
169204

170205
/**
171-
* Normalise the public TLS options (`checkServerCertificate` /
172-
* `customCaCert`) into the napi shape.
206+
* Normalise a PEM input (`string` or `Buffer`) accepted on the public
207+
* surface into the `Buffer` the napi shape requires. Does a light,
208+
* ordered BEGIN…END sanity check so a truncated/headerless blob (or a
209+
* stray page that merely contains the literals out of order, e.g. a
210+
* proxy-intercept page) is rejected here rather than surfacing as an
211+
* opaque kernel TLS error. The bytes are NOT fully parsed in JS — that
212+
* is deferred to the kernel, which returns a meaningful error on a
213+
* malformed PEM/key.
214+
*
215+
* `kind` selects the expected block: `'certificate'` matches a
216+
* `CERTIFICATE` block; `'private key'` matches any `… PRIVATE KEY` block
217+
* (PKCS#8 `PRIVATE KEY`, PKCS#1 `RSA PRIVATE KEY`, SEC1 `EC PRIVATE KEY`).
218+
*
219+
* Throws `HiveDriverError` when the value is empty or (for strings)
220+
* lacks the expected PEM header.
221+
*/
222+
function normalizePemBytes(value: Buffer | string, optionName: string, kind: 'certificate' | 'private key'): Buffer {
223+
if (typeof value === 'string') {
224+
const re =
225+
kind === 'certificate'
226+
? /-----BEGIN CERTIFICATE-----[\s\S]+?-----END CERTIFICATE-----/
227+
: /-----BEGIN [A-Z0-9 ]*PRIVATE KEY-----[\s\S]+?-----END [A-Z0-9 ]*PRIVATE KEY-----/;
228+
if (!re.test(value)) {
229+
const expected =
230+
kind === 'certificate'
231+
? "a '-----BEGIN CERTIFICATE-----' … '-----END CERTIFICATE-----' block"
232+
: "a 'BEGIN … PRIVATE KEY' / 'END … PRIVATE KEY' PEM block (PKCS#8, PKCS#1, or SEC1)";
233+
throw new HiveDriverError(
234+
`SEA backend: \`${optionName}\` string does not look like a PEM ${kind} (expected ${expected}). ` +
235+
'Pass PEM text or a Buffer of PEM bytes.',
236+
);
237+
}
238+
return Buffer.from(value, 'utf8');
239+
}
240+
if (Buffer.isBuffer(value)) {
241+
if (value.length === 0) {
242+
throw new HiveDriverError(`SEA backend: \`${optionName}\` Buffer is empty.`);
243+
}
244+
return value;
245+
}
246+
throw new HiveDriverError(`SEA backend: \`${optionName}\` must be a PEM string or a Buffer.`);
247+
}
248+
249+
/**
250+
* Normalise the public TLS options into the napi shape.
173251
*
174252
* - `checkServerCertificate` passes through verbatim (only when set; an
175253
* absent value leaves the kernel default, which is secure — verify on).
176-
* - `customCaCert` accepts a PEM string or `Buffer` on the public
177-
* surface; we convert a string to a `Buffer` here and do a light PEM
178-
* sanity check. The bytes are NOT parsed in JS — the kernel returns a
179-
* meaningful error if the PEM is malformed.
254+
* - `customCaCert` accepts a PEM string or `Buffer`; normalised to a
255+
* `Buffer` via {@link normalizePemBytes}.
256+
* - `clientCertPem` / `clientKeyPem` carry the mutual-TLS client identity.
257+
* They must be supplied **together** — supplying only one is rejected
258+
* here with an actionable error (rather than waiting for the kernel's
259+
* `InvalidArgument` at `openSession`). Each accepts a PEM string or
260+
* `Buffer`, normalised the same way.
180261
*
181-
* Throws `HiveDriverError` when `customCaCert` is supplied but empty or
182-
* (for strings) lacks a PEM certificate header.
262+
* Throws `HiveDriverError` when a cert/key is empty, mis-typed, lacks the
263+
* expected PEM header, or when only one half of the mTLS pair is set.
183264
*/
184265
export function buildSeaTlsOptions(options: ConnectionOptions): SeaTlsOptions {
185266
// Read the SEA-only fields through the purpose-built internal options type
186267
// rather than an ad-hoc inline cast, so the shape can't silently drift from
187268
// its declaration and a typo'd key fails to compile.
188-
const { checkServerCertificate, customCaCert } = options as ConnectionOptions & InternalConnectionOptions;
269+
const { checkServerCertificate, customCaCert, clientCertPem, clientKeyPem } = options as ConnectionOptions &
270+
InternalConnectionOptions;
189271

190272
const tls: SeaTlsOptions = {};
191273

@@ -194,31 +276,80 @@ export function buildSeaTlsOptions(options: ConnectionOptions): SeaTlsOptions {
194276
}
195277

196278
if (customCaCert !== undefined) {
197-
if (typeof customCaCert === 'string') {
198-
// Light PEM sanity check — require a well-ordered BEGIN…END block so a
199-
// truncated/headerless cert (or a stray page that merely contains both
200-
// literals out of order, e.g. a proxy-intercept page) is rejected here
201-
// rather than surfacing as an opaque kernel TLS error. Ordered match, not
202-
// two independent substring checks. Full parsing is deferred to the kernel.
203-
if (!/-----BEGIN CERTIFICATE-----[\s\S]+?-----END CERTIFICATE-----/.test(customCaCert)) {
204-
throw new HiveDriverError(
205-
'SEA backend: `customCaCert` string does not look like a PEM certificate ' +
206-
"(expected a '-----BEGIN CERTIFICATE-----' … '-----END CERTIFICATE-----' block). " +
207-
'Pass PEM text or a Buffer of PEM bytes.',
208-
);
209-
}
210-
tls.customCaCert = Buffer.from(customCaCert, 'utf8');
211-
} else if (Buffer.isBuffer(customCaCert)) {
212-
if (customCaCert.length === 0) {
213-
throw new HiveDriverError('SEA backend: `customCaCert` Buffer is empty.');
279+
tls.customCaCert = normalizePemBytes(customCaCert, 'customCaCert', 'certificate');
280+
}
281+
282+
// mTLS client identity. Enforce both-or-neither up front so a caller who
283+
// sets only one gets a clear message naming the missing half, instead of
284+
// the kernel's generic `InvalidArgument` after the FFI hop.
285+
const hasCert = clientCertPem !== undefined;
286+
const hasKey = clientKeyPem !== undefined;
287+
if (hasCert !== hasKey) {
288+
throw new HiveDriverError(
289+
'SEA backend: mutual TLS requires both `clientCertPem` and `clientKeyPem`; only ' +
290+
`\`${hasCert ? 'clientCertPem' : 'clientKeyPem'}\` was supplied. ` +
291+
`Provide the matching ${hasCert ? 'private key (`clientKeyPem`)' : 'certificate (`clientCertPem`)'}, ` +
292+
'or omit both.',
293+
);
294+
}
295+
if (hasCert && hasKey) {
296+
tls.clientCertPem = normalizePemBytes(clientCertPem as Buffer | string, 'clientCertPem', 'certificate');
297+
tls.clientKeyPem = normalizePemBytes(clientKeyPem as Buffer | string, 'clientKeyPem', 'private key');
298+
}
299+
300+
return tls;
301+
}
302+
303+
/**
304+
* Build the napi HTTP options (`customHeaders`) from the public
305+
* `customHeaders` map and `userAgentEntry`.
306+
*
307+
* Mirrors the Python connector's `use_kernel` path (`session.py` +
308+
* `backend/kernel/client.py`), which:
309+
* 1. composes a single connector `User-Agent` and **unconditionally**
310+
* appends it last —
311+
* `all_headers = (http_headers or []) + [("User-Agent", useragent_header)]`;
312+
* 2. before forwarding to the kernel, **drops** the kernel-managed
313+
* reserved names `Authorization` / `x-databricks-org-id`
314+
* (case-insensitive) — the kernel applies the auth token itself and
315+
* re-derives the org id from the `?o=` in the http path, and would
316+
* otherwise skip-and-warn on every request.
317+
*
318+
* The result is an ordered list (the napi `Array<HeaderEntry>` shape,
319+
* matching the kernel core `Vec<(String, String)>`): the caller's
320+
* `customHeaders` first (minus reserved names), then the connector's
321+
* `User-Agent` last. The connector UA is always present and, being last,
322+
* is authoritative (the kernel folds the last `User-Agent` into its base
323+
* UA — `DatabricksJDBCDriverOSS/...` — preserving the result-disposition
324+
* gating token). The value is composed via the same `buildUserAgentString`
325+
* the Thrift path uses, so the SEA UA carries the identical
326+
* `NodejsDatabricksSqlConnector/...` identity (with `userAgentEntry`
327+
* folded in). A caller `User-Agent` in `customHeaders` is forwarded too
328+
* (mirroring Python, which doesn't dedupe it); the kernel's last-wins fold
329+
* means the connector UA still wins.
330+
*/
331+
const KERNEL_MANAGED_HEADERS = new Set(['authorization', 'x-databricks-org-id']);
332+
333+
export function buildSeaHttpOptions(options: ConnectionOptions): SeaHttpOptions {
334+
const { customHeaders, userAgentEntry } = options;
335+
336+
const headers: Array<{ name: string; value: string }> = [];
337+
if (customHeaders) {
338+
for (const [name, value] of Object.entries(customHeaders)) {
339+
// Drop kernel-managed reserved names before the FFI hop — same
340+
// double-wall as the Python connector's `_KERNEL_MANAGED_HEADERS`.
341+
if (KERNEL_MANAGED_HEADERS.has(name.toLowerCase())) {
342+
continue;
214343
}
215-
tls.customCaCert = customCaCert;
216-
} else {
217-
throw new HiveDriverError('SEA backend: `customCaCert` must be a PEM string or a Buffer.');
344+
headers.push({ name, value });
218345
}
219346
}
220347

221-
return tls;
348+
// Always append the connector's composed User-Agent last — exactly the
349+
// Python connector's unconditional `base_headers` append.
350+
headers.push({ name: 'User-Agent', value: buildUserAgentString(userAgentEntry) });
351+
352+
return { customHeaders: headers };
222353
}
223354

224355
/**
@@ -282,7 +413,8 @@ export function buildSeaConnectionOptions(options: ConnectionOptions): SeaNative
282413
httpPath: string;
283414
intervalsAsString: boolean;
284415
maxConnections?: number;
285-
} & SeaTlsOptions = {
416+
} & SeaTlsOptions &
417+
SeaHttpOptions = {
286418
hostName: options.host,
287419
httpPath: prependSlash(options.path),
288420
// Match the NodeJS Thrift driver, which surfaces INTERVAL columns as
@@ -292,9 +424,12 @@ export function buildSeaConnectionOptions(options: ConnectionOptions): SeaNative
292424
// (native Arrow) — they already decode identically to Thrift via the
293425
// shared Arrow converter, so `complexTypesAsJson` is not forced on.
294426
intervalsAsString: true,
295-
// TLS knobs (server-cert verification toggle + custom CA). Validated and
296-
// normalised (string PEM → Buffer) here so the napi shape only sees a Buffer.
427+
// TLS knobs (server-cert verification toggle + custom CA + mTLS client
428+
// identity). Validated and normalised (string PEM → Buffer) here so the
429+
// napi shape only sees a Buffer.
297430
...buildSeaTlsOptions(options),
431+
// HTTP headers (caller `customHeaders` + composed `User-Agent`).
432+
...buildSeaHttpOptions(options),
298433
};
299434

300435
// SEA-only pool sizing; read via cast to match how this function reads the

native/sea/index.d.ts

Lines changed: 59 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)