feat(sea): wire rowLimit + statementConf + TIMESTAMP_NTZ/LTZ params#408
feat(sea): wire rowLimit + statementConf + TIMESTAMP_NTZ/LTZ params#408msrathore-db wants to merge 2 commits into
Conversation
f857d15 to
1a72cad
Compare
e09869b to
7726d7f
Compare
|
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
1 similar comment
|
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Three statement-option / param-type additions where the kernel + napi were already ready but the node SEA layer didn't expose them: - rowLimit: new `ExecuteStatementOptions.rowLimit` → napi `rowLimit` (SEA `row_limit`). SEA-only server-side cap; Thrift has no execute-time cap. - statementConf: new `ExecuteStatementOptions.statementConf` → napi `statementConf` (SEA `statement_conf`), the Thrift `confOverlay` equivalent. Generalises the existing query_tags serialisation so a caller-supplied statementConf and queryTags merge into one conf map (queryTags already forwarded upstream). - TIMESTAMP_NTZ / TIMESTAMP_LTZ: added to `DBSQLParameterType` so callers can bind timezone-explicit timestamp params. `toSparkParameter` already honours an explicit type and `SeaPositionalParams` passes the SQL type verbatim to the kernel codec (which has the NTZ/LTZ arms). Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
The cascade commit that surfaced `statementId` on `SeaNativeStatement` and `sessionId` on `SeaNativeConnection` (matching the kernel napi binding's new getters) didn't update the blocking test fakes, breaking compilation. Add the readonly fields to FakeNativeStatement / FakeNativeConnection (execution.test.ts) and FakeNativeStatement / FakeMetadataConnection (metadata.test.ts). The async fakes already carried statementId. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
68a65ce to
edd07c8
Compare
Consolidates the last net-new bit of the superseded #408: two SEA-path DBSQLParameterType variants for binding timezone-explicit timestamps. The type name flows through the existing param codec (toSparkParameter → sqlType), which the kernel accepts — validated live (SELECT ? with TIMESTAMP_NTZ/LTZ returns the bound values). On the Thrift backend they degrade to a plain TIMESTAMP bind. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
|
Superseded by the consolidated SEA stack (#412 → #413 → #414) and closing to minimise open PRs. Coverage verified before closing: rowLimit + statementConf + TIMESTAMP_NTZ/LTZ params → consolidated into #413 (buildExecuteOptions rowLimit/statementConf + DBSQLParameterType.TIMESTAMP_NTZ/LTZ, both warehouse-validated). No unique code is lost. |
Consolidates the last net-new bit of the superseded #408: two SEA-path DBSQLParameterType variants for binding timezone-explicit timestamps. The type name flows through the existing param codec (toSparkParameter → sqlType), which the kernel accepts — validated live (SELECT ? with TIMESTAMP_NTZ/LTZ returns the bound values). On the Thrift backend they degrade to a plain TIMESTAMP bind. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
) * [SEA-NodeJS] SEA connection & statement options Wire the SEA connection-level and per-statement option surfaces onto the merged-kernel napi binding (thin forwarding — the kernel owns the behaviour): Connection options (SeaAuth.buildSeaConnectionOptions): - `maxConnections` → kernel pool size, validated as a positive integer within the napi u32 range. - TLS: `checkServerCertificate` (secure-by-default — omit to keep the kernel's verify-on default; `false` opts into insecure) and `customCaCert` (PEM string or Buffer; strings are PEM-sanity-checked and normalised to a Buffer before the FFI boundary), via the new `buildSeaTlsOptions`. - `intervalsAsString: true` is always set so SEA interval/duration columns render as strings — a byte-compatible drop-in for the Thrift backend. `complexTypesAsJson` is intentionally left at the kernel default (native Arrow), which already decodes identically to Thrift via the shared converter. Statement options (SeaSessionBackend.executeStatement, via buildExecuteOptions): - `queryTimeout` → `queryTimeoutSecs`; `rowLimit` → `rowLimit` (SEA-only cap). - `queryTags` serialised JS-side (reusing Thrift's `serializeQueryTags`) into the reserved `query_tags` conf key, merged with any explicit `statementConf` — the napi `queryTags` field can't carry null-valued tags, and the kernel rejects setting both. `queryTags` / `queryTimeout` are no longer rejected. - Still rejected (genuinely unsupported on SEA): `useCloudFetch`, `useLZ4Compression`, `stagingAllowedLocalPath`. `rowLimit` / `statementConf` added to the public `ExecuteStatementOptions`; SEA-only knobs (`maxConnections` / `checkServerCertificate` / `customCaCert`) added to the internal `InternalConnectionOptions`. Validated against a live warehouse: secure-by-default connect, maxConnections, checkServerCertificate, rowLimit (caps rows), queryTimeout, queryTags, statementConf, and non-PEM customCaCert rejection. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] SEA async execute (submit / poll / awaitResult) Switch the SEA query path from the blocking `executeStatement` to the kernel's async `submitStatement`, matching the Thrift backend's always-async (`runAsync: true`) model. `submitStatement` returns immediately with a pending `AsyncStatement` (kernel `wait_timeout=0s`) while the query runs server-side. SeaOperationBackend becomes dual-mode (exactly one of): - `asyncStatement` (query path): `waitUntilReady()` polls `status()` to a terminal state on a 100ms cadence (matching Thrift), firing the progress callback each tick. Polling `status()` rather than blocking on `awaitResult()` keeps `cancel()` responsive — a blocking awaitResult would hold the kernel statement mutex for the whole query and queue cancel behind it. On Succeeded it materialises the result handle (first fetch is free); on Failed it drives `awaitResult()` to surface the kernel's typed SQL-error envelope; on a server-side Cancelled/Closed/Unknown it throws a clear error. `status()` reports the real Pending/Running/Succeeded state. - `statement` (metadata path): the kernel `list*`/`get*` statement is already terminal, so `waitUntilReady()` stays the one-shot completion tick. The fetch pipeline is shared: `awaitResult()`'s `AsyncResultHandle` and the metadata `Statement` expose the same `fetchNextBatch()` / `schema()` surface, so `SeaResultsProvider` → `ArrowResultConverter` → `ResultSlicer` consume either interchangeably via a single memoised fetch handle. cancel()/close() route through a `lifecycleHandle` abstraction over whichever handle backs the op. Re-exports the kernel `AsyncStatement` / `AsyncResultHandle` types from `SeaNativeLoader`. Validated against a live warehouse: async fetchAll correctness, multi-row drain (5000 rows), long-running aggregate (count over 20M), kernel SQL-error surfacing, and cancellation mid-execution. PR1's params/metadata/getInfo all still pass through the new async path. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] prettier-format SEA connection/options + async files (CI code-style) The CI "Check code style" step runs `prettier . --check` (whole repo); these files were committed without prettier formatting. Formatting-only. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] Address code-review findings on async/TLS/options (#413) Code-review #413 (81/100). Validated each against the code + a live warehouse: - F1 (HIGH): the async poll loop threw plain HiveDriverError for server-driven Cancelled/Closed/Unknown. The DBSQLOperation facade only mirrors its cancelled/closed flags when `err instanceof OperationStateError` (and OperationStateError extends HiveDriverError, not the reverse), so a server-side cancel/close/admin-kill left the facade desynced. Now throws OperationStateError(Canceled/Closed/Unknown) — matching the Thrift backend. The Failed branch still surfaces the kernel SQL-error envelope via awaitResult. - F2 (MED): the server-Cancelled test asserted only instanceOf(HiveDriverError), which passes for both the correct and incorrect type — it couldn't catch F1. Now asserts instanceOf(OperationStateError) + errorCode, plus a new Closed test. - F3 (MED): queryTimeout was forwarded to submitStatement but the kernel ignores queryTimeoutSecs on submit (always wait_timeout=0s), so the documented public option was a silent no-op, and the poll loop had no client-side deadline (a stalled Running statement polled forever). Now enforced client-side: the poll loop tracks a deadline, best-effort cancels the statement on expiry, and throws OperationStateError(Timeout) — matching Thrift's server TIMEDOUT outcome. Stopped forwarding the ignored queryTimeoutSecs to the napi options. Validated live: a 2s timeout interrupts a slow cross-join with TIMEOUT. - F4 (LOW): customCaCert PEM string check now requires the END marker too (a truncated/headerless cert no longer passes), consistent with the Buffer path. - F5 (LOW): SeaAuth reads the SEA-only fields (checkServerCertificate / customCaCert / maxConnections) through `InternalConnectionOptions` instead of ad-hoc inline casts, so a typo'd key fails to compile. - F6 (LOW): corrected the poll-loop comment — the prior text justified polling by an incorrect "blocking awaitResult holds the mutex and queues cancel" claim; cancel() is documented lock-free. The real rationale (real-time status to the progress callback + cancel observed between ticks) is now stated. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] add TIMESTAMP_NTZ / TIMESTAMP_LTZ bound-param types Consolidates the last net-new bit of the superseded #408: two SEA-path DBSQLParameterType variants for binding timezone-explicit timestamps. The type name flows through the existing param codec (toSparkParameter → sqlType), which the kernel accepts — validated live (SELECT ? with TIMESTAMP_NTZ/LTZ returns the bound values). On the Thrift backend they degrade to a plain TIMESTAMP bind. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] Address #413 review: TIMESTAMP_LTZ wire type + Int64 queryTimeout coercion Code-review #413 (gopalldb). Two P1s: - TIMESTAMP_LTZ was sent verbatim on the wire, but Spark has no distinct TIMESTAMP_LTZ type (TIMESTAMP already carries LTZ semantics) — so a Thrift caller got an opaque server bind error, and the enum comment falsely claimed NTZ/LTZ "degrade to a plain TIMESTAMP bind" (there was no such logic). `toSparkParameter` now maps TIMESTAMP_LTZ → `TIMESTAMP` (valid on both Thrift and kernel); TIMESTAMP_NTZ stays native (a real Spark type). Comment corrected. Added DBSQLParameter tests for both wire types (the Thrift behaviour the review flagged as untested) and updated the kernel positional-params test. - queryTimeout (`number | bigint | Int64`) was coerced with `Number(...)`, which yields NaN for an Int64 (node-int64 has no valueOf) → the client-side deadline was silently disabled for Int64 inputs. Now uses `numberToInt64(...).toNumber()`, matching the Thrift backend. Added a regression test that an `Int64(1)` queryTimeout actually fires the deadline (OperationStateError(Timeout)) rather than polling forever. (P1 "queryTimeout silently dropped on submit" and the unbounded-poll note were already resolved earlier by the client-side deadline fix; doc comment updated to match. P2 polarity/Date-NTZ items noted for the public-surface follow-up.) Validated live: NTZ binds natively and LTZ binds as TIMESTAMP on the kernel path. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> --------- Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
What
Three statement-option / param-type additions where the kernel + napi binding were already complete but the node SEA layer never exposed them. (Audit "Group A"; the fourth Group-A item,
queryTags, was a dropped pre-existing option and is fixed upstream in #403.)ExecuteStatementOptions.rowLimit→ napirowLimit(SEArow_limit)ExecuteStatementOptions.statementConf→ napistatementConf(ThriftconfOverlayequivalent)Datecoerced toTIMESTAMPDBSQLParameterType.TIMESTAMP_NTZ/TIMESTAMP_LTZselectableWhy these were "free"
The kernel
StatementSpec(row_limit,statement_conf, NTZ/LTZTypedValuearms) and napiExecuteOptionsalready exposed everything; the only missing piece was the nodeExecuteStatementOptionssurface + threading. No kernel/napi change.Notes
query_tagsserialisation (wired upstream in feat(sea): Thrift-parity — params, intervals, getInfo, SQL-error class, input validation + queryTags #403): a caller-suppliedstatementConfandqueryTagsmerge into one conf map.maxRowsremains the client-side per-fetch chunk size on both backends.toSparkParameteralready honours an explicittypeandSeaPositionalParamspasses the SQL type verbatim to the kernel codec — so only the enum values were needed.Testing
240 SEA unit tests pass (rowLimit/statementConf forwarding, statementConf+queryTags merge, NTZ/LTZ param mapping). Verified live against pecotesting earlier (rowLimit:7 caps a
range(0,100)result to 7 rows; NTZ param round-trips).Stacking
Stacked on #407 → #406 → #403 → #404.
This pull request and its description were written by Isaac.
Downstream fixes / reviewer note
flatbuffers, one-pass IPC duration rewrite, cancel/close local-state rollback on native RPC failure, closed fetch →OperationStateError(Closed), and Arrow duration rewriter cleanup.useCloudFetchrather than silently ignoring it, and uses nativesessionId/statementIdfor session/operation ids where available.