[SEA-NodeJS] (6/8) CloudFetch + Inline Arrow result fetching#381
[SEA-NodeJS] (6/8) CloudFetch + Inline Arrow result fetching#381msrathore-db wants to merge 1 commit into
Conversation
|
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
fa6da7f to
5d64f20
Compare
|
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
d529954 to
da92c45
Compare
5d64f20 to
01e4b91
Compare
|
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Implements IOperationBackend over the napi binding's Statement. fetchChunk decodes Arrow IPC bytes → apache-arrow RecordBatch → ArrowResultConverter (Phase 1+2 reused unchanged) → JS rows. All M0 datatypes round-trip via the same converter the thrift path uses (BOOL, INT8/16/32/64, FLOAT, DOUBLE, DECIMAL, STRING, BINARY, DATE, TIMESTAMP, INTERVAL, ARRAY, MAP, STRUCT). Unit tests construct synthetic IPC batches; e2e test against pecotesting confirms byte-identical parity vs thrift. No new dependencies. ArrowResultConverter / ResultSlicer / OperationIterator all reused unchanged (DRY). Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
da92c45 to
6be8799
Compare
01e4b91 to
cd3b1e1
Compare
… IOperationBackend (#378) * sea-abstraction: introduce IBackend / ISessionBackend / IOperationBackend Refactors DBSQLClient/Session/Operation to dispatch through three backend interfaces. ThriftBackend (lib/thrift-backend/) contains the relocated existing thrift logic. SeaBackend (lib/sea/) is a stub for M0; the sea-napi-binding feature wires the real impl. Public surface (lib/index.ts) unchanged. No new dependencies. All existing tests pass. Files: - lib/contracts/IBackend.ts (new) - lib/contracts/ISessionBackend.ts (new) - lib/contracts/IOperationBackend.ts (new) - lib/contracts/IDBSQLClient.ts (adds useSEA?: boolean to ConnectionOptions) - lib/thrift-backend/ThriftBackend.ts (new) - lib/thrift-backend/ThriftSessionBackend.ts (new) - lib/thrift-backend/ThriftOperationBackend.ts (new) - lib/sea/SeaBackend.ts (new, M0 stub) - lib/DBSQLClient.ts (dispatch through IBackend; useSEA picks SeaBackend) - lib/DBSQLSession.ts (facade over ISessionBackend; staging stays here) - lib/DBSQLOperation.ts (facade over IOperationBackend; iterators/fetchAll stay here) - tests/unit/DBSQLClient.test.ts (retarget internal state lookup through backend; pre-seed client.backend in tests that bypass connect()) - tests/unit/DBSQLOperation.test.ts (retarget internal state lookup through backend) * sea-abstraction: cleanup — restore JSDoc, dedupe test pre-seed, fix inline type Addresses code-bloat-watchdog findings from commit 0085928: - Restores public-API JSDoc on DBSQLSession + DBSQLOperation methods (was deleted as scope creep; contracts unchanged so docs still apply) - Adds makeStubbedClient() helper to tests/unit/DBSQLClient.test.ts; replaces 14× duplicated ThriftBackend pre-seed - Imports WaitUntilReadyOptions instead of inline option types in IOperationBackend + DBSQLOperation.waitUntilReady * sea-abstraction: address full-review findings (F1-F17 except F5) Round-N fixes from the 9-reviewer pre-review. Public IOperation/DBSQLOperation surface preserved byte-identical; backend interfaces (IBackend / ISessionBackend / IOperationBackend) made fully neutral so both Thrift and SEA can implement the same contract. F1 — neutral DTOs at IOperationBackend with Thrift-shape preservation on the public facade (adapter pattern): - lib/contracts/OperationStatus.ts (new) — neutral OperationStatus + OperationState enum mirroring databricks-sql-python's CommandState and kernel pyo3's StatementStatus taxonomy. - lib/contracts/ResultMetadata.ts (new) — neutral ResultMetadata + ResultFormat enum mirroring the three TSparkRowSetType cases. - IOperationBackend.status()/getResultMetadata() return the neutral DTOs. - ThriftOperationBackend.status() adapts at the boundary via adaptOperationStatus / adaptResultMetadata; module-level helpers thriftStateToOperationState and thriftRowSetTypeToResultFormat do the enum maps. - ThriftOperationBackend exposes thriftStatusResponse() and thriftResultMetadataResponse() as public Thrift-only accessors used by the facade's zero-loss fast path (kept for internal state-machine + result-handler dispatch as well). - lib/utils/thriftWireSynthesis.ts (new) — synthesizeThriftStatus and synthesizeThriftResultSetMetadata: convert neutral DTOs back to Thrift wire shape for the non-Thrift backend path. Lossy on Thrift-only fields (taskStatus, numModifiedRows, cacheLookupResult, etc.). - DBSQLOperation.status() and getMetadata() preserved Thrift return shape: Thrift backend path returns the real wire response (zero loss); non-Thrift backend path synthesizes via the new helpers. - DBSQLOperation.getResultMetadata() — new additive neutral accessor on IOperation; DBSQLSession.handleStagingOperation uses it instead of the deprecated Thrift-shaped getMetadata(). F2 — IBackend.connect() is now zero-arg. Backend reads everything it needs from IClientContext / constructor; matches Python connector's pattern of passing session_configuration via constructor not method-arg. F3 — Restore the 'Server protocol version' debug log dropped by the original PR-378 refactor. Re-added to ThriftSessionBackend.constructor with the LogLevel.debug + IClientContext.getLogger() pattern; matches the pre-refactor log site at main:lib/DBSQLSession.ts:175. F4 + F11 + F14 — SeaBackend stub safety: - close() is a no-op so DBSQLClient.close()'s state-clearing block can finish even after a useSEA: true connect() failure. - connect() and openSession() throw HiveDriverError instead of generic Error, matching the rest of the codebase. - connect(options: ConnectionOptions) and openSession(request: OpenSessionRequest) declare their parameters (with @typescript-eslint/no-unused-vars disable) so IDE autocomplete prompts the M1 SEA implementer. F6 + F7 + F9 + F10 — JSDoc on backend interfaces: - IBackend: connect/openSession/close docstrings; close() doc explicitly states transport-layer cleanup is owned by DBSQLClient. - ISessionBackend: copy IDBSQLSession's per-method one-liner JSDoc. - IOperationBackend: doc hasResultSet (readonly external; mutates internally), waitUntilReady (MUST throw OperationStateError on terminal non-success). F8 — tests/unit/sea/SeaBackend.test.ts (new) locks in the stub contract: connect() rejects HiveDriverError, openSession() rejects HiveDriverError, close() resolves no-op. ~30 LOC. F12 — Drop legacy { handle, ... } ctor branch from DBSQLOperation and DBSQLSession: - Facades accept only { backend, context }. - DBSQLSession no longer imports ThriftSessionBackend at all. - DBSQLOperation imports ThriftOperationBackend solely for the F1 typed downcast (zero-loss Thrift fast path); this is a deliberate, scoped coupling tied to the back-compat decision. - tests/unit/.stubs/createSessionForTest.ts and createOperationForTest.ts (new) wrap the legacy shape; all 48 + 54 test sites mechanically migrated. F15 — ThriftOperationBackend.waitUntilReady uses imported WaitUntilReadyOptions type instead of an anonymous inline shape. F16 — useSEA flag moved out of public ConnectionOptions: - Removed useSEA?: boolean from the exported lib/contracts/IDBSQLClient.ts ConnectionOptions; no longer ships in the public .d.ts. - lib/contracts/InternalConnectionOptions.ts (new) declares the flag as a non-exported internal extension; DBSQLClient.connect() reads via a typed cast. Mirrors Python's kwargs.get('use_sea', False) pattern at databricks-sql-python/src/databricks/sql/session.py:111. F17 — Missing return; after case 'timeout' in forwardConnectionEvent so a future fifth case doesn't silently fall through. The trailing return; in the last case triggers no-useless-return — quieted with a localized eslint-disable-next-line + intent comment. F5 — deferred per owner instruction (test-only as any cast tightening). Verification: - yarn lint clean (3 pre-existing warnings in tests/e2e/protocol_versions.test.ts). - yarn build clean. - tsc --noEmit -p tsconfig.json clean (apart from pre-existing examples/tokenFederation/* import errors that exist on main). - Runtime smoke test of SeaBackend stub + Thrift-wire synthesis round-trip passes 5/5 assertions. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * sea-abstraction: address PR #378 review-comment fixes (H1 / M1-M4 / L1-L10) Addresses 15 review findings from the code-review-squad pass on PR #378. L11 (backend kind field on the three interfaces) is deliberately deferred to avoid a cross-stack cascade ripple while the downstream PRs are still in flight. H1 — fetchChunk lost mid-flight failIfClosed regression. Add optional `isClosed?: () => boolean` to IOperationBackend.fetchChunk's options bag. ThriftOperationBackend.fetchChunk probes it after the setTimeout(0) macrotask yield and returns [] when set; the facade's post-fetch failIfClosed then raises the user-visible OperationStateError. Restores the guard that the refactor split across the facade/backend boundary so a cancel/close arriving during the yield window no longer runs the data RPC to completion needlessly. M1 — neutralize WaitUntilReadyOptions callback shape. Introduce IOperationBackendWaitOptions { callback?: (status: OperationStatus) => unknown } on the backend interface. Facade keeps the public Thrift-typed OperationStatusCallback and adapts at the boundary by wrapping the user's callback with synthesizeThriftStatus. ThriftOperationBackend.waitUntilReady consumes the neutral options and passes adaptOperationStatus(response) to the callback. M2 — synthesizeOkStatus maps OperationState to TStatusCode. Add synthesizeStatusFromOperation that returns ERROR_STATUS for Failed/Cancelled/Closed (carrying errorMessage + sqlState) and SUCCESS_STATUS otherwise. Wire it into synthesizeThriftStatus so legacy Status.assert(resp.status) sees the right code on non-Thrift backends. M3 — TelemetryEvent + DriverConfiguration carry a backend tag. Add optional backend?: 'thrift' | 'sea' | 'kernel' on both interfaces so dashboards can slice latency/error rate by backend without a metrics-schema migration once non-Thrift emission goes live. M4 — test coverage for the synthesize helpers + useSEA failure path. New tests/unit/thrift-backend/wireSynthesis.test.ts covering all OperationState/ResultFormat mappings, ERROR_STATUS carries errorMessage/sqlState, hasResultSet round-trip, schema/arrowSchema/ lz4Compressed/isStagingOperation preservation, and the L3 throw on unknown ResultFormat. New test in DBSQLClient.test.ts asserts that a useSEA:true connect failure leaves this.backend === undefined and the next openSession() surfaces "not connected" rather than the SeaBackend's "not implemented" error. L1 — forwardConnectionEvent normalizes payload to Error. Replace `payload as Error` with `payload instanceof Error ? payload : new Error(String(payload))` so a backend that emits a non-Error through the cross-backend onConnectionEvent doesn't crash the logger.log call. L2 — DBSQLClient.connect publishes this.backend only on success. Construct the backend locally, await connect() in a try/catch, run a best-effort backend.close() (per IBackend.close()'s safe-on-partial-init contract) and rethrow on failure. Only assign this.backend after a clean connect so a failed connect surfaces "DBSQLClient: not connected" on the next openSession. L3 — resultFormatToThrift throws on unknown ResultFormat. Replace the silent default fallback to COLUMN_BASED_SET with a HiveDriverError. Prevents a future ResultFormat enum extension from silently routing results through JsonResultHandler and surfacing garbled rows. L4 — DBSQLOperation.getMetadata carries @deprecated. Adds the canonical TypeScript JSDoc tag so IDEs (strikethrough), tsc, ESLint plugins, and agentic codegen pick up the soft deprecation in favour of getResultMetadata. L5 — numberToInt64 re-export carries @deprecated. Re-export through a named const with a JSDoc block (rather than a bare `export { ... } from`) so the @deprecated tag attaches to the symbol consumers see in their IDE / .d.ts. L6 — DBSQLSession.runBackend helper. Collapse 11 duplicated `failIfClosed → backend.X → failIfClosed` brackets into a single private runBackend<T>(fn) so the open-flag-before-and-after contract has a name and can't be forgotten in a new delegation method. L7 — restore three why-comments deleted from DBSQLSession. Staging-detection invariant in executeStatement, AWS-vs-Azure 404 difference on staging-remove, and the Content-Length-required note on staging-upload. Verbatim from main; these document non-obvious intentional behaviour the refactor inadvertently dropped. L8 — hasResultSet becomes a method on IOperationBackend. The value is state-dependent (the Thrift impl mutates the underlying operation handle inside processOperationStatusResponse), so the property+readonly+disclaimer-JSDoc pattern was misleading. Method form makes the live-read semantics obvious to a fresh implementer. 3 facade call sites updated. L9 — wireSynthesis moves under thrift-backend. The file imports Thrift IDL types and produces Thrift-typed values; it belongs next to ThriftOperationBackend, not in the neutral lib/utils/ tree where it would creep into the dependency cone of future backend-neutral helpers. Same reasoning that placed numberToInt64 and getDirectResultsOptions under thrift-backend/. L10 — interface-level downcast policy. Add a JSDoc paragraph on IOperationBackend grandfathering the two existing `instanceof ThriftOperationBackend` downcasts in DBSQLOperation.status/getMetadata and prohibiting new ones. Future zero-loss back-compat needs should extend the interface (or add an optional method) rather than spawn a per-backend branch matrix. Gates: yarn build (exit 0), yarn lint (0 errors, 3 pre-existing warnings in tests/e2e/protocol_versions.test.ts), yarn test on touched files (163 passing, +12 net new tests from M4 work; 2 failures pre- existing on PR head unchanged: getSchema-directResults and the LZ4-cloud-fetch flag — both flagged in the team-lead playbook as known prior regressions). Cascade implications for downstream PRs (#380 #377 #379 #382 #381 #384 #383): L8 converts hasResultSet from a property to a method, M1 swaps WaitUntilReadyOptions for IOperationBackendWaitOptions on the backend interface. Both are mechanical renames at downstream backend impls when they rebase. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * Fix SEA abstraction merge fallout Restore Thrift compatibility paths needed by existing schema and result-handler tests after merging main telemetry changes. Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * Restore Thrift result-handler compatibility hooks Keep existing e2e-only inspection hooks available through the facade while the new backend abstraction owns result handling. Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> --------- Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Stack
Linear stack of 8 PRs landing the M0 + M1 Phase 1 SEA NodeJS work. Merge in order from base ↑ to tip. The tip branch (
msrathore/sea-auth-u2m, PR #383) is the single snapshot containing everything in flight — point your test or benchmark harness at it for an end-to-end check.sea-abstractionsea-napi-binding.nodeartifactsea-errors-loggingErrorCode→ JS error-class mapping (M0 minimum)sea-authuseSEA: truesea-executionexecuteStatement+openSession(sessionConfig, initialCatalog/Schema)sea-resultssea-operationsea-auth-u2m← TIPCompanion kernel stack (
databricks/databricks-sql-kernel): 8 PRs — root #26 (async-public-api) → #27 → #25 → #29 → #28 → #30 → #24 → #23 (tip).Policy: new PRs always stack on the current tip. No sibling/parallel topology. No force-pushes on existing PRs unless absolutely necessary; if a PR's content is wrong, add a fix-up commit on top of the stack tip rather than rewriting history.
This PR is position 6/8.
Summary
SeaOperationBackend+SeaResultsProvider+SeaArrowIpcwire the kernel's CloudFetch and Inline Arrow paths to JS row-shape conversion.Size note (964 LOC, marginally over cap)
CloudFetch and Inline Arrow share too much (ArrowResultConverter integration) to split cleanly. Reviewable in one pass.
Test plan
useCloudFetchoverride forwarded viasession_confDraft until you give the go for review.
Tracking
Co-authored-by: Isaac