Skip to content

Commit 39dc0a7

Browse files
feat(kernel): wire OAuth (M2M/U2M), TLS/mTLS, and Geometry on use_kernel path (#819)
* feat(kernel): wire OAuth (M2M/U2M), TLS/mTLS, and Geometry on use_kernel path Consume the OAuth + TLS surface added to the Rust kernel's pyo3 binding so the use_kernel backend reaches feature parity with the Thrift/SEA backends for these flows. Bumps KERNEL_REV to the kernel commit that ships the new Session kwargs. OAuth (kernel-owned lifecycle): - auth_bridge.kernel_auth_kwargs now takes the raw connect() auth options (the OAuth secret is consumed during auth_provider construction and isn't recoverable from the built provider, so we read it from the original kwargs): - OAuth M2M: oauth_client_id + oauth_client_secret are forwarded to the kernel's auth_type='oauth-m2m'; the kernel acquires and refreshes tokens itself via workspace OIDC client-credentials. - OAuth U2M: auth_type 'databricks-oauth' / 'azure-oauth' route to the kernel's auth_type='oauth-u2m' (optional oauth_client_id / oauth_redirect_port forwarded); the kernel runs the browser flow. - A custom credentials_provider is rejected with a clear NotSupportedError — it's an opaque token source with no extractable raw creds, so the kernel can't own the lifecycle. Callers are pointed at oauth_client_id / oauth_client_secret. - oauth_client_secret is a new connect() kwarg, honored only on the use_kernel path; Thrift/SEA are unaffected. - session.py forwards the raw auth kwargs into KernelDatabricksClient. TLS / mTLS: - client._kernel_tls_kwargs translates the connector's SSLOptions into the kernel's tls_* Session kwargs: tls_trusted_ca_file -> tls_ca_cert (PEM bytes), client cert/key files -> tls_client_cert / tls_client_key (mTLS), and inverts tls_verify -> tls_skip_verify / tls_verify_hostname -> tls_skip_hostname_verify. A password-protected client key is rejected (the kernel has no surface for it yet). ssl_options is no longer accept-and-ignored on this path. Geometry: - type_mapping recovers GEOGRAPHY / GEOMETRY result columns from the databricks.type_name metadata (they arrive as Arrow Utf8, like VARIANT) so PEP-249 description reports the precise type instead of collapsing to string. Tests: 140 kernel unit tests pass (new M2M/U2M routing, custom-provider rejection, SSLOptions->tls_* translation incl. mTLS + encrypted-key rejection, and geospatial type recovery). black + mypy clean. Note: KERNEL_REV points at the kernel's feat/pyo3-oauth-tls branch HEAD until that PR merges; re-pin to the squash-merge SHA before this lands. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * chore(kernel): re-pin KERNEL_REV to updated feat/pyo3-oauth-tls HEAD The companion kernel PR (#107) was amended with review fixes (doc backend-inversion fix, U2M degenerate-value validation), changing its branch HEAD SHA. Re-pin so this PR's kernel-e2e builds against the current kernel. Still the branch HEAD — re-pin to the squash-merge SHA once #107 lands. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * fix(kernel): address review — re-pin KERNEL_REV, secret hygiene, auth ambiguity, TLS superset Addresses @gopalldb's review on #819. P0: - Re-pin KERNEL_REV to the squash-merge SHA of the now-merged kernel PR (ec2288742cbac0cd9fab50da353e8405972eefe9 on kernel main), replacing the orphaned branch-HEAD SHA. P1: - Scrub oauth_client_secret from the long-lived self._auth_options in open_session's finally (even on the failure path). It outlives the method, so a retained secret was exposed to vars()/pickle/debugger far longer than needed; the kernel now owns it. - tls_verify=False now also emits tls_skip_hostname_verify=True — no-verify subsumes hostname verification, matching SSLOptions' create_ssl_context (check_hostname=False when tls_verify is False). - Reject ambiguous auth combos before resolving, instead of silently picking M2M: (a) credentials_provider + oauth_client_secret, and (b) a U2M auth_type (databricks-oauth/azure-oauth) + oauth_client_secret. Both raise NotSupportedError naming the conflict, so the failure is at session-open rather than a confusing first-call 401 against the wrong principal. - Secret-leak tests: oauth_client_secret is forwarded to the kernel but scrubbed from _auth_options, and absent from repr(conn) / vars(conn); scrub runs even when open_session raises. P2: - _read_pem_bytes rejects an empty/whitespace CA/cert file with a clear ProgrammingError instead of passing empty PEM to the kernel. - _normalize_scopes raises ProgrammingError on a non-str/list/tuple oauth_scopes rather than silently dropping to default scopes. - Added U2M oauth_scopes forwarding test. 150 kernel unit tests pass; black + mypy clean. Note: KERNEL_REV now points at the merged kernel main SHA; no further re-pin needed before merge. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * fix(kernel): don't build connector OAuth provider on use_kernel path + add TLS e2e Live mitmproxy TLS e2e (this commit) surfaced a real bug: on use_kernel=True the connector still built its own auth provider in Session.__init__ via get_python_sql_connector_auth_provider, BEFORE use_kernel was consulted. For OAuth that constructor eagerly runs the flow (DatabricksOAuthProvider calls _initial_get_token in __init__), so passing oauth_client_id for kernel-managed M2M fired the U2M *browser* flow at connect() time — opening a browser and using the M2M service principal id as a U2M OAuth app client_id (which the account rejects). Fix: on use_kernel=True, do NOT build the connector's provider. Hand the kernel auth bridge a minimal AccessTokenAuthProvider when an access_token is present, else None; OAuth M2M/U2M resolve purely from the raw connect() kwargs the bridge already reads, and the kernel owns the entire token lifecycle. Thrift/SEA backends are unchanged. - session.py: branch the provider build on use_kernel; import AccessTokenAuthProvider; update the kernel-branch comment. - auth_bridge.py: kernel_auth_kwargs / _is_pat / _extract_bearer_token accept Optional[AuthProvider] (None when no token on the kernel path); clearer "no credentials" error; refreshed module docstring. - tests/unit/test_session.py: regression guards — use_kernel must NOT call get_python_sql_connector_auth_provider, forwards raw M2M creds via auth_options, and uses a minimal AccessTokenAuthProvider for PAT. TLS e2e (the connector-side counterpart to the kernel's v0_tls_e2e.rs): - tests/e2e/test_kernel_tls.py: 3 mitmproxy-backed tests driving sql.connect(use_kernel=True, _tls_*=...) — strict default rejects the re-signed cert, _tls_trusted_ca_file trusts it, _tls_no_verify bypasses. Skips cleanly without the kernel wheel / creds / MITMPROXY_CA_CERT. Verified live against the dogfood workspace on the OAuth M2M path (all 3 green) — this is what caught the bug above. - kernel-e2e.yml: run test_kernel_tls.py in the existing (label/merge- queue-gated) run-kernel-e2e job behind a mitmproxy service. The kernel wheel is built before the proxy is in scope so cargo's JFrog fetch isn't routed through it; HTTPS_PROXY is scoped to the TLS test step. Path filter now also triggers on session.py and test_kernel_tls.py. 184 unit tests pass; black + mypy clean. Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * test(kernel): make use_kernel auth-provider guards wheel-independent The two TestKernelAuthProviderBypass tests went through databricks.sql.connect(use_kernel=True), which triggers the lazy `import databricks_sql_kernel` in the kernel client module. That import fails in the unit-test job (no Rust wheel installed), so the tests errored with ImportError before their patches applied — green locally (wheel present) but red in CI. Rewrite them to exercise Session.__init__'s provider-selection logic directly with _create_backend stubbed, so the kernel client (and its import) is never touched. Assert on session.auth_provider and that get_python_sql_connector_auth_provider is not called. Verified passing both with the wheel present and with `import databricks_sql_kernel` forcibly blocked (reproducing the CI environment). Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> --------- Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
1 parent 6723350 commit 39dc0a7

11 files changed

Lines changed: 977 additions & 68 deletions

File tree

.github/workflows/kernel-e2e.yml

Lines changed: 54 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,7 @@ jobs:
178178
echo "$CHANGED"
179179
# Run when the connector kernel backend, kernel e2e tests,
180180
# this workflow, the kernel revision pin, or core deps move.
181-
if echo "$CHANGED" | grep -qE "^(src/databricks/sql/backend/kernel/|tests/e2e/test_kernel_backend\.py|tests/unit/test_kernel_|\.github/workflows/kernel-e2e\.yml|KERNEL_REV|pyproject\.toml|poetry\.lock)"; then
181+
if echo "$CHANGED" | grep -qE "^(src/databricks/sql/backend/kernel/|src/databricks/sql/session\.py|tests/e2e/test_kernel_backend\.py|tests/e2e/test_kernel_tls\.py|tests/unit/test_kernel_|\.github/workflows/kernel-e2e\.yml|KERNEL_REV|pyproject\.toml|poetry\.lock)"; then
182182
echo "run_tests=true" >> "$GITHUB_OUTPUT"
183183
else
184184
echo "run_tests=false" >> "$GITHUB_OUTPUT"
@@ -319,6 +319,58 @@ jobs:
319319
- name: Run kernel e2e tests
320320
run: poetry run pytest tests/e2e/test_kernel_backend.py -v
321321

322+
# ── TLS e2e through an intercepting proxy ───────────────────────
323+
# mitmproxy re-signs every TLS connection with its own CA. The
324+
# connector tests (tests/e2e/test_kernel_tls.py) prove the whole
325+
# stack — sql.connect(use_kernel=True, _tls_*=...) → SSLOptions →
326+
# kernel handshake — trusts the CA / accepts no-verify / rejects
327+
# by default. The kernel wheel is already built above, BEFORE any
328+
# proxy is in scope, so cargo's JFrog registry fetch was never
329+
# routed through (and broken by) mitmproxy.
330+
- name: Start mitmproxy
331+
run: |
332+
docker run -d --name mitmproxy \
333+
-p 8080:8080 \
334+
mitmproxy/mitmproxy:12.2.1@sha256:743b6cdc817211d64bc269f5defacca8d14e76e647fc474e5c7244dbcb645141 \
335+
mitmdump --set stream_large_bodies=0
336+
for i in $(seq 1 15); do
337+
if docker exec mitmproxy test -f /home/mitmproxy/.mitmproxy/mitmproxy-ca-cert.pem 2>/dev/null; then
338+
echo "mitmproxy CA cert is ready"
339+
break
340+
fi
341+
echo "Waiting for mitmproxy CA cert... ($i/15)"
342+
sleep 1
343+
done
344+
345+
- name: Extract mitmproxy CA certificate
346+
run: |
347+
docker cp mitmproxy:/home/mitmproxy/.mitmproxy/mitmproxy-ca-cert.pem /tmp/mitmproxy-ca-cert.pem
348+
echo "MITMPROXY_CA_CERT=/tmp/mitmproxy-ca-cert.pem" >> "$GITHUB_ENV"
349+
echo "Extracted mitmproxy CA cert:"
350+
openssl x509 -in /tmp/mitmproxy-ca-cert.pem -noout -subject -issuer
351+
352+
- name: Run kernel TLS e2e tests
353+
# HTTPS_PROXY is scoped to THIS step so only the test's own HTTP
354+
# traffic flows through mitmproxy — not cargo / pip / anything
355+
# the earlier build steps needed from the JFrog registry.
356+
env:
357+
HTTPS_PROXY: http://localhost:8080
358+
run: poetry run pytest tests/e2e/test_kernel_tls.py -v
359+
360+
- name: Verify traffic flowed through mitmproxy
361+
if: always()
362+
run: |
363+
echo "=== mitmproxy logs (tail) ==="
364+
docker logs mitmproxy 2>&1 | tail -50
365+
echo "=== checking for proxied-connection entries ==="
366+
docker logs mitmproxy 2>&1 \
367+
| grep -i "CONNECT\|clientconnect\|<<" \
368+
|| (echo "FAIL: no traffic in mitmproxy logs — proxy was not used" && exit 1)
369+
370+
- name: Cleanup mitmproxy
371+
if: always()
372+
run: docker rm -f mitmproxy 2>/dev/null || true
373+
322374
# Post a Kernel E2E check on both the labeled-PR and merge-queue
323375
# paths so the named check on the PR reflects the latest real
324376
# run (overwriting the synthetic-success check that
@@ -341,7 +393,7 @@ jobs:
341393
completed_at: new Date().toISOString(),
342394
output: {
343395
title: 'Kernel E2E passed',
344-
summary: 'tests/e2e/test_kernel_backend.py ran green against the pinned kernel SHA.'
396+
summary: 'test_kernel_backend.py and the mitmproxy-backed test_kernel_tls.py ran green against the pinned kernel SHA.'
345397
}
346398
});
347399

KERNEL_REV

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
3aa25b219ac4ec2c1e95c6f836b67d5475ae9a7d
1+
ec2288742cbac0cd9fab50da353e8405972eefe9

src/databricks/sql/backend/kernel/auth_bridge.py

Lines changed: 180 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,41 @@
1-
"""Translate the connector's ``AuthProvider`` into ``databricks_sql_kernel``
2-
``Session`` auth kwargs.
3-
4-
This phase ships PAT only. The kernel-side PyO3 binding accepts
5-
``auth_type='pat'``; OAuth / federation / custom credentials
6-
providers are reserved but not yet wired in either layer. Non-PAT
7-
auth raises ``NotSupportedError`` from this bridge so the failure
8-
surfaces at session-open time with a clear message rather than
9-
deep inside the kernel.
10-
11-
Token extraction goes through ``AuthProvider.add_headers({})``
12-
rather than touching auth-provider-specific attributes, so the
13-
bridge works uniformly for every PAT shape — including
14-
``AccessTokenAuthProvider`` wrapped in ``TokenFederationProvider``
15-
(which ``get_python_sql_connector_auth_provider`` does for every
16-
provider it builds).
1+
"""Translate the connector's auth configuration into
2+
``databricks_sql_kernel`` ``Session`` auth kwargs.
3+
4+
Three auth shapes are supported on the kernel path:
5+
6+
- **PAT** — extracted from the built ``AuthProvider`` (works for
7+
``AccessTokenAuthProvider``, including the ``TokenFederationProvider``
8+
wrapper that ``get_python_sql_connector_auth_provider`` always
9+
applies). Maps to the kernel's ``auth_type='pat'``.
10+
- **OAuth M2M** — when the caller passes ``oauth_client_id`` +
11+
``oauth_client_secret``, the *raw* credentials are forwarded to the
12+
kernel's ``auth_type='oauth-m2m'`` and the kernel owns the full
13+
token lifecycle (acquire + refresh via workspace OIDC
14+
client-credentials). We forward the raw pair rather than reusing the
15+
connector's own OAuth provider because the kernel re-mints tokens
16+
itself and the client secret is not recoverable from a built
17+
provider.
18+
- **OAuth U2M** — for ``auth_type`` ``databricks-oauth`` /
19+
``azure-oauth`` (the browser authorization-code flow), the optional
20+
``oauth_client_id`` / ``oauth_redirect_port`` are forwarded to the
21+
kernel's ``auth_type='oauth-u2m'`` and the kernel runs the browser
22+
flow itself.
23+
24+
A user-supplied custom ``credentials_provider`` is **rejected** on the
25+
kernel path with ``NotSupportedError``: it's an opaque token source
26+
with no extractable raw credentials, so the kernel can't own the
27+
lifecycle. Such callers should pass ``oauth_client_id`` /
28+
``oauth_client_secret`` (M2M) instead. Anything else non-PAT also
29+
raises ``NotSupportedError`` so the failure surfaces at session-open
30+
with a clear message rather than deep inside the kernel.
31+
32+
The M2M / U2M decisions are driven by the *raw* connect() kwargs
33+
(``auth_options``), not a built ``AuthProvider``. On the kernel path
34+
the connector deliberately does **not** build its own OAuth provider
35+
(that would eagerly run the U2M browser flow / M2M token exchange at
36+
connect() time, before the kernel is consulted), so ``auth_provider``
37+
is either a minimal PAT provider or ``None`` and the OAuth credentials
38+
are available only from the raw kwargs.
1739
"""
1840

1941
from __future__ import annotations
@@ -46,7 +68,7 @@
4668
_TOKEN_REJECT_RE = re.compile(r"[\x00-\x20\x7f]")
4769

4870

49-
def _is_pat(auth_provider: AuthProvider) -> bool:
71+
def _is_pat(auth_provider: Optional[AuthProvider]) -> bool:
5072
"""Return True iff this provider ultimately wraps an
5173
``AccessTokenAuthProvider``.
5274
@@ -65,18 +87,20 @@ def _is_pat(auth_provider: AuthProvider) -> bool:
6587
return False
6688

6789

68-
def _extract_bearer_token(auth_provider: AuthProvider) -> Optional[str]:
90+
def _extract_bearer_token(auth_provider: Optional[AuthProvider]) -> Optional[str]:
6991
"""Pull the current bearer token out of an ``AuthProvider``.
7092
7193
The connector's ``AuthProvider.add_headers`` mutates a header
7294
dict and writes the ``Authorization: Bearer <token>`` value.
7395
Going through that public surface keeps us insulated from
7496
provider-specific internals.
7597
76-
Returns ``None`` if the provider did not write an Authorization
77-
header or wrote a non-Bearer scheme — neither is representable
78-
in the kernel's PAT auth surface.
98+
Returns ``None`` if there is no provider, the provider did not
99+
write an Authorization header, or it wrote a non-Bearer scheme —
100+
none of which is representable in the kernel's PAT auth surface.
79101
"""
102+
if auth_provider is None:
103+
return None
80104
headers: Dict[str, str] = {}
81105
auth_provider.add_headers(headers)
82106
auth = headers.get("Authorization")
@@ -93,15 +117,83 @@ def _extract_bearer_token(auth_provider: AuthProvider) -> Optional[str]:
93117
return token
94118

95119

96-
def kernel_auth_kwargs(auth_provider: AuthProvider) -> Dict[str, Any]:
120+
def kernel_auth_kwargs(
121+
auth_provider: Optional[AuthProvider],
122+
auth_options: Optional[Dict[str, Any]] = None,
123+
) -> Dict[str, Any]:
97124
"""Build the kwargs passed to ``databricks_sql_kernel.Session(...)``.
98125
99-
PAT (including ``TokenFederationProvider``-wrapped PAT) routes
100-
through the kernel's PAT path. Anything else raises
101-
``NotSupportedError`` — the kernel binding doesn't accept OAuth
102-
today, and routing OAuth through PAT would silently break
103-
token refresh during long-running sessions.
126+
``auth_options`` carries the raw connect() kwargs relevant to auth
127+
(``auth_type``, ``oauth_client_id``, ``oauth_client_secret``,
128+
``oauth_redirect_port``, ``credentials_provider``). They drive the
129+
OAuth decisions because the OAuth secret is consumed during
130+
``AuthProvider`` construction and can't be read back off the built
131+
provider.
132+
133+
Resolution order:
134+
135+
0. **Ambiguity guards** — reject conflicting auth signals *before*
136+
resolving, so an ambiguous request fails loudly at session-open
137+
rather than silently picking one flow (and failing later as a
138+
confusing 401 against the wrong principal):
139+
- a custom ``credentials_provider`` *and* M2M kwargs together;
140+
- a U2M ``auth_type`` (``databricks-oauth`` / ``azure-oauth``)
141+
*and* ``oauth_client_secret`` together.
142+
1. **OAuth M2M** — ``oauth_client_id`` + ``oauth_client_secret``
143+
both present → forward raw creds to the kernel's ``oauth-m2m``.
144+
2. **PAT** — the built provider is (or wraps) an
145+
``AccessTokenAuthProvider`` → extract the bearer token.
146+
3. **OAuth U2M** — ``auth_type`` is ``databricks-oauth`` /
147+
``azure-oauth`` → forward optional ``oauth_client_id`` /
148+
``oauth_redirect_port`` to the kernel's ``oauth-u2m``.
149+
4. **Custom credentials_provider** → ``NotSupportedError`` (opaque
150+
token source; no raw creds for the kernel to own).
151+
5. Anything else → ``NotSupportedError``.
152+
153+
M2M is checked before PAT so that a workload passing both an
154+
access token *and* M2M creds resolves to the (refreshing) M2M path
155+
rather than a static token. (Token + M2M is not treated as
156+
ambiguous: a PAT is often present as ambient config the caller
157+
didn't intend as the primary credential, whereas an explicit
158+
``oauth_client_secret`` is unambiguous M2M intent.)
104159
"""
160+
opts = auth_options or {}
161+
162+
client_id = opts.get("oauth_client_id")
163+
client_secret = opts.get("oauth_client_secret")
164+
auth_type = opts.get("auth_type")
165+
has_m2m = bool(client_id and client_secret)
166+
167+
# 0. Ambiguity guards — fail before any flow is chosen.
168+
if client_secret and opts.get("credentials_provider") is not None:
169+
raise NotSupportedError(
170+
"Ambiguous auth on use_kernel=True: both a custom "
171+
"credentials_provider and oauth_client_secret were provided. "
172+
"Pass exactly one — oauth_client_id + oauth_client_secret for "
173+
"kernel-managed M2M, or use the Thrift backend (default) for "
174+
"credentials_provider."
175+
)
176+
if client_secret and auth_type in ("databricks-oauth", "azure-oauth"):
177+
raise NotSupportedError(
178+
f"Ambiguous auth on use_kernel=True: auth_type={auth_type!r} selects "
179+
"the U2M browser flow, but oauth_client_secret was also provided "
180+
"(machine-to-machine). Drop oauth_client_secret for U2M, or drop "
181+
"auth_type for M2M."
182+
)
183+
184+
# 1. OAuth M2M — raw client-credentials pair forwarded to the kernel.
185+
if has_m2m:
186+
kwargs: Dict[str, Any] = {
187+
"auth_type": "oauth-m2m",
188+
"client_id": client_id,
189+
"client_secret": client_secret,
190+
}
191+
scopes = _normalize_scopes(opts.get("oauth_scopes"))
192+
if scopes is not None:
193+
kwargs["oauth_scopes"] = scopes
194+
return kwargs
195+
196+
# 2. PAT (including TokenFederationProvider-wrapped PAT).
105197
if _is_pat(auth_provider):
106198
token = _extract_bearer_token(auth_provider)
107199
if not token:
@@ -111,8 +203,66 @@ def kernel_auth_kwargs(auth_provider: AuthProvider) -> Dict[str, Any]:
111203
)
112204
return {"auth_type": "pat", "access_token": token}
113205

206+
# 3. OAuth U2M — browser authorization-code flow; the kernel runs it.
207+
if auth_type in ("databricks-oauth", "azure-oauth"):
208+
kwargs = {"auth_type": "oauth-u2m"}
209+
if client_id:
210+
kwargs["client_id"] = client_id
211+
redirect_port = opts.get("oauth_redirect_port")
212+
if redirect_port is not None:
213+
kwargs["redirect_port"] = int(redirect_port)
214+
scopes = _normalize_scopes(opts.get("oauth_scopes"))
215+
if scopes is not None:
216+
kwargs["oauth_scopes"] = scopes
217+
return kwargs
218+
219+
# 4. Custom credentials_provider — the connector's primary M2M path
220+
# on Thrift/SEA, but unusable on the kernel: it's an opaque token
221+
# source with no extractable client_id/secret, so the kernel
222+
# can't own the token lifecycle. Point the caller at the raw
223+
# M2M kwargs instead.
224+
if opts.get("credentials_provider") is not None:
225+
raise NotSupportedError(
226+
"use_kernel=True does not support a custom credentials_provider. "
227+
"For OAuth machine-to-machine auth, pass oauth_client_id and "
228+
"oauth_client_secret so the kernel can manage the token lifecycle "
229+
"directly; or use the Thrift backend (default) with "
230+
"credentials_provider."
231+
)
232+
233+
# 5. Everything else (including no usable credentials at all —
234+
# ``auth_provider`` is None on the kernel path when no access
235+
# token was supplied and no OAuth kwargs resolved above).
236+
provider_desc = (
237+
type(auth_provider).__name__ if auth_provider is not None else "no credentials"
238+
)
114239
raise NotSupportedError(
115-
f"The kernel backend (use_kernel=True) currently only supports PAT auth, "
116-
f"but got {type(auth_provider).__name__}. Use the Thrift backend "
117-
"(default) for OAuth / federation / custom credential providers."
240+
f"use_kernel=True requires PAT (access_token), OAuth M2M "
241+
f"(oauth_client_id + oauth_client_secret), or OAuth U2M "
242+
f"(auth_type='databricks-oauth' / 'azure-oauth'), but got "
243+
f"{provider_desc} with auth_type={auth_type!r}. Use the Thrift "
244+
"backend (default) for other auth flows."
245+
)
246+
247+
248+
def _normalize_scopes(scopes: Any) -> Optional[list]:
249+
"""Normalise an ``oauth_scopes`` value to a list of strings, or
250+
``None`` to let the kernel apply its defaults.
251+
252+
Accepts a list/tuple of strings or a single space-delimited string
253+
(the shape ``DatabricksOAuthProvider`` stores internally)."""
254+
if scopes is None:
255+
return None
256+
if isinstance(scopes, str):
257+
parts = scopes.split()
258+
return parts or None
259+
if isinstance(scopes, (list, tuple)):
260+
parts = [str(s) for s in scopes if s]
261+
return parts or None
262+
# Anything else (int, dict, bool, …) is a caller error. Fail loudly
263+
# rather than silently dropping the scopes to None and surprising
264+
# the user with default scopes.
265+
raise ProgrammingError(
266+
f"oauth_scopes must be a list/tuple of strings or a space-delimited "
267+
f"string, got {type(scopes).__name__}."
118268
)

0 commit comments

Comments
 (0)