fix(bigquery): GCP service account impersonation across BigQuery ADC and SQLAlchemy paths#29683
fix(bigquery): GCP service account impersonation across BigQuery ADC and SQLAlchemy paths#29683harshsoni2024 wants to merge 5 commits into
Conversation
… credential types
BigQuery service account impersonation was only applied to the BigQuery
Python client and only when using JSON-key credentials (GcpCredentialsValues).
Two gaps caused impersonation to be silently ignored:
- The SQLAlchemy engine (used by Test Connection and all information_schema
reads) never received impersonated credentials, so those queries always ran
under the base identity.
- The Python-client path gated impersonation behind an isinstance check on
GcpCredentialsValues, excluding ADC.
gcpImpersonateServiceAccount lives on the parent credentials object and is
valid regardless of the selected gcpConfig type, so impersonation is now
resolved independently of the type across both paths:
- Add get_impersonate_client_kwargs() in helper.py and use it in
get_inspector_details so ADC and JSON-key both impersonate the Python client.
- Add a BigQuery-specific get_connection_args() that injects a pre-built
impersonated bigquery.Client into the SQLAlchemy engine via the dialect's
documented connect_args={'client': ...} hook, only when a target service
account is configured.
When no target service account is set, behavior is unchanged: connect args
equal the common args with no client injected.
Fixes #28204
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes BigQuery connector service-account impersonation so it is applied consistently across both BigQuery access paths: the SQLAlchemy bigquery:// engine (test connection + information_schema queries) and the google.cloud.bigquery.Client usage (datasets/policy tags). This addresses #28204 where ADC (and parts of JSON-key flows) were silently running under the base identity instead of the configured impersonated service account.
Changes:
- Adds a type-independent
get_impersonate_client_kwargs()helper and uses it when building the BigQuery Python client. - Adds BigQuery-specific
get_connection_args()to inject a pre-built impersonatedbigquery.Clientinto thesqlalchemy-bigquerydialect viaconnect_args={'client': ...}when impersonation is configured. - Introduces unit tests covering ADC + JSON-key impersonation behavior across both paths and ensuring no behavior change when impersonation is not configured.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| ingestion/src/metadata/ingestion/source/database/bigquery/helper.py | Centralizes impersonation kwargs generation and applies it to the BigQuery Python client path without gating on credential type. |
| ingestion/src/metadata/ingestion/source/database/bigquery/connection.py | Injects an impersonated BigQuery client into SQLAlchemy connect args when impersonation is configured, ensuring engine-driven queries run under the target SA. |
| ingestion/tests/unit/topology/database/test_bigquery_impersonation.py | Adds focused unit coverage for impersonation on both SQLAlchemy and Python client paths, for ADC and JSON-key, including regression guards. |
…eview - Treat whitespace-only impersonate email as unset (strip before guard) - Warn when impersonated client project id can't be resolved from config - Tighten get_impersonate_client_kwargs docstring; add path-scoping test
🟡 Playwright Results — all passed (25 flaky)✅ 4478 passed · ❌ 0 failed · 🟡 25 flaky · ⏭️ 38 skipped
🟡 25 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
| project_id = None | ||
| gcp_config = connection.credentials.gcpConfig | ||
| config_project_id = getattr(gcp_config, "projectId", None) | ||
| if config_project_id is not None: | ||
| if isinstance(config_project_id, SingleProjectId): | ||
| project_id = config_project_id.root | ||
| elif isinstance(config_project_id, MultipleProjectId) and config_project_id.root: | ||
| project_id = config_project_id.root[0] | ||
| return project_id |
There was a problem hiding this comment.
@harshsoni2024 can you check if this is relevant?
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
Code Review ✅ Approved 1 resolved / 1 findingsEnables service account impersonation for BigQuery across both ADC and JSON-key paths, resolving connection errors during ingestion. The implementation also addresses potential scoping issues when the project ID is missing. ✅ 1 resolved✅ Edge Case: Impersonated engine client may be scoped to None project
OptionsDisplay: compact → Showing less information. Comment with these commands to change the behavior for this request:
Was this helpful? React with 👍 / 👎 | Gitar |
|



Fixes #28204
Issue
When the BigQuery connector is configured with a Target Service Account Email
(impersonation), operations run under the base identity instead of the
impersonated one, producing
403 ... does not have bigquery.jobs.create. Thisaffects both ADC and JSON-key setups (issue #28204).
gcpImpersonateServiceAccountsits on the parentGCPCredentialsobject, so it'svalid with any
gcpConfigtype. But the connector only partially honored it, andBigQuery has two independent access paths:
bigquery://engine (Test Connection and allinformation_schemametadata queries):set_google_credentials()only sets thebase credentials env var / logs for ADC — it never applies impersonation. The
engine was built with the common connect args, so it could never impersonate,
for any credential type.
bigquery.Client(dataset listing, policy tags):get_inspector_detailsdid forward impersonation, but only inside anisinstance(..., GcpCredentialsValues)guard, so ADC was excluded.Net result (reporter's table): ADC ignores impersonation everywhere; JSON-key
impersonates the Python client but still runs its actual SQL as the base SA.
Fix
Resolve impersonation independently of the
gcpConfigtype, on both paths —mirroring the existing Google Drive / Airflow connectors:
helper.py: extractget_impersonate_client_kwargs()(type-independent;reads
gcpImpersonateServiceAccountoff the credentials) and use it inget_inspector_details, so ADC and JSON-key both impersonate the Python client.connection.py: add a BigQuery-specificget_connection_args()that, when atarget SA is configured, builds an impersonated
bigquery.Clientvia theexisting
get_bigquery_client(...)and injects it through thesqlalchemy-bigquerydialect's documentedconnect_args={'client': ...}hook._get_clientnow uses this instead ofget_connection_args_common.The impersonated client is scoped to
billingProjectIdwhen set, otherwise to thecredentials project id.
No impact on current behavior
Impersonation logic only activates when
gcpImpersonateServiceAccountis set witha non-empty email. When it isn't:
get_impersonate_client_kwargs()returns{}→ Python client built exactly asbefore.
get_connection_args()returns the common args with noclientinjected →engine built exactly as before.
Testing
test_bigquery_impersonation.py(12 tests) covering both paths for ADC andJSON-key: impersonation kwargs are type-independent, an impersonated client is
injected into the engine and scoped to the right project (billing project takes
precedence), the client is built with
impersonated_credentials, empty/blanktarget email is ignored, and — the regression guard — connect args are unchanged
and no client is injected when impersonation is absent.
test_bigquery.py,test_bigquery_test_connection.py,test_bigquery_incremental_table_processor.py)plus
test_credentials.py.make py_format_checkclean;basedpyrightreports 0 errors on the changed files.Manual reproduction (for reviewers with a GCP setup)
With a base SA granted
roles/iam.serviceAccountTokenCreatoron a target reader SA,add to the connector config:
yaml
Type of change:
High-level design:
N/A — small change.
Tests:
Use cases covered
Unit tests
Backend integration tests
Ingestion integration tests
Playwright (UI) tests
Manual testing performed
UI screen recording / screenshots:
Not applicable.
Checklist:
Fixes <issue-number>: <short explanation>Fixes #<issue-number>above.Greptile Summary
This PR fixes a bug where BigQuery service account impersonation was ignored by both the SQLAlchemy engine path and, for ADC configurations, the Python
bigquery.Clientpath. The root cause was that impersonation handling was incorrectly gated onisinstance(..., GcpCredentialsValues), excluding ADC, and was never applied to the SQLAlchemybigquery://engine at all.helper.py: Extracts a newget_impersonate_client_kwargs()helper that readsgcpImpersonateServiceAccountoff the credentials object regardless ofgcpConfigtype, with proper stripping of whitespace-only emails; replaces the old type-guarded block inget_inspector_details.connection.py: Adds a BigQuery-specificget_connection_args()that, when impersonation is configured, builds an impersonatedbigquery.Clientand injects it via thesqlalchemy-bigquerydialect'sconnect_args={'client': ...}hook, ensuring every query through the engine runs under the target identity;_get_client()is updated to use this instead ofget_connection_args_common.test_bigquery_impersonation.py: Adds 12 targeted unit tests covering both ADC and JSON-key paths, billing project precedence, blank email edge cases, and a regression guard ensuring no client is injected when impersonation is absent.Confidence Score: 5/5
Safe to merge — the change is narrowly scoped to impersonation handling and is a no-op when the impersonation field is absent, preserving all existing behavior.
Both changed code paths (SQLAlchemy engine and Python client) are guarded by a truthiness check on impersonation kwargs, so no impact on connectors that do not use gcpImpersonateServiceAccount. The new get_impersonate_client_kwargs helper correctly strips whitespace, and the project-ID resolution path covers all credential types (ADC, JSON key, credential path) with a fallback warning when none can be determined. The 12 new unit tests cover the regression guard and all meaningful edge cases.
No files require special attention.
Important Files Changed
Reviews (4): Last reviewed commit: "Merge branch 'main' into fix/bigquery-sa..." | Re-trigger Greptile