Add connector details to http headers#28129
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a configurable, workflow-specific HTTP User-Agent header to ingestion clients so OpenMetadata server logs can identify which connector/workflow/service generated each API/SSE request.
Changes:
- Add
user_agentsupport to the REST and SSE clients and thread it throughcreate_ometa_client. - Generate best-effort workflow identifiers (connector + workflow type + optional service/version) from
IngestionWorkflow, with a generic fallback inBaseWorkflow. - Add unit tests covering user-agent construction and client header behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| ingestion/tests/unit/test_user_agent.py | Adds unit tests validating User-Agent generation and REST client header application. |
| ingestion/src/metadata/workflow/ingestion.py | Builds connector-specific User-Agent strings (connector/workflow/service/version) for ingestion workflows. |
| ingestion/src/metadata/workflow/base.py | Passes workflow User-Agent into OM client creation and provides a generic default User-Agent. |
| ingestion/src/metadata/ingestion/ometa/sse_client.py | Applies configured User-Agent to SSE request headers. |
| ingestion/src/metadata/ingestion/ometa/client.py | Adds user_agent to client config and applies it to the requests session. |
| ingestion/src/metadata/ingestion/ometa/client_utils.py | Threads optional user_agent into OpenMetadata client construction via additional client config args. |
edg956
left a comment
There was a problem hiding this comment.
lgtm, but I'd address the copilot suggestions
🟡 Playwright Results — all passed (13 flaky)✅ 4067 passed · ❌ 0 failed · 🟡 13 flaky · ⏭️ 92 skipped
🟡 13 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
A user-controlled serviceName (or any other value) carrying CR/LF or other control characters would either crash the request with requests/httpx InvalidHeader or, worse, smuggle in a second header line. Strip non-printable ASCII, trim, and cap at 256 chars at every sink (REST session, SSE stream) and at the source where serviceName is interpolated. When sanitization leaves nothing usable, drop the header so the default agent is sent instead of a malformed one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mock(spec=ClientConfig) doesn't auto-discover Pydantic v2 model fields, so once sse_client.stream() started reading config.user_agent every test in the suite raised AttributeError. Add an explicit None default to the shared fixture. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code Review ✅ ApprovedIntegrates connector details into HTTP headers and improves ingestion security by sanitizing User-Agent strings and configuring client timeouts. No issues found. OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
Failed to cherry-pick changes to the 1.12.9 branch. |
|
Failed to cherry-pick changes to the 1.13 branch. |
* Add connector details to http headers * fix(ingestion): sanitize User-Agent before sending to OM server A user-controlled serviceName (or any other value) carrying CR/LF or other control characters would either crash the request with requests/httpx InvalidHeader or, worse, smuggle in a second header line. Strip non-printable ASCII, trim, and cap at 256 chars at every sink (REST session, SSE stream) and at the source where serviceName is interpolated. When sanitization leaves nothing usable, drop the header so the default agent is sent instead of a malformed one.
* Add connector details to http headers * fix(ingestion): sanitize User-Agent before sending to OM server A user-controlled serviceName (or any other value) carrying CR/LF or other control characters would either crash the request with requests/httpx InvalidHeader or, worse, smuggle in a second header line. Strip non-printable ASCII, trim, and cap at 256 chars at every sink (REST session, SSE stream) and at the source where serviceName is interpolated. When sanitization leaves nothing usable, drop the header so the default agent is sent instead of a malformed one.
|
cherry-picked into 1.13 or 1.12.9 branches manually |
|



Describe your changes:
Fixes #
I worked on ... because ...
Type of change:
High-level design:
N/A — small change.
Tests:
Use cases covered
Unit tests
Backend integration tests
Ingestion integration tests
Playwright (UI) tests
Manual testing performed
UI screen recording / screenshots:
Not applicable.
Checklist:
Fixes <issue-number>: <short explanation>Fixes #<issue-number>above.Summary by Gitar
sanitize_user_agentutility to prevent HTTP header injection by stripping control characters and enforcing a 256-character limit.User-Agentheaders inRESTandSSEclients, andserviceNameinterpolation inMetadataWorkflow.test_user_agent.pyto verify header-safe output and default fallback behavior for unsalvageable inputs.timeoutconfiguration forClientConfigwith default connection and read timeouts of(10, 300)seconds.This will update automatically on new commits.