Skip to content

Add connector details to http headers#28129

Merged
harshach merged 4 commits into
mainfrom
t3code/21217985
May 16, 2026
Merged

Add connector details to http headers#28129
harshach merged 4 commits into
mainfrom
t3code/21217985

Conversation

@harshach
Copy link
Copy Markdown
Collaborator

@harshach harshach commented May 14, 2026

Describe your changes:

Fixes #

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

High-level design:

N/A — small change.

Tests:

Use cases covered

Unit tests

Backend integration tests

Ingestion integration tests

Playwright (UI) tests

Manual testing performed

UI screen recording / screenshots:

Not applicable.

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • My PR is linked to a GitHub issue via Fixes #<issue-number> above.
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
  • For UI changes: I attached a screen recording and/or screenshots above.
  • I have added tests (unit / integration / Playwright as applicable) and listed them above.

Summary by Gitar

  • Ingestion client security:
    • Introduced sanitize_user_agent utility to prevent HTTP header injection by stripping control characters and enforcing a 256-character limit.
    • Applied sanitization to User-Agent headers in REST and SSE clients, and serviceName interpolation in MetadataWorkflow.
    • Added robust unit testing in test_user_agent.py to verify header-safe output and default fallback behavior for unsalvageable inputs.
  • Ingestion client update:
    • Added timeout configuration for ClientConfig with default connection and read timeouts of (10, 300) seconds.

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings May 14, 2026 21:11
@harshach harshach requested a review from a team as a code owner May 14, 2026 21:11
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels May 14, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a configurable, workflow-specific HTTP User-Agent header to ingestion clients so OpenMetadata server logs can identify which connector/workflow/service generated each API/SSE request.

Changes:

  • Add user_agent support to the REST and SSE clients and thread it through create_ometa_client.
  • Generate best-effort workflow identifiers (connector + workflow type + optional service/version) from IngestionWorkflow, with a generic fallback in BaseWorkflow.
  • Add unit tests covering user-agent construction and client header behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
ingestion/tests/unit/test_user_agent.py Adds unit tests validating User-Agent generation and REST client header application.
ingestion/src/metadata/workflow/ingestion.py Builds connector-specific User-Agent strings (connector/workflow/service/version) for ingestion workflows.
ingestion/src/metadata/workflow/base.py Passes workflow User-Agent into OM client creation and provides a generic default User-Agent.
ingestion/src/metadata/ingestion/ometa/sse_client.py Applies configured User-Agent to SSE request headers.
ingestion/src/metadata/ingestion/ometa/client.py Adds user_agent to client config and applies it to the requests session.
ingestion/src/metadata/ingestion/ometa/client_utils.py Threads optional user_agent into OpenMetadata client construction via additional client config args.

Comment thread ingestion/src/metadata/ingestion/ometa/client.py
Comment thread ingestion/src/metadata/ingestion/ometa/sse_client.py
Comment thread ingestion/src/metadata/workflow/ingestion.py
edg956
edg956 previously approved these changes May 14, 2026
Copy link
Copy Markdown
Contributor

@edg956 edg956 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but I'd address the copilot suggestions

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

🟡 Playwright Results — all passed (13 flaky)

✅ 4067 passed · ❌ 0 failed · 🟡 13 flaky · ⏭️ 92 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 297 0 2 4
🟡 Shard 2 754 0 8 14
🟡 Shard 3 780 0 1 7
✅ Shard 4 790 0 0 18
🟡 Shard 5 708 0 1 41
🟡 Shard 6 738 0 1 8
🟡 13 flaky test(s) (passed on retry)
  • Features/CustomizeDetailPage.spec.ts › Glossary Term - customization should work (shard 1, 1 retry)
  • Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/DataQuality/ColumnLevelTests.spec.ts › Column Values To Be In Set (shard 2, 1 retry)
  • Features/Glossary/GlossaryHierarchy.spec.ts › should cancel move operation (shard 2, 1 retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2, 1 retry)
  • Features/KnowledgeCenterList.spec.ts › Knowledge Center List - Test infinite scroll/pagination (shard 2, 1 retry)
  • Features/KnowledgeCenterTextEditor.spec.ts › Rich Text Editor - Text Formatting (shard 2, 1 retry)
  • Features/KnowledgeCenterTextEditor.spec.ts › Rich Text Editor - Text Formatting (shard 2, 1 retry)
  • Features/KnowledgeCenterTextEditor.spec.ts › Rich Text Editor - Text Formatting (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Pages/EntityDataSteward.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

A user-controlled serviceName (or any other value) carrying CR/LF or
other control characters would either crash the request with
requests/httpx InvalidHeader or, worse, smuggle in a second header line.
Strip non-printable ASCII, trim, and cap at 256 chars at every sink
(REST session, SSE stream) and at the source where serviceName is
interpolated. When sanitization leaves nothing usable, drop the header
so the default agent is sent instead of a malformed one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 16, 2026 04:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Mock(spec=ClientConfig) doesn't auto-discover Pydantic v2 model fields,
so once sse_client.stream() started reading config.user_agent every test
in the suite raised AttributeError. Add an explicit None default to the
shared fixture.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 16, 2026

Code Review ✅ Approved

Integrates connector details into HTTP headers and improves ingestion security by sanitizing User-Agent strings and configuring client timeouts. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@harshach harshach added the To release Will cherry-pick this PR into the release branch label May 16, 2026
@harshach harshach merged commit 3771265 into main May 16, 2026
77 of 78 checks passed
@harshach harshach deleted the t3code/21217985 branch May 16, 2026 18:08
@github-actions
Copy link
Copy Markdown
Contributor

Failed to cherry-pick changes to the 1.12.9 branch.
Please cherry-pick the changes manually.
You can find more details here.

@github-actions
Copy link
Copy Markdown
Contributor

Failed to cherry-pick changes to the 1.13 branch.
Please cherry-pick the changes manually.
You can find more details here.

harshach added a commit that referenced this pull request May 16, 2026
* Add connector details to http headers

* fix(ingestion): sanitize User-Agent before sending to OM server

A user-controlled serviceName (or any other value) carrying CR/LF or
other control characters would either crash the request with
requests/httpx InvalidHeader or, worse, smuggle in a second header line.
Strip non-printable ASCII, trim, and cap at 256 chars at every sink
(REST session, SSE stream) and at the source where serviceName is
interpolated. When sanitization leaves nothing usable, drop the header
so the default agent is sent instead of a malformed one.
harshach added a commit that referenced this pull request May 16, 2026
* Add connector details to http headers

* fix(ingestion): sanitize User-Agent before sending to OM server

A user-controlled serviceName (or any other value) carrying CR/LF or
other control characters would either crash the request with
requests/httpx InvalidHeader or, worse, smuggle in a second header line.
Strip non-printable ASCII, trim, and cap at 256 chars at every sink
(REST session, SSE stream) and at the source where serviceName is
interpolated. When sanitization leaves nothing usable, drop the header
so the default agent is sent instead of a malformed one.
@harshach
Copy link
Copy Markdown
Collaborator Author

cherry-picked into 1.13 or 1.12.9 branches manually

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants