Skip to content

feat: add OpenLineage request logger extension#19107

Merged
jtuglu1 merged 23 commits into
apache:masterfrom
mshahid6:add-open-lineage
Jun 16, 2026
Merged

feat: add OpenLineage request logger extension#19107
jtuglu1 merged 23 commits into
apache:masterfrom
mshahid6:add-open-lineage

Conversation

@mshahid6

@mshahid6 mshahid6 commented Mar 7, 2026

Copy link
Copy Markdown
Contributor

Description

Added extensions-contrib/openlineage-emitter as a contrib extension that uses the RequestLogger to transform and send lineage information to any OpenLineage-compatible API.
For SQL queries, the SQL text is parsed with the Calcite parser to extract input datasources (FROM clauses, JOINs, CTEs) and output datasources (INSERT INTO). For native queries, table names are read from DataSource.getTableNames(). Native sub-queries spawned by a SQL execution are deduplicated against the SQL-level event.
Each event includes standard OpenLineage facets (processing_engine, jobType, sql,errorMessage) and custom Druid facets (druid_query_context with user identity and query metadata, druid_query_statistics with duration and bytes).

Transport is configurable: CONSOLE (default) logs JSON to the Druid log; HTTP POSTs to an OpenLineage endpoint such as Marquez. Can be combined with other loggers via the composing provider.

Screenshot 2026-05-13 at 3 21 30 PM

This PR has:

  • been self-reviewed.
  • using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@jtuglu1 jtuglu1 self-requested a review March 13, 2026 05:22
Comment thread docs/development/extensions-contrib/openlineage-emitter.md Outdated
Comment thread docs/development/extensions-contrib/openlineage-emitter.md Outdated
Comment thread docs/development/extensions-contrib/openlineage-emitter.md
@mshahid6 mshahid6 changed the title feat: add OpenLineage request logger extension feat: add OpenLineage request logger extension Apr 23, 2026

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1

This is an automated review by Codex GPT-5

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1

This is an automated review by Codex GPT-5

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 1
P2 0
P3 0
Total 1
Severity Findings
P0 0
P1 1
P2 0
P3 0
Total 1

Reviewed 11 of 11 changed files.


This is an automated review by Codex GPT-5.5

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1

Reviewed 11 of 11 changed files.


This is an automated review by Codex GPT-5.5

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up handled: the MSQ output extraction now uses DruidSqlParser, normalizes schema/catalog-prefixed datasource targets, skips EXTERN exports, and includes regression coverage for the cases raised.

Reviewed 11 of 11 changed files.


This is an automated review by Codex GPT-5.5

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.

Reviewed 13 of 13 changed files.


This is an automated review by Codex GPT-5.5

@jtuglu1 jtuglu1 added this to the 38.0.0 milestone Jun 16, 2026

@jtuglu1 jtuglu1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once CI is green

@jtuglu1 jtuglu1 closed this Jun 16, 2026
@jtuglu1 jtuglu1 reopened this Jun 16, 2026
@jtuglu1 jtuglu1 merged commit 0e991fd into apache:master Jun 16, 2026
38 checks passed
@mshahid6 mshahid6 deleted the add-open-lineage branch June 16, 2026 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants