Skip to content

feat(span-processor): mark app root spans#1651

Open
hassiebp wants to merge 3 commits into
mainfrom
add-app-root-span-detection
Open

feat(span-processor): mark app root spans#1651
hassiebp wants to merge 3 commits into
mainfrom
add-app-root-span-detection

Conversation

@hassiebp
Copy link
Copy Markdown
Contributor

@hassiebp hassiebp commented May 12, 2026

What

Adds SDK-side app-root marking for exported spans so Langfuse V4 can default observation tables to the top-most exported observations without relying on the OpenTelemetry root span being exported.

Why

Default and custom span filtering can remove infrastructure or instrumentation spans before export. When that happens, the backend may never receive the original OTEL root span, so filtering the UI by parent_observation_id IS NULL can miss the useful root row. The SDK now marks the best-effort app root on the span before export.

Changes

  • Add langfuse.internal.is_app_root as the internal app-root marker attribute.
  • Track per-trace, processor-local parent export state in LangfuseSpanProcessor.
  • Mark an exported span as an app root when its immediate parent is not expected to export.
  • Use langfuse_trace_id baggage for best-effort distributed suppression after active Langfuse spans start.
  • Avoid self-suppression by reading baggage from the span's parent context and writing baggage only into the child context used for downstream work.
  • Keep internal langfuse_trace_id baggage out of propagated user metadata.
  • Add focused unit coverage for filtered parents, sibling roots, baggage suppression, local parent precedence, and start-time false positive behavior.

Limitations

  • The decision is best-effort and made at span start using currently available filter information.
  • If should_export_span changes its decision by span end, the SDK does not repair already-started descendants.
  • Distributed suppression depends on normal OTEL context and baggage propagation.
  • Processor-local state is cleaned when spans end; never-ended spans can retain state for that trace.

Verification

  • uv run --frozen ruff check langfuse/_client/span_processor.py langfuse/_client/client.py langfuse/_client/propagation.py langfuse/_client/attributes.py tests/unit/test_app_root_detection.py tests/unit/test_propagate_attributes.py
  • uv run --frozen ruff format --check langfuse/_client/span_processor.py langfuse/_client/client.py langfuse/_client/propagation.py langfuse/_client/attributes.py tests/unit/test_app_root_detection.py tests/unit/test_propagate_attributes.py
  • uv run --frozen mypy langfuse --no-error-summary
  • uv run --frozen pytest tests/unit/test_app_root_detection.py tests/unit/test_propagate_attributes.py tests/unit/test_otel.py -q

Greptile Summary

This PR adds best-effort "app root" span detection to the Langfuse SDK so that Langfuse V4 can identify the topmost exported observation per trace even when OTEL infrastructure spans are filtered out before export.

  • LangfuseSpanProcessor now tracks per-trace, per-span export-expectation state in a thread-safe dict, marking a span as langfuse.internal.is_app_root at on_start time when its immediate parent is not expected to export. Cleanup runs in on_end's finally block to ensure the state is always released.
  • client.py injects a langfuse_trace_id baggage entry into the child context after each start_as_current_observation span starts, enabling distributed suppression for downstream services while avoiding self-suppression.
  • The internal baggage key is explicitly excluded from _get_propagated_attributes_from_context to prevent it from surfacing as user metadata.

Confidence Score: 3/5

The feature is logically sound and the threading model is correct, but _get_readable_span relies on a private OTel SDK method that, if ever removed or renamed, will silently disable app-root marking across the entire SDK with only a DEBUG-level log.

The core detection and cleanup logic is well-designed and the new test coverage is thorough. The risk is concentrated in _get_readable_span, which bridges from the mutable Span seen at on_start to the ReadableSpan interface expected by _should_export_span. Using span._readable_span() achieves this, but if the OTel SDK drops the private method the feature degrades invisibly with no warning and no fallback.

langfuse/_client/span_processor.py, specifically the _get_readable_span static method around line 311.

Sequence Diagram

sequenceDiagram
    participant C as Client (start_as_current_observation)
    participant SP as LangfuseSpanProcessor
    participant BS as BatchSpanProcessor

    C->>SP: on_start(span, parent_context)
    SP->>SP: _mark_app_root_candidate(span, parent_context)
    note over SP: Reads langfuse_trace_id from parent_context baggage
    SP->>BS: super().on_start(span, parent_context)
    SP-->>C: returns

    C->>C: attach baggage token (langfuse_trace_id)
    C->>C: yield Langfuse span wrapper

    C->>C: finally: detach baggage token

    C->>SP: on_end(span)
    SP->>SP: project/scope/filter checks
    alt should export
        SP->>BS: super().on_end(span)
    end
    SP->>SP: finally: _cleanup_app_root_state(span)
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
langfuse/_client/span_processor.py:311-317
**Private OTel SDK API creates silent feature degradation path**

`_readable_span` is a private method on `opentelemetry.sdk.trace.Span`. If the OTel SDK ever renames or removes it, `getattr` returns `None`, the callable check raises `TypeError`, which is caught by the `try/except` in `on_start` at DEBUG level — and the entire `IS_APP_ROOT` feature silently stops working with no user-visible error. Since `opentelemetry.sdk.trace.Span` already implements `ReadableSpan` (it inherits from it), you can replace the private accessor with a direct cast: `cast(ReadableSpan, span)`. This removes the fragile dependency without changing runtime behavior.

### Issue 2 of 2
tests/unit/test_propagate_attributes.py:1638-1641
Import added inside a test method — the team's style rule requires imports to live at the top of the module, not nested inside functions or methods. The pre-existing `from opentelemetry import baggage` and `from opentelemetry import context as otel_context` in this same method also violate the rule, but this PR adds a third one. All three should be hoisted to the module level.

```suggestion
        from opentelemetry import baggage
        from opentelemetry import context as otel_context
```

Reviews (1): Last reviewed commit: "Merge branch 'main' into add-app-root-sp..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

Context used:

  • Rule used - Move imports to the top of the module instead of p... (source)

Learned From
langfuse/langfuse-python#1387

@hassiebp hassiebp marked this pull request as ready for review May 12, 2026 12:54
@github-actions
Copy link
Copy Markdown

@claude review

@hassiebp hassiebp changed the title feat: mark app root spans in SDK feat(span-processor): mark app root spans in SDK May 12, 2026
@hassiebp hassiebp changed the title feat(span-processor): mark app root spans in SDK feat(span-processor): mark app root spans May 12, 2026
Comment thread langfuse/_client/span_processor.py Outdated
Comment on lines +311 to +317
def _get_readable_span(span: Span) -> ReadableSpan:
readable_span = getattr(span, "_readable_span", None)

if not callable(readable_span):
raise TypeError("Span does not expose a readable span snapshot.")

super().on_end(span)
return cast(ReadableSpan, readable_span())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Private OTel SDK API creates silent feature degradation path

_readable_span is a private method on opentelemetry.sdk.trace.Span. If the OTel SDK ever renames or removes it, getattr returns None, the callable check raises TypeError, which is caught by the try/except in on_start at DEBUG level — and the entire IS_APP_ROOT feature silently stops working with no user-visible error. Since opentelemetry.sdk.trace.Span already implements ReadableSpan (it inherits from it), you can replace the private accessor with a direct cast: cast(ReadableSpan, span). This removes the fragile dependency without changing runtime behavior.

Prompt To Fix With AI
This is a comment left during a code review.
Path: langfuse/_client/span_processor.py
Line: 311-317

Comment:
**Private OTel SDK API creates silent feature degradation path**

`_readable_span` is a private method on `opentelemetry.sdk.trace.Span`. If the OTel SDK ever renames or removes it, `getattr` returns `None`, the callable check raises `TypeError`, which is caught by the `try/except` in `on_start` at DEBUG level — and the entire `IS_APP_ROOT` feature silently stops working with no user-visible error. Since `opentelemetry.sdk.trace.Span` already implements `ReadableSpan` (it inherits from it), you can replace the private accessor with a direct cast: `cast(ReadableSpan, span)`. This removes the fragile dependency without changing runtime behavior.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fffa395845

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1186 to +1191
if otel_span.is_recording():
context_with_app_root_claim = _set_langfuse_trace_id_in_baggage(
trace_id=self._get_otel_trace_id(otel_span),
context=otel_context_api.get_current(),
)
baggage_token = otel_context_api.attach(context_with_app_root_claim)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate trace-id baggage claim on export eligibility

_start_as_current_otel_span_with_processed_media now adds langfuse_trace_id baggage for every recording Langfuse span before any export filtering runs, so downstream services can suppress IS_APP_ROOT even when the upstream parent is later dropped by should_export_span/scope filters. In that setup (custom filtering + distributed propagation), no exported span may end up marked as app root in downstream traces, which defeats the new root-selection behavior. The claim should only be propagated when the originating span is expected to export (or include a way to distinguish non-exporting parents).

Useful? React with 👍 / 👎.

Comment thread langfuse/_client/span_processor.py
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ba553a13db

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +243 to 245
suppressed_by_parent_claim = (
propagated_trace_id == trace_id and parent_state is None
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor upstream root claim when local parent is filtered

The suppression check only applies when parent_state is None, so any non-exported local parent span (for example, an auto-instrumented server span filtered by should_export_span) disables suppression even when the incoming context already carries a matching langfuse_trace_id claim. In that case the downstream Langfuse span is marked langfuse.internal.is_app_root again, creating duplicate app roots across services and undermining the distributed de-duplication this change introduces.

Useful? React with 👍 / 👎.

Comment on lines +1184 to +1191
baggage_token = None

if otel_span.is_recording():
context_with_app_root_claim = _set_langfuse_trace_id_in_baggage(
trace_id=self._get_otel_trace_id(otel_span),
context=otel_context_api.get_current(),
)
baggage_token = otel_context_api.attach(context_with_app_root_claim)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The baggage claim (langfuse_trace_id) is attached in _start_as_current_otel_span_with_processed_media based solely on otel_span.is_recording(), without consulting should_export_span. In a distributed trace where a custom filter drops the parent observation in Service A, Service B still receives the baggage and computes suppressed_by_parent_claim = True in _mark_app_root_candidate — so the downstream root is denied IS_APP_ROOT while the parent never reaches the backend, leaving no span in the trace marked as app root. Gate the baggage attachment on _is_expected_exported_at_start (or equivalent) so non-exporting parents do not propagate a suppression claim.

Extended reasoning...

What the bug is

client.py:1184-1191 attaches the langfuse_trace_id baggage to the current context whenever otel_span.is_recording() returns True. But is_recording() only reflects the sampling decision — it says nothing about whether the user's should_export_span filter will keep the span. The local app-root logic in the span processor uses _is_expected_exported_at_start (which calls should_export_span) to make the export decision, but the cross-process baggage path is set without that gate. This asymmetry breaks the new V4 default-table behavior in a documented, supported scenario: custom should_export_span plus distributed (cross-service) trace propagation.

How it manifests in a distributed trace

Configure Service A with a custom should_export_span that drops a particular span (e.g. parent_obs). Then:

  1. Service Alangfuse.start_as_current_observation(name='parent_obs') creates an SDK Span. is_recording() returns True (sampling decision), even though should_export_span will later return False.
  2. Service A — At client.py:1186, the SDK calls _set_langfuse_trace_id_in_baggage and attaches baggage langfuse_trace_id=T1 to the active context, purely on the strength of is_recording().
  3. Service A — User code makes an outbound HTTP request inside the observation. OTel propagators inject the W3C traceparent (trace_id=T1) AND the baggage header.
  4. Service B — Receives the request. langfuse_tracer.start_span('downstream-root', context=...) starts a span as a remote child of the propagated context. Its trace_id is T1; parent.span_id is the remote parent's id but parent_state is None in Service B's processor (the parent lives in Service A).
  5. Service B — In span_processor._mark_app_root_candidate:
    • format_trace_id(span.context.trace_id) == 'T1'
    • propagated_trace_id == 'T1' (from baggage)
    • parent_state is None (no local record of the remote parent)
    • suppressed_by_parent_claim = (T1 == T1) and (None is None) = True
    • mark_app_root = expected_exported AND not parent_expected_exported AND not suppressed_by_parent_claim = True AND True AND False = False
  6. Service A — At span end, should_export_span(parent_obs) returns False. parent_obs is dropped and never exported.

Net effect: parent_obs is filtered out on Service A, downstream-root is suppressed on Service B by a claim from a parent that will never reach the backend, and no span in the entire distributed trace carries langfuse.internal.is_app_root. The V4 default observation table — which the PR explicitly motivates as the reason for this feature — has no row to surface for this trace.

Why existing code doesn't prevent it

The local case is handled correctly: when Service A starts parent_obs it is recorded in trace_state.spans with expected_exported_at_start=False (because _is_expected_exported_at_start evaluates the filter). So if Service A starts a local child, the processor sees a non-None parent_state whose expected_exported_at_start is False, and correctly marks the child as the app root. The test test_local_baggage_claim_does_not_suppress_child_of_filtered_parent exercises exactly this.

The cross-process baggage path has no equivalent. _set_langfuse_trace_id_in_baggage is called unconditionally for any recording span, so the baggage on the wire effectively says 'a Langfuse parent for trace T1 exists upstream — suppress yourself' even when the upstream parent is filtered out and will never appear in the backend.

Impact

  • Triggered by a documented, supported configuration: any user with a custom should_export_span that drops some observations AND any cross-service HTTP call inside such an observation will hit this.
  • Defeats the feature this PR introduces for exactly the scenario described in the PR motivation ('Default and custom span filtering can remove infrastructure or instrumentation spans before export').
  • Independently flagged by chatgpt-codex-connector at P2 in the PR timeline with the same conclusion and the same suggested fix, which corroborates the analysis.

Fix

Gate the baggage attachment at client.py:1186 on the same predicate the span processor already uses locally: _is_expected_exported_at_start(otel_span). If the span is not expected to export, do not attach the langfuse_trace_id claim — there is no app-root candidate upstream to suppress with. Alternatively, attach the baggage lazily inside the processor once export eligibility has been established (e.g. only when IS_APP_ROOT is actually set on the span). The processor already exposes _is_expected_exported_at_start and uses it for the symmetric local decision, so the fix is mechanical and self-consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant