feat(span-processor): mark app root spans#1651
Conversation
|
@claude review |
| def _get_readable_span(span: Span) -> ReadableSpan: | ||
| readable_span = getattr(span, "_readable_span", None) | ||
|
|
||
| if not callable(readable_span): | ||
| raise TypeError("Span does not expose a readable span snapshot.") | ||
|
|
||
| super().on_end(span) | ||
| return cast(ReadableSpan, readable_span()) |
There was a problem hiding this comment.
Private OTel SDK API creates silent feature degradation path
_readable_span is a private method on opentelemetry.sdk.trace.Span. If the OTel SDK ever renames or removes it, getattr returns None, the callable check raises TypeError, which is caught by the try/except in on_start at DEBUG level — and the entire IS_APP_ROOT feature silently stops working with no user-visible error. Since opentelemetry.sdk.trace.Span already implements ReadableSpan (it inherits from it), you can replace the private accessor with a direct cast: cast(ReadableSpan, span). This removes the fragile dependency without changing runtime behavior.
Prompt To Fix With AI
This is a comment left during a code review.
Path: langfuse/_client/span_processor.py
Line: 311-317
Comment:
**Private OTel SDK API creates silent feature degradation path**
`_readable_span` is a private method on `opentelemetry.sdk.trace.Span`. If the OTel SDK ever renames or removes it, `getattr` returns `None`, the callable check raises `TypeError`, which is caught by the `try/except` in `on_start` at DEBUG level — and the entire `IS_APP_ROOT` feature silently stops working with no user-visible error. Since `opentelemetry.sdk.trace.Span` already implements `ReadableSpan` (it inherits from it), you can replace the private accessor with a direct cast: `cast(ReadableSpan, span)`. This removes the fragile dependency without changing runtime behavior.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fffa395845
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if otel_span.is_recording(): | ||
| context_with_app_root_claim = _set_langfuse_trace_id_in_baggage( | ||
| trace_id=self._get_otel_trace_id(otel_span), | ||
| context=otel_context_api.get_current(), | ||
| ) | ||
| baggage_token = otel_context_api.attach(context_with_app_root_claim) |
There was a problem hiding this comment.
Gate trace-id baggage claim on export eligibility
_start_as_current_otel_span_with_processed_media now adds langfuse_trace_id baggage for every recording Langfuse span before any export filtering runs, so downstream services can suppress IS_APP_ROOT even when the upstream parent is later dropped by should_export_span/scope filters. In that setup (custom filtering + distributed propagation), no exported span may end up marked as app root in downstream traces, which defeats the new root-selection behavior. The claim should only be propagated when the originating span is expected to export (or include a way to distinguish non-exporting parents).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ba553a13db
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| suppressed_by_parent_claim = ( | ||
| propagated_trace_id == trace_id and parent_state is None | ||
| ) |
There was a problem hiding this comment.
Honor upstream root claim when local parent is filtered
The suppression check only applies when parent_state is None, so any non-exported local parent span (for example, an auto-instrumented server span filtered by should_export_span) disables suppression even when the incoming context already carries a matching langfuse_trace_id claim. In that case the downstream Langfuse span is marked langfuse.internal.is_app_root again, creating duplicate app roots across services and undermining the distributed de-duplication this change introduces.
Useful? React with 👍 / 👎.
| baggage_token = None | ||
|
|
||
| if otel_span.is_recording(): | ||
| context_with_app_root_claim = _set_langfuse_trace_id_in_baggage( | ||
| trace_id=self._get_otel_trace_id(otel_span), | ||
| context=otel_context_api.get_current(), | ||
| ) | ||
| baggage_token = otel_context_api.attach(context_with_app_root_claim) |
There was a problem hiding this comment.
🔴 The baggage claim (langfuse_trace_id) is attached in _start_as_current_otel_span_with_processed_media based solely on otel_span.is_recording(), without consulting should_export_span. In a distributed trace where a custom filter drops the parent observation in Service A, Service B still receives the baggage and computes suppressed_by_parent_claim = True in _mark_app_root_candidate — so the downstream root is denied IS_APP_ROOT while the parent never reaches the backend, leaving no span in the trace marked as app root. Gate the baggage attachment on _is_expected_exported_at_start (or equivalent) so non-exporting parents do not propagate a suppression claim.
Extended reasoning...
What the bug is
client.py:1184-1191 attaches the langfuse_trace_id baggage to the current context whenever otel_span.is_recording() returns True. But is_recording() only reflects the sampling decision — it says nothing about whether the user's should_export_span filter will keep the span. The local app-root logic in the span processor uses _is_expected_exported_at_start (which calls should_export_span) to make the export decision, but the cross-process baggage path is set without that gate. This asymmetry breaks the new V4 default-table behavior in a documented, supported scenario: custom should_export_span plus distributed (cross-service) trace propagation.
How it manifests in a distributed trace
Configure Service A with a custom should_export_span that drops a particular span (e.g. parent_obs). Then:
- Service A —
langfuse.start_as_current_observation(name='parent_obs')creates an SDKSpan.is_recording()returnsTrue(sampling decision), even thoughshould_export_spanwill later returnFalse. - Service A — At
client.py:1186, the SDK calls_set_langfuse_trace_id_in_baggageand attaches baggagelangfuse_trace_id=T1to the active context, purely on the strength ofis_recording(). - Service A — User code makes an outbound HTTP request inside the observation. OTel propagators inject the W3C
traceparent(trace_id=T1) AND the baggage header. - Service B — Receives the request.
langfuse_tracer.start_span('downstream-root', context=...)starts a span as a remote child of the propagated context. Itstrace_idis T1;parent.span_idis the remote parent's id butparent_stateisNonein Service B's processor (the parent lives in Service A). - Service B — In
span_processor._mark_app_root_candidate:format_trace_id(span.context.trace_id) == 'T1'propagated_trace_id == 'T1'(from baggage)parent_state is None(no local record of the remote parent)suppressed_by_parent_claim = (T1 == T1) and (None is None) = Truemark_app_root = expected_exported AND not parent_expected_exported AND not suppressed_by_parent_claim = True AND True AND False = False
- Service A — At span end,
should_export_span(parent_obs)returnsFalse.parent_obsis dropped and never exported.
Net effect: parent_obs is filtered out on Service A, downstream-root is suppressed on Service B by a claim from a parent that will never reach the backend, and no span in the entire distributed trace carries langfuse.internal.is_app_root. The V4 default observation table — which the PR explicitly motivates as the reason for this feature — has no row to surface for this trace.
Why existing code doesn't prevent it
The local case is handled correctly: when Service A starts parent_obs it is recorded in trace_state.spans with expected_exported_at_start=False (because _is_expected_exported_at_start evaluates the filter). So if Service A starts a local child, the processor sees a non-None parent_state whose expected_exported_at_start is False, and correctly marks the child as the app root. The test test_local_baggage_claim_does_not_suppress_child_of_filtered_parent exercises exactly this.
The cross-process baggage path has no equivalent. _set_langfuse_trace_id_in_baggage is called unconditionally for any recording span, so the baggage on the wire effectively says 'a Langfuse parent for trace T1 exists upstream — suppress yourself' even when the upstream parent is filtered out and will never appear in the backend.
Impact
- Triggered by a documented, supported configuration: any user with a custom
should_export_spanthat drops some observations AND any cross-service HTTP call inside such an observation will hit this. - Defeats the feature this PR introduces for exactly the scenario described in the PR motivation ('Default and custom span filtering can remove infrastructure or instrumentation spans before export').
- Independently flagged by
chatgpt-codex-connectorat P2 in the PR timeline with the same conclusion and the same suggested fix, which corroborates the analysis.
Fix
Gate the baggage attachment at client.py:1186 on the same predicate the span processor already uses locally: _is_expected_exported_at_start(otel_span). If the span is not expected to export, do not attach the langfuse_trace_id claim — there is no app-root candidate upstream to suppress with. Alternatively, attach the baggage lazily inside the processor once export eligibility has been established (e.g. only when IS_APP_ROOT is actually set on the span). The processor already exposes _is_expected_exported_at_start and uses it for the symmetric local decision, so the fix is mechanical and self-consistent.
What
Adds SDK-side app-root marking for exported spans so Langfuse V4 can default observation tables to the top-most exported observations without relying on the OpenTelemetry root span being exported.
Why
Default and custom span filtering can remove infrastructure or instrumentation spans before export. When that happens, the backend may never receive the original OTEL root span, so filtering the UI by
parent_observation_id IS NULLcan miss the useful root row. The SDK now marks the best-effort app root on the span before export.Changes
langfuse.internal.is_app_rootas the internal app-root marker attribute.LangfuseSpanProcessor.langfuse_trace_idbaggage for best-effort distributed suppression after active Langfuse spans start.langfuse_trace_idbaggage out of propagated user metadata.Limitations
should_export_spanchanges its decision by span end, the SDK does not repair already-started descendants.Verification
uv run --frozen ruff check langfuse/_client/span_processor.py langfuse/_client/client.py langfuse/_client/propagation.py langfuse/_client/attributes.py tests/unit/test_app_root_detection.py tests/unit/test_propagate_attributes.pyuv run --frozen ruff format --check langfuse/_client/span_processor.py langfuse/_client/client.py langfuse/_client/propagation.py langfuse/_client/attributes.py tests/unit/test_app_root_detection.py tests/unit/test_propagate_attributes.pyuv run --frozen mypy langfuse --no-error-summaryuv run --frozen pytest tests/unit/test_app_root_detection.py tests/unit/test_propagate_attributes.py tests/unit/test_otel.py -qGreptile Summary
This PR adds best-effort "app root" span detection to the Langfuse SDK so that Langfuse V4 can identify the topmost exported observation per trace even when OTEL infrastructure spans are filtered out before export.
LangfuseSpanProcessornow tracks per-trace, per-span export-expectation state in a thread-safe dict, marking a span aslangfuse.internal.is_app_rootaton_starttime when its immediate parent is not expected to export. Cleanup runs inon_end'sfinallyblock to ensure the state is always released.client.pyinjects alangfuse_trace_idbaggage entry into the child context after eachstart_as_current_observationspan starts, enabling distributed suppression for downstream services while avoiding self-suppression._get_propagated_attributes_from_contextto prevent it from surfacing as user metadata.Confidence Score: 3/5
The feature is logically sound and the threading model is correct, but _get_readable_span relies on a private OTel SDK method that, if ever removed or renamed, will silently disable app-root marking across the entire SDK with only a DEBUG-level log.
The core detection and cleanup logic is well-designed and the new test coverage is thorough. The risk is concentrated in _get_readable_span, which bridges from the mutable Span seen at on_start to the ReadableSpan interface expected by _should_export_span. Using span._readable_span() achieves this, but if the OTel SDK drops the private method the feature degrades invisibly with no warning and no fallback.
langfuse/_client/span_processor.py, specifically the _get_readable_span static method around line 311.
Sequence Diagram
sequenceDiagram participant C as Client (start_as_current_observation) participant SP as LangfuseSpanProcessor participant BS as BatchSpanProcessor C->>SP: on_start(span, parent_context) SP->>SP: _mark_app_root_candidate(span, parent_context) note over SP: Reads langfuse_trace_id from parent_context baggage SP->>BS: super().on_start(span, parent_context) SP-->>C: returns C->>C: attach baggage token (langfuse_trace_id) C->>C: yield Langfuse span wrapper C->>C: finally: detach baggage token C->>SP: on_end(span) SP->>SP: project/scope/filter checks alt should export SP->>BS: super().on_end(span) end SP->>SP: finally: _cleanup_app_root_state(span)Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "Merge branch 'main' into add-app-root-sp..." | Re-trigger Greptile
Context used:
Learned From
langfuse/langfuse-python#1387