fix(telemetry): prevent spurious OTel context detach errors when async generators are cancelled#4919
fix(telemetry): prevent spurious OTel context detach errors when async generators are cancelled#4919mmphego wants to merge 8 commits intogoogle:mainfrom
Conversation
…llation When an async generator using start_as_current_span() is closed from a different asyncio.Task (e.g. asyncio's asyncgen finalizer scheduled via call_soon in the event-loop base context), ContextVar.reset(token) raises ValueError because the token is bound to the task context that created it, not the base context where cleanup runs. otel_context.detach() catches this ValueError internally and logs it at ERROR level — producing noisy "Failed to detach context" errors in any service that cancels an in-flight agent run (e.g. a guardrail agent that blocks a request mid-generation). Fix: replace start_as_current_span() context managers in async generator bodies with manual span + context management via tracer.start_span() and a new _safe_detach() helper that calls _RUNTIME_CONTEXT.detach() directly, catching ValueError silently rather than letting OTel log a spurious ERROR. Span data (attributes, timing, relationships) is fully preserved. Only the contextvars state restoration is skipped for generators that are being discarded — an acceptable trade-off. Affected locations: - tracing.py: _use_native_generate_content_span_stable_semconv - tracing.py: _use_native_generate_content_span (experimental semconv path) - base_agent.py: run_async and run_live invoke_agent spans Fixes google#4894
…r cancellation Covers the fix introduced for google#4894: - TestSafeDetach: unit tests for _safe_detach() — valid token detach, cross-context token absorbed silently (no ERROR log), DEBUG message emitted - TestGenerateContentSpanCrossContextClose: verifies both stable and experimental semconv span functions survive close from a different asyncio.Task with no ERROR logs and span data preserved in exports - TestBaseAgentSpanCrossContextClose: verifies run_async invoke_agent span survives cross-context generator close and is correctly exported
|
Hi @mmphego , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Can you please fix the formatting errors |
- Update test_spans.py mocks from start_as_current_span to start_span
- Patch start_span in test_functional.py span_exporter fixture so
generate_content and invoke_agent spans export correctly
- Fix import ordering in base_agent.py (opentelemetry before relative imports)
- Reformat test_otel_context_detach.py with pyink/isort
|
@rohityan please run the workflows |
|
Bump!! |
|
Hi @mmphego Thank you for your contribution. We're aware of this issue, and unfortunately this is a core limitation of how ContextVars & Python Async Generators interplay with each other. The solution is to wrap all async generators in I fear replacing the OTel error message with a custom |
Summary
Fixes #4894
Eliminates the noisy
ERROR [opentelemetry.context] Failed to detach contextlog that appears whenever an in-flight agent run is interrupted mid-generation (e.g. a guardrail that blocks a request after the LLM has already started streaming).Root Cause
tracer.start_as_current_span()returns a context manager that stores acontextvars.Tokenon enter and callsContextVar.reset(token)on exit. ATokencan only be reset in the samecontextvars.Contextobject that produced it.When an async generator using
start_as_current_span()is closed from a differentasyncio.Task— which happens when asyncio's asyncgen finalizer hook schedulesaclose()viacall_soonin the event-loop's base context —reset(token)raisesValueError.otel_context.detach()catches thisValueErrorinternally and logs it at ERROR level before returning, so the error is invisible to callers but still appears in logs.Affected code paths
tracing.py_use_native_generate_content_span_stable_semconvgenerate_content {model}tracing.py_use_native_generate_content_span(experimental semconv)generate_content {model}base_agent.pyrun_asyncinvoke_agent {name}base_agent.pyrun_liveinvoke_agent {name}Fix
Replace
with tracer.start_as_current_span(...)in async generator bodies with manual span management:_safe_detach()calls_RUNTIME_CONTEXT.detach()directly (bypassing OTel's logging wrapper) and silently absorbsValueErrorwhen cleanup runs in a different context.Span data is fully preserved (attributes, timing, parent relationships). Only the
contextvarsstate restoration is skipped for generators that are being discarded — an acceptable trade-off.Reproducer
Testing
Reproducer and fix verification tests added in the linked issue. Key test cases:
test_should_reproduce_error_when_generator_closed_from_different_task— confirms the bug exists with the rawstart_as_current_spanpatterntest_should_not_error_with_adk_safe_span_pattern— confirms the_safe_detachpattern eliminates the error even under cross-context closeChecklist
run_asyncandrun_livepaths fixed