test: Add token span attribute integration tests#312
Open
Pvnn wants to merge 3 commits into
Open
Conversation
|
|
||
| # -- OpenAI streaming -- | ||
|
|
||
| def test_openai_streaming_accumulation(self, patched_span): |
Collaborator
There was a problem hiding this comment.
Either split into two focused tests (one for chunk accumulation, one for finalize → span attributes), or extract a helper that sets up the StreamingWrapper with patches.
Author
There was a problem hiding this comment.
Extracted _make_streaming_wrapper helper and split into focused accumulation and finalization tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Adds integration-style tests for token-to-span attribute assignment across all instrumented LLM providers, covering the full pipeline:
What's tested
39 tests, all passing.
Findings during implementation
1. Zero token values silently dropped (all providers) Extraction uses
usage.get("prompt_tokens") or usage.get("input_tokens")—0is falsy so it resolves toNonebefore anyis not Noneguard runs. A valid zero completion (e.g. safety-blocked response) leaves no token attributes on the span.2. Pydantic AI casts token values to strings
_safe_set_attributecallsstr(value)unconditionally before writing. Token counts land as"100"instead of100. Downstream tools doing arithmetic on token metrics may need to consider this?Test approach
A
patched_spanfixture creates aMagicMockspan, wiresset_attributeto store intospan.attributes, then callsSpanIOProcessor.on_start(span)to apply the aliasing closure — replicating the exact runtime pipeline with no tracer or OTLP overhead.