fix(qwen-agent): improve token and nested agent tracing by sipercai · Pull Request #220 · alibaba/loongsuite-python

sipercai · 2026-06-17T06:58:30Z

Summary

Improve Qwen-Agent instrumentation so DashScope-backed chat/model calls and agent runs keep token usage and cleaner agent telemetry.

This extends the original DeepSeek/Qwen-Agent token usage fix with the related Qwen-Agent optimizations from the downstream branch: robust usage extraction across Qwen-Agent metadata shapes, streaming usage preservation, nested agent span support, final-answer-only agent output, and agent-level LLM token rollup.

Changes

Extract token usage from multiple Qwen-Agent/DashScope response shapes, including usage, extra.usage, extra.model_service_info, and top-level model_service_info.
Support dict-like, SDK-object, JSON-string, and namespace-style metadata, including input_tokens/prompt_tokens, output_tokens/completion_tokens, cache-read tokens, and cache-creation tokens.
Apply usage while streaming chunks are consumed and keep the most complete cumulative usage if later chunks omit usage metadata.
Preserve nested invoke_agent spans by replacing the global agent-run boolean guard with a same-instance reentrancy stack.
Roll child LLM token usage onto active agent spans. Parent agent spans intentionally represent the total nested LLM cost of that agent run; global cost aggregation should use LLM spans or trace-level de-duplication rather than summing agent spans.
Record only the final assistant answer as the invoke_agent output, instead of storing intermediate tool calls and tool results as the final agent response.
Add focused coverage for token metadata variants, streaming usage preservation, final agent output filtering, nested agent spans, and agent-level token rollup.
Add a changelog entry under Unreleased > Fixed.

Validation Evidence

Check	Status	Evidence
Approved spec / waiver	pass	User requested direct implementation and PR update for Qwen-Agent DeepSeek token usage plus downstream Qwen-Agent optimizations.
Changed surface	pass	Limited to `loongsuite-instrumentation-qwen-agent` source, tests, and changelog.
Rebase	pass	`git fetch origin --prune && git rebase origin/main` -> current branch is up to date.
Static readiness	pass	`check_loongsuite_pr_readiness.py --repo .`
Syntax / whitespace	pass	`python3 -m py_compile .../utils.py .../patch.py .../test_spans.py`; `git diff --check`
Focused tests	pass	`pytest -q instrumentation-loongsuite/loongsuite-instrumentation-qwen-agent/tests/test_spans.py` (`24 passed`)
Latest package matrix	pass	`tox -c tox-loongsuite.ini -e py312-test-loongsuite-instrumentation-qwen-agent-latest` (`44 passed`)
Oldest package matrix	pass	`tox -c tox-loongsuite.ini -e py312-test-loongsuite-instrumentation-qwen-agent-oldest` (`44 passed`)
Lint	pass	`tox -c tox-loongsuite.ini -e lint-loongsuite-instrumentation-qwen-agent`
Precommit	pass	`tox -e precommit`
Privacy scan	pass	Diff regex scan found no credentials, bearer tokens, local user paths, or API-key literals.
Claude review loop	pass	`/tmp/codex-claude-review/loongsuite-python-agent-fcd27d9b4e/run-20260617-153346`; r1 `findings=0`, r2 process finding resolved by amending the full implementation into HEAD, r3 `findings=0`.
GitHub CI	pending	Updated head will trigger CI after force-push. Existing PR checks were still queued before this update.

Real E2E Matrix

Scenario	Status	Evidence
Non-streaming model call	blocked	Live `deepseek-v3` non-streaming smoke in the temp qwen-agent environment hit framework assertion `use_raw_api only support full stream`; non-streaming token extraction is covered by focused tests and latest/oldest tox.
Streaming model call	pass	Live `deepseek-v3` streaming smoke produced chat spans with `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, and `gen_ai.usage.total_tokens`.
Concurrency	pass	Two concurrent live `deepseek-v3` streaming calls produced isolated chat spans; every chat span had token usage.
Agent / tool / ReAct	pass	Qwen-Agent real/VCR-backed tests cover basic agent run, stream LLM TTFT, non-stream chat, agent non-stream run, multi-turn, ReAct, and tool-call flows in latest and oldest tox matrices.
Tool-heavy	pass	Existing tool-call, ReAct, and span hierarchy tests passed; this change does not alter tool dispatch or schema generation.
Error path	pass	Existing chat, agent, and tool error-path span tests passed; provider error responses do not yield successful usage metadata.

Telemetry Contract

Contract	Status	Evidence
Span kind and usage attributes	pass	Focused tests and live streaming smoke verified LLM chat spans contain usage attributes before span finalization.
Parent-child tree	pass	Span hierarchy tests verify agent -> chat/tool nesting; new nested-agent test verifies child `invoke_agent` span parentage.
Agent output content	pass	`SPAN_ONLY` test verifies `invoke_agent` output records only the final assistant answer, not intermediate tool calls/results.
Content capture modes	pass	`SPAN_ONLY` final-output coverage and live `NO_CONTENT` smoke both kept token attributes independent from content capture.
Concurrency isolation	pass	Live concurrent streaming smoke produced separate chat traces and token usage on every chat span.
Weaver live-check	pass	Weaver live-check on live exported Qwen-Agent spans reported no blocking violations; only non-blocking stability / enum informational findings.

Notes

gen_ai.usage.total_tokens remains computed by the existing GenAI span finalization layer from input and output tokens. This PR sets the source input/output/cache fields and does not duplicate total-token computation.

ralf0131

Review by github-manager-bot

Summary

Three improvements to loongsuite-instrumentation-qwen-agent: (1) record token usage from DashScope response metadata on streaming and non-streaming chat spans, (2) roll up child LLM token usage to parent invoke_agent spans and preserve nested agent spans, (3) record only the final agent answer as output.

Findings

[Info] patch.py — The reentrancy guard is correctly upgraded from a boolean _in_agent_run to an instance-ID-based stack (_agent_run_instance_stack). This fixes the nested-agent suppression bug while still preventing Proxy/Wrapper super-call duplication. The _active_agent_invocations ContextVar tuple-based stack for token rollup is well-designed.
[Info] patch.py — Token usage rollup via _accumulate_llm_usage_on_active_agents is correctly transitive (parent agent accumulates nested LLM costs), and the current_score guard in _apply_usage_to_llm_invocation handles cumulative streaming chunks correctly.
[Warning] utils.py:218-222 — In _convert_qwen_agent_final_output_messages, the fallback return _convert_qwen_messages_to_output_messages(messages) at the end of the function will include all messages (including tool-call messages) if no assistant message with text content is found. This could include intermediate tool-call/function messages in the output. Consider falling back to the last message instead of all messages, or logging a debug warning when no final answer is found.

Tests

Excellent test coverage: streaming/non-streaming token usage, cumulative usage retention across chunks, nested agent span creation with parent-child relationship verification, final-output-only message assertion, and agent token rollup verification. The tests are well-structured with realistic mock objects.

Overall, a solid and well-tested improvement. The one minor warning about the fallback path is non-blocking.

Automated review by github-manager-bot

ralf0131

Review by github-manager-bot

Summary

Re-verification after new commits. Substantially improves Qwen-Agent/DashScope token accounting and agent telemetry: usage extraction across many metadata shapes (with streaming "keep most complete" semantics), nested agent spans via a same-instance reentrancy stack, final-answer-only agent output, and agent-level LLM token rollup. Clean APPROVE.

Findings

[Info] patch.py (reentrancy) — Replacing the global _in_agent_run boolean with an _agent_run_instance_stack of id(instance) is the correct fix: nested runs on different instances are now preserved while same-instance super() calls are still deduplicated. The paired ContextVar tuple with proper .reset() in finally avoids leaks.
[Info] utils.py (_extract_usage_values / _apply_usage_to_llm_invocation) — Robust recursive extraction across dict/SDK-object/JSON-string shapes. The _usage_score guard correctly keeps the most complete cumulative usage when later streaming chunks omit usage metadata, so partial updates can't regress already-captured totals.
[Info] _convert_qwen_agent_final_output_messages — Walking reversed(messages) and returning the first assistant text (skipping tool/function messages) cleanly isolates the final answer; empty-case returns []. Good.

Cross-repo Note

util/opentelemetry-util-genai is untouched; changes are isolated to loongsuite-instrumentation-qwen-agent, so downstream instrumentation plugins are not affected.

Excellent test coverage (token variants, streaming preservation, final-output filtering, nested spans, agent rollup) plus thorough validation evidence. LGTM.

Automated review by github-manager-bot

sipercai force-pushed the liuyu/fix-qwen-agent-token-usage branch from 7fdc966 to a304e11 Compare June 17, 2026 07:48

sipercai changed the title ~~fix(qwen-agent): record chat token usage~~ fix(qwen-agent): improve token and nested agent tracing Jun 17, 2026

github-actions Bot assigned 123liuziming, Cirilla-zmh and ralf0131 Jun 17, 2026

github-actions Bot requested review from 123liuziming, Cirilla-zmh and ralf0131 June 17, 2026 08:10

ralf0131 approved these changes Jun 17, 2026

View reviewed changes

fix(qwen-agent): improve token and nested agent tracing

5088b41

sipercai force-pushed the liuyu/fix-qwen-agent-token-usage branch from a304e11 to 5088b41 Compare June 18, 2026 02:34

ralf0131 approved these changes Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(qwen-agent): improve token and nested agent tracing#220

fix(qwen-agent): improve token and nested agent tracing#220
sipercai wants to merge 1 commit into
alibaba:mainfrom
sipercai:liuyu/fix-qwen-agent-token-usage

sipercai commented Jun 17, 2026 •

edited

Loading

Uh oh!

ralf0131 left a comment

Uh oh!

ralf0131 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sipercai commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Validation Evidence

Real E2E Matrix

Telemetry Contract

Notes

Uh oh!

ralf0131 left a comment

Choose a reason for hiding this comment

Review by github-manager-bot

Summary

Findings

Tests

Uh oh!

ralf0131 left a comment

Choose a reason for hiding this comment

Review by github-manager-bot

Summary

Findings

Cross-repo Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sipercai commented Jun 17, 2026 •

edited

Loading