Skip to content

fix: capture claude agent sdk session ids#222

Draft
sipercai wants to merge 1 commit into
mainfrom
fix/claude-agent-sdk-session-id
Draft

fix: capture claude agent sdk session ids#222
sipercai wants to merge 1 commit into
mainfrom
fix/claude-agent-sdk-session-id

Conversation

@sipercai

Copy link
Copy Markdown
Collaborator

Description

This PR captures Claude Agent SDK session IDs on agent, LLM, and tool spans so traces from resumed or client-managed SDK sessions can be correlated by gen_ai.session.id.

The change propagates session IDs from SDK init/system messages, stream events, ClaudeSDKClient.query(..., session_id=...), standalone query(..., options.resume=...), and result-message fallbacks. When an upstream Entry span has already propagated gen_ai.session.id through OpenTelemetry Baggage, that Entry session is used ahead of the Claude SDK's internal session so downstream spans keep the request-level LoongSuite identity. It also preserves the caller's active OpenTelemetry context instead of clearing it with an empty context, matching the Robin fix for broken parent-child trace linkage, and keeps per-stream tool state local so concurrent streams that reuse tool IDs do not cross-contaminate session or trace state.

Fixes # (N/A)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • python $PIPELINE_SKILL_DIR/scripts/check_loongsuite_pr_readiness.py --repo .
  • tox -e precommit
  • OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_session_capture.py -q
  • OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests -q -m "not requires_cli"
  • ANTHROPIC_BASE_URL=https://dashscope.aliyuncs.com/apps/anthropic ANTHROPIC_API_KEY=<configured> OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_attributes.py::test_span_attributes_semantic_conventions -q -s
  • ANTHROPIC_BASE_URL=https://dashscope.aliyuncs.com/apps/anthropic ANTHROPIC_API_KEY=<configured> OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_attributes.py::test_span_attributes_no_sensitive_data -q
  • Bounded live Claude Agent SDK smoke with a real provider-compatible endpoint and one Read tool call.
  • Bounded live Claude Agent SDK concurrency smoke with two simultaneous queries.
  • Weaver JSON sample live-check with weaver registry live-check -r <loongsuite-semantic-conventions>/model --advice-profile loongsuite-genai --input-format json.

Validation Evidence

Spec and Scope

  • Linked issue/spec: No GitHub issue; customer-reported Claude Agent SDK session-capture gap.
  • Approved spec/comment: Direct bug-fix request and PR submission approval in the implementation thread.
  • Changed surface: loongsuite-instrumentation-claude-agent-sdk runtime patch, focused session tests, and plugin changelog.

Local Checks

Check Command Result Notes
Static readiness python $PIPELINE_SKILL_DIR/scripts/check_loongsuite_pr_readiness.py --repo . pass LoongSuite pipeline static readiness checker passed.
Review matrix planning python $PIPELINE_SKILL_DIR/scripts/plan_review_matrix.py --repo . --format markdown pass Matrix identified GenAI agent/session telemetry coverage requirements.
Precommit tox -e precommit pass Repository formatting and lint hooks passed.
Focused session tests OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_session_capture.py -q pass 13 session propagation tests passed.
Parent context preservation OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_session_capture.py::test_wrap_query_preserves_active_parent_context -q pass Regression coverage for the Robin empty-context broken-link fix.
Plugin test suite OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests -q -m "not requires_cli" pass 69 passed, 9 deselected.
Live CLI smoke ANTHROPIC_BASE_URL=https://dashscope.aliyuncs.com/apps/anthropic ANTHROPIC_API_KEY=<configured> OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_attributes.py::test_span_attributes_semantic_conventions -q -s pass 1 live provider-compatible CLI test passed.
No-content live CLI smoke ANTHROPIC_BASE_URL=https://dashscope.aliyuncs.com/apps/anthropic ANTHROPIC_API_KEY=<configured> OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests/test_attributes.py::test_span_attributes_no_sensitive_data -q pass 1 live provider-compatible NO_CONTENT test passed.
Focused tox env tox -c tox-loongsuite.ini -e py312-test-loongsuite-instrumentation-claude-agent-sdk-latest -- -q -m "not requires_cli" blocked Attempted twice; both runs stalled in upstream OpenTelemetry git dependency installation before test execution. Focused pytest and live CLI smoke passed.
Claude review codex-claude-review-loop blocked Two earlier review rounds completed; first P2 test-coverage finding was fixed, second round had zero remaining findings. After the final Entry-baggage priority change, a third review attempt produced only session heartbeat output and no review content, so this PR is opened as draft with local test evidence.
Privacy scan Secret/local-path scan over changed Claude Agent SDK tests pass No credentials or private paths introduced; only existing mock env placeholder remained outside this PR diff.
Diff whitespace git diff --check pass No whitespace errors.

Real E2E Matrix

Scenario Status Command or Demo Evidence
non-streaming pass Live test_span_attributes_semantic_conventions consumed a bounded SDK query to completion. AGENT/LLM spans were produced with semantic attributes.
streaming pass Focused StreamEvent/session tests plus local telemetry smoke. Stream-event session IDs populate agent and LLM spans before result finalization.
concurrency pass Bounded live concurrency smoke with two simultaneous SDK queries. 2 agent roots, 2 trace IDs, and 2 session IDs; no cross-trace contamination.
agent/tool/ReAct pass Bounded live SDK smoke using one Read tool call. Produced AGENT, LLM, and TOOL spans with one session.
tool-heavy pass Focused mock stream and local telemetry smoke with repeated/multiple tool calls. Tool spans inherit session ID and per-stream tool state remains isolated.
error path pass python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-claude-agent-sdk/tests -q -m "not requires_cli" Existing edge-case/error tests passed in the plugin suite.

Telemetry and Weaver

Check Status Command or Artifact Notes
Span tree / span kinds pass Local telemetry smoke plus trace validator Verified AGENT, LLM, and TOOL span kinds with required gen_ai.session.id.
Entry baggage identity priority pass Focused session propagation test When Entry baggage has gen_ai.session.id, AGENT, LLM, and TOOL spans use it instead of Claude's internal session id.
Parent-child trace linkage pass Focused parent context preservation regression test Agent spans keep an active caller span as parent when one exists, instead of starting after an empty context reset.
Content capture modes pass SPAN_ONLY focused tests and live NO_CONTENT test SPAN_ONLY captures expected span content; NO_CONTENT live test did not leak sensitive prompt text.
Concurrency isolation pass Focused parallel stream tests and bounded live concurrency smoke Same tool ID reused across streams stayed trace/session isolated.
Weaver live-check pass weaver registry live-check -r <loongsuite-semantic-conventions>/model --input-source <generated JSON sample> --input-format json --advice-profile loongsuite-genai --skip-policies true Generated sample contained AGENT, LLM, and TOOL spans; Weaver returned zero violations.

CI

  • GitHub checks: Not run yet; this branch has not been pushed before this PR.
  • Known unrelated failures: None known.
  • Follow-up needed: Watch CI after PR creation. The local focused tox env was blocked by upstream dependency installation and should be retried if CI reproduces a dependency-resolution issue. Re-run Claude review when the review CLI produces normal output again.

Does This PR Require a Core Repo Change?

  • Yes. - Link to PR:
  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants