Skip to content

[bot] Cohere streaming drops thinking content blocks from Command A Reasoning models #1845

@braintrust-bot

Description

@braintrust-bot

Summary

The Cohere v2 Chat API returns thinking content blocks when using reasoning-capable models (e.g. command-a-reasoning-08-2025), but the Cohere plugin's streaming aggregation silently drops all thinking content. The content-delta handler only extracts the text field; thinking deltas fall through and are lost. No reasoning token metrics are captured either. This is a direct parity gap with Anthropic, Google GenAI, and OpenAI reasoning instrumentation in this repo.

What instrumentation is missing

1. Streaming: thinking content silently dropped

In js/src/instrumentation/plugins/cohere-plugin.ts, the extractV8DeltaText function (lines 537–552) only extracts the text field from content-delta events:

function extractV8DeltaText(chunk: CohereChatStreamEvent): string | undefined {
  // ...
  if (isObject(content) && typeof content.text === "string") {
    return content.text;  // only text extracted
  }
  return undefined;  // thinking field → silently dropped
}

When a content-delta event carries { thinking: "Let me analyze..." } instead of { text: "..." }, the function returns undefined and the thinking content is lost. The aggregated output (lines 725–735) joins only text deltas and tool calls — no thinking content is included.

2. Streaming: content-start/content-end and tool-plan-delta events not handled

The Cohere v2 streaming API emits content-start events with type: "thinking" or type: "text" to mark content block boundaries, and tool-plan-delta events for reasoning about tool use. The plugin handles neither — only message-start, content-delta, tool-call-start, tool-call-delta, message-end, text-generation, tool-calls-generation, and stream-end are processed.

3. Request metadata: thinking parameter not captured

The CHAT_REQUEST_METADATA_ALLOWLIST (lines 95–129) does not include thinking or thinkingTokenBudget / thinking_token_budget. When users enable reasoning via thinking: { type: "enabled", token_budget: 1000 }, this configuration is excluded from span metadata.

4. Metrics: no reasoning token metrics

mergeUsageMetrics (lines 384–462) captures prompt_tokens, completion_tokens, tokens, prompt_cached_tokens, and billed unit metrics, but does not extract any reasoning/thinking token metrics from the usage response.

Comparison with other providers in this repo

Provider Thinking content captured Reasoning token metrics Request config captured
Anthropic thinking_delta aggregated in streaming ✅ N/A (included in output tokens) ✅ N/A
Google GenAI thoughtsTokenCount handled completion_reasoning_tokens metric ✅ N/A
OpenAI ✅ Reasoning tokens tracked completion_reasoning_tokens metric reasoning_effort in metadata
Cohere ❌ Silently dropped ❌ Not captured thinking not in allowlist

Braintrust docs status

not_found — There is no Cohere-specific integration page on braintrust.dev. The wrap-providers docs page does not mention Cohere.

Upstream references

Local files inspected

  • js/src/instrumentation/plugins/cohere-plugin.ts (lines 537–552: extractV8DeltaText only extracts text; lines 644–650: content-delta handler; lines 384–462: mergeUsageMetrics; lines 95–129: CHAT_REQUEST_METADATA_ALLOWLIST)
  • js/src/instrumentation/plugins/cohere-channels.ts
  • js/src/auto-instrumentations/configs/cohere.ts
  • js/src/vendor-sdk-types/cohere.ts
  • e2e/scenarios/cohere-instrumentation/ (no reasoning test scenarios)

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions