Skip to content

[bot] HuggingFace streaming chat completion drops tool calls during aggregation #1846

@braintrust-bot

Description

@braintrust-bot

Summary

The HuggingFace plugin's streaming chat completion aggregation (aggregateChatCompletionChunks) only collects delta.content text and delta.role. Streamed delta.tool_calls chunks are silently dropped, producing output spans with no tool call information. Non-streaming chat completions are unaffected since extractOutput returns the raw choices array which includes tool_calls.

The HuggingFace Inference API explicitly supports tool calling with streaming in OpenAI-compatible format, and this is documented as a key capability for building AI agents.

What instrumentation is missing

1. Streaming tool call deltas not aggregated

In js/src/instrumentation/plugins/huggingface-plugin.ts, the aggregateChatCompletionChunks function (lines 385–440) processes streamed chunks but only handles text content:

for (const chunk of chunks) {
  for (const choice of chunk.choices ?? []) {
    // ...
    if (typeof delta?.content === "string") {
      existing.content += delta.content;   // text ✓
    }
    if (typeof delta?.role === "string") {
      existing.role = delta.role;           // role ✓
    }
    // delta.tool_calls → not handled, silently dropped ✗
  }
}

When the model returns a tool call during streaming, the delta.tool_calls array in each chunk is ignored. The aggregated output contains only { message: { content, role }, finish_reason } — no tool calls.

By contrast, every other provider plugin in this repo that supports streaming tool calls properly aggregates them:

  • OpenAI: aggregateChatCompletionChunks tracks toolCallsByIndex and merges incremental tool call deltas
  • Anthropic: handles tool_use content blocks with input_json_delta aggregation
  • Cohere: has tool-call-start and tool-call-delta event handlers
  • Mistral: uses OpenAI-compatible format with tool call aggregation via shared utilities

2. tools and tool_choice not in request metadata allowlist

The REQUEST_METADATA_ALLOWLIST (lines 21–33) does not include tools or tool_choice. When users pass tool definitions to the HuggingFace chat API, this configuration is excluded from span metadata. Other provider plugins (OpenAI, Cohere, Mistral) capture these parameters.

Braintrust docs status

not_found — There is no HuggingFace-specific integration page on braintrust.dev.

Upstream references

  • HuggingFace Function Calling guide: https://huggingface.co/docs/inference-providers/guides/function-calling
  • HuggingFace Chat Completion with streaming tool calls: documented with stream: True + chunk.choices[0].delta.tool_calls
  • The streaming response uses OpenAI-compatible format: delta.tool_calls[].function.name and delta.tool_calls[].function.arguments (incremental)
  • Supports tool_choice: "auto" | "required" | { type: "function", function: { name } } and strict mode
  • This is a stable, documented feature supported across multiple inference providers

Local files inspected

  • js/src/instrumentation/plugins/huggingface-plugin.ts (lines 385–440: aggregateChatCompletionChunks; lines 21–33: REQUEST_METADATA_ALLOWLIST)
  • js/src/instrumentation/plugins/huggingface-channels.ts
  • js/src/vendor-sdk-types/huggingface.ts
  • e2e/scenarios/huggingface-instrumentation/ (no tool call test scenarios)

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions