Skip to content

Python: fix: buffer out-of-order tool results in _sanitize_tool_history#4946

Open
ranst91 wants to merge 2 commits intomicrosoft:mainfrom
ranst91:fix/buffer-out-of-order-tool-results
Open

Python: fix: buffer out-of-order tool results in _sanitize_tool_history#4946
ranst91 wants to merge 2 commits intomicrosoft:mainfrom
ranst91:fix/buffer-out-of-order-tool-results

Conversation

@ranst91
Copy link
Copy Markdown

@ranst91 ranst91 commented Mar 27, 2026

Problem

When CopilotKit sends conversation history to the agent, tool results sometimes arrive before their corresponding assistant message due to how CopilotKit merges MESSAGES_SNAPSHOT events with locally-tracked messages. For
example:

[tool result for pieChart] ← arrives first, pending=None
[assistant: called pieChart] ← arrives after

_sanitize_tool_history was silently dropping any tool result that arrived when no pending call IDs were tracked (i.e. before its assistant message). This left pieChart (or any other frontend/declaration-only tool) unresolved
in subsequent turns, causing OpenAI to reject the request with:

▎ No tool output found for function call call_xxx

Fix

Instead of dropping orphaned tool results, buffer them by call_id. When the matching assistant message is later seen, re-inject the buffered results immediately after it — in the correct position for the Responses API to
resolve them.

Why this matters

This surfaces on any multi-turn conversation after using a generative UI component (pieChart, barChart, etc.) or any other frontend tool whose result is synthesized by the client rather than the agent. The second user message
after such a turn would always fail.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Copilot AI review requested due to automatic review settings March 27, 2026 10:06
@github-actions github-actions bot changed the title fix: buffer out-of-order tool results in _sanitize_tool_history Python: fix: buffer out-of-order tool results in _sanitize_tool_history Mar 27, 2026
@ranst91
Copy link
Copy Markdown
Author

ranst91 commented Mar 27, 2026

@microsoft-github-policy-service agree [company="CopilotKit"]

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the AG-UI message normalization pipeline to handle out-of-order tool results (tool function_result messages arriving before the corresponding assistant function_call message), preventing unresolved tool-call histories that can cause provider validation failures.

Changes:

  • Buffer “orphaned” tool results (by call_id) when they arrive before an assistant tool-call message.
  • When the matching assistant message is later processed, re-inject the buffered tool results immediately after it.

Comment on lines +86 to +91
for call_id in list(tool_ids):
if call_id in orphaned_tool_results:
sanitized.append(orphaned_tool_results.pop(call_id))
if pending_tool_call_ids:
pending_tool_call_ids.discard(call_id)

Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orphaned_tool_results stores the entire tool Message per call_id, and the reinjection loop appends a buffered message once per matching call_id. If a single tool message contains multiple function_result contents (this happens in core where a tool message can carry a list of results), this can re-insert the same message multiple times and/or re-introduce unrelated function_result entries, which can violate provider validation. Consider buffering/splitting at the function_result content level (e.g., call_id -> Content) and re-emitting tool messages that include only the results matching the current assistant’s tool_ids, ensuring each buffered message/content is appended at most once.

Suggested change
for call_id in list(tool_ids):
if call_id in orphaned_tool_results:
sanitized.append(orphaned_tool_results.pop(call_id))
if pending_tool_call_ids:
pending_tool_call_ids.discard(call_id)
if tool_ids:
# Group buffered tool messages by underlying Message object so that:
# - Each original tool message is re-emitted at most once.
# - We can filter contents to only the function_result entries matching
# the current assistant message's tool_ids.
grouped_by_message: dict[int, dict[str, Any]] = {}
for call_id in list(tool_ids):
msg_for_call = orphaned_tool_results.get(call_id)
if not msg_for_call:
continue
msg_key = id(msg_for_call)
group = grouped_by_message.setdefault(
msg_key, {"message": msg_for_call, "call_ids": set()}
)
group["call_ids"].add(call_id)
for group in grouped_by_message.values():
msg_for_group: Message = cast(Message, group["message"])
call_ids_for_msg: set[str] = cast(set[str], group["call_ids"])
# Only keep function_result contents whose call_id matches one of the
# tool_ids for this assistant message. This avoids re-emitting unrelated
# function_result entries that belong to other tool calls.
filtered_contents = [
c
for c in (msg_for_group.contents or [])
if getattr(c, "type", None) == "function_result"
and getattr(c, "call_id", None) is not None
and str(c.call_id) in call_ids_for_msg
]
if filtered_contents:
sanitized.append(
Message(role=msg_for_group.role, contents=filtered_contents)
)
# Mark these call_ids as consumed from both orphaned_tool_results and
# pending_tool_call_ids so they are not processed again.
for consumed_call_id in call_ids_for_msg:
orphaned_tool_results.pop(consumed_call_id, None)
if pending_tool_call_ids:
pending_tool_call_ids.discard(consumed_call_id)

Copilot uses AI. Check for mistakes.
@ranst91
Copy link
Copy Markdown
Author

ranst91 commented Mar 27, 2026

@microsoft-github-policy-service agree company="CopilotKit"

@moonbox3
Copy link
Copy Markdown
Contributor

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/ag-ui/agent_framework_ag_ui
   _message_adapters.py6095590%121–122, 131–134, 137–141, 143–148, 151, 160–166, 212, 343, 433–436, 438–440, 487–489, 543, 546, 548, 551, 554, 570, 587, 609, 709, 725–726, 797, 819, 889, 924–925, 993, 1036
TOTAL27212318488% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
5441 20 💤 0 ❌ 0 🔥 1m 29s ⏱️

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py:205

  • The current role == "tool" handling only buffers out-of-order results when no pending call_ids are tracked. If a tool message arrives while pending_tool_call_ids is non-empty but the tool message is batched (contains results for both pending and not-yet-seen calls), keep=True will append the entire message—including function_results whose call_id is not in pending_tool_call_ids. That can reintroduce the same OpenAI validation failure (tool output without a matching tool call in the immediately preceding assistant message). Suggestion: filter msg.contents to only the matching call_ids, buffer the non-matching function_results into orphaned_tool_results, and append a synthetic/filtered tool message (or drop if nothing matches). Adding a regression test for the mixed batched case (e.g., [assistant c1], [tool c1+c2], [assistant c2]) would lock this down.
        if role_value == "tool":
            if not pending_tool_call_ids:
                # Tool result arrived before its assistant message — buffer each Content
                # individually so re-injection can reconstruct a filtered Message containing
                # only the results that belong to a given assistant turn.
                for content in msg.contents or []:
                    if content.type == "function_result" and content.call_id:
                        orphaned_tool_results[str(content.call_id)] = content
                continue
            keep = False
            for content in msg.contents or []:
                if content.type == "function_result" and content.call_id:
                    call_id = str(content.call_id)
                    if call_id in pending_tool_call_ids:

Comment on lines +33 to +37
# Buffer individual function_result Contents keyed by call_id for tool messages that
# arrive before their assistant message (out-of-order history). Buffering at the Content
# level (not the Message level) prevents a multi-result tool message from being
# re-injected multiple times or leaking unrelated results into the wrong assistant turn.
orphaned_tool_results: dict[str, Any] = {}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orphaned_tool_results is declared as dict[str, Any] but it only stores Content instances (function_result contents). Tightening this to dict[str, Content] (and matched_contents: list[Content]) will improve type safety and avoid leaking Any into Message(contents=...) in this typed package.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants