Python: fix: buffer out-of-order tool results in _sanitize_tool_history by ranst91 · Pull Request #4946 · microsoft/agent-framework

ranst91 · 2026-03-27T10:06:16Z

Problem

When CopilotKit sends conversation history to the agent, tool results sometimes arrive before their corresponding assistant message due to how CopilotKit merges MESSAGES_SNAPSHOT events with locally-tracked messages. For
example:

[tool result for pieChart] ← arrives first, pending=None
[assistant: called pieChart] ← arrives after

_sanitize_tool_history was silently dropping any tool result that arrived when no pending call IDs were tracked (i.e. before its assistant message). This left pieChart (or any other frontend/declaration-only tool) unresolved
in subsequent turns, causing OpenAI to reject the request with:

▎ No tool output found for function call call_xxx

Fix

Instead of dropping orphaned tool results, buffer them by call_id. When the matching assistant message is later seen, re-inject the buffered results immediately after it — in the correct position for the Responses API to
resolve them.

Why this matters

This surfaces on any multi-turn conversation after using a generative UI component (pieChart, barChart, etc.) or any other frontend tool whose result is synthesized by the client rather than the agent. The second user message
after such a turn would always fail.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

ranst91 · 2026-03-27T10:08:56Z

@microsoft-github-policy-service agree [company="CopilotKit"]

Copilot

Pull request overview

This PR updates the AG-UI message normalization pipeline to handle out-of-order tool results (tool function_result messages arriving before the corresponding assistant function_call message), preventing unresolved tool-call histories that can cause provider validation failures.

Changes:

Buffer “orphaned” tool results (by call_id) when they arrive before an assistant tool-call message.
When the matching assistant message is later processed, re-inject the buffered tool results immediately after it.

Copilot · 2026-03-27T10:12:27Z

python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py

+            for call_id in list(tool_ids):
+                if call_id in orphaned_tool_results:
+                    sanitized.append(orphaned_tool_results.pop(call_id))
+                    if pending_tool_call_ids:
+                        pending_tool_call_ids.discard(call_id)
+


orphaned_tool_results stores the entire tool Message per call_id, and the reinjection loop appends a buffered message once per matching call_id. If a single tool message contains multiple function_result contents (this happens in core where a tool message can carry a list of results), this can re-insert the same message multiple times and/or re-introduce unrelated function_result entries, which can violate provider validation. Consider buffering/splitting at the function_result content level (e.g., call_id -> Content) and re-emitting tool messages that include only the results matching the current assistant’s tool_ids, ensuring each buffered message/content is appended at most once.

Suggested change

for call_id in list(tool_ids):

if call_id in orphaned_tool_results:

sanitized.append(orphaned_tool_results.pop(call_id))

if pending_tool_call_ids:

pending_tool_call_ids.discard(call_id)

if tool_ids:

# Group buffered tool messages by underlying Message object so that:

# - Each original tool message is re-emitted at most once.

# - We can filter contents to only the function_result entries matching

# the current assistant message's tool_ids.

grouped_by_message: dict[int, dict[str, Any]] = {}

for call_id in list(tool_ids):

msg_for_call = orphaned_tool_results.get(call_id)

if not msg_for_call:

continue

msg_key = id(msg_for_call)

group = grouped_by_message.setdefault(

msg_key, {"message": msg_for_call, "call_ids": set()}

)

group["call_ids"].add(call_id)

for group in grouped_by_message.values():

msg_for_group: Message = cast(Message, group["message"])

call_ids_for_msg: set[str] = cast(set[str], group["call_ids"])

# Only keep function_result contents whose call_id matches one of the

# tool_ids for this assistant message. This avoids re-emitting unrelated

# function_result entries that belong to other tool calls.

filtered_contents = [

c

for c in (msg_for_group.contents or [])

if getattr(c, "type", None) == "function_result"

and getattr(c, "call_id", None) is not None

and str(c.call_id) in call_ids_for_msg

]

if filtered_contents:

sanitized.append(

Message(role=msg_for_group.role, contents=filtered_contents)

)

# Mark these call_ids as consumed from both orphaned_tool_results and

# pending_tool_call_ids so they are not processed again.

for consumed_call_id in call_ids_for_msg:

orphaned_tool_results.pop(consumed_call_id, None)

if pending_tool_call_ids:

pending_tool_call_ids.discard(consumed_call_id)

python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py

ranst91 · 2026-03-27T10:15:48Z

@microsoft-github-policy-service agree company="CopilotKit"

… batched tool messages

moonbox3 · 2026-04-10T14:21:02Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/ag-ui/agent_framework_ag_ui
_message_adapters.py	609	55	90%	121–122, 131–134, 137–141, 143–148, 151, 160–166, 212, 343, 433–436, 438–440, 487–489, 543, 546, 548, 551, 554, 570, 587, 609, 709, 725–726, 797, 819, 889, 924–925, 993, 1036
TOTAL	27212	3184	88%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
5441	20 💤	0 ❌	0 🔥	1m 29s ⏱️

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py:205

The current role == "tool" handling only buffers out-of-order results when no pending call_ids are tracked. If a tool message arrives while pending_tool_call_ids is non-empty but the tool message is batched (contains results for both pending and not-yet-seen calls), keep=True will append the entire message—including function_results whose call_id is not in pending_tool_call_ids. That can reintroduce the same OpenAI validation failure (tool output without a matching tool call in the immediately preceding assistant message). Suggestion: filter msg.contents to only the matching call_ids, buffer the non-matching function_results into orphaned_tool_results, and append a synthetic/filtered tool message (or drop if nothing matches). Adding a regression test for the mixed batched case (e.g., [assistant c1], [tool c1+c2], [assistant c2]) would lock this down.

        if role_value == "tool":
            if not pending_tool_call_ids:
                # Tool result arrived before its assistant message — buffer each Content
                # individually so re-injection can reconstruct a filtered Message containing
                # only the results that belong to a given assistant turn.
                for content in msg.contents or []:
                    if content.type == "function_result" and content.call_id:
                        orphaned_tool_results[str(content.call_id)] = content
                continue
            keep = False
            for content in msg.contents or []:
                if content.type == "function_result" and content.call_id:
                    call_id = str(content.call_id)
                    if call_id in pending_tool_call_ids:

Copilot · 2026-04-10T14:25:15Z

python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py

+    # Buffer individual function_result Contents keyed by call_id for tool messages that
+    # arrive before their assistant message (out-of-order history). Buffering at the Content
+    # level (not the Message level) prevents a multi-result tool message from being
+    # re-injected multiple times or leaking unrelated results into the wrong assistant turn.
+    orphaned_tool_results: dict[str, Any] = {}


orphaned_tool_results is declared as dict[str, Any] but it only stores Content instances (function_result contents). Tightening this to dict[str, Content] (and matched_contents: list[Content]) will improve type safety and avoid leaking Any into Message(contents=...) in this typed package.

Copilot AI review requested due to automatic review settings March 27, 2026 10:06

markwallace-microsoft added the python label Mar 27, 2026

github-actions bot changed the title ~~fix: buffer out-of-order tool results in _sanitize_tool_history~~ Python: fix: buffer out-of-order tool results in _sanitize_tool_history Mar 27, 2026

Copilot started reviewing on behalf of ranst91 March 27, 2026 10:07 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

friskyraptorbruh approved these changes Apr 4, 2026

View reviewed changes

eavanvalkenburg requested review from Copilot and moonbox3 April 10, 2026 14:17

ranst91 added 2 commits April 10, 2026 16:18

fix: buffer out-of-order tool results in _sanitize_tool_history

8895108

fix: buffer at Content level to prevent duplicate/leaked results from…

988515e

… batched tool messages

eavanvalkenburg force-pushed the fix/buffer-out-of-order-tool-results branch from 937338d to 988515e Compare April 10, 2026 14:18

Copilot started reviewing on behalf of eavanvalkenburg April 10, 2026 14:19 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: fix: buffer out-of-order tool results in _sanitize_tool_history#4946

Python: fix: buffer out-of-order tool results in _sanitize_tool_history#4946
ranst91 wants to merge 2 commits intomicrosoft:mainfrom
ranst91:fix/buffer-out-of-order-tool-results

ranst91 commented Mar 27, 2026

Uh oh!

ranst91 commented Mar 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Uh oh!

ranst91 commented Mar 27, 2026

Uh oh!

moonbox3 commented Apr 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

-            for call_id in list(tool_ids):
-                if call_id in orphaned_tool_results:
-                    sanitized.append(orphaned_tool_results.pop(call_id))
-                    if pending_tool_call_ids:
-                        pending_tool_call_ids.discard(call_id)
+            if tool_ids:
+                # Group buffered tool messages by underlying Message object so that:
+                # - Each original tool message is re-emitted at most once.
+                # - We can filter contents to only the function_result entries matching
+                #   the current assistant message's tool_ids.
+                grouped_by_message: dict[int, dict[str, Any]] = {}
+                for call_id in list(tool_ids):
+                    msg_for_call = orphaned_tool_results.get(call_id)
+                    if not msg_for_call:
+                        continue
+                    msg_key = id(msg_for_call)
+                    group = grouped_by_message.setdefault(
+                        msg_key, {"message": msg_for_call, "call_ids": set()}
+                    )
+                    group["call_ids"].add(call_id)
+                for group in grouped_by_message.values():
+                    msg_for_group: Message = cast(Message, group["message"])
+                    call_ids_for_msg: set[str] = cast(set[str], group["call_ids"])
+                    # Only keep function_result contents whose call_id matches one of the
+                    # tool_ids for this assistant message. This avoids re-emitting unrelated
+                    # function_result entries that belong to other tool calls.
+                    filtered_contents = [
+                        c
+                        for c in (msg_for_group.contents or [])
+                        if getattr(c, "type", None) == "function_result"
+                        and getattr(c, "call_id", None) is not None
+                        and str(c.call_id) in call_ids_for_msg
+                    ]
+                    if filtered_contents:
+                        sanitized.append(
+                            Message(role=msg_for_group.role, contents=filtered_contents)
+                        )
+                    # Mark these call_ids as consumed from both orphaned_tool_results and
+                    # pending_tool_call_ids so they are not processed again.
+                    for consumed_call_id in call_ids_for_msg:
+                        orphaned_tool_results.pop(consumed_call_id, None)
+                        if pending_tool_call_ids:
+                            pending_tool_call_ids.discard(consumed_call_id)

Conversation

ranst91 commented Mar 27, 2026

Contribution Checklist

Uh oh!

ranst91 commented Mar 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ranst91 commented Mar 27, 2026

Uh oh!

moonbox3 commented Apr 10, 2026

Python Unit Test Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants