fix(mcp): return tool images via langgraph Command (no more UUID side-channel) by Mgczacki · Pull Request #2189 · dimensionalOS/dimos

Mgczacki · 2026-05-20T06:35:47Z

Context

I hit this while developing #2188 (the memory2 agent + Space raster backend) and figured the fix would be worth porting to the MCP client, which had the same problem and was working around it in a less elegant way.

The underlying constraint

Many OpenAI chat-completion models don't accept image content inside a tool result — you can't just stuff an image into a ToolMessage. The canonical workaround is to deliver the image as a HumanMessage after the tool returns. That part isn't avoidable.

What we can avoid is delivering that HumanMessage in a separate, later agent turn — which is what the MCP client was doing.

What was happening before

When an MCP tool returned non-text content (an image), mcp_client.py:

Minted a UUID for the image.
Returned a placeholder ToolMessage to the agent: "Tool call started with UUID: X. You will be updated with the result soon".
Queued the actual image as a fresh HumanMessage via add_message, which got picked up on the next agent turn.

The agent therefore saw a fake "wait for it" tool result, kept reasoning with no real data, and only got the image one turn later. Two extra round-trips per image, and the tool result the agent reasons over isn't the actual content.

What we do now

We're already on langchain's create_agent, which is a thin wrapper over LangGraph. LangGraph lets a tool return a Command(update={...}) instead of a single message — i.e. the tool call can append multiple messages to state as the result of one call. The state's messages reducer (the tool-aware one in agent.py) then re-orders so each image-carrying HumanMessage lands next to its matching ToolMessage, paired by additional_kwargs["tool_call_id"].

Concretely, the MCP tool now returns:

Command(update={
    "messages": [
        ToolMessage(content="<text part or stub>", tool_call_id=...),
        HumanMessage(content=[image_block], additional_kwargs={"tool_call_id": ...}),
    ]
})

Both messages land in the same turn. The HumanMessage-with-image workaround for OpenAI's API constraint is still there — but it's now part of the tool's response, not a fake-out + late delivery. The reducer guarantees ordering even when multiple parallel tool calls return images simultaneously.

Bonus fix: `InjectedToolCallId` for JSON-Schema tools

LangChain auto-injects InjectedToolCallId into tool arguments when args_schema is a Pydantic model. MCP gives us its schema as a plain JSON-Schema dict (the authoritative LLM contract on the MCP side), which langchain doesn't auto-inject for. Added _McpStructuredTool, a tiny StructuredTool subclass that does the injection for the JSON-Schema path. Without this, the tool wouldn't know its own tool_call_id and the reducer pairing breaks.

Future steps

The same Command(update=...) pattern unlocks more than just "tools that return images". Two observations from driving the memory2 agent against this fix:

The agent skips steps. Skills that prescribe verification sub-procedures (e.g. "call frames_facing to refine each candidate position before merging") get partially-followed even when the prose is strict. The current harness has no way to enforce that a step ran before the agent can submit an answer — the LLM is the only authority. Closing this needs explicit task management in the harness: a way to gate the final-answer tool on whether the prerequisite tool calls actually fired.
Long-horizon tasks benefit from subagent delegation. Some questions (e.g. room segmentation across a 5-minute recording, exhaustive frontier exploration) are big enough that a single agent context fills up. The cleanest way to scale is to let the top-level agent dispatch self-contained sub-tasks to subagents — each with its own context window — and integrate their results. That keeps the parent agent's context focused on synthesis instead of bookkeeping.

These two together — harness-level task management + subagent delegation — are most naturally implemented as another module: a long-running, compute-heavier agent specialised for complex tasks, that the existing McpClient-style short-horizon agents can defer to in a true multi-agent setting. This PR is the prerequisite (image-returning tools work cleanly across agents); the multi-agent module is the follow-up.

Test plan

pytest dimos/agents/mcp/test_mcp_client_unit.py — covers the new Command-return path and the multi-image / parallel-tool-call ordering.
Manual: drive an MCP tool that returns an image; verify the agent sees the image in the same turn as the tool call (no "started with UUID" placeholder).

…de-channel The MCP client previously handled non-text tool content (images) by minting a UUID, returning a placeholder ToolMessage ("Tool call started with UUID: X. You will be updated with the result soon"), and queuing the image as a fresh HumanMessage via add_message. The image therefore arrived in a new agent turn rather than as the tool call's actual result. Switch to the langgraph pattern used by examples/memory2_agent/tools.py: the tool returns Command(update={"messages": [ToolMessage, HumanMessage]}) so the image is applied within the same turn. The HumanMessage carries additional_kwargs["tool_call_id"] so the state reducer can pair it with its ToolMessage when multiple parallel tool calls return images at once. Adds _McpStructuredTool, a small StructuredTool subclass that injects InjectedToolCallId for tools whose args_schema is a JSON-Schema dict (MCP's authoritative LLM contract) — langchain only handles this automatically for Pydantic args_schemas. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-05-20T06:45:02Z

❌ 1 Tests Failed:

Tests completed	Failed	Passed	Skipped
2047	1	2046	69

View the top 1 failed test(s) by shortest run time

dimos.agents.mcp.test_mcp_client::test_image

Stack Traces | 24.9s run time

agent_setup = <function agent_setup.<locals>.fn at 0x74bab5ab37e0>

    @pytest.mark.self_hosted
    def test_image(agent_setup):
        history = agent_setup(
            blueprints=[Visualizer.blueprint()],
            messages=[
                HumanMessage(
                    "What do you see? Take a picture using your camera and describe it. "
                    "Please mention one of the words which best match the image: "
                    "'stadium', 'cafe', 'battleship'."
                )
            ],
            system_prompt="You are a helpful assistant that can use a camera to take pictures.",
        )
    
        response = history[-1].content.lower()
>       assert "cafe" in response
E       assert 'cafe' in "i've taken a picture. let me analyze and describe it for you.\nthe image features an expansive outdoor stadium. from the camera's perspective, the word 'stadium' best matches the image. is there anything else you'd like to know or do?"

agent_setup = <function agent_setup.<locals>.fn at 0x74bab5ab37e0>
history    = [HumanMessage(content="What do you see? Take a picture using your camera and describe it. Please mention one of the wo...s={}, response_metadata={}, id='lc_run--019e46c7-caee-7453-9c1c-69e83028cb11-0', tool_calls=[], invalid_tool_calls=[])]
response   = "i've taken a picture. let me analyze and describe it for you.\nthe image features an expansive outdoor stadium. from the camera's perspective, the word 'stadium' best matches the image. is there anything else you'd like to know or do?"

.../agents/mcp/test_mcp_client.py:199: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
_{📋 Got 3 mins? Take this short survey to help us improve Test Analytics.}

greptile-apps · 2026-05-20T06:52:02Z

Greptile Summary

This PR replaces the UUID side-channel workaround for image-returning MCP tools with LangGraph's Command(update={...}) pattern, delivering both the ToolMessage stub and the image-carrying HumanMessage in the same agent turn. It also introduces _McpStructuredTool (overriding public invoke/ainvoke) to inject tool_call_id for JSON-Schema-dict tools, and _fix_parallel_tool_batches to keep parallel ToolMessages contiguous as OpenAI requires.

call_tool now returns Command(update={\"messages\": [ToolMessage, HumanMessage]}) when the MCP response contains image content, eliminating the extra agent turn and the misleading "started with UUID" placeholder.
_McpStructuredTool.invoke/ainvoke injects tool_call_id via public Runnable API (not private _injected_args_keys), addressing the concern raised in a prior review thread.
_fix_parallel_tool_batches reorders interleaved [Tool₁, Human₁, Tool₂, Human₂, …] batches to [Tool₁, Tool₂, …, Human₁, Human₂, …], triggered only when all expected parallel tool responses are present, ensuring OpenAI never sees a non-contiguous parallel tool batch.

Confidence Score: 5/5

Safe to merge — the Command-based image delivery and parallel-batch reordering are both well-reasoned and thoroughly tested.

The rewrite replaces a two-turn UUID side-channel with a single-turn Command that delivers both the ToolMessage stub and the image-carrying HumanMessage atomically. The _fix_parallel_tool_batches reducer only rewrites a parallel batch when all expected ToolMessages are present, making it idempotent and safe when tool results arrive out of order. _McpStructuredTool correctly injects tool_call_id via the public invoke/ainvoke surface rather than private LangChain internals. All previously missing branches (text+image return, bare-dict error, parallel reorder variants) now have explicit test coverage. No incorrect filtering, missing required fields, or broken invariants were found.

No files require special attention.

Important Files Changed

Filename	Overview
dimos/agents/mcp/mcp_client.py	Core rewrite: adds Command-based image delivery, _McpStructuredTool for public-API tool_call_id injection, and _fix_parallel_tool_batches reducer to maintain OpenAI contiguity invariant. Logic is sound across all traced parallel-tool orderings.
dimos/agents/mcp/test_mcp_client_unit.py	Adds thorough test coverage for the new Command return path (image-only, text+image, invoke injection, bare-dict error), parallel batch reordering, and the reducer composition. All previously untested branches are now covered.

Sequence Diagram

sequenceDiagram
    participant User
    participant McpClient
    participant LangGraph
    participant LLM as OpenAI LLM
    participant ToolNode
    participant MCP as MCP Server

    User->>McpClient: human message
    McpClient->>LangGraph: "stream({messages: history})"
    LangGraph->>LLM: history (via _OrderedAgentState)
    LLM-->>LangGraph: "AIMessage(tool_calls=[{id:a},{id:b}])"

    par Parallel tool calls
        LangGraph->>ToolNode: invoke tool A (text-only)
        ToolNode->>MCP: tools/call A
        MCP-->>ToolNode: "{content:[{type:text}]}"
        ToolNode-->>LangGraph: ToolMessage(a, text result)
    and
        LangGraph->>ToolNode: invoke tool B (image)
        ToolNode->>MCP: tools/call B
        MCP-->>ToolNode: "{content:[{type:image_url}]}"
        ToolNode-->>LangGraph: "Command(update={messages:[ToolMessage(b), HumanMessage(image, tool_call_id=b)]})"
    end

    Note over LangGraph: _reorder_tool_responses reducer runs
    Note over LangGraph: [AI, Tool(a), Tool(b), Human(image)] ✓

    LangGraph->>LLM: ordered history with image in same turn
    LLM-->>LangGraph: final AIMessage
    LangGraph-->>McpClient: stream updates
    McpClient-->>User: response

_{Reviews (4): Last reviewed commit: "Adding proposed changes, fixing stash of..." | Re-trigger Greptile}

greptile-apps · 2026-05-20T06:52:05Z

+        self, tool_input: str | dict[str, Any], tool_call_id: str | None
+    ) -> tuple[tuple[Any, ...], dict[str, Any]]:
+        args, kwargs = super()._to_args_and_kwargs(tool_input, tool_call_id)
+        if "tool_call_id" in self._injected_args_keys:


Dependency on private _injected_args_keys API

self._injected_args_keys is a single-underscore property — a LangChain implementation detail not part of the public contract. The GitHub issue #36221 ("Typing issue in StructuredTool._injected_args_keys", March 2026) confirms it's actively being changed. If a LangChain version renames or redefines this property, _McpStructuredTool._to_args_and_kwargs silently skips the injection and every image-returning tool call fails with a TypeError (call_tool() missing required argument tool_call_id). Consider adding a guard-check at construction time (e.g., an assert hasattr(self, "_injected_args_keys")) or documenting the expected minimum langchain-core version.

We can update that if updating the langchain version changes the way the library does it. That's why pinned libraries exist.

It's better to write a future-proof version now than wait for the private API to break and implement the fix then.

_to_args_and_kwargs and _injected_args_keys should not be used.

If the goal is to capture tool_call_id you can do so by defining something like this:

class _McpStructuredTool(StructuredTool): def run( self, tool_input: str | dict[str, Any], *args: Any, **kwargs: Any ) -> Any: tool_input = _inject_tool_call_id(tool_input, kwargs.get("tool_call_id")) return super().run(tool_input, *args, **kwargs) async def arun( self, tool_input: str | dict[str, Any], *args: Any, **kwargs: Any ) -> Any: tool_input = _inject_tool_call_id(tool_input, kwargs.get("tool_call_id")) return await super().arun(tool_input, *args, **kwargs) def _inject_tool_call_id( tool_input: str | dict[str, Any], tool_call_id: str | None ) -> str | dict[str, Any]: if not isinstance(tool_input, dict): return tool_input if tool_call_id is None: raise ValueError( "MCP tool requires a tool_call_id; invoke via a ToolCall, not a bare dict." ) return {**tool_input, "tool_call_id": tool_call_id}

Adds `narrate_picture` to the mock MCP server and a test asserting that when a tool returns both text and an image, the ToolMessage carries the tool's real text (not the "{name} returned N artefact(s)" fallback sentinel) while the image still rides back on the follow-up HumanMessage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

paul-nechifor · 2026-05-20T17:34:52Z

+            # Vision content can't be embedded inside a ToolMessage for OpenAI
+            # (and others), so we use Command to append a follow-up HumanMessage
+            # carrying the image blocks within the same agent turn. Mirrors the
+            # pattern used by examples/memory2_agent/tools.py.


What is memory2_agent? I don't see this in the codebase.

Apologies for the lack of context, it's the agent in this PR: #2188 - will clean up this comment and others that reference it.

…com:mgczacki/dimos into worktree-mcp-client-image-langgraph-command

paul-nechifor · 2026-05-20T17:47:21Z

+            # carrying the image blocks within the same agent turn. Mirrors the
+            # pattern used by examples/memory2_agent/tools.py.
+            #
+            # The HumanMessage is tagged with `additional_kwargs["tool_call_id"]`


Where is additional_kwargs["tool_call_id"] used?

In langchain_core it seems like it's meant to be used for additional fields used by some providers.

It seems like I stashed my reducer that read it just before uploading the PR 😢. Will update shortly when I reapply it and double-check everything. Answering your question, it is read as a condition for the reducer to be able to re-order the messages once it gets them. Human Message with tool_call_id in additional_kwargs will be the indicator to reorder it just after the tool response that matches said tool_call_id.

greptile-apps · 2026-05-20T17:51:45Z

Want your agent to iterate on Greptile's feedback? Try greploops.

Add `dimos.telemetry`, a self-contained module that wraps the agent's per-turn execution in an OTEL span. The base install is unaffected: this package imports no opentelemetry packages at module load, and `dimos.telemetry.span` is a silent no-op until tracing is wired up. Wiring options: * Env-driven (recommended). Install the extra and set `OTEL_EXPORTER_OTLP_ENDPOINT`. `enable()` runs automatically on first import and configures the OTLP HTTP exporter plus, when available, `openinference-instrumentation-langchain` for LangChain auto-spans. * Caller-owned provider: `configure_tracing(my_provider)`. * Standard OTEL idiom: `DimosInstrumentor().instrument(...)`. The class is resolved lazily via module-level `__getattr__` so the heavy `opentelemetry.instrumentation` import only runs on attribute access. Vendor-agnostic via OTLP: Langfuse, Arize Phoenix, LangSmith, and Opik all accept the same pipeline; selection is by env var, not code. Each McpClient instance now generates a UUID at construction and stamps it on every `agent.turn` span via `session_attributes()`, which sets both the OpenInference `session.id` (Langfuse, Phoenix) and `langsmith.trace.session_id` (LangSmith). Backends group all per-turn traces from one instance into a single session in their UI. Opik has no OTEL→Threads mapping yet (comet-ml/opik#3441); use its native SDK there.

Two unrelated dev-only patches kept together so they can be reverted as a single unit before opening any feature PR: - dimos/models/segmentation/edge_tam.py Make the inference device configurable, default to CUDA when available, fall back to CPU with a warning instead of hard-failing. (originally from Jeff Hykin) - dimos/robot/unitree/mujoco_connection.py Use DYLD_FALLBACK_LIBRARY_PATH so we don't shadow Apple's Accelerate with conda's stale libblas, and redirect mjpython stdout/stderr to files instead of subprocess.PIPE to avoid deadlock when the parent doesn't drain stderr. Both unblock running agentic blueprints on Apple Silicon. Revert this commit before opening the agent_graph_understanding feature PR.

This reverts commit e20cfbd.

This reverts commit bc91121.

Mgczacki · 2026-05-20T18:59:48Z

Closing and reopening as a branch within the repo.

Mario Garrido and others added 2 commits May 19, 2026 12:41

[autofix.ci] apply automated fixes

e1941c1

Mgczacki marked this pull request as ready for review May 20, 2026 06:44

greptile-apps Bot reviewed May 20, 2026

View reviewed changes

Mario Garrido and others added 2 commits May 20, 2026 00:58

[autofix.ci] apply automated fixes

2f73db4

paul-nechifor reviewed May 20, 2026

View reviewed changes

Mario Garrido added 2 commits May 20, 2026 10:46

Cleanup of comment

ee4c3e6

Merge branch 'worktree-mcp-client-image-langgraph-command' of github.…

fb122b2

…com:mgczacki/dimos into worktree-mcp-client-image-langgraph-command

paul-nechifor reviewed May 20, 2026

View reviewed changes

Mario Garrido and others added 5 commits May 20, 2026 11:02

Revert "feat(telemetry): opt-in OpenTelemetry tracing for agents"

ea58194

This reverts commit e20cfbd.

Revert "chore(local-dev): macOS workarounds for edgetam + mujoco"

cd530de

This reverts commit bc91121.

Adding proposed changes, fixing stash of changes

68f25e5

Mgczacki mentioned this pull request May 20, 2026

Mcp client image langgraph command #2195

Open

2 tasks

Mgczacki closed this May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mcp): return tool images via langgraph Command (no more UUID side-channel)#2189

fix(mcp): return tool images via langgraph Command (no more UUID side-channel)#2189
Mgczacki wants to merge 11 commits into
dimensionalOS:mainfrom
Mgczacki:worktree-mcp-client-image-langgraph-command

Mgczacki commented May 20, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 20, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 20, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot May 20, 2026

Uh oh!

Mgczacki May 20, 2026

Uh oh!

paul-nechifor May 20, 2026

Uh oh!

Uh oh!

paul-nechifor May 20, 2026

Uh oh!

Mgczacki May 20, 2026

Uh oh!

paul-nechifor May 20, 2026

Uh oh!

Mgczacki May 20, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 20, 2026

Uh oh!

Mgczacki commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Mgczacki commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

The underlying constraint

What was happening before

What we do now

Bonus fix: InjectedToolCallId for JSON-Schema tools

Future steps

Test plan

Uh oh!

codecov Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ 1 Tests Failed:

Uh oh!

greptile-apps Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Mgczacki May 20, 2026

Choose a reason for hiding this comment

Uh oh!

paul-nechifor May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

paul-nechifor May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Mgczacki May 20, 2026

Choose a reason for hiding this comment

Uh oh!

paul-nechifor May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Mgczacki May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented May 20, 2026

Uh oh!

Mgczacki commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mgczacki commented May 20, 2026 •

edited

Loading

Bonus fix: `InjectedToolCallId` for JSON-Schema tools

codecov Bot commented May 20, 2026 •

edited

Loading

greptile-apps Bot commented May 20, 2026 •

edited

Loading

Mgczacki May 20, 2026 •

edited

Loading