Mcp client image langgraph command by Mgczacki · Pull Request #2195 · dimensionalOS/dimos

Mgczacki · 2026-05-20T18:59:29Z

Context

Continuing #2189 within the repo so CI/CD can run.

I hit this while developing #2188 (the memory2 agent + Space raster backend) and figured the fix would be worth porting to the MCP client, which had the same problem and was working around it in a less elegant way.

The underlying constraint

Many OpenAI chat-completion models don't accept image content inside a tool result — you can't just stuff an image into a ToolMessage. The canonical workaround is to deliver the image as a HumanMessage after the tool returns. That part isn't avoidable.

What we can avoid is delivering that HumanMessage in a separate, later agent turn — which is what the MCP client was doing.

What was happening before

When an MCP tool returned non-text content (an image), mcp_client.py:

Minted a UUID for the image.
Returned a placeholder ToolMessage to the agent: "Tool call started with UUID: X. You will be updated with the result soon".
Queued the actual image as a fresh HumanMessage via add_message, which got picked up on the next agent turn.

The agent therefore saw a fake "wait for it" tool result, kept reasoning with no real data, and only got the image one turn later. Two extra round-trips per image, and the tool result the agent reasons over isn't the actual content.

What we do now

We're already on langchain's create_agent, which is a thin wrapper over LangGraph. LangGraph lets a tool return a Command(update={...}) instead of a single message — i.e. the tool call can append multiple messages to state as the result of one call. The state's messages reducer (the tool-aware one in agent.py) then re-orders so each image-carrying HumanMessage lands next to its matching ToolMessage, paired by additional_kwargs["tool_call_id"].

Concretely, the MCP tool now returns:

Command(update={
    "messages": [
        ToolMessage(content="<text part or stub>", tool_call_id=...),
        HumanMessage(content=[image_block], additional_kwargs={"tool_call_id": ...}),
    ]
})

Both messages land in the same turn. The HumanMessage-with-image workaround for OpenAI's API constraint is still there — but it's now part of the tool's response, not a fake-out + late delivery. The reducer guarantees ordering even when multiple parallel tool calls return images simultaneously.

Bonus fix: `InjectedToolCallId` for JSON-Schema tools

LangChain auto-injects InjectedToolCallId into tool arguments when args_schema is a Pydantic model. MCP gives us its schema as a plain JSON-Schema dict (the authoritative LLM contract on the MCP side), which langchain doesn't auto-inject for. Added _McpStructuredTool, a tiny StructuredTool subclass that does the injection for the JSON-Schema path. Without this, the tool wouldn't know its own tool_call_id and the reducer pairing breaks.

Future steps

The same Command(update=...) pattern unlocks more than just "tools that return images". Two observations from driving the memory2 agent against this fix:

The agent skips steps. Skills that prescribe verification sub-procedures (e.g. "call frames_facing to refine each candidate position before merging") get partially-followed even when the prose is strict. The current harness has no way to enforce that a step ran before the agent can submit an answer — the LLM is the only authority. Closing this needs explicit task management in the harness: a way to gate the final-answer tool on whether the prerequisite tool calls actually fired.
Long-horizon tasks benefit from subagent delegation. Some questions (e.g. room segmentation across a 5-minute recording, exhaustive frontier exploration) are big enough that a single agent context fills up. The cleanest way to scale is to let the top-level agent dispatch self-contained sub-tasks to subagents — each with its own context window — and integrate their results. That keeps the parent agent's context focused on synthesis instead of bookkeeping.

These two together — harness-level task management + subagent delegation — are most naturally implemented as another module: a long-running, compute-heavier agent specialised for complex tasks, that the existing McpClient-style short-horizon agents can defer to in a true multi-agent setting. This PR is the prerequisite (image-returning tools work cleanly across agents); the multi-agent module is the follow-up.

Test plan

pytest dimos/agents/mcp/test_mcp_client_unit.py — covers the new Command-return path and the multi-image / parallel-tool-call ordering.
Manual: drive an MCP tool that returns an image; verify the agent sees the image in the same turn as the tool call (no "started with UUID" placeholder).

…de-channel The MCP client previously handled non-text tool content (images) by minting a UUID, returning a placeholder ToolMessage ("Tool call started with UUID: X. You will be updated with the result soon"), and queuing the image as a fresh HumanMessage via add_message. The image therefore arrived in a new agent turn rather than as the tool call's actual result. Switch to the langgraph pattern used by examples/memory2_agent/tools.py: the tool returns Command(update={"messages": [ToolMessage, HumanMessage]}) so the image is applied within the same turn. The HumanMessage carries additional_kwargs["tool_call_id"] so the state reducer can pair it with its ToolMessage when multiple parallel tool calls return images at once. Adds _McpStructuredTool, a small StructuredTool subclass that injects InjectedToolCallId for tools whose args_schema is a JSON-Schema dict (MCP's authoritative LLM contract) — langchain only handles this automatically for Pydantic args_schemas. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds `narrate_picture` to the mock MCP server and a test asserting that when a tool returns both text and an image, the ToolMessage carries the tool's real text (not the "{name} returned N artefact(s)" fallback sentinel) while the image still rides back on the follow-up HumanMessage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…com:mgczacki/dimos into worktree-mcp-client-image-langgraph-command

Add `dimos.telemetry`, a self-contained module that wraps the agent's per-turn execution in an OTEL span. The base install is unaffected: this package imports no opentelemetry packages at module load, and `dimos.telemetry.span` is a silent no-op until tracing is wired up. Wiring options: * Env-driven (recommended). Install the extra and set `OTEL_EXPORTER_OTLP_ENDPOINT`. `enable()` runs automatically on first import and configures the OTLP HTTP exporter plus, when available, `openinference-instrumentation-langchain` for LangChain auto-spans. * Caller-owned provider: `configure_tracing(my_provider)`. * Standard OTEL idiom: `DimosInstrumentor().instrument(...)`. The class is resolved lazily via module-level `__getattr__` so the heavy `opentelemetry.instrumentation` import only runs on attribute access. Vendor-agnostic via OTLP: Langfuse, Arize Phoenix, LangSmith, and Opik all accept the same pipeline; selection is by env var, not code. Each McpClient instance now generates a UUID at construction and stamps it on every `agent.turn` span via `session_attributes()`, which sets both the OpenInference `session.id` (Langfuse, Phoenix) and `langsmith.trace.session_id` (LangSmith). Backends group all per-turn traces from one instance into a single session in their UI. Opik has no OTEL→Threads mapping yet (comet-ml/opik#3441); use its native SDK there.

Two unrelated dev-only patches kept together so they can be reverted as a single unit before opening any feature PR: - dimos/models/segmentation/edge_tam.py Make the inference device configurable, default to CUDA when available, fall back to CPU with a warning instead of hard-failing. (originally from Jeff Hykin) - dimos/robot/unitree/mujoco_connection.py Use DYLD_FALLBACK_LIBRARY_PATH so we don't shadow Apple's Accelerate with conda's stale libblas, and redirect mjpython stdout/stderr to files instead of subprocess.PIPE to avoid deadlock when the parent doesn't drain stderr. Both unblock running agentic blueprints on Apple Silicon. Revert this commit before opening the agent_graph_understanding feature PR.

This reverts commit e20cfbd.

This reverts commit bc91121.

Mgczacki · 2026-05-20T19:00:32Z

@paul-nechifor I removed the private-API dep . Took your direction but used invoke/ainvoke (documented to accept ToolCall) and pulled the id from input["id"] rather than reading tool_call_id from run() kwargs.

greptile-apps · 2026-05-20T19:09:08Z

Greptile Summary

This PR replaces the old UUID-placeholder workaround for image-returning MCP tools with a clean Command(update={...})-based approach that delivers the ToolMessage and image-carrying HumanMessage in the same agent turn, eliminating the two-round-trip overhead. It also introduces _McpStructuredTool to bridge LangChain's InjectedToolCallId auto-injection (only available for Pydantic schemas) to MCP's JSON-Schema tools, and adds _fix_parallel_tool_batches + _reorder_tool_responses to keep parallel ToolMessages contiguous per OpenAI's API requirement.

Command-based image delivery: call_tool now returns a Command with a paired [ToolMessage, HumanMessage] when non-text content is present, co-landing both in one agent turn; self._history is explicitly re-sorted via _fix_parallel_tool_batches after streaming so the pre-reducer node outputs don't corrupt the history fed back into stream() on the next turn.
_McpStructuredTool: Overrides invoke/ainvoke to inject tool_call_id from the ToolCall envelope into args before delegating, bridging the Pydantic-only auto-injection gap for JSON-Schema tools.
Test coverage: New unit tests cover the Command return path, multi-image/parallel-tool ordering, the ainvoke async path, and the _process_message history normalization, directly addressing previously identified gaps.

Confidence Score: 5/5

Safe to merge. The rewrite correctly co-delivers images in one agent turn, history normalisation after streaming is sound, and the previously identified ordering problem in _history is now explicitly fixed.

The central logic — Command-based image delivery, _fix_parallel_tool_batches, and _McpStructuredTool — is correct and well-tested across sync, async, single-tool, and parallel-tool scenarios. The explicit _fix_parallel_tool_batches call at the end of _process_message directly closes the gap where stream_mode='updates' raw node output bypassed the channel reducer, so _history is correctly ordered before it is fed back into the next stream() call. No regressions are introduced, and the fixture update aligns with the removed two-turn placeholder flow.

No files require special attention.

Important Files Changed

Filename	Overview
dimos/agents/mcp/mcp_client.py	Core logic rewrite: replaces UUID-placeholder side-channel with Command-based same-turn image delivery; adds _fix_parallel_tool_batches, _reorder_tool_responses, _OrderedAgentState, and _McpStructuredTool; applies history normalisation after each stream call. Logic is sound and well-commented.
dimos/agents/mcp/test_mcp_client_unit.py	Comprehensive new tests cover Command return, text+image branching, ToolCall invoke/ainvoke injection, bare-dict rejection, and _process_message history normalisation post-streaming; mock fixtures extended accordingly.
dimos/agents/mcp/fixtures/test_image.json	Fixture updated: removes the now-unnecessary intermediate AI response that mirrored the old two-turn placeholder flow; only the final answer turn remains.

Sequence Diagram

sequenceDiagram
    participant LG as LangGraph ToolNode
    participant MT as _McpStructuredTool.invoke
    participant CT as call_tool closure
    participant MCP as MCP Server
    participant State as Graph State (_OrderedAgentState)

    LG->>MT: "invoke(ToolCall{id, args, type})"
    MT->>MT: "_inject_tool_call_id(input)<br/>args[tool_call_id] = id"
    MT->>CT: super().invoke(modified_input)
    CT->>MCP: tools/call(name, kwargs)
    MCP-->>CT: "{content: [{type, ...}]}"
    alt text only
        CT-->>LG: plain str
        LG->>State: "ToolMessage(content=str, tool_call_id=id)"
    else has image blocks
        CT-->>LG: "Command(update={messages:[ToolMessage, HumanMessage]})"
        LG->>State: "ToolMessage(content=summary, tool_call_id=id) + HumanMessage([text,image], additional_kwargs={tool_call_id})"
    end
    State->>State: _reorder_tool_responses reducer
    Note over State: Parallel ToolMessages contiguous, Image HumanMessages follow

_{Reviews (2): Last reviewed commit: "[autofix.ci] apply automated fixes" | Re-trigger Greptile}

codecov · 2026-05-20T19:29:22Z

Codecov Report

❌ Patch coverage is 97.71429% with 4 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
dimos/agents/mcp/mcp_client.py	96.49%	1 Missing and 1 partial ⚠️
dimos/agents/mcp/test_mcp_client_unit.py	98.30%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Mgczacki · 2026-05-20T21:46:23Z

This fixes #2054

Mario Garrido and others added 11 commits May 19, 2026 12:41

[autofix.ci] apply automated fixes

e1941c1

[autofix.ci] apply automated fixes

2f73db4

Cleanup of comment

ee4c3e6

Merge branch 'worktree-mcp-client-image-langgraph-command' of github.…

fb122b2

…com:mgczacki/dimos into worktree-mcp-client-image-langgraph-command

Revert "feat(telemetry): opt-in OpenTelemetry tracing for agents"

ea58194

This reverts commit e20cfbd.

Revert "chore(local-dev): macOS workarounds for edgetam + mujoco"

cd530de

This reverts commit bc91121.

Adding proposed changes, fixing stash of changes

68f25e5

Mgczacki requested a review from paul-nechifor May 20, 2026 18:59

Mgczacki added the agents label May 20, 2026

greptile-apps Bot reviewed May 20, 2026

View reviewed changes

Comment thread dimos/agents/mcp/mcp_client.py

Mario Garrido and others added 2 commits May 20, 2026 12:18

Improvements, fix failing test

0ef82f5

[autofix.ci] apply automated fixes

4987b1a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mcp client image langgraph command#2195

Mcp client image langgraph command#2195
Mgczacki wants to merge 13 commits into
mainfrom
mcp-client-image-langgraph-command

Mgczacki commented May 20, 2026

Uh oh!

Mgczacki commented May 20, 2026

Uh oh!

greptile-apps Bot commented May 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

codecov Bot commented May 20, 2026

Uh oh!

Mgczacki commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mgczacki commented May 20, 2026

Context

The underlying constraint

What was happening before

What we do now

Bonus fix: InjectedToolCallId for JSON-Schema tools

Future steps

Test plan

Uh oh!

Mgczacki commented May 20, 2026

Uh oh!

greptile-apps Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

codecov Bot commented May 20, 2026

Codecov Report

Uh oh!

Mgczacki commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bonus fix: `InjectedToolCallId` for JSON-Schema tools

greptile-apps Bot commented May 20, 2026 •

edited

Loading