Skip to content

fix: categorize tool transport errors as SYSTEM with actionable messages#967

Open
ionut-mihalache-uipath wants to merge 1 commit into
mainfrom
fix/categorize-tool-transport-errors
Open

fix: categorize tool transport errors as SYSTEM with actionable messages#967
ionut-mihalache-uipath wants to merge 1 commit into
mainfrom
fix/categorize-tool-transport-errors

Conversation

@ionut-mihalache-uipath

Copy link
Copy Markdown
Contributor

Why

Two production failure modes currently surface as AGENT_RUNTIME.UNEXPECTED_ERROR / category Unknown, making them useless for alert triage and confusing for agent builders:

  1. Integration tool timeouts — an httpx.ReadTimeout from invoke_activity_async propagates raw to map_runtime, which only special-cases EnrichedException/HTTPStatusError. Worse, str(httpx.ReadTimeout) is "", so the tracked event shows an empty Error Details: section (observed on a SupplierDueDiligence run, AgentRunId dc567786-6243-4bce-b50f-27b4fee4a107).
  2. MCP protocol errors — an McpError("Session terminated") (synthesized by our streamable HTTP transport when the server 404s the session, after McpClient's session-reinit retry is exhausted) propagates raw the same way.

What

  • integration_tool.py: catch httpx.TimeoutException in integration_tool_fn and raise AgentRuntimeError(HTTP_ERROR, SYSTEM) with a retry hint naming the tool. Only timeouts are caught; other transport errors propagate as before.
  • mcp_tool.py: catch McpError in tool_fn and raise AgentRuntimeError(HTTP_ERROR, SYSTEM) via a new _map_mcp_error helper. Session error codes (32600/-32000) get a "connection to MCP server '' was terminated and could not be re-established… retry the run later" message; other codes surface the server's own error message. MCP tool execution failures come back as CallToolResult.isError results, not exceptions, so a raised McpError is always protocol/session/transport-level — SYSTEM is the right category.
  • mcp_client.py: expose server_slug as a public property so the error message can name the server without reaching into _config.

Both raises use should_wrap=False so the actionable message isn't buried under the generic "An unexpected error occurred…" prefix, and chain the original exception (from e) so tracebacks keep the root cause.

Behavior note: for conversational agents, tool exceptions are converted to error ToolMessages by wrap_tools_with_error_handling, so this only changes run-failure classification on the non-conversational path (the LLM just sees a friendlier message on the conversational one).

Testing

  • New test_timeout_raises_agent_runtime_error_with_system_category in test_integration_tool.py
  • New TestMcpToolErrorHandling class in test_mcp_tool.py (session error, non-session error, non-McpError passthrough)
  • uv run pytest tests/agent/tools/test_integration_tool.py tests/agent/tools/test_mcp/ — all pass
  • ruff check, ruff format --check, mypy clean on touched files

🤖 Generated with Claude Code

https://claude.ai/code/session_01N4bvjRA16sKYhsW3mHHpiZ

Integration tool calls that hit an httpx timeout and MCP tool calls that
fail with a protocol-level McpError (e.g. "Session terminated") currently
propagate raw and get mapped to AGENT_RUNTIME.UNEXPECTED_ERROR with
category Unknown - and for timeouts, an empty detail, since
str(httpx.ReadTimeout) is "".

- integration_tool: catch httpx.TimeoutException and raise
  AgentRuntimeError(HTTP_ERROR, SYSTEM) with a retry hint naming the tool
- mcp_tool: catch McpError and raise AgentRuntimeError(HTTP_ERROR, SYSTEM);
  session errors (32600/-32000) get a "connection terminated, retry later"
  message, other codes surface the server's own error message
- mcp_client: expose the configured server slug as a public property

Both raises use should_wrap=False so the message is not buried under the
generic unexpected-error prefix, and chain the original exception.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01N4bvjRA16sKYhsW3mHHpiZ
Copilot AI review requested due to automatic review settings July 2, 2026 13:49

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves agent run failure triage by mapping two previously “Unknown/Unexpected” tool transport failure modes (Integration Service timeouts and MCP protocol/session errors) into structured AgentRuntimeErrors categorized as SYSTEM, with actionable, unwrapped messages while preserving the original exception as the cause.

Changes:

  • Catch httpx.TimeoutException in Integration Service tool invocation and raise AgentRuntimeError(HTTP_ERROR, SYSTEM, should_wrap=False) with a retry hint.
  • Catch protocol-level McpError during MCP tool calls and map it via _map_mcp_error() into AgentRuntimeError(HTTP_ERROR, SYSTEM, should_wrap=False), with special messaging for session termination codes.
  • Expose McpClient.server_slug to allow MCP error messages to name the server, and add targeted tests for both behaviors.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/uipath_langchain/agent/tools/integration_tool.py Maps Integration Service timeouts to categorized AgentRuntimeError with actionable retry guidance.
src/uipath_langchain/agent/tools/mcp/mcp_tool.py Maps protocol/session-level McpError exceptions to categorized AgentRuntimeError with server/tool context.
src/uipath_langchain/agent/tools/mcp/mcp_client.py Adds a public server_slug property for safer access in error reporting.
tests/agent/tools/test_integration_tool.py Adds a regression test ensuring timeouts become SYSTEM AgentRuntimeError with a chained cause.
tests/agent/tools/test_mcp/test_mcp_tool.py Adds tests verifying MCP protocol errors are categorized/messaged correctly and non-McpError exceptions still propagate.

@sonarqubecloud

sonarqubecloud Bot commented Jul 2, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants