Skip to content

contrib: add tool_registry package with AgenticSession support#1435

Open
lex00 wants to merge 4 commits intotemporalio:mainfrom
lex00:feat/tool-registry
Open

contrib: add tool_registry package with AgenticSession support#1435
lex00 wants to merge 4 commits intotemporalio:mainfrom
lex00:feat/tool-registry

Conversation

@lex00
Copy link
Copy Markdown

@lex00 lex00 commented Apr 13, 2026

What was changed

New optional contrib package temporalio[tool-registry] for running LLM tool-calling loops inside Temporal activities.

  • ToolRegistry — maps tool names to JSON Schema definitions and handler functions
  • run_tool_loop — standalone tool loop, no Temporal worker required
  • agentic_session — crash-safe context manager that checkpoints conversation history via activity.heartbeat() on each turn and restores on retry; session survives both activity crashes and provider-side session expiry since state is stored locally
  • Built-in Anthropic and OpenAI providers
  • MockProvider and ResponseBuilder for unit testing without a live API key
  • MCP tool import via ToolRegistry.from_mcp_tools()

Bug fix included

Handler exceptions in the initial implementation were not caught, causing them to propagate out of the activity. All six SDK providers now catch handler errors and feed them back to the model. Anthropic providers additionally set is_error: true on the tool result block (per the Anthropic API spec); OpenAI has no equivalent field.

Why?

Temporal activities are a natural fit for LLM tool-calling loops, but every team reimplements the same boilerplate. This contrib package standardizes the pattern across all six Temporal SDKs.

This is a different layer from openai_agents, google_adk_agents, and langgraph. Those integrations run each model call as a separate Temporal activity using server-side session IDs. tool_registry runs the entire conversation in one activity using local heartbeat state — which survives server-side session expiry and works identically across all six SDKs. Use openai_agents/google_adk_agents/langgraph when you are already using those frameworks; use tool_registry for direct Anthropic/OpenAI calls or cross-SDK portability.

Proposal: temporalio/proposals#107

Checklist

  1. Related to: Add ToolRegistry + AgenticSession proposal proposals#107
  2. How was this tested: unit tests with MockProvider (no API key required); integration tests against live Anthropic and OpenAI APIs gated on RUN_INTEGRATION_TESTS=1 (skipped by default)
  3. Any docs updates needed? Yes — docs.temporal.io update to follow after merge

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lex00 lex00 requested a review from a team as a code owner April 13, 2026 03:53
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 13, 2026

CLA assistant check
All committers have signed the CLA.

lex00 and others added 3 commits April 12, 2026 23:23
…viders

- Wrap dispatch() calls in try/except in both AnthropicProvider and
  OpenAIProvider; handler exceptions no longer crash the activity
- Set is_error=True on Anthropic tool result blocks when a handler raises,
  matching the Anthropic API spec; OpenAI has no equivalent field
- Add tests covering handler error → is_error propagation and successful
  handler → no is_error
- Update README feature matrix to clarify positioning vs framework plugins

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add adispatch() to ToolRegistry for async handler support; sync dispatch()
  raises TypeError with helpful message for async handlers; providers use adispatch
- Make AnthropicProvider and OpenAIProvider run_turn/run_loop async
- Rename AgenticSession.issues → results in _session.py and testing.py
- Add ScheduleToCloseTimeout guidance to README
- Update tests: new adispatch tests, async handler end-to-end test, fix message
  indices in provider tests, rename issues→results in test_agentic_session.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Shows a code review agent that proposes auto-fixes requiring human sign-off
before application. The tool handler blocks on a Temporal workflow signal,
heartbeating manually while waiting. Key behaviors demonstrated:

- Rejection reason returned to LLM as tool result so it can revise approach
- Deterministic workflow IDs make crash-retry idempotent: re-attaches to the
  existing approval workflow rather than sending a duplicate notification
- Manual heartbeat loop inside the handler keeps the activity alive during
  long human review windows

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@0xbrainkid
Copy link
Copy Markdown

The agentic_session context manager is exactly the right abstraction — crash-safe conversation history via heartbeat checkpoints solves the most painful operational problem with long LLM tool loops in Temporal (activity retry losing context).

One addition worth considering for the AgenticSession interface: session-level identity and trust metadata. The session already holds conversation state. If it also holds the agent identity (which agent is running this session) and its behavioral trust score at session start, two things become possible:

1. Trust-gated tool access within the session. If the session knows the agent identity, ToolRegistry.run_tool_loop() can apply per-agent trust thresholds — the same tool is available to a high-trust agent but sandboxed or blocked for a low-trust agent, without needing to re-query the trust registry on every tool call:

async with agentic_session(
    provider=AnthropicProvider(),
    agent_id="fleet-agent-xyz",
    min_tool_trust=0.80  # tools requiring higher trust are blocked for this agent
) as session:
    result = await session.run_tool_loop(registry, messages)

2. Per-session behavioral audit trail. Each heartbeat checkpoint is a natural attestation point — the session state at each checkpoint can include the agent identity alongside the conversation history. If the session is later audited (e.g., for EU AI Act Article 12 compliance), the checkpoints provide a timestamped, crash-survivor record of which agent made which decisions.

The MockProvider and ResponseBuilder pattern is a good design — makes the trust gating testable without a live registry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants