feat(sdk): agent runtime behind backend/harness ports by mmabrouk · Pull Request #4771 · Agenta-AI/agenta

mmabrouk · 2026-06-19T16:28:56Z

Agent-workflows: functional PR set

Sliced by functional area, final code only (no intermediate churn). Most PRs are independent off main; two pairs are stacked. This PR's base is main.

feat(sdk): agent runtime behind backend/harness ports #4771: SDK agent runtime: ports, adapters, tools, messages protocol <- you are here
- feat(agent): agent workflow service and tool-resolution API #4772: Agent service + tool-resolution API
feat(agent): runner wire contract and tool execution #4773: Runner wire contract + tool execution
- feat(agent): runner engines, server, and tracing #4774: Runner engines, server, tracing
feat(frontend): agent config playground controls #4775: Playground agent config UI
chore(hosting): wire the agent runner sidecar into compose #4776: Hosting compose wiring
docs(agent): agent-workflows design and ground truth #4777: Docs: design + ground truth

Context

The agent runtime turns a stored agent definition into a live coding-agent turn: it picks an engine, shapes the config that engine wants, runs one prompt, and streams the reply back to a browser. This PR puts that whole runtime in the SDK under sdks/python/agenta/sdk/agents/, structured as ports and adapters. It targets main and is independent. It is a functional slice that shows the final code, so the service PR that composes these adapters stacks on top of it.

What this changes

The runtime now reads as three layers behind interfaces. Backend is the engine: it declares which harnesses it can drive and owns sandbox plus session lifecycle. Environment sits above a backend and owns the sandbox-per-session policy. Harness sits above an environment and maps a neutral config into one engine's shape.

Before, the per-harness knowledge lived in the TypeScript runner, and a caller spoke directly to a transport. Now a caller builds a SessionConfig, hands it to a Harness, and the harness produces the engine-shaped config that a Backend plumbs to the runner without business logic. PiHarness, ClaudeHarness, and AgentaHarness each do different work because the harnesses differ: Pi takes built-in tool names plus native specs and never gates tool use; Claude has no built-ins, delivers tools over MCP, and gates tool use behind a permission policy; Agenta is Pi plus forced skills and a preamble.

Two backend adapters drive real engines. InProcessPiBackend runs Pi in-process through the runner and supports pi and agenta. RivetBackend drives a harness over ACP and supports pi and claude on local or Daytona. LocalBackend is a stub that raises NotImplementedError.

The browser edge is the Vercel /messages adapter. It folds inbound UIMessage input into neutral messages, emits Vercel UI Message Stream parts, stamps x-ag-messages-format and x-ag-messages-version headers, resolves the session id, and routes /load-session through a SessionStore port whose only adapter today is NoopSessionStore. The normalizer threads session_id as a request-envelope field, so it survives the round trip as a correlation value.

Key architectural decision to review

The first decision is the ownership split. The SDK owns the runtime ports and the adapters, and the service only composes them (sdks/python/agenta/sdk/agents/interfaces.py). The tradeoff: a standalone SDK user can drive Pi with no Agenta service, but the service must inject its server-side concerns (gateway tool resolution, the secret vault) through the injected adapter seams rather than reaching into the runtime. Check that Backend.supported_harnesses stays the single source of truth and that Harness.__init__ rejects an unsupported pairing before any run starts.

The second decision is that session_id is a correlation primitive, not state (adapters/vercel/routing.py, middlewares/running/normalizer.py). The cold runtime still receives the full message history on every turn. resolve_session_id mints, echoes, or rejects the id against a bounded charset, and the id is stamped onto the stream and the envelope, but nothing reads it back as conversation state yet. SessionStore is a port-only seam: NoopSessionStore returns empty history and discards writes, so /load-session answers with nothing until a real adapter lands. Confirm this is a deliberate seam and not a dropped write path.

How to review this PR

Read interfaces.py first and fix the three-layer vocabulary in your head: Backend, Environment, Harness, plus the Sandbox, Session, and SessionStore ports. Then read dtos.py for the shapes that cross those ports, especially SessionConfig (the run bundle), AgentConfig.harness_options (the per-harness escape hatch), and the PiAgentConfig / ClaudeAgentConfig / AgentaAgentConfig split where wire_tools differs per engine. Then read adapters/harnesses.py, adapters/in_process.py, and adapters/rivet.py to see the mapping and the two real backends. Read adapters/vercel/routing.py last for the browser edge.

You can skip the mcp/ subpackage and the parsing helpers at the bottom of dtos.py on a first pass; they are mechanical. The regression most likely to break is the golden wire contract: a tool-free run's /run payload must stay byte-identical, so watch any change to wire_tools, wire_mcp, or request_to_wire against golden/run_request.pi.json.

Tests / notes

The suite covers the DTO shapes, the harness adapters and their backend-support validation, the /messages and /load-session routing, the tool resolver, and a transport round trip. The wire-contract test pins the runner payload against golden JSON. The NoopSessionStore path is verified to return empty and discard, which documents the not-yet-persisted behavior rather than hiding it.

…es protocol

vercel · 2026-06-19T16:29:02Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 22, 2026 12:21pm

coderabbitai · 2026-06-19T16:29:19Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d9133488-6ee8-4cb5-87a1-46ea23fed178

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a complete agenta.sdk.agents Python runtime: hexagonal-port ABCs (Backend, Sandbox, Session, Environment, Harness), Pydantic DTOs, MCP and tools subsystems, AgentRun streaming, InProcessPiBackend/SandboxAgentBackend/LocalBackend adapters, PiHarness/ClaudeHarness/AgentaHarness, a Vercel UI message/SSE/routing adapter, TypeScript runner HTTP/subprocess transport, and workflow schema/registry/routing wiring, backed by broad unit and integration tests.

Changes

Agents Runtime

Layer / File(s)	Summary
Core DTOs, interfaces, streaming, tools/MCP subsystems `sdks/python/agenta/sdk/agents/dtos.py`, `sdks/python/agenta/sdk/agents/interfaces.py`, `sdks/python/agenta/sdk/agents/streaming.py`, `sdks/python/agenta/sdk/agents/errors.py`, `sdks/python/agenta/sdk/agents/tools/...`, `sdks/python/agenta/sdk/agents/mcp/...`	Defines `HarnessType`, `ContentBlock`, `Message`, `AgentConfig`, `SessionConfig`, per-harness config classes, the `Sandbox`/`Session`/`Backend`/`Environment`/`Harness` ABCs with `NoopSessionStore`, the `AgentRun` async-iterable streaming primitive, the tools subsystem (models, parsing, compat coercion, resolver, wire serialization, error hierarchy), and the MCP subsystem (models, parsing, resolver, wire helpers, error types).
Backend adapters, harness adapters, Vercel adapter, transport `sdks/python/agenta/sdk/agents/adapters/...`, `sdks/python/agenta/sdk/agents/utils/ts_runner.py`, `sdks/python/agenta/sdk/agents/utils/wire.py`	Implements `InProcessPiBackend`, `SandboxAgentBackend`, `LocalBackend` (stub); runner command resolution (`_runner_config.py`); `PiHarness`, `ClaudeHarness`, `AgentaHarness` with Agenta forced defaults and `make_harness` factory; Vercel adapter (`messages.py`, `sse.py`, `stream.py`, `routing.py`); HTTP and subprocess NDJSON transport (`ts_runner.py`); and `/run` request/response wire serialization.
Workflow flags, schema catalog, normalizer, routing decorator `sdks/python/agenta/sdk/engines/running/interfaces.py`, `sdks/python/agenta/sdk/engines/running/utils.py`, `sdks/python/agenta/sdk/middlewares/running/normalizer.py`, `sdks/python/agenta/sdk/models/workflows.py`, `sdks/python/agenta/sdk/utils/types.py`, `sdks/python/agenta/sdk/decorators/routing.py`, `sdks/python/agenta/__init__.py`, `sdks/python/agenta/sdk/agents/__init__.py`	Registers `agenta:builtin:agent:v0` interface/catalog/configuration, updates `_AGENTA_ROLE_TABLE` to application-only, adds `is_agent` workflow flags, `session_id`/`messages`/`stream` request/response fields, `LoadSessionRequest`/`LoadSessionResponse` DTOs, `session_id` normalizer mapping, `AgentConfigSchema` catalog type, Vercel SSE response branch and `/messages`/`/load-session` route registration in the `route` decorator, and top-level `agenta` package re-exports.
Unit, integration, and endpoint tests `sdks/python/agenta/tests/agents/...`, `sdks/python/oss/tests/pytest/unit/agents/...`, `sdks/python/oss/tests/pytest/integration/agents/...`, `sdks/python/oss/tests/pytest/utils/test_messages_endpoint.py`, `sdks/python/oss/tests/pytest/utils/test_routing.py`, `sdks/python/oss/tests/pytest/unit/test_normalizer_passthrough.py`	Adds fake port implementations (`FakeSandbox`, `FakeSession`, `FakeBackend`) with golden wire fixtures; tests for `AgentRun` streaming event ordering/cleanup/error, subprocess transport cancellation, full transport round-trip integration; DTO coercion/wire serialization, tool model/parsing/resolver, MCP resolver, harness adapter translation (Pi/Agenta/Claude), environment lifecycle, Vercel UI message inbound/outbound/stream projection, `/messages` and `/load-session` HTTP endpoint behavior, SSE framing, reserved path rejection, OpenAPI agent route registration, runner adapter constructor validation, and `session_id` normalizer passthrough.

Sequence Diagram(s)

sequenceDiagram
  participant Client as HTTP Client
  participant FastAPI as FastAPI /messages
  participant Routing as register_agent_message_routes
  participant Harness as PiHarness / AgentaHarness
  participant Backend as InProcessPiBackend
  participant Runner as TS Runner (HTTP or subprocess)

  Client->>FastAPI: POST /messages {messages, session_id, stream}
  FastAPI->>Routing: make_messages_endpoint(workflow, ...)
  Routing->>Routing: resolve_session_id(request.session_id)
  Routing->>Routing: vercel_ui_messages_to_messages(request.messages)
  Routing->>Harness: workflow(config, messages, session_id=...)
  Harness->>Harness: _to_harness_config(session_config)
  Harness->>Backend: create_sandbox() / create_session(...)
  Backend->>Runner: deliver_http / deliver_subprocess (request_to_wire)
  Runner-->>Backend: NDJSON stream or JSON
  Backend-->>Harness: AgentRun / AgentResult
  alt streaming
    Harness-->>Routing: WorkflowStreamingResponse
    Routing->>Routing: inject_stream_session_id(response, session_id)
    Routing->>Client: text/event-stream via vercel_sse_stream
  else JSON
    Harness-->>Routing: WorkflowBatchResponse
    Routing->>Client: JSON + Vercel protocol headers
  end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.58% which is insufficient. The required threshold is 60.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(sdk): agent runtime behind backend/harness ports' clearly and concisely summarizes the main architectural addition of the pull request.
Description check	✅ Passed	The description comprehensively explains the context, architecture, design decisions, and provides detailed guidance on how to review this substantial PR adding the agent runtime system.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/agent-sdk-runtime

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mmabrouk · 2026-06-19T16:34:13Z

Reviewer guide: interesting code

A few pointers to the load-bearing decisions, so review time goes to the parts that matter.

sdk/agents/interfaces.py:140 and interfaces.py:248 — the backend/harness validation matrix: each backend declares supported_harnesses, and the Harness constructor rejects an environment whose backend cannot drive it, so a bad pairing fails at construction rather than mid-run.
sdk/agents/adapters/harnesses.py:83 — ClaudeHarness drops Pi built-in tool names with a warning, because built-ins are a Pi concept Claude cannot honor; this is the clearest spot where the adapters do genuinely divergent work.
sdk/agents/interfaces.py:89 — SessionStore is a port-only seam with a NoopSessionStore default; the cold runtime still gets full history every turn, so nothing persists yet and the platform store attaches here later.
sdk/agents/adapters/vercel/routing.py:26 — session_id is validated against a bounded charset and minted when absent, then carried as an envelope field (not a header) and stamped onto the first Vercel start part's messageMetadata.
sdk/agents/tools/resolver.py — ToolResolver turns canonical ToolConfig into runner-ready ToolSpec through an injected secret provider and gateway resolver; the gateway resolver is None here and lands server-side in feat(agent): agent workflow service and tool-resolution API #4772, so only the offline executors resolve in the SDK.
sdk/agents/dtos.py:546 — SessionConfig exposes the resolved tools under two names (builtin_names/builtin_tools, tool_specs/custom_tools) via alias choices; the same coercion lives in ResolvedToolSet, and the back-compat names must keep working.
sdk/agents/adapters/local.py — LocalBackend raises on every method by design; it is the next backend's skeleton, present so the adapter layout and port shape are visible.

mmabrouk · 2026-06-19T16:34:25Z

+        # Claude has no Pi built-in tools; drop them rather than ship a name Claude cannot
+        # honor. Tools go over MCP, and Claude gates tool use, so the permission policy is
+        # carried through.
+        if config.builtin_names:


Claude has no Pi built-in tools, so they are dropped with a warning rather than shipped as a name Claude cannot honor. This is the cleanest example of an adapter sending only what its harness understands.

mmabrouk · 2026-06-19T16:34:26Z

+from .messages import message_to_vercel_ui_message, vercel_ui_messages_to_messages
+
+# An opaque, project-scoped session id (RFC §4.1): bounded length, restricted charset.
+_SESSION_ID_RE = re.compile(r"^[A-Za-z0-9._:-]{1,128}$")


session_id is a project-scoped opaque token validated against a bounded charset/length and minted when absent, carried as an envelope field rather than a header. Worth confirming the charset is wide enough for the platform's id format.

mmabrouk · 2026-06-19T16:34:27Z

                )
                consumed.add(name)

+            elif name == "session_id":


This maps the request envelope's session_id into a handler parameter of the same name, which is how the /messages session threads into the agent handler without living in request.data.inputs.

coderabbitai

Actionable comments posted: 14

🧹 Nitpick comments (2)

sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py (1)

86-96: ⚡ Quick win

Add regression coverage for agent shape with missing tools

Please add a test where {"agent": {"instructions": "I"}} + defaults verifies tools still inherit from defaults. This would have caught the current fallback bug.

sdks/python/oss/tests/pytest/unit/agents/tools/test_parsing.py (1)

36-46: ⚡ Quick win

Add a regression test for string needs_approval values.

Given legacy payloads may carry "false"/"true" as strings, add a case asserting "false" does not become True after coercion.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 3746d41f-c884-49c4-8834-df9bd68dfb03

📥 Commits

Reviewing files that changed from the base of the PR and between a97e608 and b9e62f9.

📒 Files selected for processing (68)

sdks/python/agenta/__init__.py
sdks/python/agenta/sdk/agents/__init__.py
sdks/python/agenta/sdk/agents/adapters/__init__.py
sdks/python/agenta/sdk/agents/adapters/agenta_builtins.py
sdks/python/agenta/sdk/agents/adapters/harnesses.py
sdks/python/agenta/sdk/agents/adapters/in_process.py
sdks/python/agenta/sdk/agents/adapters/local.py
sdks/python/agenta/sdk/agents/adapters/rivet.py
sdks/python/agenta/sdk/agents/adapters/vercel/__init__.py
sdks/python/agenta/sdk/agents/adapters/vercel/messages.py
sdks/python/agenta/sdk/agents/adapters/vercel/routing.py
sdks/python/agenta/sdk/agents/adapters/vercel/sse.py
sdks/python/agenta/sdk/agents/adapters/vercel/stream.py
sdks/python/agenta/sdk/agents/dtos.py
sdks/python/agenta/sdk/agents/errors.py
sdks/python/agenta/sdk/agents/interfaces.py
sdks/python/agenta/sdk/agents/mcp/__init__.py
sdks/python/agenta/sdk/agents/mcp/errors.py
sdks/python/agenta/sdk/agents/mcp/interfaces.py
sdks/python/agenta/sdk/agents/mcp/models.py
sdks/python/agenta/sdk/agents/mcp/parsing.py
sdks/python/agenta/sdk/agents/mcp/resolver.py
sdks/python/agenta/sdk/agents/mcp/wire.py
sdks/python/agenta/sdk/agents/streaming.py
sdks/python/agenta/sdk/agents/tools/__init__.py
sdks/python/agenta/sdk/agents/tools/compat.py
sdks/python/agenta/sdk/agents/tools/errors.py
sdks/python/agenta/sdk/agents/tools/interfaces.py
sdks/python/agenta/sdk/agents/tools/models.py
sdks/python/agenta/sdk/agents/tools/parsing.py
sdks/python/agenta/sdk/agents/tools/resolver.py
sdks/python/agenta/sdk/agents/tools/wire.py
sdks/python/agenta/sdk/agents/ui_messages.py
sdks/python/agenta/sdk/agents/utils/__init__.py
sdks/python/agenta/sdk/agents/utils/ts_runner.py
sdks/python/agenta/sdk/agents/utils/wire.py
sdks/python/agenta/sdk/decorators/routing.py
sdks/python/agenta/sdk/engines/running/interfaces.py
sdks/python/agenta/sdk/engines/running/utils.py
sdks/python/agenta/sdk/middlewares/running/normalizer.py
sdks/python/agenta/sdk/models/workflows.py
sdks/python/agenta/sdk/utils/types.py
sdks/python/agenta/tests/agents/test_streaming.py
sdks/python/oss/tests/pytest/integration/agents/__init__.py
sdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.py
sdks/python/oss/tests/pytest/unit/agents/__init__.py
sdks/python/oss/tests/pytest/unit/agents/conftest.py
sdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.json
sdks/python/oss/tests/pytest/unit/agents/golden/run_request.pi.json
sdks/python/oss/tests/pytest/unit/agents/golden/run_result.error.json
sdks/python/oss/tests/pytest/unit/agents/golden/run_result.ok.json
sdks/python/oss/tests/pytest/unit/agents/mcp/__init__.py
sdks/python/oss/tests/pytest/unit/agents/mcp/test_resolver.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_capabilities_events.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_content_blocks.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_harness_configs.py
sdks/python/oss/tests/pytest/unit/agents/test_environment_lifecycle.py
sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
sdks/python/oss/tests/pytest/unit/agents/test_ui_messages.py
sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py
sdks/python/oss/tests/pytest/unit/agents/tools/__init__.py
sdks/python/oss/tests/pytest/unit/agents/tools/test_models.py
sdks/python/oss/tests/pytest/unit/agents/tools/test_parsing.py
sdks/python/oss/tests/pytest/unit/agents/tools/test_resolver.py
sdks/python/oss/tests/pytest/unit/test_normalizer_passthrough.py
sdks/python/oss/tests/pytest/utils/test_messages_endpoint.py
sdks/python/oss/tests/pytest/utils/test_routing.py

coderabbitai · 2026-06-19T16:45:30Z

+    def __init__(
+        self,
+        backend: "InProcessPiBackend",
+        config: HarnessAgentConfig,
+        *,
+        secrets: Optional[Mapping[str, str]],
+        trace: Optional[TraceContext],
+        session_id: Optional[str],
+    ) -> None:
+        self._backend = backend
+        self._config = config
+        self._secrets = dict(secrets or {})
+        self._trace = trace
+        self._session_id = session_id
+
+    @property
+    def id(self) -> Optional[str]:
+        return self._session_id
+
+    def _wire_payload(self, messages: Sequence[Message]) -> Dict[str, Any]:
+        """The ``/run`` request JSON for this turn (shared by ``prompt`` and ``stream``)."""
+        return request_to_wire(
+            engine=InProcessPiBackend._ENGINE,
+            harness=HarnessType.PI,
+            sandbox="local",


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the requested harness type in the wire payload.

create_session accepts harness, but the session drops it and _wire_payload always sends HarnessType.PI (Line 76). For Agenta runs, this serializes the wrong harness across the backend boundary.

Suggested fix

class InProcessPiSession(Session): @@ def __init__( self, backend: "InProcessPiBackend", config: HarnessAgentConfig, *, + harness: HarnessType, secrets: Optional[Mapping[str, str]], trace: Optional[TraceContext], session_id: Optional[str], ) -> None: self._backend = backend self._config = config + self._harness = harness self._secrets = dict(secrets or {}) self._trace = trace self._session_id = session_id @@ return request_to_wire( engine=InProcessPiBackend._ENGINE, - harness=HarnessType.PI, + harness=self._harness, sandbox="local", config=self._config, messages=messages, secrets=self._secrets, trace=self._trace, session_id=self._session_id, ) @@ async def create_session( @@ ) -> InProcessPiSession: return InProcessPiSession( self, config, + harness=harness, secrets=secrets, trace=trace, session_id=session_id, )

Also applies to: 137-153

coderabbitai · 2026-06-19T16:45:30Z

+    supported_harnesses = frozenset({HarnessType.PI, HarnessType.CLAUDE})
+
+    async def create_sandbox(self) -> Sandbox:
+        raise NotImplementedError(
+            "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
+            "Phase 4: Claude via claude-agent-sdk)."
+        )
+
+    async def create_session(
+        self,
+        sandbox: Sandbox,
+        config: HarnessAgentConfig,
+        *,
+        harness: HarnessType,
+        secrets: Optional[Mapping[str, str]] = None,
+        trace: Optional[TraceContext] = None,
+        session_id: Optional[str] = None,
+    ) -> Session:
+        raise NotImplementedError(
+            "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
+            "Phase 4: Claude via claude-agent-sdk)."
+        )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid advertising harness support before implementation exists.

LocalBackend declares PI/CLAUDE in supported_harnesses (Line 27), but both creation methods always raise NotImplementedError (Lines 30-48). This defers failure to runtime instead of failing fast on compatibility checks.

Suggested fail-fast adjustment

- supported_harnesses = frozenset({HarnessType.PI, HarnessType.CLAUDE}) + supported_harnesses = frozenset()

coderabbitai · 2026-06-19T16:45:30Z

+    async def load_session_endpoint(req: Request, request: LoadSessionRequest):
+        messages = await store.load(request.session_id)
+        response = LoadSessionResponse(
+            session_id=request.session_id,
+            messages=[
+                message_to_vercel_ui_message(message, message_id=f"msg-{idx}")
+                for idx, message in enumerate(messages, start=1)
+            ],
+        )
+        return set_vercel_message_protocol_headers(
+            JSONResponse(content=response.model_dump(mode="json"))
+        )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate session_id in /load-session before hitting SessionStore.

Line 159 forwards raw request.session_id to store.load(...) without the same charset/length gate used by /messages (Lines 84-93). This creates an inconsistent trust boundary and can expose storage adapters to unsafe identifiers.

Suggested patch

async def load_session_endpoint(req: Request, request: LoadSessionRequest): - messages = await store.load(request.session_id) + session_id = resolve_session_id(request.session_id) + if session_id is None: + return set_vercel_message_protocol_headers( + JSONResponse( + status_code=400, + content={ + "detail": "session_id violates the allowed charset/length" + }, + ) + ) + messages = await store.load(session_id) response = LoadSessionResponse( - session_id=request.session_id, + session_id=session_id, messages=[ message_to_vercel_ui_message(message, message_id=f"msg-{idx}") for idx, message in enumerate(messages, start=1) ], )

coderabbitai · 2026-06-19T16:45:30Z

+
+# Permission policy for harness tool use in a headless run. ``auto`` approves (tools are
+# backend-resolved and trusted, no human to prompt); ``deny`` rejects.
+PermissionPolicy = str  # "auto" | "deny"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate permission_policy instead of accepting arbitrary strings

PermissionPolicy is documented as "auto" | "deny" but currently typed as str, so invalid values flow through until downstream failure. Enforce this at DTO boundaries.

Proposed fix

-from typing import Any, Callable, ClassVar, Dict, List, Optional, Tuple, Union +from typing import Any, Callable, ClassVar, Dict, List, Literal, Optional, Tuple, Union @@ -PermissionPolicy = str # "auto" | "deny" +PermissionPolicy = Literal["auto", "deny"]

Also applies to: 363-379, 502-503, 559-559

@mmabrouk, great — here's the plan to enforce the constraint:

Change PermissionPolicy from str to Literal["auto", "deny"] in sdks/python/agenta/sdk/agents/dtos.py

Add Literal to the existing from typing import ... line

(◕‿◕✿)

📌 Commit the changes to this branch

coderabbitai · 2026-06-19T16:45:30Z

+    if "needs_approval" in source:
+        result["needs_approval"] = bool(source["needs_approval"])
+    if isinstance(source.get("render"), dict):


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

needs_approval coercion is semantically wrong for string inputs.

Line 54 uses bool(source["needs_approval"]), so values like "false" become True. That flips approval gating behavior for legacy payloads.

Proposed fix

def _copy_tool_metadata( source: dict[str, Any], target: dict[str, Any] ) -> dict[str, Any]: result = dict(target) if "needs_approval" in source: - result["needs_approval"] = bool(source["needs_approval"]) + result["needs_approval"] = source["needs_approval"] if isinstance(source.get("render"), dict): result["render"] = dict(source["render"]) return result

coderabbitai · 2026-06-19T16:45:31Z

+        if on_error == "raise":
+            raise error
+        diagnostics.append(ToolConfigDiagnostic(index=index, message=str(error)))


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate on_error at runtime to prevent silent fallback behavior.

If callers pass an invalid value (e.g., typo), current logic silently behaves like "collect". Fail fast to avoid hidden parse-policy changes.

Proposed fix

def coerce_tool_configs( values: Optional[Sequence[Any]], *, on_error: Literal["raise", "collect"] = "raise", ) -> ToolConfigParseResult: """Convert legacy values, either raising or returning structured diagnostics.""" + if on_error not in {"raise", "collect"}: + raise ValueError("on_error must be 'raise' or 'collect'") + tool_configs: list[ToolConfig] = [] diagnostics: list[ToolConfigDiagnostic] = []

coderabbitai · 2026-06-19T16:45:31Z

+    if response.status_code >= 500:
+        raise RuntimeError(
+            f"Agent runner HTTP {response.status_code}: {response.text[:1000]}"
+        )
+    return response.json()


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle all non-2xx HTTP statuses as transport failures.

Only 5xx is handled today; 4xx responses fall through and may surface as opaque JSON parse errors instead of clear runner failures.

Proposed fix

- if response.status_code >= 500: + if response.status_code >= 400: raise RuntimeError( f"Agent runner HTTP {response.status_code}: {response.text[:1000]}" ) @@ - if response.status_code >= 500: + if response.status_code >= 400: body = await response.aread() raise RuntimeError( f"Agent runner HTTP {response.status_code}: {body[:1000]!r}" )

Also applies to: 108-113

coderabbitai · 2026-06-19T16:45:31Z

+            async for line in response.aiter_lines():
+                line = line.strip()
+                if line:
+                    yield json.loads(line)
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce a terminal stream result (or raise a transport error).

Both streaming transports can end cleanly when the runner disconnects/exits early, which leaves downstream AgentRun without a terminal result and can hide backend failures.

Proposed fix

async def deliver_http_stream( @@ - async with httpx.AsyncClient(timeout=timeout) as client: + saw_result = False + async with httpx.AsyncClient(timeout=timeout) as client: async with client.stream( "POST", url, json=payload, headers=headers ) as response: @@ async for line in response.aiter_lines(): line = line.strip() if line: - yield json.loads(line) + record = json.loads(line) + if record.get("kind") == "result": + saw_result = True + yield record + if not saw_result: + raise RuntimeError( + "Agent runner stream ended without a terminal result record" + ) @@ async def deliver_subprocess_stream( @@ - try: + saw_result = False + try: while True: @@ line = raw.decode("utf-8", "replace").strip() if line: - yield json.loads(line) + record = json.loads(line) + if record.get("kind") == "result": + saw_result = True + yield record await proc.wait() + err = (await proc.stderr.read()).decode("utf-8", "replace") + if proc.returncode not in (0, None): + raise RuntimeError( + f"Agent runner stream failed. exit={proc.returncode} stderr={err[-2000:]}" + ) + if not saw_result: + raise RuntimeError( + f"Agent runner stream ended without terminal result. stderr={err[-2000:]}" + ) finally: if proc.returncode is None: proc.kill() await proc.wait()

Also applies to: 147-160

coderabbitai · 2026-06-19T16:45:31Z

+    text = res.text
+    assert '"sessionId": "sess_abc"' in text  # stamped onto the start part
+    assert '"type": "text-delta"' in text
+    assert "data: [DONE]" in text
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make the SSE session-id check structure-aware instead of whitespace-dependent.

Line 196 matches a literal JSON substring ('"sessionId": "sess_abc"'), which can fail on harmless serializer formatting changes.

Suggested test hardening

text = res.text - assert '"sessionId": "sess_abc"' in text # stamped onto the start part + payloads = [ + json.loads(line.removeprefix("data: ")) + for line in text.splitlines() + if line.startswith("data: ") and line != "data: [DONE]" + ] + start = next(p for p in payloads if p.get("type") == "start") + assert start["messageMetadata"]["sessionId"] == "sess_abc" assert '"type": "text-delta"' in text assert "data: [DONE]" in text

mmabrouk · 2026-06-19T17:51:27Z

Reviewer guide: interesting code

A few spots worth landing on first:

sdks/python/agenta/sdk/agents/interfaces.py:140 — Backend.supported_harnesses is the single source of truth for what an engine can drive; Harness.__init__ validates against it before any run.
sdks/python/agenta/sdk/agents/interfaces.py:111 — NoopSessionStore returns empty history and discards writes, which is the port-only seam behind /load-session until a real store lands.
sdks/python/agenta/sdk/agents/dtos.py:524 — AgentaAgentConfig extends PiAgentConfig and only adds forced skills, which is the cleanest read on "Agenta is Pi with an opinion".
sdks/python/agenta/sdk/agents/adapters/in_process.py:118 — InProcessPiBackend is the reference backend; note it is deliberately not a subclass of RivetBackend even though they share wire helpers.
sdks/python/agenta/sdk/agents/adapters/harnesses.py:85 — ClaudeHarness drops Pi built-in tool names with a warning, because built-ins are a Pi concept Claude cannot honor.
sdks/python/agenta/sdk/agents/adapters/vercel/routing.py:43 — resolve_session_id mints, echoes, or rejects the session id against a bounded charset; this is where session_id enters the run as a correlation value.

mmabrouk · 2026-06-19T17:51:44Z

+    """
+
+    #: The single source of truth for what this engine can run.
+    supported_harnesses: ClassVar[FrozenSet[HarnessType]] = frozenset()


This class var is the one place an engine declares its supported harnesses. The split below keeps backends as pure plumbing: they never branch on a harness name, they only check membership here.

mmabrouk · 2026-06-19T17:51:46Z

+        # carried through.
+        if config.builtin_names:
+            log.warning(
+                "ClaudeHarness ignores %d built-in tool(s); built-ins are a Pi concept",


Worth confirming a warning is the right level here. A config that names Pi built-ins but runs on Claude silently loses those tools; a stored agent could behave differently across harnesses without an obvious signal.

mmabrouk · 2026-06-19T17:51:47Z

+    return response
+
+
+def resolve_session_id(session_id: Optional[str]) -> Optional[str]:


This is the only gate on the session id. Returning None on an invalid id drives the 400 in the endpoint; a minted id uses sess_ + uuid4 hex, which stays inside the allowed charset.

mmabrouk

Codex subagent review for #4771

Findings:

Blocking: sdks/python/agenta/sdk/agents/adapters/rivet.py:36 and sdks/python/agenta/sdk/agents/adapters/in_process.py:36 make the default runner-backed path point at pnpm exec tsx src/cli.ts, but this PR does not add services/agent/src/cli.ts or the runner package. The public SDK example also uses RivetBackend() with no url, command, or cwd (sdks/python/agenta/sdk/agents/__init__.py:19), while the integration test only proves transport behavior by injecting a fake Python runner (sdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.py:81). Merged alone, and for #4772 stacked on it, the advertised default SDK runtime fails before any harness starts unless later runner assets from #4773/#4778 are present and the process cwd happens to be right. Please either require an explicit url/command until the runner lands, stack/retarget this runtime on the runner PR, or include the runnable runner assets plus an end-to-end test that exercises the default path.
sdks/python/agenta/sdk/agents/dtos.py:680 drops default tools whenever a dedicated agent dict is present but omits tools. The from_params docstring says unset fields fall back to defaults, and the MCP/harness-option paths do that, but this branch returns None; the constructor then passes tools=_as_list(None) and silently clears defaults.tools. A partial override such as { "agent": { "model": "..." } } will run tool-free. Please fall back to defaults.tools when the key is absent and add a partial-agent test.

Stack note: #4771 does contain the Python utils/wire.py serializer and golden fixtures. #4773 still advertises independence from main, but its protocol docs point at those SDK files and one advertised test imports src/engines/pi.ts, which only lands in the later runner-engine PR. Please align the stack-nav/review map so reviewers know which PR supplies the wire fixtures and runner assets.

I did not run tests locally; this review used the GitHub patch/head files.

coderabbitai

Actionable comments posted: 1

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5388d34f-2c5e-4260-b4e6-13176aece5f9

📥 Commits

Reviewing files that changed from the base of the PR and between b9e62f9 and 741fc73.

📒 Files selected for processing (9)

sdks/python/agenta/sdk/agents/__init__.py
sdks/python/agenta/sdk/agents/adapters/_runner_config.py
sdks/python/agenta/sdk/agents/adapters/in_process.py
sdks/python/agenta/sdk/agents/adapters/rivet.py
sdks/python/agenta/sdk/agents/dtos.py
sdks/python/agenta/sdk/agents/errors.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
sdks/python/oss/tests/pytest/unit/agents/test_runner_adapter_config.py

🚧 Files skipped from review as they are similar to previous changes (6)

sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
sdks/python/agenta/sdk/agents/init.py
sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
sdks/python/agenta/sdk/agents/adapters/in_process.py
sdks/python/agenta/sdk/agents/adapters/rivet.py
sdks/python/agenta/sdk/agents/dtos.py

coderabbitai · 2026-06-19T20:33:41Z

+    if url:
+        return list(command) if command is not None else list(DEFAULT_RUNNER_COMMAND)
+    if command is not None:
+        return list(command)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject empty command at config time.

At Line 22 and Line 24, command=[] is accepted and propagated as _command, which creates an unusable subprocess transport and fails later at runtime. Validate non-empty command in resolve_runner_command so misconfiguration fails fast with AgentRunnerConfigurationError.

Suggested fix

def resolve_runner_command( @@ ) -> List[str]: + def _validated_command(raw: Sequence[str]) -> List[str]: + cmd = list(raw) + if not cmd: + raise AgentRunnerConfigurationError( + f"{backend_name} received an empty command. " + "Pass a non-empty command, pass url for an HTTP runner, " + f"or set cwd to a runner wrapper containing {RUNNER_CLI_PATH.as_posix()}." + ) + return cmd + if url: - return list(command) if command is not None else list(DEFAULT_RUNNER_COMMAND) + return _validated_command(command) if command is not None else list(DEFAULT_RUNNER_COMMAND) if command is not None: - return list(command) + return _validated_command(command)

coderabbitai

🧹 Nitpick comments (1)

sdks/python/agenta/sdk/agents/adapters/sandbox_agent.py (1)
188-196: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Consider adding logging for suppressed event-sink exceptions.

The function silently suppresses all exceptions from the event sink. While the comment explains the rationale, adding a debug log would improve debuggability when an event sink misbehaves.
📊 Proposed enhancement to add logging
 def _emit_events(result: AgentResult, on_event: Optional[EventSink]) -> None:
     """Replay ``result.events`` to an optional sink, suppressing sink exceptions."""
     if on_event is None:
         return
     for evt in result.events or []:
         try:
             on_event(evt)
-        except Exception:
-            pass  # the sink is caller-provided; don't let it crash the result
+        except Exception as e:
+            import logging
+            logging.getLogger(__name__).debug(
+                "Event sink raised exception: %s", e, exc_info=True
+            )

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 13e12ec7-a558-4f95-a9b2-438fdc0384cb

📥 Commits

Reviewing files that changed from the base of the PR and between 741fc73 and 2a7c129.

📒 Files selected for processing (15)

sdks/python/agenta/__init__.py
sdks/python/agenta/sdk/agents/__init__.py
sdks/python/agenta/sdk/agents/adapters/__init__.py
sdks/python/agenta/sdk/agents/adapters/_runner_config.py
sdks/python/agenta/sdk/agents/adapters/in_process.py
sdks/python/agenta/sdk/agents/adapters/local.py
sdks/python/agenta/sdk/agents/adapters/sandbox_agent.py
sdks/python/agenta/sdk/agents/dtos.py
sdks/python/agenta/sdk/agents/interfaces.py
sdks/python/agenta/sdk/agents/utils/ts_runner.py
sdks/python/agenta/sdk/agents/utils/wire.py
sdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.json
sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
sdks/python/oss/tests/pytest/unit/agents/test_runner_adapter_config.py
sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py

✅ Files skipped from review due to trivial changes (1)

sdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.json

🚧 Files skipped from review as they are similar to previous changes (10)

sdks/python/agenta/sdk/agents/adapters/init.py
sdks/python/agenta/sdk/agents/adapters/local.py
sdks/python/agenta/sdk/agents/adapters/_runner_config.py
sdks/python/agenta/sdk/agents/utils/wire.py
sdks/python/agenta/sdk/agents/init.py
sdks/python/agenta/sdk/agents/utils/ts_runner.py
sdks/python/agenta/sdk/agents/adapters/in_process.py
sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py
sdks/python/agenta/sdk/agents/interfaces.py
sdks/python/agenta/sdk/agents/dtos.py

mmabrouk · 2026-06-22T11:40:48Z

+    """A chat message in the conversation. ``content`` is text or content blocks.
+
+    This is the runtime's own message type, distinct from the SDK's prompt ``Message``
+    (``agenta.Message``); the two serve different layers.


this is confusing. we need different namicn / some clarity here

…error handling)

feat(sdk): agent runtime ports, adapters, tool resolution, and messag…

b9e62f9

…es protocol

dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 19, 2026

dosubot Bot added Backend feature python Pull requests that update Python code SDK labels Jun 19, 2026

vercel Bot deployed to Preview June 19, 2026 16:29 View deployment

mmabrouk mentioned this pull request Jun 19, 2026

feat(agent): agent workflow service and tool-resolution API #4772

Open