Skip to content

feat(sdk): agent runtime behind backend/harness ports#4771

Merged
mmabrouk merged 4 commits into
big-agentsfrom
feat/agent-sdk-runtime
Jun 22, 2026
Merged

feat(sdk): agent runtime behind backend/harness ports#4771
mmabrouk merged 4 commits into
big-agentsfrom
feat/agent-sdk-runtime

Conversation

@mmabrouk

@mmabrouk mmabrouk commented Jun 19, 2026

Copy link
Copy Markdown
Member

Agent-workflows: functional PR set

Sliced by functional area, final code only (no intermediate churn). Most PRs are independent off main; two pairs are stacked. This PR's base is main.

Context

The agent runtime turns a stored agent definition into a live coding-agent turn: it picks an engine, shapes the config that engine wants, runs one prompt, and streams the reply back to a browser. This PR puts that whole runtime in the SDK under sdks/python/agenta/sdk/agents/, structured as ports and adapters. It targets main and is independent. It is a functional slice that shows the final code, so the service PR that composes these adapters stacks on top of it.

What this changes

The runtime now reads as three layers behind interfaces. Backend is the engine: it declares which harnesses it can drive and owns sandbox plus session lifecycle. Environment sits above a backend and owns the sandbox-per-session policy. Harness sits above an environment and maps a neutral config into one engine's shape.

Before, the per-harness knowledge lived in the TypeScript runner, and a caller spoke directly to a transport. Now a caller builds a SessionConfig, hands it to a Harness, and the harness produces the engine-shaped config that a Backend plumbs to the runner without business logic. PiHarness, ClaudeHarness, and AgentaHarness each do different work because the harnesses differ: Pi takes built-in tool names plus native specs and never gates tool use; Claude has no built-ins, delivers tools over MCP, and gates tool use behind a permission policy; Agenta is Pi plus forced skills and a preamble.

Two backend adapters drive real engines. InProcessPiBackend runs Pi in-process through the runner and supports pi and agenta. RivetBackend drives a harness over ACP and supports pi and claude on local or Daytona. LocalBackend is a stub that raises NotImplementedError.

The browser edge is the Vercel /messages adapter. It folds inbound UIMessage input into neutral messages, emits Vercel UI Message Stream parts, stamps x-ag-messages-format and x-ag-messages-version headers, resolves the session id, and routes /load-session through a SessionStore port whose only adapter today is NoopSessionStore. The normalizer threads session_id as a request-envelope field, so it survives the round trip as a correlation value.

Key architectural decision to review

The first decision is the ownership split. The SDK owns the runtime ports and the adapters, and the service only composes them (sdks/python/agenta/sdk/agents/interfaces.py). The tradeoff: a standalone SDK user can drive Pi with no Agenta service, but the service must inject its server-side concerns (gateway tool resolution, the secret vault) through the injected adapter seams rather than reaching into the runtime. Check that Backend.supported_harnesses stays the single source of truth and that Harness.__init__ rejects an unsupported pairing before any run starts.

The second decision is that session_id is a correlation primitive, not state (adapters/vercel/routing.py, middlewares/running/normalizer.py). The cold runtime still receives the full message history on every turn. resolve_session_id mints, echoes, or rejects the id against a bounded charset, and the id is stamped onto the stream and the envelope, but nothing reads it back as conversation state yet. SessionStore is a port-only seam: NoopSessionStore returns empty history and discards writes, so /load-session answers with nothing until a real adapter lands. Confirm this is a deliberate seam and not a dropped write path.

How to review this PR

Read interfaces.py first and fix the three-layer vocabulary in your head: Backend, Environment, Harness, plus the Sandbox, Session, and SessionStore ports. Then read dtos.py for the shapes that cross those ports, especially SessionConfig (the run bundle), AgentConfig.harness_options (the per-harness escape hatch), and the PiAgentConfig / ClaudeAgentConfig / AgentaAgentConfig split where wire_tools differs per engine. Then read adapters/harnesses.py, adapters/in_process.py, and adapters/rivet.py to see the mapping and the two real backends. Read adapters/vercel/routing.py last for the browser edge.

You can skip the mcp/ subpackage and the parsing helpers at the bottom of dtos.py on a first pass; they are mechanical. The regression most likely to break is the golden wire contract: a tool-free run's /run payload must stay byte-identical, so watch any change to wire_tools, wire_mcp, or request_to_wire against golden/run_request.pi.json.

Tests / notes

The suite covers the DTO shapes, the harness adapters and their backend-support validation, the /messages and /load-session routing, the tool resolver, and a transport round trip. The wire-contract test pins the runner payload against golden JSON. The NoopSessionStore path is verified to return empty and discard, which documents the not-yet-persisted behavior rather than hiding it.

@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 19, 2026
@vercel

vercel Bot commented Jun 19, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 22, 2026 12:21pm

Request Review

@dosubot dosubot Bot added Backend feature python Pull requests that update Python code SDK labels Jun 19, 2026
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d9133488-6ee8-4cb5-87a1-46ea23fed178

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a complete agenta.sdk.agents Python runtime: hexagonal-port ABCs (Backend, Sandbox, Session, Environment, Harness), Pydantic DTOs, MCP and tools subsystems, AgentRun streaming, InProcessPiBackend/SandboxAgentBackend/LocalBackend adapters, PiHarness/ClaudeHarness/AgentaHarness, a Vercel UI message/SSE/routing adapter, TypeScript runner HTTP/subprocess transport, and workflow schema/registry/routing wiring, backed by broad unit and integration tests.

Changes

Agents Runtime

Layer / File(s) Summary
Core DTOs, interfaces, streaming, tools/MCP subsystems
sdks/python/agenta/sdk/agents/dtos.py, sdks/python/agenta/sdk/agents/interfaces.py, sdks/python/agenta/sdk/agents/streaming.py, sdks/python/agenta/sdk/agents/errors.py, sdks/python/agenta/sdk/agents/tools/..., sdks/python/agenta/sdk/agents/mcp/...
Defines HarnessType, ContentBlock, Message, AgentConfig, SessionConfig, per-harness config classes, the Sandbox/Session/Backend/Environment/Harness ABCs with NoopSessionStore, the AgentRun async-iterable streaming primitive, the tools subsystem (models, parsing, compat coercion, resolver, wire serialization, error hierarchy), and the MCP subsystem (models, parsing, resolver, wire helpers, error types).
Backend adapters, harness adapters, Vercel adapter, transport
sdks/python/agenta/sdk/agents/adapters/..., sdks/python/agenta/sdk/agents/utils/ts_runner.py, sdks/python/agenta/sdk/agents/utils/wire.py
Implements InProcessPiBackend, SandboxAgentBackend, LocalBackend (stub); runner command resolution (_runner_config.py); PiHarness, ClaudeHarness, AgentaHarness with Agenta forced defaults and make_harness factory; Vercel adapter (messages.py, sse.py, stream.py, routing.py); HTTP and subprocess NDJSON transport (ts_runner.py); and /run request/response wire serialization.
Workflow flags, schema catalog, normalizer, routing decorator
sdks/python/agenta/sdk/engines/running/interfaces.py, sdks/python/agenta/sdk/engines/running/utils.py, sdks/python/agenta/sdk/middlewares/running/normalizer.py, sdks/python/agenta/sdk/models/workflows.py, sdks/python/agenta/sdk/utils/types.py, sdks/python/agenta/sdk/decorators/routing.py, sdks/python/agenta/__init__.py, sdks/python/agenta/sdk/agents/__init__.py
Registers agenta:builtin:agent:v0 interface/catalog/configuration, updates _AGENTA_ROLE_TABLE to application-only, adds is_agent workflow flags, session_id/messages/stream request/response fields, LoadSessionRequest/LoadSessionResponse DTOs, session_id normalizer mapping, AgentConfigSchema catalog type, Vercel SSE response branch and /messages//load-session route registration in the route decorator, and top-level agenta package re-exports.
Unit, integration, and endpoint tests
sdks/python/agenta/tests/agents/..., sdks/python/oss/tests/pytest/unit/agents/..., sdks/python/oss/tests/pytest/integration/agents/..., sdks/python/oss/tests/pytest/utils/test_messages_endpoint.py, sdks/python/oss/tests/pytest/utils/test_routing.py, sdks/python/oss/tests/pytest/unit/test_normalizer_passthrough.py
Adds fake port implementations (FakeSandbox, FakeSession, FakeBackend) with golden wire fixtures; tests for AgentRun streaming event ordering/cleanup/error, subprocess transport cancellation, full transport round-trip integration; DTO coercion/wire serialization, tool model/parsing/resolver, MCP resolver, harness adapter translation (Pi/Agenta/Claude), environment lifecycle, Vercel UI message inbound/outbound/stream projection, /messages and /load-session HTTP endpoint behavior, SSE framing, reserved path rejection, OpenAPI agent route registration, runner adapter constructor validation, and session_id normalizer passthrough.

Sequence Diagram(s)

sequenceDiagram
  participant Client as HTTP Client
  participant FastAPI as FastAPI /messages
  participant Routing as register_agent_message_routes
  participant Harness as PiHarness / AgentaHarness
  participant Backend as InProcessPiBackend
  participant Runner as TS Runner (HTTP or subprocess)

  Client->>FastAPI: POST /messages {messages, session_id, stream}
  FastAPI->>Routing: make_messages_endpoint(workflow, ...)
  Routing->>Routing: resolve_session_id(request.session_id)
  Routing->>Routing: vercel_ui_messages_to_messages(request.messages)
  Routing->>Harness: workflow(config, messages, session_id=...)
  Harness->>Harness: _to_harness_config(session_config)
  Harness->>Backend: create_sandbox() / create_session(...)
  Backend->>Runner: deliver_http / deliver_subprocess (request_to_wire)
  Runner-->>Backend: NDJSON stream or JSON
  Backend-->>Harness: AgentRun / AgentResult
  alt streaming
    Harness-->>Routing: WorkflowStreamingResponse
    Routing->>Routing: inject_stream_session_id(response, session_id)
    Routing->>Client: text/event-stream via vercel_sse_stream
  else JSON
    Harness-->>Routing: WorkflowBatchResponse
    Routing->>Client: JSON + Vercel protocol headers
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.58% which is insufficient. The required threshold is 60.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(sdk): agent runtime behind backend/harness ports' clearly and concisely summarizes the main architectural addition of the pull request.
Description check ✅ Passed The description comprehensively explains the context, architecture, design decisions, and provides detailed guidance on how to review this substantial PR adding the agent runtime system.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agent-sdk-runtime

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mmabrouk

Copy link
Copy Markdown
Member Author

Reviewer guide: interesting code

A few pointers to the load-bearing decisions, so review time goes to the parts that matter.

  • sdk/agents/interfaces.py:140 and interfaces.py:248 — the backend/harness validation matrix: each backend declares supported_harnesses, and the Harness constructor rejects an environment whose backend cannot drive it, so a bad pairing fails at construction rather than mid-run.
  • sdk/agents/adapters/harnesses.py:83ClaudeHarness drops Pi built-in tool names with a warning, because built-ins are a Pi concept Claude cannot honor; this is the clearest spot where the adapters do genuinely divergent work.
  • sdk/agents/interfaces.py:89SessionStore is a port-only seam with a NoopSessionStore default; the cold runtime still gets full history every turn, so nothing persists yet and the platform store attaches here later.
  • sdk/agents/adapters/vercel/routing.py:26session_id is validated against a bounded charset and minted when absent, then carried as an envelope field (not a header) and stamped onto the first Vercel start part's messageMetadata.
  • sdk/agents/tools/resolver.pyToolResolver turns canonical ToolConfig into runner-ready ToolSpec through an injected secret provider and gateway resolver; the gateway resolver is None here and lands server-side in feat(agent): agent workflow service and tool-resolution API #4772, so only the offline executors resolve in the SDK.
  • sdk/agents/dtos.py:546SessionConfig exposes the resolved tools under two names (builtin_names/builtin_tools, tool_specs/custom_tools) via alias choices; the same coercion lives in ResolvedToolSet, and the back-compat names must keep working.
  • sdk/agents/adapters/local.pyLocalBackend raises on every method by design; it is the next backend's skeleton, present so the adapter layout and port shape are visible.

Comment thread sdks/python/agenta/sdk/agents/interfaces.py
# Claude has no Pi built-in tools; drop them rather than ship a name Claude cannot
# honor. Tools go over MCP, and Claude gates tool use, so the permission policy is
# carried through.
if config.builtin_names:

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude has no Pi built-in tools, so they are dropped with a warning rather than shipped as a name Claude cannot honor. This is the cleanest example of an adapter sending only what its harness understands.

from .messages import message_to_vercel_ui_message, vercel_ui_messages_to_messages

# An opaque, project-scoped session id (RFC §4.1): bounded length, restricted charset.
_SESSION_ID_RE = re.compile(r"^[A-Za-z0-9._:-]{1,128}$")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

session_id is a project-scoped opaque token validated against a bounded charset/length and minted when absent, carried as an envelope field rather than a header. Worth confirming the charset is wide enough for the platform's id format.

)
consumed.add(name)

elif name == "session_id":

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maps the request envelope's session_id into a handler parameter of the same name, which is how the /messages session threads into the agent handler without living in request.data.inputs.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Nitpick comments (2)
sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py (1)

86-96: ⚡ Quick win

Add regression coverage for agent shape with missing tools

Please add a test where {"agent": {"instructions": "I"}} + defaults verifies tools still inherit from defaults. This would have caught the current fallback bug.

sdks/python/oss/tests/pytest/unit/agents/tools/test_parsing.py (1)

36-46: ⚡ Quick win

Add a regression test for string needs_approval values.

Given legacy payloads may carry "false"/"true" as strings, add a case asserting "false" does not become True after coercion.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 3746d41f-c884-49c4-8834-df9bd68dfb03

📥 Commits

Reviewing files that changed from the base of the PR and between a97e608 and b9e62f9.

📒 Files selected for processing (68)
  • sdks/python/agenta/__init__.py
  • sdks/python/agenta/sdk/agents/__init__.py
  • sdks/python/agenta/sdk/agents/adapters/__init__.py
  • sdks/python/agenta/sdk/agents/adapters/agenta_builtins.py
  • sdks/python/agenta/sdk/agents/adapters/harnesses.py
  • sdks/python/agenta/sdk/agents/adapters/in_process.py
  • sdks/python/agenta/sdk/agents/adapters/local.py
  • sdks/python/agenta/sdk/agents/adapters/rivet.py
  • sdks/python/agenta/sdk/agents/adapters/vercel/__init__.py
  • sdks/python/agenta/sdk/agents/adapters/vercel/messages.py
  • sdks/python/agenta/sdk/agents/adapters/vercel/routing.py
  • sdks/python/agenta/sdk/agents/adapters/vercel/sse.py
  • sdks/python/agenta/sdk/agents/adapters/vercel/stream.py
  • sdks/python/agenta/sdk/agents/dtos.py
  • sdks/python/agenta/sdk/agents/errors.py
  • sdks/python/agenta/sdk/agents/interfaces.py
  • sdks/python/agenta/sdk/agents/mcp/__init__.py
  • sdks/python/agenta/sdk/agents/mcp/errors.py
  • sdks/python/agenta/sdk/agents/mcp/interfaces.py
  • sdks/python/agenta/sdk/agents/mcp/models.py
  • sdks/python/agenta/sdk/agents/mcp/parsing.py
  • sdks/python/agenta/sdk/agents/mcp/resolver.py
  • sdks/python/agenta/sdk/agents/mcp/wire.py
  • sdks/python/agenta/sdk/agents/streaming.py
  • sdks/python/agenta/sdk/agents/tools/__init__.py
  • sdks/python/agenta/sdk/agents/tools/compat.py
  • sdks/python/agenta/sdk/agents/tools/errors.py
  • sdks/python/agenta/sdk/agents/tools/interfaces.py
  • sdks/python/agenta/sdk/agents/tools/models.py
  • sdks/python/agenta/sdk/agents/tools/parsing.py
  • sdks/python/agenta/sdk/agents/tools/resolver.py
  • sdks/python/agenta/sdk/agents/tools/wire.py
  • sdks/python/agenta/sdk/agents/ui_messages.py
  • sdks/python/agenta/sdk/agents/utils/__init__.py
  • sdks/python/agenta/sdk/agents/utils/ts_runner.py
  • sdks/python/agenta/sdk/agents/utils/wire.py
  • sdks/python/agenta/sdk/decorators/routing.py
  • sdks/python/agenta/sdk/engines/running/interfaces.py
  • sdks/python/agenta/sdk/engines/running/utils.py
  • sdks/python/agenta/sdk/middlewares/running/normalizer.py
  • sdks/python/agenta/sdk/models/workflows.py
  • sdks/python/agenta/sdk/utils/types.py
  • sdks/python/agenta/tests/agents/test_streaming.py
  • sdks/python/oss/tests/pytest/integration/agents/__init__.py
  • sdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.py
  • sdks/python/oss/tests/pytest/unit/agents/__init__.py
  • sdks/python/oss/tests/pytest/unit/agents/conftest.py
  • sdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.json
  • sdks/python/oss/tests/pytest/unit/agents/golden/run_request.pi.json
  • sdks/python/oss/tests/pytest/unit/agents/golden/run_result.error.json
  • sdks/python/oss/tests/pytest/unit/agents/golden/run_result.ok.json
  • sdks/python/oss/tests/pytest/unit/agents/mcp/__init__.py
  • sdks/python/oss/tests/pytest/unit/agents/mcp/test_resolver.py
  • sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
  • sdks/python/oss/tests/pytest/unit/agents/test_dtos_capabilities_events.py
  • sdks/python/oss/tests/pytest/unit/agents/test_dtos_content_blocks.py
  • sdks/python/oss/tests/pytest/unit/agents/test_dtos_harness_configs.py
  • sdks/python/oss/tests/pytest/unit/agents/test_environment_lifecycle.py
  • sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
  • sdks/python/oss/tests/pytest/unit/agents/test_ui_messages.py
  • sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py
  • sdks/python/oss/tests/pytest/unit/agents/tools/__init__.py
  • sdks/python/oss/tests/pytest/unit/agents/tools/test_models.py
  • sdks/python/oss/tests/pytest/unit/agents/tools/test_parsing.py
  • sdks/python/oss/tests/pytest/unit/agents/tools/test_resolver.py
  • sdks/python/oss/tests/pytest/unit/test_normalizer_passthrough.py
  • sdks/python/oss/tests/pytest/utils/test_messages_endpoint.py
  • sdks/python/oss/tests/pytest/utils/test_routing.py

Comment thread sdks/python/agenta/sdk/agents/__init__.py
Comment on lines +53 to +77
def __init__(
self,
backend: "InProcessPiBackend",
config: HarnessAgentConfig,
*,
secrets: Optional[Mapping[str, str]],
trace: Optional[TraceContext],
session_id: Optional[str],
) -> None:
self._backend = backend
self._config = config
self._secrets = dict(secrets or {})
self._trace = trace
self._session_id = session_id

@property
def id(self) -> Optional[str]:
return self._session_id

def _wire_payload(self, messages: Sequence[Message]) -> Dict[str, Any]:
"""The ``/run`` request JSON for this turn (shared by ``prompt`` and ``stream``)."""
return request_to_wire(
engine=InProcessPiBackend._ENGINE,
harness=HarnessType.PI,
sandbox="local",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the requested harness type in the wire payload.

create_session accepts harness, but the session drops it and _wire_payload always sends HarnessType.PI (Line 76). For Agenta runs, this serializes the wrong harness across the backend boundary.

Suggested fix
 class InProcessPiSession(Session):
@@
     def __init__(
         self,
         backend: "InProcessPiBackend",
         config: HarnessAgentConfig,
         *,
+        harness: HarnessType,
         secrets: Optional[Mapping[str, str]],
         trace: Optional[TraceContext],
         session_id: Optional[str],
     ) -> None:
         self._backend = backend
         self._config = config
+        self._harness = harness
         self._secrets = dict(secrets or {})
         self._trace = trace
         self._session_id = session_id
@@
         return request_to_wire(
             engine=InProcessPiBackend._ENGINE,
-            harness=HarnessType.PI,
+            harness=self._harness,
             sandbox="local",
             config=self._config,
             messages=messages,
             secrets=self._secrets,
             trace=self._trace,
             session_id=self._session_id,
         )
@@
     async def create_session(
@@
     ) -> InProcessPiSession:
         return InProcessPiSession(
             self,
             config,
+            harness=harness,
             secrets=secrets,
             trace=trace,
             session_id=session_id,
         )

Also applies to: 137-153

Comment on lines +27 to +48
supported_harnesses = frozenset({HarnessType.PI, HarnessType.CLAUDE})

async def create_sandbox(self) -> Sandbox:
raise NotImplementedError(
"LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
"Phase 4: Claude via claude-agent-sdk)."
)

async def create_session(
self,
sandbox: Sandbox,
config: HarnessAgentConfig,
*,
harness: HarnessType,
secrets: Optional[Mapping[str, str]] = None,
trace: Optional[TraceContext] = None,
session_id: Optional[str] = None,
) -> Session:
raise NotImplementedError(
"LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
"Phase 4: Claude via claude-agent-sdk)."
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid advertising harness support before implementation exists.

LocalBackend declares PI/CLAUDE in supported_harnesses (Line 27), but both creation methods always raise NotImplementedError (Lines 30-48). This defers failure to runtime instead of failing fast on compatibility checks.

Suggested fail-fast adjustment
-    supported_harnesses = frozenset({HarnessType.PI, HarnessType.CLAUDE})
+    supported_harnesses = frozenset()

Comment on lines +158 to +169
async def load_session_endpoint(req: Request, request: LoadSessionRequest):
messages = await store.load(request.session_id)
response = LoadSessionResponse(
session_id=request.session_id,
messages=[
message_to_vercel_ui_message(message, message_id=f"msg-{idx}")
for idx, message in enumerate(messages, start=1)
],
)
return set_vercel_message_protocol_headers(
JSONResponse(content=response.model_dump(mode="json"))
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate session_id in /load-session before hitting SessionStore.

Line 159 forwards raw request.session_id to store.load(...) without the same charset/length gate used by /messages (Lines 84-93). This creates an inconsistent trust boundary and can expose storage adapters to unsafe identifiers.

Suggested patch
 async def load_session_endpoint(req: Request, request: LoadSessionRequest):
-        messages = await store.load(request.session_id)
+        session_id = resolve_session_id(request.session_id)
+        if session_id is None:
+            return set_vercel_message_protocol_headers(
+                JSONResponse(
+                    status_code=400,
+                    content={
+                        "detail": "session_id violates the allowed charset/length"
+                    },
+                )
+            )
+        messages = await store.load(session_id)
         response = LoadSessionResponse(
-            session_id=request.session_id,
+            session_id=session_id,
             messages=[
                 message_to_vercel_ui_message(message, message_id=f"msg-{idx}")
                 for idx, message in enumerate(messages, start=1)
             ],
         )

Comment thread sdks/python/agenta/sdk/agents/dtos.py Outdated

# Permission policy for harness tool use in a headless run. ``auto`` approves (tools are
# backend-resolved and trusted, no human to prompt); ``deny`` rejects.
PermissionPolicy = str # "auto" | "deny"

@coderabbitai coderabbitai Bot Jun 19, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate permission_policy instead of accepting arbitrary strings

PermissionPolicy is documented as "auto" | "deny" but currently typed as str, so invalid values flow through until downstream failure. Enforce this at DTO boundaries.

Proposed fix
-from typing import Any, Callable, ClassVar, Dict, List, Optional, Tuple, Union
+from typing import Any, Callable, ClassVar, Dict, List, Literal, Optional, Tuple, Union
@@
-PermissionPolicy = str  # "auto" | "deny"
+PermissionPolicy = Literal["auto", "deny"]

Also applies to: 363-379, 502-503, 559-559

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmabrouk, great — here's the plan to enforce the constraint:

  • Change PermissionPolicy from str to Literal["auto", "deny"] in sdks/python/agenta/sdk/agents/dtos.py
  • Add Literal to the existing from typing import ... line

(◕‿◕✿)


  • 📌 Commit the changes to this branch

Comment on lines +53 to +55
if "needs_approval" in source:
result["needs_approval"] = bool(source["needs_approval"])
if isinstance(source.get("render"), dict):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

needs_approval coercion is semantically wrong for string inputs.

Line 54 uses bool(source["needs_approval"]), so values like "false" become True. That flips approval gating behavior for legacy payloads.

Proposed fix
 def _copy_tool_metadata(
     source: dict[str, Any], target: dict[str, Any]
 ) -> dict[str, Any]:
     result = dict(target)
     if "needs_approval" in source:
-        result["needs_approval"] = bool(source["needs_approval"])
+        result["needs_approval"] = source["needs_approval"]
     if isinstance(source.get("render"), dict):
         result["render"] = dict(source["render"])
     return result

Comment on lines +125 to +127
if on_error == "raise":
raise error
diagnostics.append(ToolConfigDiagnostic(index=index, message=str(error)))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate on_error at runtime to prevent silent fallback behavior.

If callers pass an invalid value (e.g., typo), current logic silently behaves like "collect". Fail fast to avoid hidden parse-policy changes.

Proposed fix
 def coerce_tool_configs(
     values: Optional[Sequence[Any]],
     *,
     on_error: Literal["raise", "collect"] = "raise",
 ) -> ToolConfigParseResult:
     """Convert legacy values, either raising or returning structured diagnostics."""
+    if on_error not in {"raise", "collect"}:
+        raise ValueError("on_error must be 'raise' or 'collect'")
+
     tool_configs: list[ToolConfig] = []
     diagnostics: list[ToolConfigDiagnostic] = []

Comment on lines +29 to +33
if response.status_code >= 500:
raise RuntimeError(
f"Agent runner HTTP {response.status_code}: {response.text[:1000]}"
)
return response.json()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle all non-2xx HTTP statuses as transport failures.

Only 5xx is handled today; 4xx responses fall through and may surface as opaque JSON parse errors instead of clear runner failures.

Proposed fix
-    if response.status_code >= 500:
+    if response.status_code >= 400:
         raise RuntimeError(
             f"Agent runner HTTP {response.status_code}: {response.text[:1000]}"
         )
@@
-            if response.status_code >= 500:
+            if response.status_code >= 400:
                 body = await response.aread()
                 raise RuntimeError(
                     f"Agent runner HTTP {response.status_code}: {body[:1000]!r}"
                 )

Also applies to: 108-113

Comment on lines +113 to +117
async for line in response.aiter_lines():
line = line.strip()
if line:
yield json.loads(line)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce a terminal stream result (or raise a transport error).

Both streaming transports can end cleanly when the runner disconnects/exits early, which leaves downstream AgentRun without a terminal result and can hide backend failures.

Proposed fix
 async def deliver_http_stream(
@@
-    async with httpx.AsyncClient(timeout=timeout) as client:
+    saw_result = False
+    async with httpx.AsyncClient(timeout=timeout) as client:
         async with client.stream(
             "POST", url, json=payload, headers=headers
         ) as response:
@@
             async for line in response.aiter_lines():
                 line = line.strip()
                 if line:
-                    yield json.loads(line)
+                    record = json.loads(line)
+                    if record.get("kind") == "result":
+                        saw_result = True
+                    yield record
+            if not saw_result:
+                raise RuntimeError(
+                    "Agent runner stream ended without a terminal result record"
+                )
@@
 async def deliver_subprocess_stream(
@@
-    try:
+    saw_result = False
+    try:
         while True:
@@
             line = raw.decode("utf-8", "replace").strip()
             if line:
-                yield json.loads(line)
+                record = json.loads(line)
+                if record.get("kind") == "result":
+                    saw_result = True
+                yield record
         await proc.wait()
+        err = (await proc.stderr.read()).decode("utf-8", "replace")
+        if proc.returncode not in (0, None):
+            raise RuntimeError(
+                f"Agent runner stream failed. exit={proc.returncode} stderr={err[-2000:]}"
+            )
+        if not saw_result:
+            raise RuntimeError(
+                f"Agent runner stream ended without terminal result. stderr={err[-2000:]}"
+            )
     finally:
         if proc.returncode is None:
             proc.kill()
             await proc.wait()

Also applies to: 147-160

Comment on lines +195 to +199
text = res.text
assert '"sessionId": "sess_abc"' in text # stamped onto the start part
assert '"type": "text-delta"' in text
assert "data: [DONE]" in text

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make the SSE session-id check structure-aware instead of whitespace-dependent.

Line 196 matches a literal JSON substring ('"sessionId": "sess_abc"'), which can fail on harmless serializer formatting changes.

Suggested test hardening
     text = res.text
-    assert '"sessionId": "sess_abc"' in text  # stamped onto the start part
+    payloads = [
+        json.loads(line.removeprefix("data: "))
+        for line in text.splitlines()
+        if line.startswith("data: ") and line != "data: [DONE]"
+    ]
+    start = next(p for p in payloads if p.get("type") == "start")
+    assert start["messageMetadata"]["sessionId"] == "sess_abc"
     assert '"type": "text-delta"' in text
     assert "data: [DONE]" in text

@mmabrouk

Copy link
Copy Markdown
Member Author

Reviewer guide: interesting code

A few spots worth landing on first:

  • sdks/python/agenta/sdk/agents/interfaces.py:140Backend.supported_harnesses is the single source of truth for what an engine can drive; Harness.__init__ validates against it before any run.
  • sdks/python/agenta/sdk/agents/interfaces.py:111NoopSessionStore returns empty history and discards writes, which is the port-only seam behind /load-session until a real store lands.
  • sdks/python/agenta/sdk/agents/dtos.py:524AgentaAgentConfig extends PiAgentConfig and only adds forced skills, which is the cleanest read on "Agenta is Pi with an opinion".
  • sdks/python/agenta/sdk/agents/adapters/in_process.py:118InProcessPiBackend is the reference backend; note it is deliberately not a subclass of RivetBackend even though they share wire helpers.
  • sdks/python/agenta/sdk/agents/adapters/harnesses.py:85ClaudeHarness drops Pi built-in tool names with a warning, because built-ins are a Pi concept Claude cannot honor.
  • sdks/python/agenta/sdk/agents/adapters/vercel/routing.py:43resolve_session_id mints, echoes, or rejects the session id against a bounded charset; this is where session_id enters the run as a correlation value.

"""

#: The single source of truth for what this engine can run.
supported_harnesses: ClassVar[FrozenSet[HarnessType]] = frozenset()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class var is the one place an engine declares its supported harnesses. The split below keeps backends as pure plumbing: they never branch on a harness name, they only check membership here.

Comment thread sdks/python/agenta/sdk/agents/interfaces.py
# carried through.
if config.builtin_names:
log.warning(
"ClaudeHarness ignores %d built-in tool(s); built-ins are a Pi concept",

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth confirming a warning is the right level here. A config that names Pi built-ins but runs on Claude silently loses those tools; a stored agent could behave differently across harnesses without an obvious signal.

return response


def resolve_session_id(session_id: Optional[str]) -> Optional[str]:

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only gate on the session id. Returning None on an invalid id drives the 400 in the endpoint; a minted id uses sess_ + uuid4 hex, which stays inside the allowed charset.

@mmabrouk mmabrouk left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex subagent review for #4771

Findings:

  • Blocking: sdks/python/agenta/sdk/agents/adapters/rivet.py:36 and sdks/python/agenta/sdk/agents/adapters/in_process.py:36 make the default runner-backed path point at pnpm exec tsx src/cli.ts, but this PR does not add services/agent/src/cli.ts or the runner package. The public SDK example also uses RivetBackend() with no url, command, or cwd (sdks/python/agenta/sdk/agents/__init__.py:19), while the integration test only proves transport behavior by injecting a fake Python runner (sdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.py:81). Merged alone, and for #4772 stacked on it, the advertised default SDK runtime fails before any harness starts unless later runner assets from #4773/#4778 are present and the process cwd happens to be right. Please either require an explicit url/command until the runner lands, stack/retarget this runtime on the runner PR, or include the runnable runner assets plus an end-to-end test that exercises the default path.

  • sdks/python/agenta/sdk/agents/dtos.py:680 drops default tools whenever a dedicated agent dict is present but omits tools. The from_params docstring says unset fields fall back to defaults, and the MCP/harness-option paths do that, but this branch returns None; the constructor then passes tools=_as_list(None) and silently clears defaults.tools. A partial override such as { "agent": { "model": "..." } } will run tool-free. Please fall back to defaults.tools when the key is absent and add a partial-agent test.

Stack note: #4771 does contain the Python utils/wire.py serializer and golden fixtures. #4773 still advertises independence from main, but its protocol docs point at those SDK files and one advertised test imports src/engines/pi.ts, which only lands in the later runner-engine PR. Please align the stack-nav/review map so reviewers know which PR supplies the wire fixtures and runner assets.

I did not run tests locally; this review used the GitHub patch/head files.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5388d34f-2c5e-4260-b4e6-13176aece5f9

📥 Commits

Reviewing files that changed from the base of the PR and between b9e62f9 and 741fc73.

📒 Files selected for processing (9)
  • sdks/python/agenta/sdk/agents/__init__.py
  • sdks/python/agenta/sdk/agents/adapters/_runner_config.py
  • sdks/python/agenta/sdk/agents/adapters/in_process.py
  • sdks/python/agenta/sdk/agents/adapters/rivet.py
  • sdks/python/agenta/sdk/agents/dtos.py
  • sdks/python/agenta/sdk/agents/errors.py
  • sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
  • sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
  • sdks/python/oss/tests/pytest/unit/agents/test_runner_adapter_config.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
  • sdks/python/agenta/sdk/agents/init.py
  • sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
  • sdks/python/agenta/sdk/agents/adapters/in_process.py
  • sdks/python/agenta/sdk/agents/adapters/rivet.py
  • sdks/python/agenta/sdk/agents/dtos.py

Comment on lines +21 to +24
if url:
return list(command) if command is not None else list(DEFAULT_RUNNER_COMMAND)
if command is not None:
return list(command)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject empty command at config time.

At Line 22 and Line 24, command=[] is accepted and propagated as _command, which creates an unusable subprocess transport and fails later at runtime. Validate non-empty command in resolve_runner_command so misconfiguration fails fast with AgentRunnerConfigurationError.

Suggested fix
 def resolve_runner_command(
@@
 ) -> List[str]:
+    def _validated_command(raw: Sequence[str]) -> List[str]:
+        cmd = list(raw)
+        if not cmd:
+            raise AgentRunnerConfigurationError(
+                f"{backend_name} received an empty command. "
+                "Pass a non-empty command, pass url for an HTTP runner, "
+                f"or set cwd to a runner wrapper containing {RUNNER_CLI_PATH.as_posix()}."
+            )
+        return cmd
+
     if url:
-        return list(command) if command is not None else list(DEFAULT_RUNNER_COMMAND)
+        return _validated_command(command) if command is not None else list(DEFAULT_RUNNER_COMMAND)
     if command is not None:
-        return list(command)
+        return _validated_command(command)

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
sdks/python/agenta/sdk/agents/adapters/sandbox_agent.py (1)

188-196: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Consider adding logging for suppressed event-sink exceptions.

The function silently suppresses all exceptions from the event sink. While the comment explains the rationale, adding a debug log would improve debuggability when an event sink misbehaves.

📊 Proposed enhancement to add logging
 def _emit_events(result: AgentResult, on_event: Optional[EventSink]) -> None:
     """Replay ``result.events`` to an optional sink, suppressing sink exceptions."""
     if on_event is None:
         return
     for evt in result.events or []:
         try:
             on_event(evt)
-        except Exception:
-            pass  # the sink is caller-provided; don't let it crash the result
+        except Exception as e:
+            import logging
+            logging.getLogger(__name__).debug(
+                "Event sink raised exception: %s", e, exc_info=True
+            )

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 13e12ec7-a558-4f95-a9b2-438fdc0384cb

📥 Commits

Reviewing files that changed from the base of the PR and between 741fc73 and 2a7c129.

📒 Files selected for processing (15)
  • sdks/python/agenta/__init__.py
  • sdks/python/agenta/sdk/agents/__init__.py
  • sdks/python/agenta/sdk/agents/adapters/__init__.py
  • sdks/python/agenta/sdk/agents/adapters/_runner_config.py
  • sdks/python/agenta/sdk/agents/adapters/in_process.py
  • sdks/python/agenta/sdk/agents/adapters/local.py
  • sdks/python/agenta/sdk/agents/adapters/sandbox_agent.py
  • sdks/python/agenta/sdk/agents/dtos.py
  • sdks/python/agenta/sdk/agents/interfaces.py
  • sdks/python/agenta/sdk/agents/utils/ts_runner.py
  • sdks/python/agenta/sdk/agents/utils/wire.py
  • sdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.json
  • sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
  • sdks/python/oss/tests/pytest/unit/agents/test_runner_adapter_config.py
  • sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py
✅ Files skipped from review due to trivial changes (1)
  • sdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.json
🚧 Files skipped from review as they are similar to previous changes (10)
  • sdks/python/agenta/sdk/agents/adapters/init.py
  • sdks/python/agenta/sdk/agents/adapters/local.py
  • sdks/python/agenta/sdk/agents/adapters/_runner_config.py
  • sdks/python/agenta/sdk/agents/utils/wire.py
  • sdks/python/agenta/sdk/agents/init.py
  • sdks/python/agenta/sdk/agents/utils/ts_runner.py
  • sdks/python/agenta/sdk/agents/adapters/in_process.py
  • sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py
  • sdks/python/agenta/sdk/agents/interfaces.py
  • sdks/python/agenta/sdk/agents/dtos.py

Comment thread sdks/python/agenta/sdk/agents/dtos.py Outdated
"""A chat message in the conversation. ``content`` is text or content blocks.

This is the runtime's own message type, distinct from the SDK's prompt ``Message``
(``agenta.Message``); the two serve different layers.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing. we need different namicn / some clarity here

@mmabrouk mmabrouk changed the base branch from main to big-agents June 22, 2026 11:49
@mmabrouk mmabrouk merged commit 2eed5d0 into big-agents Jun 22, 2026
11 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend feature python Pull requests that update Python code SDK size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant