Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
a9cacf4
feat(harness): unified harness surface — foundation (span derivation,…
declan-scale Jun 22, 2026
c8de1d4
feat(streaming): stream tool call argument deltas in TemporalStreamin…
vkalmathscale Jun 22, 2026
694960f
fix(harness): assert cross-channel (yield vs auto-send) conformance e…
declan-scale Jun 22, 2026
5ec62c2
feat(pydantic-ai): migrate onto unified harness surface (PR4) (#415)
declan-scale Jun 22, 2026
d10e151
feat(openai-agents): migrate onto the unified harness surface (#416)
declan-scale Jun 22, 2026
904339c
feat(claude-code): stream-json parser tap for the unified harness sur…
declan-scale Jun 22, 2026
d344228
feat(langgraph): migrate LangGraph harness onto unified surface (#417)
declan-scale Jun 22, 2026
9b2b031
feat(codex): event-stream parser tap for the unified harness surface …
declan-scale Jun 22, 2026
fa60632
feat(harness): public adk facade + docs for the unified harness surfa…
declan-scale Jun 23, 2026
2d63eef
fix(tracing): fail open temporal span activities (#437)
danielmillerp Jun 23, 2026
10d22a2
feat(tracing): skip Agentex span-start write by default (end-only ing…
NiteshDhanpal Jun 23, 2026
7f6d70a
test(harness): shared test fakes + conformance determinism fix (#427)
declan-scale Jun 23, 2026
514075d
feat(sdk): add webhook helper for forward-route handlers (#419)
danielmillerp Jun 23, 2026
ebaf617
refactor(tutorials): migrate to the unified harness surface + renumbe…
declan-scale Jun 23, 2026
ee41408
refactor(cli): migrate existing langgraph/pydantic-ai templates to un…
declan-scale Jun 23, 2026
37c7d9d
feat(api): add webhook endpoint
stainless-app[bot] Jun 23, 2026
5954a9f
fix(tests): use relative import for assert_matches_type in webhooks t…
declan-scale Jun 23, 2026
1d86e8a
fix(harness): correct codex & openai reasoning stream envelopes (#441)
declan-scale Jun 23, 2026
055f0c1
chore: release main
stainless-app[bot] Jun 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .github/workflows/agentex-tutorials-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,29 @@ jobs:
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.local/bin" >> $GITHUB_PATH

# Subprocess-CLI harnesses: install the relevant CLI only for the
# claude-code / codex tutorials (no-op for every other tutorial). npm is
# preinstalled on ubuntu runners. Versions mirror the golden agent's
# sandbox image (teams/sgp/agents/golden_agent/sandbox/Dockerfile): claude-code
# is pinned to the same CLAUDE_CODE_VERSION; codex is left unpinned there,
# so it is left unpinned here too. Bump CLAUDE_CODE_VERSION in lockstep
# with the sandbox Dockerfile.
- name: Install harness CLI (claude-code / codex only)
if: ${{ contains(matrix.tutorial, 'claude_code') || contains(matrix.tutorial, 'codex') }}
env:
CLAUDE_CODE_VERSION: "2.1.142"
run: |
if [[ "${{ matrix.tutorial }}" == *claude_code* ]]; then
echo "📦 Installing Claude Code CLI (v${CLAUDE_CODE_VERSION})..."
npm install -g "@anthropic-ai/claude-code@${CLAUDE_CODE_VERSION}"
claude --version || true
fi
if [[ "${{ matrix.tutorial }}" == *codex* ]]; then
echo "📦 Installing Codex CLI..."
npm install -g @openai/codex
codex --version || true
fi

- name: Pull latest AgentEx image
run: |
echo "🐳 Pulling latest Scale AgentEx Docker image..."
Expand Down Expand Up @@ -136,6 +159,11 @@ jobs:
working-directory: ./examples/tutorials
env:
OPENAI_API_KEY: ${{ secrets.TUTORIAL_OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.TUTORIAL_ANTHROPIC_API_KEY }}
# Enable the gated live tests only for the matching subprocess-CLI
# harness tutorial (the CLI is installed for it in the step above).
CLAUDE_LIVE_TESTS: ${{ contains(matrix.tutorial, 'claude_code') && '1' || '' }}
CODEX_LIVE_TESTS: ${{ contains(matrix.tutorial, 'codex') && '1' || '' }}
HEALTH_CHECK_PORT: 8080 # Use non-privileged port for temporal worker health checks
run: |
echo "Testing tutorial: ${{ matrix.tutorial }}"
Expand Down
61 changes: 61 additions & 0 deletions .github/workflows/harness-integration.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
name: Harness Integration

on:
push:
branches: [main]
pull_request:
paths:
- "src/agentex/lib/core/harness/**"
- "src/agentex/lib/adk/_modules/**"
- "tests/lib/core/harness/test_harness_pydantic_ai_*.py"
- "tests/lib/core/harness/test_harness_langgraph_*.py"
- ".github/workflows/harness-integration.yml"

jobs:
conformance:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5.4.2
with:
version: '0.10.2'

- name: Bootstrap
run: ./scripts/bootstrap

# Defer to scripts/test so the harness suite runs under the exact same
# invocation as the main CI test job: DEFER_PYDANTIC_BUILD=false and
# `uv run --isolated --all-packages --all-extras pytest`, across the
# min/max supported Python versions. Running `uv run pytest` directly
# would risk an all-extras-only dep passing locally but failing in CI.
- name: Conformance suite
run: ./scripts/test tests/lib/core/harness/ -v

# Offline harness integration tests (sync / async / temporal channels) for each
# migrated harness. These use fake streams / TestModel + fake streaming/tracing
# and require no live infrastructure. Future harness migration PRs (6-8) add
# their harness to the matrix below and their test paths to the triggers above.
live-matrix:
runs-on: ubuntu-latest
strategy:
matrix:
harness: [pydantic_ai, langgraph]
channel: [sync, async, temporal]
fail-fast: false
name: ${{ matrix.harness }}-${{ matrix.channel }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5.4.2
with:
version: '0.10.2'

- name: Bootstrap
run: ./scripts/bootstrap

- name: ${{ matrix.harness }} ${{ matrix.channel }} integration tests (offline)
run: |
./scripts/test tests/lib/core/harness/test_harness_${{ matrix.harness }}_${{ matrix.channel }}.py -v
4 changes: 2 additions & 2 deletions .release-please-manifest.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{
".": "0.14.0",
"adk": "0.13.2"
".": "0.15.0",
"adk": "0.14.0"
}
8 changes: 4 additions & 4 deletions .stats.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
configured_endpoints: 64
openapi_spec_url: https://storage.googleapis.com/stainless-sdk-openapi-specs/sgp/agentex-sdk-ae2571b5ac5d337ba5ced527cec0ff6e3088296fa67c3c836ed5a06544b25cb8.yml
openapi_spec_hash: 962a2f20444c7823fd3a34f95365146e
config_hash: 138b7c0b394e7393133c8ff16a6d0eb3
configured_endpoints: 65
openapi_spec_url: https://storage.googleapis.com/stainless-sdk-openapi-specs/sgp/agentex-sdk-cd43ba4b554ca024dd7ee7b74e4f4700a743282c17def704a0967e6ff251c09b.yml
openapi_spec_hash: 9369ccc9c0289e9d6f641a526d244d1c
config_hash: 1ae003838971335aac550f3ad5872f54
32 changes: 32 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,38 @@

* **tracing:** emit OTel metrics for async span queue depth, batch drain, and SGP export success/failure (HTTP status labels). Disable SDK-side recording with ``AGENTEX_TRACING_METRICS=0``.

## 0.15.0 (2026-06-23)

Full Changelog: [agentex-client-v0.14.0...agentex-client-v0.15.0](https://github.com/scaleapi/scale-agentex-python/compare/agentex-client-v0.14.0...agentex-client-v0.15.0)

### Features

* **api:** add webhook endpoint ([37c7d9d](https://github.com/scaleapi/scale-agentex-python/commit/37c7d9d465943184ab84922ba1079b939516d534))
* **claude-code:** stream-json parser tap for the unified harness surface ([#420](https://github.com/scaleapi/scale-agentex-python/issues/420)) ([904339c](https://github.com/scaleapi/scale-agentex-python/commit/904339c21b8cd641a02d903c03d4a8730b4d7e84))
* **codex:** event-stream parser tap for the unified harness surface ([#421](https://github.com/scaleapi/scale-agentex-python/issues/421)) ([9b2b031](https://github.com/scaleapi/scale-agentex-python/commit/9b2b03144cc67bb497e0a301686207aba2629758))
* **harness:** public adk facade + docs for the unified harness surface (PR 9) ([#423](https://github.com/scaleapi/scale-agentex-python/issues/423)) ([fa60632](https://github.com/scaleapi/scale-agentex-python/commit/fa60632f9be84315a3fdc627745ae5b605994bd8))
* **harness:** unified harness surface — foundation (span derivation, delivery adapters, emitter) ([#412](https://github.com/scaleapi/scale-agentex-python/issues/412)) ([a9cacf4](https://github.com/scaleapi/scale-agentex-python/commit/a9cacf4eb71697351ee658a570636f04bbf31ad5))
* **langgraph:** migrate LangGraph harness onto unified surface ([#417](https://github.com/scaleapi/scale-agentex-python/issues/417)) ([d344228](https://github.com/scaleapi/scale-agentex-python/commit/d34422845de4b80ed69d2dccfdb0c680ef2fbca3))
* **openai-agents:** migrate onto the unified harness surface ([#416](https://github.com/scaleapi/scale-agentex-python/issues/416)) ([d10e151](https://github.com/scaleapi/scale-agentex-python/commit/d10e1510bd5da44ad5acc5cac638750122083fce))
* **pydantic-ai:** migrate onto unified harness surface (PR4) ([#415](https://github.com/scaleapi/scale-agentex-python/issues/415)) ([5ec62c2](https://github.com/scaleapi/scale-agentex-python/commit/5ec62c20781d24fc3e0b92734fcd444b1e791d70))
* **sdk:** add webhook helper for forward-route handlers ([#419](https://github.com/scaleapi/scale-agentex-python/issues/419)) ([514075d](https://github.com/scaleapi/scale-agentex-python/commit/514075de2189f33be4ade0ac84368019e55ed7ea))
* **streaming:** stream tool call argument deltas in TemporalStreamingModel ([#355](https://github.com/scaleapi/scale-agentex-python/issues/355)) ([c8de1d4](https://github.com/scaleapi/scale-agentex-python/commit/c8de1d4c9c3b5b3c16ad4aaf9644c1ba0d618757))
* **tracing:** skip Agentex span-start write by default (end-only ingest) ([#438](https://github.com/scaleapi/scale-agentex-python/issues/438)) ([10d22a2](https://github.com/scaleapi/scale-agentex-python/commit/10d22a27091c9c410ae808dab9cfce5dab3816a8))


### Bug Fixes

* **harness:** assert cross-channel (yield vs auto-send) conformance equivalence [AGX1-373] ([#414](https://github.com/scaleapi/scale-agentex-python/issues/414)) ([694960f](https://github.com/scaleapi/scale-agentex-python/commit/694960f913b8ba521d9236e876e5e00f57a3a3ff))
* **harness:** correct codex & openai reasoning stream envelopes ([#441](https://github.com/scaleapi/scale-agentex-python/issues/441)) ([1d86e8a](https://github.com/scaleapi/scale-agentex-python/commit/1d86e8a47a369814540b6e853cd20240c6098f27))
* **tests:** use relative import for assert_matches_type in webhooks test ([#440](https://github.com/scaleapi/scale-agentex-python/issues/440)) ([5954a9f](https://github.com/scaleapi/scale-agentex-python/commit/5954a9fc8c7961ef5ceb41abf3ca32e6e78590c5))
* **tracing:** fail open temporal span activities ([#437](https://github.com/scaleapi/scale-agentex-python/issues/437)) ([2d63eef](https://github.com/scaleapi/scale-agentex-python/commit/2d63eef53bdb919bb6568e04708e3b7abcb8075b))


### Refactors

* **cli:** migrate existing langgraph/pydantic-ai templates to unified surface ([#429](https://github.com/scaleapi/scale-agentex-python/issues/429)) ([ee41408](https://github.com/scaleapi/scale-agentex-python/commit/ee41408c420eba5c6b8fe8719c8ebd445dcd220c))
* **tutorials:** migrate to the unified harness surface + renumber ([#428](https://github.com/scaleapi/scale-agentex-python/issues/428)) ([ebaf617](https://github.com/scaleapi/scale-agentex-python/commit/ebaf617256c7971dde12fd7e25f02b05f2f42fca))

## 0.14.0 (2026-06-22)

Full Changelog: [agentex-client-v0.13.1...agentex-client-v0.14.0](https://github.com/scaleapi/scale-agentex-python/compare/agentex-client-v0.13.1...agentex-client-v0.14.0)
Expand Down
8 changes: 8 additions & 0 deletions adk/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Changelog

## 0.14.0 (2026-06-23)

Full Changelog: [agentex-sdk-v0.13.2...agentex-sdk-v0.14.0](https://github.com/scaleapi/scale-agentex-python/compare/agentex-sdk-v0.13.2...agentex-sdk-v0.14.0)

### Features

* **harness:** public adk facade + docs for the unified harness surface (PR 9) ([#423](https://github.com/scaleapi/scale-agentex-python/issues/423)) ([fa60632](https://github.com/scaleapi/scale-agentex-python/commit/fa60632f9be84315a3fdc627745ae5b605994bd8))

## 0.13.2 (2026-06-22)

Full Changelog: [agentex-sdk-v0.13.1...agentex-sdk-v0.13.2](https://github.com/scaleapi/scale-agentex-python/compare/agentex-sdk-v0.13.1...agentex-sdk-v0.13.2)
Expand Down
196 changes: 196 additions & 0 deletions adk/docs/harness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# Unified Harness Surface

The unified harness surface gives every agent harness (pydantic-ai, LangGraph, OpenAI Agents, and future parsers) a single, shared path to streaming, message persistence, and tracing. The Agentex `StreamTaskMessage*` event stream is the canonical wire format. A harness tap produces that stream once; the shared machinery delivers it and derives spans from it.

All public names are re-exported from `agentex.lib.adk`:

```python
from agentex.lib.adk import (
UnifiedEmitter,
SpanTracer,
TurnUsage,
TurnResult,
HarnessTurn,
StreamTaskMessage,
OpenSpan,
CloseSpan,
SpanSignal,
)
```

The implementation lives at `src/agentex/lib/core/harness/`.

---

## The canonical stream: `StreamTaskMessage`

`StreamTaskMessage` is a union of the four wire-protocol update types:

```
StreamTaskMessageStart - opens a content slot (text, reasoning, tool request, ...)
StreamTaskMessageDelta - appends a token/fragment to an open slot
StreamTaskMessageFull - posts a complete message in one shot (tool response, ...)
StreamTaskMessageDone - closes an open slot
```

Every harness tap produces a sequence of these. Everything downstream (delivery, tracing) reads the same sequence.

---

## Per-harness taps: `convert_<harness>_to_agentex_events`

A tap is an async generator that translates the harness's native event stream into `StreamTaskMessage*` events. The currently shipped taps are:

| Harness | Tap function | Exported from |
|---|---|---|
| pydantic-ai | `convert_pydantic_ai_to_agentex_events` | `agentex.lib.adk` |
| LangGraph | `convert_langgraph_to_agentex_events` | `agentex.lib.adk` |

Taps for claude-code and codex will be added in subsequent PRs (AGX1-420, AGX1-421) and exported from `agentex.lib.adk` in the same way.

---

## `HarnessTurn` protocol

`HarnessTurn` is the interface a harness turn object must satisfy to plug into `UnifiedEmitter`:

```python
@runtime_checkable
class HarnessTurn(Protocol):
@property
def events(self) -> AsyncIterator[StreamTaskMessage]: ...

def usage(self) -> TurnUsage: ...
```

`events` is the canonical stream for this turn. `usage()` is valid only after `events` is exhausted (async generators cannot cleanly return a value to the consumer, so usage travels out-of-band).

---

## `TurnUsage`

Token counts and cost for one turn, harness-independent:

```python
class TurnUsage(BaseModel):
model: str | None = None
input_tokens: int | None = None
output_tokens: int | None = None
cached_input_tokens: int | None = None
reasoning_tokens: int | None = None
total_tokens: int | None = None
cost_usd: float | None = None
duration_ms: int | None = None
num_llm_calls: int = 0
num_tool_calls: int = 0
num_reasoning_blocks: int = 0
```

Field names align with `agentex.lib.core.observability.llm_metrics` for easy conversion.

---

## `UnifiedEmitter`

`UnifiedEmitter` ties a turn's canonical stream, tracing context, and delivery mode together. Construct one per turn with the task/trace context from the request:

```python
emitter = UnifiedEmitter(
task_id=params.task.id,
trace_id=params.task.id, # or None to disable tracing
parent_span_id=turn_span.id if turn_span else None,
)
```

**Tracing is on by default** when `trace_id` is provided. To disable it explicitly, pass `tracer=False`. To inject a custom `SpanTracer` (e.g. in tests), pass it as `tracer=<instance>`.

### Delivery mode 1: `yield_turn` (sync HTTP ACP)

For sync ACP agents that return events directly over the HTTP response:

```python
@acp.on_message_send
async def handle(params):
turn = MyHarnessTurn(params) # implements HarnessTurn
async for event in emitter.yield_turn(turn):
yield event
```

`yield_turn` forwards each event to the caller and traces spans as a side effect. It is a passthrough when `tracer` is `None`.

### Delivery mode 2: `auto_send_turn` (async/Temporal)

For async or Temporal agents that push to the task stream via Redis:

```python
result: TurnResult = await emitter.auto_send_turn(turn, created_at=workflow.now())
```

`auto_send_turn` drives `adk.streaming` contexts for every message in the stream, derives and records spans, and returns a `TurnResult` with the final text and usage. Pass `created_at` under Temporal to back-date message timestamps deterministically.

---

## `TurnResult`

```python
class TurnResult(BaseModel):
final_text: str = ""
usage: TurnUsage = TurnUsage()
```

Returned by `auto_send_turn`. `final_text` is the last text segment of the turn (multi-step runs return only the final segment, matching `stream_langgraph_events` / `stream_pydantic_ai_events` semantics).

---

## Tracing: span derivation

Spans are derived from the canonical stream by `SpanDeriver` (pure, no `adk` dependency) and dispatched to `adk.tracing` by `SpanTracer`. The mapping:

- `StreamTaskMessageStart(ToolRequestContent)` + `StreamTaskMessageDone` on that index -> tool span open (keyed by `tool_call_id`)
- `StreamTaskMessageFull(ToolResponseContent)` whose `tool_call_id` was opened -> tool span close
- `StreamTaskMessageFull(ToolRequestContent)` (harnesses that emit tool calls as Full) -> opens a tool span; matching `Full(ToolResponseContent)` closes it
- `StreamTaskMessageStart(ReasoningContent)` + `StreamTaskMessageDone` -> reasoning span

`SpanTracer` is `SpanDeriver`'s consumer. You can inject a custom `SpanTracer` via `UnifiedEmitter(tracer=<instance>)` for advanced use or testing.

---

## Usage examples by channel

### Sync ACP (pydantic-ai tap)

```python
import agentex.lib.adk as adk
from agentex.lib.adk import UnifiedEmitter, convert_pydantic_ai_to_agentex_events

@acp.on_message_send
async def handle(params):
task_id = params.task.id
async with adk.tracing.span(trace_id=task_id, name="message", ...) as turn_span:
emitter = UnifiedEmitter(
task_id=task_id,
trace_id=task_id,
parent_span_id=turn_span.id if turn_span else None,
)
tap = convert_pydantic_ai_to_agentex_events(pydantic_stream)
# wrap tap in a HarnessTurn then yield_turn, or yield directly:
async for event in tap:
yield event
```

For the pre-unified sync path the tap is still yielded directly; `UnifiedEmitter.yield_turn` is the forward-looking integration point when a `HarnessTurn` wrapper is available.

### Async Temporal (auto-send)

```python
from agentex.lib.adk import UnifiedEmitter

emitter = UnifiedEmitter(
task_id=task_id,
trace_id=task_id,
parent_span_id=parent_span_id,
)
result = await emitter.auto_send_turn(turn, created_at=workflow.now())
# result.final_text — last text segment
# result.usage — TurnUsage (tokens, cost, ...)
```
2 changes: 1 addition & 1 deletion adk/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# (agentex/{__init__.py, _*.py, types/, resources/}) ships from the slim
# sibling package `agentex-client` which is pinned as a runtime dep.
name = "agentex-sdk"
version = "0.13.2"
version = "0.14.0"
description = "Agent Development Kit (ADK) overlay for the Agentex API — FastACP server, Temporal workflows, LLM provider integrations, observability"
license = "Apache-2.0"
authors = [
Expand Down
12 changes: 12 additions & 0 deletions api.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,3 +245,15 @@ Methods:
- <code title="post /checkpoints/get-tuple">client.checkpoints.<a href="./src/agentex/resources/checkpoints.py">get_tuple</a>(\*\*<a href="src/agentex/types/checkpoint_get_tuple_params.py">params</a>) -> <a href="./src/agentex/types/checkpoint_get_tuple_response.py">Optional[CheckpointGetTupleResponse]</a></code>
- <code title="post /checkpoints/put">client.checkpoints.<a href="./src/agentex/resources/checkpoints.py">put</a>(\*\*<a href="src/agentex/types/checkpoint_put_params.py">params</a>) -> <a href="./src/agentex/types/checkpoint_put_response.py">CheckpointPutResponse</a></code>
- <code title="post /checkpoints/put-writes">client.checkpoints.<a href="./src/agentex/resources/checkpoints.py">put_writes</a>(\*\*<a href="src/agentex/types/checkpoint_put_writes_params.py">params</a>) -> None</code>

# Webhooks

Types:

```python
from agentex.types import WebhookCreateWebhookTriggerResponse
```

Methods:

- <code title="post /agent_api_keys/webhook-trigger">client.webhooks.<a href="./src/agentex/resources/webhooks.py">create_webhook_trigger</a>(\*\*<a href="src/agentex/types/webhook_create_webhook_trigger_params.py">params</a>) -> <a href="./src/agentex/types/webhook_create_webhook_trigger_response.py">WebhookCreateWebhookTriggerResponse</a></code>
Loading
Loading