Skip to content

Commit 91579ab

Browse files
declan-scaleclaude
andcommitted
docs: golden-agent adoption plan for the unified harness surface
Migration plan (implementation in agentex-agents/golden_agent): replace the bespoke HarnessEvent/AgentexStreamAdapter + per-provider parsers with the SDK taps + UnifiedEmitter, keeping all SGP-coupled orchestration (sandbox pool, secrets, MCP reauth, data-plane override). Phased: claude-code -> codex -> delete harness layer -> optional in-process OpenAI-Agents. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 37421b6 commit 91579ab

1 file changed

Lines changed: 89 additions & 0 deletions

File tree

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Golden Agent — Adopting the Unified Harness Surface
2+
3+
Date: 2026-06-22
4+
Status: Plan (implementation lands in `agentex-agents`, after the SDK stack merges + releases)
5+
SDK repo: `scale-agentex-python` · Agent repo: `agentex-agents` (`teams/sgp/agents/golden_agent`)
6+
7+
## Goal
8+
9+
Replace the golden agent's bespoke harness internals (its neutral `HarnessEvent` vocabulary, the `AgentexStreamAdapter`, and the per-provider stream parsers) with the now-first-class SDK unified surface — `convert_<harness>_to_agentex_events` taps + `<Harness>Turn` + `UnifiedEmitter` — while keeping every SGP-coupled orchestration concern (sandbox pool, sandbox setup, secret/MCP reauth, data-plane override) exactly where it is. Net effect: the golden agent stops maintaining its own parsing/streaming/tracing layer and consumes the SDK's, which now carries the AGX1-377/378 fixes and cross-channel conformance.
10+
11+
This is the payoff of the SDK harness-surface workstream (PRs: foundation #412, conformance #414, pydantic #415, openai #416, langgraph #417, claude-code #420, codex #421). The SDK *enables* the surface; this plan *consumes* it in production.
12+
13+
## Current golden-agent internals (what gets replaced vs kept)
14+
15+
Paths under `teams/sgp/agents/golden_agent/project/`.
16+
17+
| Area | File(s) | Disposition |
18+
|------|---------|-------------|
19+
| Neutral event vocabulary | `harness/events.py` (`HarnessEvent`) | **Delete** — superseded by the SDK canonical `StreamTaskMessage*` stream. |
20+
| Event→adk bridge | `harness/adapter.py` (`AgentexStreamAdapter`) | **Delete** — superseded by `UnifiedEmitter` (yield + auto_send) which drives `adk.streaming`/`adk.tracing` and derives spans. |
21+
| Provider protocol | `harness/protocol.py` (`HarnessProvider`) | **Delete or shrink** — providers no longer need to emit `HarnessEvent`; they produce the CLI's stdout stream and hand it to the SDK tap. |
22+
| claude-code parser | `harness/providers/claude.py` `_StreamJsonProcessor` | **Delete** — replaced by SDK `convert_claude_code_to_agentex_events` + `ClaudeCodeTurn`. Keep the sandbox/CLI-spawn parts of `ClaudeProvider`. |
23+
| codex parser | `harness/providers/codex.py` `_CodexEventProcessor` | **Delete** — replaced by SDK `convert_codex_to_agentex_events` + `CodexTurn`. Keep sandbox/CLI-spawn. |
24+
| Turn dispatch activity | `harness/activity.py` (`execute_agent_turn`) | **Simplify** — keep provider selection + heartbeat + metrics; replace the `provider→HarnessEvent→adapter` loop with `tap → <Harness>Turn → UnifiedEmitter.auto_send_turn`. |
25+
| In-process OpenAI-Agents harness | `harness/oai_mcp.py`, `oai_hooks.py`, `oai_streaming_model.py` | **Phase 2 (optional)** — could adopt the SDK openai tap; trickiest, deferred. |
26+
| Sandbox pool / setup / config / data-plane | `sandbox_pool.py`, `sandbox_setup.py`, `sandbox_config.py`, `sandbox_client_oai.py`, `pool_activities.py` | **Keep unchanged** (SGP-coupled; out of SDK scope). |
27+
| Secret / MCP reauth | `secrets.py`, `internal-packages/sgp_secrets_client` | **Keep unchanged** (SGP/identity-service-coupled). |
28+
| Capabilities / catalog / prompts / workflow / cron | `capabilities/*`, `prompts/*`, `workflow.py`, `cron.py`, `meta_activities.py` | **Keep unchanged.** |
29+
| Reconnect notices | `harness/notices.py` | **Keep** — independent of the harness stream (a standalone task message). |
30+
31+
## Target flow (per turn, inside the Temporal activity)
32+
33+
The golden agent runs under Temporal, so delivery uses the **auto_send** channel from inside the activity (the SDK `UnifiedEmitter.auto_send_turn`, which already runs correctly in an activity).
34+
35+
```
36+
execute_agent_turn (activity):
37+
1. Acquire/reconnect sandbox (pool), resolve secrets, render MCP config # KEEP — SGP-coupled
38+
2. Emit sandbox-setup steps as ToolRequestContent/ToolResponseContent # KEEP — now agentex content
39+
3. Spawn `claude -p --output-format stream-json` / `codex exec` in sandbox # KEEP — CLI spawn
40+
4. turn = ClaudeCodeTurn(chain(setup_events, sandbox.stdout_lines)) # NEW — SDK tap + Turn
41+
5. result = await UnifiedEmitter(task_id, trace_id, parent_span_id)\
42+
.auto_send_turn(turn, created_at=workflow.now()) # NEW — SDK delivery
43+
6. emit per-turn metrics from result.usage (TurnUsage) # KEEP — DogStatsD, now fed by TurnUsage
44+
```
45+
46+
### Sandbox-setup event interleaving
47+
Today the provider yields sandbox provisioning steps (reconnect / find / create / configure-git / clone) as `ToolStarted`/`ToolCompleted` `HarnessEvent`s that flow through the adapter so they appear in the UI + trace. Under the unified surface these become agent-produced `ToolRequestContent`/`ToolResponseContent` `StreamTaskMessage*` messages, **chained before** the harness tap's stream into one canonical stream for the turn (`chain(setup_events, convert_claude_code_to_agentex_events(stdout))`). `UnifiedEmitter` then delivers and traces the whole turn uniformly — setup steps keep showing in the UI and span tree.
48+
49+
### Determinism / timestamps
50+
Pass `created_at=workflow.now()` to `auto_send_turn` (AGX1-378) so the turn's messages carry deterministic Temporal timestamps, matching the prior dispenser behavior.
51+
52+
### Usage / metrics
53+
`auto_send_turn` returns a `TurnResult` with a normalized `TurnUsage`. The golden agent's per-turn DogStatsD metrics (`metrics.py`) read from `TurnUsage` instead of the old `TurnCompleted` event — one shape for traces + metrics.
54+
55+
## Phases
56+
57+
**Phase 0 — Prereqs (no golden-agent code yet)**
58+
- SDK stack merges to `next`/main and a version is released (or pin a pre-release).
59+
- Confirm the public import path (AGX1-375): import the surface from the public `adk.*` facade if available, else `agentex.lib.core.harness` / `adk._modules`.
60+
- Bump the golden agent's `scale-agentex` dependency to that version. Watch the `uv.lock` churn (commit it deliberately; see the team's lint-before-push rule).
61+
62+
**Phase 1 — claude-code provider**
63+
- In `ClaudeProvider`, keep sandbox acquisition, secret/MCP injection, and the `claude -p` spawn. Replace the `_StreamJsonProcessor` + `HarnessEvent` yielding with: produce the CLI's stdout as an async line iterator and wrap it in `ClaudeCodeTurn`.
64+
- In `execute_agent_turn`, replace the `AgentexStreamAdapter` loop with `UnifiedEmitter(...).auto_send_turn(turn, created_at=workflow.now())`; chain the sandbox-setup content before the tap stream.
65+
- Delete `_StreamJsonProcessor`.
66+
- Verify against the golden agent's existing turn tests + a live claude-code turn in a dev sandbox: UI streaming, tool spans, reasoning, usage/metrics all intact.
67+
68+
**Phase 2 — codex provider** — same as Phase 1 with `convert_codex_to_agentex_events` + `CodexTurn`; delete `_CodexEventProcessor`.
69+
70+
**Phase 3 — retire the bespoke harness layer** — delete `harness/events.py`, `harness/adapter.py`, and shrink/delete `harness/protocol.py`; simplify `harness/activity.py` to the tap→Turn→emitter shape. Confirm no remaining imports of the deleted symbols.
71+
72+
**Phase 4 (optional, later) — in-process OpenAI-Agents (litellm) harness** — adopt the SDK openai tap (`OpenAITurn` / `run_agent_streamed_auto_send`) in place of `oai_streaming_model.py`/`oai_hooks.py`. This path runs the OpenAI Agents SDK in-process inside the workflow and is the most coupled to Temporal context; treat as a separate, carefully-scoped follow-up.
73+
74+
## Testing
75+
- The SDK's cross-channel conformance (#414) + per-harness fixtures already prove the taps produce correct, channel-equivalent streams + spans + usage. The golden agent inherits that confidence by consuming them.
76+
- Golden-agent side: keep its existing turn/integration tests; add a dev-sandbox live smoke per provider (claude-code, codex) asserting streamed text + tool request/response + reasoning + a well-formed span tree + non-zero `TurnUsage`.
77+
- Regression watch: UI message shapes (text/reasoning/tool), span nesting, and per-turn metrics must match pre-migration behavior.
78+
79+
## What stays SGP-side permanently (never moves to the SDK)
80+
Per the original layering analysis: the sandbox pool + acquire modes, sandbox setup orchestration, `_override_data_plane` (ARP in-cluster routing), the sgp-secrets client, and user-scoped secret resolution / OAuth MCP reauth. The reauth refresh ideally migrates **identity-service-side** over time (so the agent becomes a dumb consumer of live tokens); the SDK only needs a generic "credential expired → emit reconnect notice" hook, not the sgp-secrets contract.
81+
82+
## Risks / watch-items
83+
- **Wire-shape parity:** the SDK `auto_send` delivers `Full` tool messages as open+close and streamed tool requests as `Start+Delta+Done` (both deliver equivalent content; AGX1-377 fixed the previously-dropped streamed shape). Validate the UI renders tool request/response identically to today.
84+
- **Reasoning/thinking mapping:** confirm claude-code `thinking` blocks and codex reasoning map to `ReasoningContent` the same way the old adapter did (the SDK taps were ported from these exact processors, so parity is expected — verify in a live turn).
85+
- **Heartbeat/timeout:** keep `execute_agent_turn`'s heartbeat pulse around the (now SDK-driven) consumption loop; long CLI turns must still heartbeat.
86+
- **uv.lock / dependency bump:** pin the SDK version explicitly; deliberate lock commit.
87+
88+
## Sequencing summary
89+
SDK PRs merge + release → bump golden agent dependency (Phase 0) → claude-code (Phase 1) → codex (Phase 2) → delete bespoke harness layer (Phase 3) → optional in-process OpenAI-Agents adoption (Phase 4). Each phase is independently shippable and reversible (the deleted code is recoverable from history until Phase 3 lands).

0 commit comments

Comments
 (0)