feat(openai-agents): migrate onto the unified harness surface by declan-scale · Pull Request #416 · scaleapi/scale-agentex-python

declan-scale · 2026-06-18T20:19:04Z

What

PR 6 of the unified-harness-surface series: migrate the OpenAI Agents SDK integration onto the shared harness surface.

Library

OpenAITurn (src/agentex/lib/adk/providers/_modules/openai_turn.py): a HarnessTurn adapter that wraps a Runner.run_streamed result. It converts the SDK's native events into the canonical StreamTaskMessage* stream via the existing convert_openai_to_agentex_events, and after the stream is exhausted reads result.raw_responses to aggregate per-response usage into a provider-independent TurnUsage.
- openai_usage_to_turn_usage(usage, model) maps agents.Usage -> TurnUsage with defensive getattr access so present-but-zero values (e.g. 0 output tokens on a cache hit) survive as 0, not None.
- _aggregate_usage(raw_responses) sums usage across ModelResponses via Usage.add, skipping responses without usage.
- Accepts either result= (a streamed run) or stream= (a pre-built canonical stream, for tests); raises ValueError if neither. coalesce_tool_requests is a no-op kept for API parity.
OpenAIService.run_agent_streamed_auto_send: replaced the ~270-line inline streaming/reasoning/span loop with UnifiedEmitter.auto_send_turn(OpenAITurn(result=result, model=model)). Guardrail tripwire handling and the RunResultStreaming return type are preserved. The created_at first-message ordering limitation under the unified path is documented in a comment. OpenAITurn is imported lazily inside the method to avoid a circular import at package init.
SyncStreamingModel / SyncStreamingProvider: docstring-deprecated (no runtime warning), pointing at the harness pattern.

Tests

tests/lib/adk/providers/test_openai_turn.py: usage mapping (full / None / real zeros), _aggregate_usage (empty / single / multiple), events driven by an injected canonical stream, usage() before/after exhaustion (including the result-backed path), and the ValueError guard.
tests/lib/core/harness/conformance/test_openai_conformance.py: text-only, tool-call, reasoning, and multi-step canonical fixtures; registers module-locally and parametrizes over its own list to avoid the cross-module global-registry hazard.
tests/lib/adk/providers/test_openai_activities.py: updated the streamed-auto-send activity test to the new contract (full tool messages are posted by opening a context with initial_content and closing it, no stream_update).

Tutorials

Three tutorials demonstrating the same OpenAITurn across delivery modes, each with an offline test (no server / Redis / Temporal / API key required):

examples/tutorials/00_sync/060_harness_openai — UnifiedEmitter.yield_turn
examples/tutorials/10_async/00_base/130_harness_openai — UnifiedEmitter.auto_send_turn
examples/tutorials/10_async/10_temporal/140_harness_openai — auto_send_turn inside a custom Temporal activity

Verification

./scripts/lint — clean (ruff + pyright, 0 errors)
Full tests/ suite — 1016 passed, 1376 skipped
All three tutorial offline tests pass individually

🤖 Generated with Claude Code

Greptile Summary

This PR migrates the OpenAI Agents SDK integration onto the shared harness surface by introducing OpenAITurn (a HarnessTurn adapter) and replacing the ~270-line inline streaming loop in run_agent_streamed_auto_send with UnifiedEmitter.auto_send_turn. It also adds three tutorial projects, a new unit-test module for OpenAITurn, and cross-channel conformance fixtures.

OpenAITurn wraps RunResultStreaming, converts native SDK events to canonical StreamTaskMessage* via convert_openai_to_agentex_events, and aggregates raw_responses usage after stream exhaustion; UnifiedEmitter.auto_send_turn correctly reads turn.usage() only after consuming the stream, fixing the previously reported stale-usage bug.
run_agent_streamed_auto_send now uses the 4-branch Runner.run_streamed pattern to restore previous_response_id forwarding, and auto_send_turn receives created_at directly (stamping all streaming contexts rather than only the first).
Tutorials (060_harness_openai, 130_harness_openai, 140_harness_openai) demonstrate sync, async, and Temporal delivery modes; the Temporal tutorial now correctly accumulates multi-turn conversation history via input_list + result.to_input_list().

Confidence Score: 5/5

Safe to merge; the two observations are edge-case behavioral changes that do not affect the core delivery path for normal (non-guardrail) turns.

The migration correctly fixes both previously reported defects (stale usage on auto_send and silent previous_response_id drop). The new OpenAITurn adapter, conformance tests, and updated activity tests are solid. The two flagged items are narrow: the Temporal heartbeat concern only materialises when heartbeat_timeout is explicitly configured AND streaming exceeds it; the created_at dispenser change only affects output-guardrail rejection message ordering, an uncommon path.

src/agentex/lib/core/services/adk/providers/openai.py — the heartbeat and created_at dispenser changes are worth a follow-up if Temporal heartbeat_timeout is configured in production for this activity.

Important Files Changed

Filename	Overview
src/agentex/lib/adk/providers/_modules/openai_turn.py	New HarnessTurn adapter wrapping RunResultStreaming; usage aggregation is correct (populated after stream exhaustion via _iter_events) and defensive getattr access is well-handled.
src/agentex/lib/core/services/adk/providers/openai.py	run_agent_streamed_auto_send migrated to UnifiedEmitter; previous_response_id forwarding restored (4-branch pattern), but per-event heartbeating from the old inline loop is gone and the created_at dispenser semantics changed for output guardrail rejection messages.
src/agentex/lib/adk/providers/_modules/sync_provider.py	Docstring deprecations added to SyncStreamingModel/SyncStreamingProvider; remaining changes are whitespace/quote normalization, no behavioral changes.
tests/lib/adk/providers/test_openai_turn.py	Comprehensive unit tests for OpenAITurn; covers usage mapping, aggregation, stream passthrough, and the ValueError guard. The result-backed usage test correctly monkeypatches the converter.
tests/lib/core/harness/conformance/test_openai_conformance.py	Cross-channel conformance fixtures (text, tool-call, reasoning, multi-step) parametrized over a module-local list to avoid global-registry hazards; safe and well-structured.
tests/lib/adk/providers/test_openai_activities.py	Activity tests updated to the unified harness contract; new tests for previous_response_id forwarding and created_at propagation are solid coverage additions.
examples/tutorials/10_async/10_temporal/140_harness_openai/project/activities.py	Multi-turn history is now correctly threaded through input_list; runner.to_input_list() is called after auto_send_turn exhausts the stream, which is the correct ordering.
examples/tutorials/10_async/10_temporal/140_harness_openai/project/workflow.py	Workflow accumulates self._messages and passes it to the activity on each turn; multi-turn memory is properly maintained across signals.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant W as Workflow / ACP
    participant S as OpenAIService
    participant R as Runner.run_streamed
    participant OT as OpenAITurn
    participant UE as UnifiedEmitter
    participant AS as auto_send
    participant ST as StreamingService

    W->>S: run_agent_streamed_auto_send(input, created_at)
    S->>R: run_streamed(agent, input, [max_turns], [prev_resp_id])
    R-->>S: RunResultStreaming
    S->>OT: OpenAITurn(result, model)
    S->>UE: UnifiedEmitter(task_id, tracer, streaming)
    S->>UE: auto_send_turn(turn, created_at)
    UE->>AS: "auto_send(turn.events, created_at=created_at)"
    loop For each canonical event
        AS->>OT: _iter_events() → convert_openai_to_agentex_events
        OT-->>AS: "StreamTaskMessage* (Start / Delta / Done / Full)"
        AS->>ST: "streaming_task_message_context(created_at=created_at)"
        AS->>ST: stream_update / close
    end
    Note over OT: After last event: aggregate raw_responses → TurnUsage
    AS-->>UE: "TurnResult(final_text, usage=empty_default)"
    UE->>OT: turn.usage()
    OT-->>UE: TurnUsage (populated)
    UE-->>S: TurnResult (with usage)
    S-->>W: RunResultStreaming

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant W as Workflow / ACP
    participant S as OpenAIService
    participant R as Runner.run_streamed
    participant OT as OpenAITurn
    participant UE as UnifiedEmitter
    participant AS as auto_send
    participant ST as StreamingService

    W->>S: run_agent_streamed_auto_send(input, created_at)
    S->>R: run_streamed(agent, input, [max_turns], [prev_resp_id])
    R-->>S: RunResultStreaming
    S->>OT: OpenAITurn(result, model)
    S->>UE: UnifiedEmitter(task_id, tracer, streaming)
    S->>UE: auto_send_turn(turn, created_at)
    UE->>AS: "auto_send(turn.events, created_at=created_at)"
    loop For each canonical event
        AS->>OT: _iter_events() → convert_openai_to_agentex_events
        OT-->>AS: "StreamTaskMessage* (Start / Delta / Done / Full)"
        AS->>ST: "streaming_task_message_context(created_at=created_at)"
        AS->>ST: stream_update / close
    end
    Note over OT: After last event: aggregate raw_responses → TurnUsage
    AS-->>UE: "TurnResult(final_text, usage=empty_default)"
    UE->>OT: turn.usage()
    OT-->>UE: TurnUsage (populated)
    UE-->>S: TurnResult (with usage)
    S-->>W: RunResultStreaming

Comments Outside Diff (2)

src/agentex/lib/adk/providers/_modules/sync_provider.py, line 564-572 (link)

Reasoning spans are missed

The converter starts OpenAI reasoning output as TextContent, but the shared span derivation opens reasoning spans only when the start content has type reasoning. Real OpenAI reasoning streams therefore flow through as text starts, so the unified harness never derives the reasoning span that the new conformance fixture expects.

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/agentex/lib/adk/providers/_modules/sync_provider.py
Line: 564-572

Comment:
**Reasoning spans are missed**

The converter starts OpenAI reasoning output as `TextContent`, but the shared span derivation opens reasoning spans only when the start content has type `reasoning`. Real OpenAI reasoning streams therefore flow through as text starts, so the unified harness never derives the reasoning span that the new conformance fixture expects.

How can I resolve this? If you propose a fix, please make it concise.

src/agentex/lib/core/services/adk/providers/openai.py, line 794 (link)

run_agent_streamed_auto_send silently drops previous_response_id
- Bug
  - previous_response_id is accepted as a parameter on line 681 but is never forwarded to Runner.run_streamed on lines 794-797. The migrated method uses only a 2-branch if/else (max_turns or not), while all three sibling methods use a 4-branch matrix that correctly forwards previous_response_id.
- Cause
  - During the migration to the unified harness (OpenAITurn + UnifiedEmitter), the Runner.run_streamed call was simplified to 2 branches, dropping the previous_response_id forwarding. The # noqa: ARG002 annotation on line 681 suppressed the linter warning that would have caught the unused argument.
- Fix
  - Replace the 2-branch if/else at lines 794-797 with the same 4-branch pattern used by run_agent_streamed (lines 632-646), forwarding previous_response_id to Runner.run_streamed when it is not None.
Artifacts

Supporting artifact from the T-Rex run
- Contains supporting evidence from the run (text/markdown; charset=utf-8).
_{Ran code and verified through T-Rex}

_{Reviews (11): Last reviewed commit: "fix(openai): forward previous_response_i..." | Re-trigger Greptile}

declan-scale · 2026-06-18T21:19:54Z

@greptile review

Add OpenAITurn, a HarnessTurn adapter that wraps an OpenAI Agents SDK streamed run (Runner.run_streamed) and converts its native events into the canonical StreamTaskMessage* stream via convert_openai_to_agentex_events, aggregating per-response usage into a provider-independent TurnUsage after stream exhaustion. Defensive getattr access preserves real zeros. Refactor OpenAIService.run_agent_streamed_auto_send to drive delivery, tracing, and usage through UnifiedEmitter.auto_send_turn(OpenAITurn(...)), replacing the ~270-line inline streaming loop. Guardrail tripwire handling and the RunResultStreaming return type are preserved; the created_at first-message ordering limitation under the unified path is documented. Docstring-deprecate SyncStreamingModel/SyncStreamingProvider (no runtime warning). Add unit tests for OpenAITurn + usage mapping, OpenAI conformance fixtures (module-local registry), update the streamed-auto-send activity test to the new full-message contract, and add three tutorials (sync 060, async 130, temporal 140) demonstrating OpenAITurn with yield_turn / auto_send_turn, each with an offline test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…378) Thread the workflow-supplied created_at through UnifiedEmitter.auto_send_turn(turn, created_at=created_at) so the first agent message of the turn is stamped with the deterministic timestamp (e.g. workflow.now()) just as the original inline loop did before the unified-harness migration. The foundation (b4b8b33) wired auto_send_turn to accept and forward created_at to every streaming_task_message_context call. This commit connects the call site in run_agent_streamed_auto_send to that new parameter, restoring the behaviour that the migration comment documented as a known trade-off. Update the stale limitation comment to reflect the fix. Add test_run_agent_streamed_auto_send_forwards_created_at, which drives the activity through a fake stream with a pinned datetime and asserts every streaming context receives that datetime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ation Replace the old determinism-only test (derive_all) with the full cross-channel assertion pattern: register fixtures with per-module _OPENAI_FIXTURES, call run_cross_channel_conformance, and assert logical-delivery and span-signal equivalence across yield_events and auto_send — matching the pattern in test_conformance.py. Swap ReasoningSummaryDelta for ReasoningContentDelta so the runner's payload accumulator recognises the delta type and the payload comparison exercises the reasoning seeding path. Remove derive_all import. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… [greptile] The activity created a fresh agent each turn and passed only the latest user message to Runner.run_streamed, so the model had no memory of prior turns. Thread the running conversation through the workflow instance (self._messages): pass the prior input_list into the activity, build [*history, user_message] for the run, and return result.to_input_list() so the next turn continues the conversation. The activity now returns RunHarnessAgentResult (final_text + input_list); the workflow deserializes it via result_type. Note: the separate 06-22 "usage always empty in the auto_send path" comment is resolved by the foundation — UnifiedEmitter.auto_send_turn now reads turn.usage() AFTER auto_send drains the stream (no eager capture). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test_run_agent_streamed_auto_send_forwards_created_at fed an empty stream, so auto_send opened zero streaming contexts and `all(ts == deterministic_ts for ts in recorded_created_ats)` was vacuously true — it could not catch a created_at regression. Emit a tool call + tool response so contexts are actually opened, and assert recorded_created_ats is non-empty before checking each value. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…send [greptile] The unified-surface migration of run_agent_streamed_auto_send dropped previous_response_id: it was accepted (suppressed by noqa: ARG002) but never passed to Runner.run_streamed, so any caller continuing a Responses-API conversation silently started a fresh one. Mirror the non-auto-send run_agent_streamed branching (max_turns x previous_response_id) and drop the now-incorrect noqa. The activity layer already forwarded params.previous_response_id. Adds a test asserting the id reaches Runner.run_streamed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread examples/tutorials/10_async/10_temporal/140_harness_openai/project/activities.py Outdated

declan-scale force-pushed the declan-scale/pr6-openai branch from d1c5c65 to ab92b50 Compare June 18, 2026 21:09

declan-scale force-pushed the declan-scale/unified-harness-surface branch from b4b8b33 to da780a1 Compare June 22, 2026 13:48

declan-scale force-pushed the declan-scale/pr6-openai branch from ab92b50 to cbc2a9b Compare June 22, 2026 13:53

declan-scale changed the base branch from declan-scale/unified-harness-surface to declan-scale/agx1-373-conformance-equivalence June 22, 2026 13:53

greptile-apps Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread src/agentex/lib/adk/providers/_modules/openai_turn.py

declan-scale force-pushed the declan-scale/agx1-373-conformance-equivalence branch from 37421b6 to df3461c Compare June 22, 2026 14:13

declan-scale force-pushed the declan-scale/pr6-openai branch 2 times, most recently from dcc0b33 to e3c14a8 Compare June 22, 2026 14:37

declan-scale force-pushed the declan-scale/agx1-373-conformance-equivalence branch from ccbd5cf to e3fa1cc Compare June 22, 2026 15:14

declan-scale force-pushed the declan-scale/pr6-openai branch 2 times, most recently from d2f4389 to 045b29e Compare June 22, 2026 15:53

danielmillerp approved these changes Jun 22, 2026

View reviewed changes

declan-scale force-pushed the declan-scale/agx1-373-conformance-equivalence branch from c8c63d1 to 05120f3 Compare June 22, 2026 18:47

declan-scale force-pushed the declan-scale/pr6-openai branch from d01bc67 to 151aef7 Compare June 22, 2026 18:47

declan-scale force-pushed the declan-scale/agx1-373-conformance-equivalence branch from 05120f3 to c9a907c Compare June 22, 2026 19:54

declan-scale force-pushed the declan-scale/pr6-openai branch from 151aef7 to 54893b6 Compare June 22, 2026 19:54

declan-scale force-pushed the declan-scale/agx1-373-conformance-equivalence branch from c9a907c to a04bf5e Compare June 22, 2026 20:01

Base automatically changed from declan-scale/agx1-373-conformance-equivalence to next June 22, 2026 20:09

declan-scale and others added 6 commits June 22, 2026 16:10

declan-scale force-pushed the declan-scale/pr6-openai branch from 54893b6 to 9b3ec57 Compare June 22, 2026 20:11

declan-scale merged commit d10e151 into next Jun 22, 2026
48 checks passed

declan-scale deleted the declan-scale/pr6-openai branch June 22, 2026 22:21

stainless-app Bot mentioned this pull request Jun 22, 2026

chore: release main #424

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openai-agents): migrate onto the unified harness surface#416

feat(openai-agents): migrate onto the unified harness surface#416
declan-scale merged 6 commits into
nextfrom
declan-scale/pr6-openai

declan-scale commented Jun 18, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

declan-scale commented Jun 18, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

declan-scale commented Jun 18, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Library

Tests

Tutorials

Verification

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

Uh oh!

declan-scale commented Jun 18, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

declan-scale commented Jun 18, 2026 •

edited by greptile-apps Bot

Loading