feat(openai-agents): migrate onto the unified harness surface#416
Merged
Conversation
d1c5c65 to
ab92b50
Compare
Contributor
Author
|
@greptile review |
b4b8b33 to
da780a1
Compare
ab92b50 to
cbc2a9b
Compare
37421b6 to
df3461c
Compare
dcc0b33 to
e3c14a8
Compare
ccbd5cf to
e3fa1cc
Compare
d2f4389 to
045b29e
Compare
danielmillerp
approved these changes
Jun 22, 2026
c8c63d1 to
05120f3
Compare
d01bc67 to
151aef7
Compare
05120f3 to
c9a907c
Compare
151aef7 to
54893b6
Compare
c9a907c to
a04bf5e
Compare
Base automatically changed from
declan-scale/agx1-373-conformance-equivalence
to
next
June 22, 2026 20:09
Add OpenAITurn, a HarnessTurn adapter that wraps an OpenAI Agents SDK streamed run (Runner.run_streamed) and converts its native events into the canonical StreamTaskMessage* stream via convert_openai_to_agentex_events, aggregating per-response usage into a provider-independent TurnUsage after stream exhaustion. Defensive getattr access preserves real zeros. Refactor OpenAIService.run_agent_streamed_auto_send to drive delivery, tracing, and usage through UnifiedEmitter.auto_send_turn(OpenAITurn(...)), replacing the ~270-line inline streaming loop. Guardrail tripwire handling and the RunResultStreaming return type are preserved; the created_at first-message ordering limitation under the unified path is documented. Docstring-deprecate SyncStreamingModel/SyncStreamingProvider (no runtime warning). Add unit tests for OpenAITurn + usage mapping, OpenAI conformance fixtures (module-local registry), update the streamed-auto-send activity test to the new full-message contract, and add three tutorials (sync 060, async 130, temporal 140) demonstrating OpenAITurn with yield_turn / auto_send_turn, each with an offline test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…378) Thread the workflow-supplied created_at through UnifiedEmitter.auto_send_turn(turn, created_at=created_at) so the first agent message of the turn is stamped with the deterministic timestamp (e.g. workflow.now()) just as the original inline loop did before the unified-harness migration. The foundation (b4b8b33) wired auto_send_turn to accept and forward created_at to every streaming_task_message_context call. This commit connects the call site in run_agent_streamed_auto_send to that new parameter, restoring the behaviour that the migration comment documented as a known trade-off. Update the stale limitation comment to reflect the fix. Add test_run_agent_streamed_auto_send_forwards_created_at, which drives the activity through a fake stream with a pinned datetime and asserts every streaming context receives that datetime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ation Replace the old determinism-only test (derive_all) with the full cross-channel assertion pattern: register fixtures with per-module _OPENAI_FIXTURES, call run_cross_channel_conformance, and assert logical-delivery and span-signal equivalence across yield_events and auto_send — matching the pattern in test_conformance.py. Swap ReasoningSummaryDelta for ReasoningContentDelta so the runner's payload accumulator recognises the delta type and the payload comparison exercises the reasoning seeding path. Remove derive_all import. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… [greptile] The activity created a fresh agent each turn and passed only the latest user message to Runner.run_streamed, so the model had no memory of prior turns. Thread the running conversation through the workflow instance (self._messages): pass the prior input_list into the activity, build [*history, user_message] for the run, and return result.to_input_list() so the next turn continues the conversation. The activity now returns RunHarnessAgentResult (final_text + input_list); the workflow deserializes it via result_type. Note: the separate 06-22 "usage always empty in the auto_send path" comment is resolved by the foundation — UnifiedEmitter.auto_send_turn now reads turn.usage() AFTER auto_send drains the stream (no eager capture). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
test_run_agent_streamed_auto_send_forwards_created_at fed an empty stream, so auto_send opened zero streaming contexts and `all(ts == deterministic_ts for ts in recorded_created_ats)` was vacuously true — it could not catch a created_at regression. Emit a tool call + tool response so contexts are actually opened, and assert recorded_created_ats is non-empty before checking each value. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…send [greptile] The unified-surface migration of run_agent_streamed_auto_send dropped previous_response_id: it was accepted (suppressed by noqa: ARG002) but never passed to Runner.run_streamed, so any caller continuing a Responses-API conversation silently started a fresh one. Mirror the non-auto-send run_agent_streamed branching (max_turns x previous_response_id) and drop the now-incorrect noqa. The activity layer already forwarded params.previous_response_id. Adds a test asserting the id reaches Runner.run_streamed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
54893b6 to
9b3ec57
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
PR 6 of the unified-harness-surface series: migrate the OpenAI Agents SDK integration onto the shared harness surface.
Library
OpenAITurn(src/agentex/lib/adk/providers/_modules/openai_turn.py): aHarnessTurnadapter that wraps aRunner.run_streamedresult. It converts the SDK's native events into the canonicalStreamTaskMessage*stream via the existingconvert_openai_to_agentex_events, and after the stream is exhausted readsresult.raw_responsesto aggregate per-response usage into a provider-independentTurnUsage.openai_usage_to_turn_usage(usage, model)mapsagents.Usage->TurnUsagewith defensivegetattraccess so present-but-zero values (e.g. 0 output tokens on a cache hit) survive as0, notNone._aggregate_usage(raw_responses)sums usage acrossModelResponses viaUsage.add, skipping responses without usage.result=(a streamed run) orstream=(a pre-built canonical stream, for tests); raisesValueErrorif neither.coalesce_tool_requestsis a no-op kept for API parity.OpenAIService.run_agent_streamed_auto_send: replaced the ~270-line inline streaming/reasoning/span loop withUnifiedEmitter.auto_send_turn(OpenAITurn(result=result, model=model)). Guardrail tripwire handling and theRunResultStreamingreturn type are preserved. Thecreated_atfirst-message ordering limitation under the unified path is documented in a comment.OpenAITurnis imported lazily inside the method to avoid a circular import at package init.SyncStreamingModel/SyncStreamingProvider: docstring-deprecated (no runtime warning), pointing at the harness pattern.Tests
tests/lib/adk/providers/test_openai_turn.py: usage mapping (full / None / real zeros),_aggregate_usage(empty / single / multiple),eventsdriven by an injected canonical stream,usage()before/after exhaustion (including the result-backed path), and theValueErrorguard.tests/lib/core/harness/conformance/test_openai_conformance.py: text-only, tool-call, reasoning, and multi-step canonical fixtures; registers module-locally and parametrizes over its own list to avoid the cross-module global-registry hazard.tests/lib/adk/providers/test_openai_activities.py: updated the streamed-auto-send activity test to the new contract (full tool messages are posted by opening a context withinitial_contentand closing it, nostream_update).Tutorials
Three tutorials demonstrating the same
OpenAITurnacross delivery modes, each with an offline test (no server / Redis / Temporal / API key required):examples/tutorials/00_sync/060_harness_openai—UnifiedEmitter.yield_turnexamples/tutorials/10_async/00_base/130_harness_openai—UnifiedEmitter.auto_send_turnexamples/tutorials/10_async/10_temporal/140_harness_openai—auto_send_turninside a custom Temporal activityVerification
./scripts/lint— clean (ruff + pyright, 0 errors)tests/suite — 1016 passed, 1376 skipped🤖 Generated with Claude Code
Greptile Summary
This PR migrates the OpenAI Agents SDK integration onto the shared harness surface by introducing
OpenAITurn(aHarnessTurnadapter) and replacing the ~270-line inline streaming loop inrun_agent_streamed_auto_sendwithUnifiedEmitter.auto_send_turn. It also adds three tutorial projects, a new unit-test module forOpenAITurn, and cross-channel conformance fixtures.OpenAITurnwrapsRunResultStreaming, converts native SDK events to canonicalStreamTaskMessage*viaconvert_openai_to_agentex_events, and aggregatesraw_responsesusage after stream exhaustion;UnifiedEmitter.auto_send_turncorrectly readsturn.usage()only after consuming the stream, fixing the previously reported stale-usage bug.run_agent_streamed_auto_sendnow uses the 4-branchRunner.run_streamedpattern to restoreprevious_response_idforwarding, andauto_send_turnreceivescreated_atdirectly (stamping all streaming contexts rather than only the first).060_harness_openai,130_harness_openai,140_harness_openai) demonstrate sync, async, and Temporal delivery modes; the Temporal tutorial now correctly accumulates multi-turn conversation history viainput_list+result.to_input_list().Confidence Score: 5/5
Safe to merge; the two observations are edge-case behavioral changes that do not affect the core delivery path for normal (non-guardrail) turns.
The migration correctly fixes both previously reported defects (stale usage on auto_send and silent previous_response_id drop). The new OpenAITurn adapter, conformance tests, and updated activity tests are solid. The two flagged items are narrow: the Temporal heartbeat concern only materialises when heartbeat_timeout is explicitly configured AND streaming exceeds it; the created_at dispenser change only affects output-guardrail rejection message ordering, an uncommon path.
src/agentex/lib/core/services/adk/providers/openai.py — the heartbeat and created_at dispenser changes are worth a follow-up if Temporal heartbeat_timeout is configured in production for this activity.
Important Files Changed
Sequence Diagram
%%{init: {'theme': 'neutral'}}%% sequenceDiagram participant W as Workflow / ACP participant S as OpenAIService participant R as Runner.run_streamed participant OT as OpenAITurn participant UE as UnifiedEmitter participant AS as auto_send participant ST as StreamingService W->>S: run_agent_streamed_auto_send(input, created_at) S->>R: run_streamed(agent, input, [max_turns], [prev_resp_id]) R-->>S: RunResultStreaming S->>OT: OpenAITurn(result, model) S->>UE: UnifiedEmitter(task_id, tracer, streaming) S->>UE: auto_send_turn(turn, created_at) UE->>AS: "auto_send(turn.events, created_at=created_at)" loop For each canonical event AS->>OT: _iter_events() → convert_openai_to_agentex_events OT-->>AS: "StreamTaskMessage* (Start / Delta / Done / Full)" AS->>ST: "streaming_task_message_context(created_at=created_at)" AS->>ST: stream_update / close end Note over OT: After last event: aggregate raw_responses → TurnUsage AS-->>UE: "TurnResult(final_text, usage=empty_default)" UE->>OT: turn.usage() OT-->>UE: TurnUsage (populated) UE-->>S: TurnResult (with usage) S-->>W: RunResultStreaming%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% sequenceDiagram participant W as Workflow / ACP participant S as OpenAIService participant R as Runner.run_streamed participant OT as OpenAITurn participant UE as UnifiedEmitter participant AS as auto_send participant ST as StreamingService W->>S: run_agent_streamed_auto_send(input, created_at) S->>R: run_streamed(agent, input, [max_turns], [prev_resp_id]) R-->>S: RunResultStreaming S->>OT: OpenAITurn(result, model) S->>UE: UnifiedEmitter(task_id, tracer, streaming) S->>UE: auto_send_turn(turn, created_at) UE->>AS: "auto_send(turn.events, created_at=created_at)" loop For each canonical event AS->>OT: _iter_events() → convert_openai_to_agentex_events OT-->>AS: "StreamTaskMessage* (Start / Delta / Done / Full)" AS->>ST: "streaming_task_message_context(created_at=created_at)" AS->>ST: stream_update / close end Note over OT: After last event: aggregate raw_responses → TurnUsage AS-->>UE: "TurnResult(final_text, usage=empty_default)" UE->>OT: turn.usage() OT-->>UE: TurnUsage (populated) UE-->>S: TurnResult (with usage) S-->>W: RunResultStreamingComments Outside Diff (2)
src/agentex/lib/adk/providers/_modules/sync_provider.py, line 564-572 (link)The converter starts OpenAI reasoning output as
TextContent, but the shared span derivation opens reasoning spans only when the start content has typereasoning. Real OpenAI reasoning streams therefore flow through as text starts, so the unified harness never derives the reasoning span that the new conformance fixture expects.Prompt To Fix With AI
src/agentex/lib/core/services/adk/providers/openai.py, line 794 (link)previous_response_idis accepted as a parameter on line 681 but is never forwarded toRunner.run_streamedon lines 794-797. The migrated method uses only a 2-branch if/else (max_turns or not), while all three sibling methods use a 4-branch matrix that correctly forwardsprevious_response_id.Runner.run_streamedcall was simplified to 2 branches, dropping theprevious_response_idforwarding. The# noqa: ARG002annotation on line 681 suppressed the linter warning that would have caught the unused argument.run_agent_streamed(lines 632-646), forwardingprevious_response_idtoRunner.run_streamedwhen it is not None.Artifacts
Supporting artifact from the T-Rex run
Reviews (11): Last reviewed commit: "fix(openai): forward previous_response_i..." | Re-trigger Greptile