fix(harness): correct codex & openai reasoning stream envelopes by declan-scale · Pull Request #441 · scaleapi/scale-agentex-python

declan-scale · 2026-06-23T21:45:38Z

What

Fixes two reasoning-stream issues found while reviewing the unified-harness release (#424). Both are confined to the streaming envelope (the live StreamTaskMessage* sequence); neither changes the persisted message type, but #1 produces a duplicate persisted message.

1. Codex: duplicate / orphaned reasoning message (🔴 user-visible)

_codex_sync.py emitted Start(ReasoningContent, active) then Full(ReasoningContent, static) at the open index, with no Done. auto_send handles Full by opening its own throwaway streaming context (it ignores event.index), so the original Start context was never closed until end-of-turn teardown — persisting a second, near-empty reasoning message with a later timestamp (duplicate + out-of-order).

Fix: mirror the working _claude_code_sync.py pattern — stream the final reasoning as ReasoningSummaryDelta + ReasoningContentDelta on the open index, then close with Done. The open context accumulates the final ReasoningContent and closes cleanly as one message. The no-started case opens the Start lazily and closes it the same way; the empty-reasoning case still closes with a bare Done.

2. OpenAI: reasoning `Start` content type regressed `ReasoningContent` → `TextContent` (🟠)

convert_openai_to_agentex_events opened reasoning summary/content messages with a TextContent Start. On the migrated auto_send/Temporal path this regressed the pre-migration behavior (which started reasoning with ReasoningContent), so any consumer that branches on the start event's content type to show a "thinking" indicator now sees plain text. The persisted content is rebuilt from the reasoning deltas regardless, so only the live envelope was affected.

Fix: emit ReasoningContent(style="active") for both reasoning Starts. This aligns the converter with the codex/claude_code taps, the (already-corrected) langgraph-sync converter, and what the OpenAI conformance suite already treats as canonical.

Testing

Rewrote the codex reasoning tests to assert the Start + summary/content deltas + Done sequence and that no Full(ReasoningContent) is emitted; added an empty-block-closes-with-Done case.
tests/lib/adk/test_codex_sync.py, tests/lib/adk/providers/test_openai_turn.py, tests/lib/adk/test_codex_turn.py, tests/lib/adk/test_claude_code_sync.py, tests/lib/adk/providers/, and the full tests/lib/core/harness/ (incl. conformance) suites pass (203 + 87 green across runs).
ruff format + ruff check clean on all changed files.

🤖 Generated with Claude Code

Greptile Summary

This PR fixes two streaming envelope bugs in the reasoning message pipeline: a Codex duplicate-message issue caused by mixing Start+Full (which leaves the open context dangling via auto_send's index-ignoring behaviour), and an OpenAI regression where reasoning Start events were typed as TextContent instead of ReasoningContent.

Codex (_codex_sync.py): item.completed now emits ReasoningSummaryDelta + ReasoningContentDelta + Done on the already-open index, mirroring the _claude_code_sync.py pattern; a lazy Start is opened when no item.started preceded it, and an empty block still closes cleanly with a bare Done.
OpenAI (sync_provider.py): Both reasoning_summary and reasoning_content Start events now carry ReasoningContent(style="active") so downstream consumers that branch on the start-event type correctly render a thinking indicator.
Tests (test_codex_sync.py): Existing reasoning tests are rewritten to assert the full Start → deltas → Done sequence and explicitly assert no Full(ReasoningContent) is emitted; a new test covers the empty-block case.

Confidence Score: 4/5

Safe to merge — both fixes are well-scoped to the streaming envelope layer and don't touch persisted message types; the production logic is correct.

The Codex and OpenAI changes are straightforward and well-explained. The lazy-open path in _codex_sync.py correctly opens a Start, emits both a ReasoningSummaryDelta and a ReasoningContentDelta, then closes with Done — but the corresponding test (test_reasoning_no_started_opens_and_closes_one_message) only asserts on the content delta, leaving the summary delta unverified on that path.

tests/lib/adk/test_codex_sync.py — the lazy-open reasoning test is missing the ReasoningSummaryDelta assertion that the normal-path test carries.

Important Files Changed

Filename	Overview
src/agentex/lib/adk/_modules/_codex_sync.py	Replaces Full(ReasoningContent) emission with Start+SummaryDelta+ContentDelta+Done sequence; adds lazy Start for no-started case and imports ReasoningSummaryDelta.
src/agentex/lib/adk/providers/_modules/sync_provider.py	Fixes reasoning Start events to use ReasoningContent instead of TextContent for both reasoning_summary and reasoning_content paths; imports ReasoningContent.
tests/lib/adk/test_codex_sync.py	Rewrites reasoning tests to assert Start+deltas+Done pattern and adds empty-block test; lazy-open path missing ReasoningSummaryDelta assertion.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Codex as Codex API
    participant Proc as _CodexStreamProcessor
    participant Consumer as auto_send / Consumer

    Note over Proc,Consumer: BEFORE (broken)
    Codex->>Proc: item.started (reasoning)
    Proc->>Consumer: "StreamTaskMessageStart(ReasoningContent, active) @ idx"
    Codex->>Proc: "item.completed (reasoning, text=...)"
    Proc->>Consumer: "StreamTaskMessageFull(ReasoningContent, static) @ idx"
    Note over Consumer: auto_send ignores idx on Full — Start @ idx stays dangling until teardown — duplicate message persisted

    Note over Proc,Consumer: AFTER (fixed)
    Codex->>Proc: item.started (reasoning)
    Proc->>Consumer: "StreamTaskMessageStart(ReasoningContent, active) @ idx"
    Codex->>Proc: "item.completed (reasoning, text=...)"
    Proc->>Consumer: "StreamTaskMessageDelta(ReasoningSummaryDelta) @ idx"
    Proc->>Consumer: "StreamTaskMessageDelta(ReasoningContentDelta) @ idx"
    Proc->>Consumer: "StreamTaskMessageDone @ idx"
    Note over Consumer: Open context accumulates deltas and closes cleanly — one message persisted

    Note over Proc,Consumer: OpenAI fix (sync_provider.py)
    Proc->>Consumer: "StreamTaskMessageStart(ReasoningContent, was TextContent) @ idx"
    Note over Consumer: Consumers see correct type for thinking-indicator branch

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Codex as Codex API
    participant Proc as _CodexStreamProcessor
    participant Consumer as auto_send / Consumer

    Note over Proc,Consumer: BEFORE (broken)
    Codex->>Proc: item.started (reasoning)
    Proc->>Consumer: "StreamTaskMessageStart(ReasoningContent, active) @ idx"
    Codex->>Proc: "item.completed (reasoning, text=...)"
    Proc->>Consumer: "StreamTaskMessageFull(ReasoningContent, static) @ idx"
    Note over Consumer: auto_send ignores idx on Full — Start @ idx stays dangling until teardown — duplicate message persisted

    Note over Proc,Consumer: AFTER (fixed)
    Codex->>Proc: item.started (reasoning)
    Proc->>Consumer: "StreamTaskMessageStart(ReasoningContent, active) @ idx"
    Codex->>Proc: "item.completed (reasoning, text=...)"
    Proc->>Consumer: "StreamTaskMessageDelta(ReasoningSummaryDelta) @ idx"
    Proc->>Consumer: "StreamTaskMessageDelta(ReasoningContentDelta) @ idx"
    Proc->>Consumer: "StreamTaskMessageDone @ idx"
    Note over Consumer: Open context accumulates deltas and closes cleanly — one message persisted

    Note over Proc,Consumer: OpenAI fix (sync_provider.py)
    Proc->>Consumer: "StreamTaskMessageStart(ReasoningContent, was TextContent) @ idx"
    Note over Consumer: Consumers see correct type for thinking-indicator branch

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
tests/lib/adk/test_codex_sync.py:441-449
**Lazy-open path missing `ReasoningSummaryDelta` assertion**

The production code for the `idx is None` case always emits a `ReasoningSummaryDelta` before the `ReasoningContentDelta` (lines 407–416 of `_codex_sync.py`). This test only asserts on `content_deltas` and ignores `summary_deltas`, so a future regression that accidentally drops the summary delta on this path would go undetected. Adding the same `summary_deltas` assertions that `test_reasoning_start_deltas_done` carries would close the gap.

_{Reviews (1): Last reviewed commit: "fix(tests): narrow reasoning delta types..." | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

… duplicate message Codex reasoning emitted Start(active) then Full(static) at the open index with no Done. auto_send routes a Full into its own throwaway streaming context (ignoring the index), so the Start context survived until end-of-turn teardown and persisted a second, near-empty reasoning message (user-visible duplicate + out-of-order). Mirror the claude_code pattern: stream the final reasoning as summary + content deltas on the open index, then close with a Done, so the open context accumulates the final ReasoningContent and closes cleanly as one message. The no-started case opens the Start lazily and closes it the same way. Updated tests assert the Start + deltas + Done sequence and that no Full(ReasoningContent) is emitted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

convert_openai_to_agentex_events opened reasoning summary/content messages with a TextContent Start. On the migrated auto_send/Temporal path this regressed the prior behavior (which started reasoning with ReasoningContent), so consumers branching on the start event's content type render reasoning as plain text. The final persisted content is rebuilt from the reasoning deltas regardless, so this only affects the live stream envelope. Aligns with the codex/claude_code taps and the langgraph-sync converter, and matches what the openai conformance suite already treats as canonical. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ng tests pyright does not narrow the StreamTaskMessageDelta.delta union through an isinstance filter inside a list comprehension, so accessing content_delta / summary_delta on the collected elements failed. Bind each delta to a local and assert isinstance before accessing the typed attribute. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-06-23T22:00:38Z

+        assert len(starts) == 1
+        assert isinstance(starts[0].content, ReasoningContent)
+        assert reasoning_fulls == []
+        assert len(content_deltas) == 1
+        content_delta = content_deltas[0].delta
+        assert isinstance(content_delta, ReasoningContentDelta)
+        assert content_delta.content_delta == "orphan thought"
+        assert len(dones) == 1
+        assert dones[0].index == starts[0].index


Lazy-open path missing ReasoningSummaryDelta assertion

The production code for the idx is None case always emits a ReasoningSummaryDelta before the ReasoningContentDelta (lines 407–416 of _codex_sync.py). This test only asserts on content_deltas and ignores summary_deltas, so a future regression that accidentally drops the summary delta on this path would go undetected. Adding the same summary_deltas assertions that test_reasoning_start_deltas_done carries would close the gap.

Prompt To Fix With AI

This is a comment left during a code review. Path: tests/lib/adk/test_codex_sync.py Line: 441-449 Comment: **Lazy-open path missing `ReasoningSummaryDelta` assertion** The production code for the `idx is None` case always emits a `ReasoningSummaryDelta` before the `ReasoningContentDelta` (lines 407–416 of `_codex_sync.py`). This test only asserts on `content_deltas` and ignores `summary_deltas`, so a future regression that accidentally drops the summary delta on this path would go undetected. Adding the same `summary_deltas` assertions that `test_reasoning_start_deltas_done` carries would close the gap. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

declan-scale and others added 2 commits June 23, 2026 17:45

danielmillerp approved these changes Jun 23, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 23, 2026

View reviewed changes

declan-scale merged commit 1d86e8a into next Jun 23, 2026
55 checks passed

declan-scale deleted the declan-scale/fix-codex-openai-reasoning-stream branch June 23, 2026 22:01

stainless-app Bot mentioned this pull request Jun 23, 2026

chore: release main #424

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(harness): correct codex & openai reasoning stream envelopes#441

fix(harness): correct codex & openai reasoning stream envelopes#441
declan-scale merged 3 commits into
nextfrom
declan-scale/fix-codex-openai-reasoning-stream

declan-scale commented Jun 23, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

greptile-apps Bot Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

declan-scale commented Jun 23, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

1. Codex: duplicate / orphaned reasoning message (🔴 user-visible)

2. OpenAI: reasoning Start content type regressed ReasoningContent → TextContent (🟠)

Testing

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

declan-scale commented Jun 23, 2026 •

edited by greptile-apps Bot

Loading

2. OpenAI: reasoning `Start` content type regressed `ReasoningContent` → `TextContent` (🟠)