Skip to content

feat(agents-server-ui): stream model reasoning into the UI#4508

Open
kevin-dp wants to merge 9 commits into
mainfrom
kevin/reasoning-content
Open

feat(agents-server-ui): stream model reasoning into the UI#4508
kevin-dp wants to merge 9 commits into
mainfrom
kevin/reasoning-content

Conversation

@kevin-dp

@kevin-dp kevin-dp commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

While the model is "thinking" (Anthropic extended thinking, DeepSeek-R1 reasoning, Moonshot K2, OpenAI Responses summaries) the agent response now shows the reasoning text faded above the answer, with the existing Thinking shimmer heading plus elapsed-time ticker. Once the reasoning settles it collapses to ▸ Thought for 12s — click to expand. Multiple reasoning rows per run render independently in order (one per LLM step in tool-using turns). UX intentionally mirrors Claude Code + OpenCode patterns.

Implementation (end-to-end)

  • Schemareasoning row gains run_id, encrypted (Anthropic redacted-thinking opaque payload, must round-trip back to the model verbatim), and summary_title (extracted at write time). New reasoningDeltas collection mirrors textDeltas. Strictly additive.
  • BridgeOutboundBridge gains onReasoningStart / onReasoningDelta / onReasoningEnd, parallel to the text path. Reasoning counter added to OutboundIdSeed.
  • Adapterpi-adapter.ts routes pi-ai's thinking_start / thinking_delta / thinking_end events to the bridge. Parses a **Title**\n\n<body> heading once at write time (OpenAI Responses; no-op for Anthropic / DeepSeek / Moonshot). Defensive: handles late thinking_delta without a preceding thinking_start, and closes an open reasoning row on message_end (e.g. provider abort).
  • Timeline — Live reasoning: Collection<EntityTimelineReasoningItem> on EntityTimelineRunRow, content built via the same delta-join pattern as EntityTimelineTextItem.content.
  • UI — New <ReasoningSection> renders above items in AgentResponseLive:
    • Live: faded markdown via Streamdown with ThinkingIndicator heading + summary title + elapsed-time ticker
    • Settled: ▸ Thought for Ns with click-to-expand. Closure duration snapshotted from Date.now() - timestamp using the same sawStreamingRef trick from the elapsed-time PR — accurate for in-session settles, stays a bare Thought for rows already settled on first mount (no real end timestamp available client-side).
    • Redacted: Anthropic safety-filter payloads render ⊘ Reasoning redacted by provider safety filters. The encrypted payload is still persisted server-side so the model gets it back on the next turn.

Reference

Patterns informed by reading OpenCode's reasoning implementation:

  • 3-event streaming protocol (reasoning-start / reasoning-delta / reasoning-end)
  • ReasoningPart storage shape including encrypted for Anthropic round-trip
  • reasoningSummary() headline parser (5-line regex, OpenAI Responses only)
  • Collapsed-by-default UX with click-to-expand

Test plan

  • pnpm typecheck clean in agents-runtime + agents-server-ui
  • pnpm test outbound-bridge pi-adapter entity-timeline in agents-runtime (95 passed: 18 bridge + 21 adapter + 56 timeline)
  • pnpm test in agents-server-ui (66 passed)
  • pnpm -C packages/agents-runtime build — dist artifacts emit cleanly
  • Manual: prompt Anthropic Claude with extended-thinking enabled; verify streaming reasoning appears faded above the answer with elapsed ticker, then collapses to Thought for Ns on settle
  • Manual: multi-step tool-using turn; verify each step's reasoning renders as a separate collapsible row

Notes

  • Cached AgentResponse (the non-Live path used for old scrollback sections) doesn't yet surface reasoning — historical rows recorded before this PR lack the data anyway. Follow-up if we discover sessions where this matters.
  • The pre-existing runtime-dsl.test.ts 401 failures (and dispatch-policy-routing.test.ts 500 failures) reproduce identically on clean main and were not introduced by this PR.

🤖 Generated with Claude Code

While the model is "thinking" (Anthropic extended thinking, DeepSeek-R1
reasoning_content, Moonshot K2, OpenAI Responses summaries) the agent
response now shows the reasoning text faded above the answer, with the
existing `Thinking` shimmer heading + elapsed-time ticker. Once the
reasoning settles, it collapses to `▸ Thought for 12s` — click to
expand. Multiple reasoning rows per run render independently in order
(one per LLM step in tool-using turns).

End-to-end plumbing:

- Schema: `reasoning` row gains `run_id`, `encrypted` (Anthropic
  redacted blocks must round-trip back to the model), and
  `summary_title` (extracted at write time). New `reasoningDeltas`
  collection mirrors `textDeltas` for streamed content.
- Bridge: `OutboundBridge` gains `onReasoningStart` / `onReasoningDelta`
  / `onReasoningEnd`, parallel to text.
- Adapter: `pi-adapter.ts` routes `thinking_start` / `thinking_delta` /
  `thinking_end` from pi-ai. Parses a `**Title**\n\n<body>` heading
  once at write time (OpenAI Responses; no-op for others).
- Timeline: live `reasoning: Collection<EntityTimelineReasoningItem>`
  on `EntityTimelineRunRow`, content built via delta-join.
- UI: new `<ReasoningSection>` renders above items in
  `AgentResponseLive`. Streamdown body, click-to-expand on settle,
  redacted-block placeholder for opaque Anthropic payloads.
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Desktop Builds

Build artifacts for commit aef3aab.

Platform Status Artifact
macOS Apple Silicon Passed DMG
macOS Intel Passed DMG
Windows x64 Passed Installer
Linux x64 Passed AppImage / deb

Workflow run

@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 46.00000% with 135 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@12f1d17). Learn more about missing BASE report.
⚠️ Report is 28 commits behind head on main.

Files with missing lines Patch % Lines
packages/agents-runtime/src/outbound-bridge.ts 24.13% 44 Missing ⚠️
packages/agents-runtime/src/pi-adapter.ts 10.86% 41 Missing ⚠️
...ents-server-ui/src/components/ReasoningSection.tsx 0.00% 39 Missing ⚠️
.../agents-server-ui/src/components/AgentResponse.tsx 0.00% 9 Missing ⚠️
packages/agents/src/model-catalog.ts 93.54% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4508   +/-   ##
=======================================
  Coverage        ?   56.86%           
=======================================
  Files           ?      359           
  Lines           ?    39304           
  Branches        ?    11049           
=======================================
  Hits            ?    22351           
  Misses          ?    16882           
  Partials        ?       71           
Flag Coverage Δ
packages/agents 71.14% <93.54%> (?)
packages/agents-mcp 77.54% <ø> (?)
packages/agents-mobile 66.92% <ø> (?)
packages/agents-runtime 80.88% <50.29%> (?)
packages/agents-server 73.98% <ø> (?)
packages/agents-server-ui 6.19% <0.00%> (?)
packages/electric-ax 46.42% <ø> (?)
packages/experimental 87.73% <ø> (?)
packages/react-hooks 86.48% <ø> (?)
packages/start 82.83% <ø> (?)
packages/typescript-client 91.83% <ø> (?)
packages/y-electric 56.05% <ø> (?)
typescript 56.86% <46.00%> (?)
unit-tests 56.86% <46.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Mobile Build

Local mobile checks ran for commit aef3aab.

The EAS Android preview build was skipped because the mobile-eas-build label is not present.
Add the mobile-eas-build label to this PR to produce an installable preview build.

Workflow run

Previously `withProviderPayloadDefaults` short-circuited for any
provider other than OpenAI / OpenAI-Codex, so picking Claude with a
`reasoningEffort` higher than `auto` produced no effect — no
`thinking` parameter was added to the request, so Anthropic ran in
standard mode and the model emitted no `thinking_delta` events. The
inbound reasoning plumbing landed in the same PR was correct but
unreachable from Anthropic without this.

Now: when the chosen model is Anthropic-capable for reasoning AND
`reasoningEffort` is explicit (minimal/low/medium/high), inject

  thinking: { type: "enabled", budget_tokens: <by effort> }

into the payload. Budgets follow Anthropic's docs (≥ 1024 floor):
minimal=1024, low=2048, medium=8192, high=24576. `auto` stays opt-out
of thinking so default sessions don't silently incur the extra
reasoning tokens.

@KyleAMathews KyleAMathews left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely! Could you add a screenshot of the UI to the PR body?

kevin-dp and others added 3 commits June 8, 2026 14:53
Three latent bugs in the reasoning-content branch that together made
extended thinking and the assistant's answer text fail to render:

1. **Alias collision in the timeline live query** —
   `entity-timeline.ts` had two correlated sub-queries (one for
   `items.text.content`, one for `reasoning.content`) both using
   `chunk` as the `from({...})` alias. TanStack DB silently
   mis-bound the correlation when both were active in the same run
   projection, so `items.text.content` came back as an empty string
   even though the deltas were present in `db.collections.textDeltas`.
   Reasoning won the binding; the answer didn't render at all.

   Fix: rename the inner alias to `textChunk`, and hoist the union
   row's text fields to top-level scalars (`text_key`, `text_run_id`,
   …) so the correlation references a top-level field instead of a
   nested `item.text.key` (also a source of empty joins).

2. **Anthropic thinking always-on instead of opt-in** —
   `withProviderPayloadDefaults` short-circuited for Anthropic when
   `reasoningEffort` was `auto`, so no `thinking` parameter ever
   reached the API. The OpenAI branch already defaulted `auto` to
   `minimal`; Anthropic now does the same (1024-token budget). `low`
   / `medium` / `high` scale the budget exactly as before.

3. **Anthropic `thinking` merge order** — pi-ai writes
   `thinking: { type: "disabled" }` into the request body by default.
   Our `onPayload` was merging `existingThinking` _last_, so the
   default `type: "disabled"` clobbered our `type: "enabled"` and
   the API rejected `budget_tokens` with
   `thinking.disabled.budget_tokens: Extra inputs are not permitted`.
   Spread `existingThinking` first now, then `type` + `budget_tokens`.

Tests:
- `entity-timeline.test.ts` — regression test exercises
  `createEntityTimelineQuery` end-to-end with text and reasoning rows
  in the same run; fails on the alias collision, passes with the
  rename + flat-field projection.
- `model-catalog.test.ts` — adds Anthropic-side coverage that mirrors
  the existing OpenAI tests: always-on minimal budget on `auto`,
  scaled budget on explicit effort, and `type: disabled` override
  for pre-existing `thinking` in the payload.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eltas

The reasoning sub-collection's `content` field — projected via
`concat(toArray(<correlated delta-join>))` — went stale in the
running app after the row's status flipped to `completed`, surfacing
`content: null` in the live query even though the deltas were still
present in the local DB. The expand-thought-block view rendered an
empty body until the user navigated away and back (forcing a fresh
live-query subscription), at which point the join evaluated cleanly.

Unit tests for the same projection pattern all pass — the bug only
reproduces in the running app, against an established live-query
graph with overlapping text/reasoning subscriptions. The sub-query
itself is correct (data is there after a fresh subscription), but
something about the long-lived subscription state makes the
correlated row binding stale.

Sidestep the unreliable projection entirely:

- **Timeline query** — drop the `content` field from
  `EntityTimelineReasoningItem`. Expose `run.reasoningDeltas` as a
  parallel sub-collection (mirroring `run.reasoning`), surfacing the
  raw deltas keyed by `reasoning_id`.
- **UI** — `AgentResponseLive` subscribes to both `run.reasoning` and
  `run.reasoningDeltas`, builds a `Map<reasoning_id, content>` from
  the deltas client-side, and merges it onto the reasoning rows
  before handing them to `<ReasoningSection>`. Reactive on every
  delta arrival, no stale state.
- **State lift** — `expanded` for the collapsed "Thought for Ns"
  toggle moves from `ReasoningEntryView` (per-entry) up to
  `ReasoningSection` (keyed by `entry.key`), so the user's choice
  survives any spurious unmount of the entry view (virtualizer
  measurement passes, brief entries-empty states, etc.).

Tests:
- New regressions in `entity-timeline.test.ts` exercise the deltas
  sub-collection with the same shape as the failing production
  scenario: reasoning + text together, multi-step run-row updates,
  status transitions.

Follow-up: investigate why the original correlated sub-query goes
stale only against long-lived live-query graphs (passes in tests).
The `content` projection has been left commented-out in case we
want to restore it after fixing the underlying TanStack DB issue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The original `reasoning.content` projection used
`concat(toArray(<correlated delta-join>))`, which TanStack DB compiles
to a `buildIncludesSubquery(..., 'concat')` node — a specialized
differential-dataflow operator that incrementally maintains a
string-concatenation of a child query's projection.

Unit tests of the same projection shape pass cleanly: a fresh
`createLiveQueryCollection` evaluates the join correctly on initial
preload, and again after status flips. Tests do not reproduce the
production failure mode (long-lived subscription where `content`
silently goes from populated → null after the row's status flips,
recovering only after a full live-query teardown).

Leaving a placeholder test as a marker — when we have a repro, drop
the body in here and restore the `content` field in
`entity-timeline.ts:buildEntityTimelineQuery`. The current fix
sidesteps the issue by exposing `run.reasoningDeltas` and assembling
content client-side, which is reliable but bypasses what should be
a working server-side projection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kevin-dp and others added 2 commits June 9, 2026 09:57
Restore the original nested-text shape on \`runItemsSource\` —
\`text: caseWhen(text.key, {...})\` and \`textContent: concat(toArray(...))\`
projected together on the union row — and undo the flat-scalar
hoist (\`text_key\`, \`text_run_id\`, \`text_order\`, \`text_status\`).
The \`textChunk\` alias on the delta-join stays, since that's the
load-bearing change that actually fixed the original \`chunk\`
alias collision with the reasoning sub-query.

When fixing the original alias-collision bug I made two changes in
one commit:

1. Renamed the text delta-join alias \`chunk\` → \`textChunk\` so it
   no longer collided with the \`chunk\` used in reasoning content.
2. Hoisted text fields to flat scalars on the union row so the join
   could move out of \`runItemsSource\`'s select and into the items
   consumer's select.

I never bisected the two. Turns out (1) alone is sufficient — the
nested \`text: caseWhen(text.key, {...})\` + co-located \`textContent\`
projection works fine once the alias collision is gone. The flat-
scalar hoist was unnecessary churn that just made the code harder
to read for no behavioral benefit.

Tested by reverting (2), running unit tests (60 still pass), and
verifying in the running app that text content still streams in
and renders correctly through a full Claude exchange.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ection

Reverts the client-side `run.reasoningDeltas` workaround in favor of
the server-side `concat(toArray(...))` projection on
`run.reasoning.content`.

Currently broken in production against `@tanstack/db@0.6.7` —
documented in `packages/agents-runtime/test/entity-timeline.test.ts`'s
`reasoning content remains populated after status flips to completed`
and friends. Unit tests against the projection pass cleanly; the bug
only surfaces in a long-lived stream-backed live query after the
parent row's `.update()`, with the field silently becoming `null`
even though deltas are present in the local DB. A fresh subscription
(navigate-away + back, or reload) recovers.

Holding this branch as a draft PR so the work isn't lost. Merge once
TanStack DB ships an upstream fix that makes the placeholder tests
pass against a long-lived production live query.

Diff vs `kevin/reasoning-content`:

- `entity-timeline.ts` — add `content: concat(toArray(<delta-join>))`
  back to `reasoning.select(...)`, drop the parallel
  `reasoningDeltas` sub-collection. Alias stays `reasoningChunk`
  (not the generic `chunk`) to avoid the alias-collision class of bug.
- `EntityTimelineReasoningItem` — `content: string` reinstated;
  `EntityTimelineReasoningDeltaItem` removed.
- `client.ts` — drop `EntityTimelineReasoningDeltaItem` export.
- `AgentResponseLive` — drop the `run.reasoningDeltas` subscription
  + client-side concat; `reasoningEntries` reads `content` straight
  off the projected row.
- Tests — three reasoning-content tests assert `reasoning[0].content`
  (rather than concatenating raw deltas).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@netlify

netlify Bot commented Jun 9, 2026

Copy link
Copy Markdown

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 7d8ef81
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/6a27c7654807820008d20557
😎 Deploy Preview https://deploy-preview-4508--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Tracks down and fixes the bug that's been driving the
client-side-concat workaround in #4508 and blocking #4532.

## Root cause

TanStack DB's "includes" — fields whose value is a sub-query like
\`concat(toArray(...))\` — are deferred. A row carrying an include
arrives with the field set to \`null\` and a hidden
\`Symbol(includesRouting)\` marker describing how to compute it. The
include is only materialized when something downstream reads it
*in the right way*.

The empirical rule (figured out via DevTools probes — \`.toArray\` on
the sub-collection always showed the populated string, \`useLiveQuery\`
output had \`content: null\`):

  **An include is materialized only when it's referenced inside a
  \`caseWhen\` object body in a downstream \`.select(...)\`. A bare
  top-level reference doesn't trigger it — the include is just
  aliased forward, still deferred.**

This is why \`items.text.content\` has always worked and reasoning
hasn't. The items consumer derefs \`item.textContent\` inside the
\`text: caseWhen(item.text.key, { ..., content: item.textContent })\`
body. The reasoning consumer had \`content: concat(toArray(...))\`
(or, after the source/consumer split,
\`content: r.reasoningContent\`) at the top level of its select.
useLiveQuery handed the row to React with \`content: null\`.

## Fix

Wrap the include reference inside a \`caseWhen\` object body, mirroring
items:

\`\`\`ts
reasoning: q
  .from({ r: runReasoningSource })
  ...
  .select(({ r }) => ({
    key: r.key,
    run_id: r.run_id,
    order: r.order,
    status: r.status,
    body: caseWhen(r.key, {
      content: r.reasoningContent,
    }),
    summary_title: r.summary_title,
    encrypted: r.encrypted,
  }))
\`\`\`

\`r.key\` is always truthy on a real row, so the caseWhen is
effectively unconditional — its only purpose is being an object body
that forces the include reference to materialize.

UI reads \`entry.body?.content\` (via the type) and \`AgentResponseLive\`
maps it back into a flat \`content: string\` on \`ReasoningEntry\` so
\`ReasoningSection\`'s API is unchanged.

This drops the need for the client-side concat workaround that was
the original target of #4532.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread packages/agents-runtime/src/outbound-bridge.ts
@kevin-dp

kevin-dp commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@KyleAMathews here are some screenshots showing how it displays while it's thinking and how it displays when it's done thinking (the "Thought for 2s" block is expandable on click).

thinking-block thought-block

The entity-stream-db mock omitted the reasoning and reasoningDeltas
collections, so loadOutboundIdSeed crashed when reading
db.collections.reasoning.toArray under three process-wake scenarios.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants