fix(openai): skip realtime truncate when no audio played#1903
fix(openai): skip realtime truncate when no audio played#1903rosetta-livekit-bot[bot] wants to merge 1 commit into
Conversation
|
| } else { | ||
| this.sendEvent({ | ||
| type: 'conversation.item.delete', | ||
| item_id: _options.messageId, | ||
| event_id: shortuuid('chat_ctx_delete_'), | ||
| } as api_proto.ConversationItemDeleteEvent); | ||
| } |
There was a problem hiding this comment.
🔴 Deleting an interrupted message from the server while still adding it locally causes conversation history to desync
An interrupted conversation item is deleted from the remote API (type: 'conversation.item.delete' at plugins/openai/src/realtime/realtime_model.ts:921) when zero audio was reported, but the caller still inserts that same item into the local chat context (agents/src/voice/agent_activity.ts:3460), so future context syncs will re-create the deleted item on the server.
Impact: The AI model's conversation history accumulates ghost messages that were supposed to be deleted, potentially causing confusing or incoherent responses.
Mechanism: delete vs. truncate produces new local/remote desync
Previously, when audioEndMs was 0, the code sent conversation.item.truncate with audio_end_ms: 0. The server's conversation.item.truncated response is NOT handled in the event dispatcher (no case for it around realtime_model.ts:1230), so the item remained in remoteChatCtx. The local chatCtx also had the item — no desync.
With the new code, when audioEndMs === 0, a conversation.item.delete is sent instead. The server responds with conversation.item.deleted, which IS handled (realtime_model.ts:1234). handleConversationItemDeleted at realtime_model.ts:1540 removes the item from remoteChatCtx. Meanwhile, agent_activity.ts:3452-3462 still inserts the item (with its forwardedText) into this.agent._chatCtx.
This creates a new desync: the item exists locally but not remotely. When createChatCtxUpdateEvents (realtime_model.ts:667) runs next, it diffs local vs. remote via computeChatCtxDiff at line 692, sees the item only in local, and generates a conversation.item.create event to re-add it to the server.
The audioEndMs === 0 condition is hit when:
- The avatar transport fails to report a playback position (NaN clamped to 0 at
agent_activity.ts:3438-3440), even though the user may have heard audio andsynchronizedTranscriptis non-empty. - The audio was genuinely interrupted at position 0.
In scenario 1, the synchronized transcript is often non-empty (some words were forwarded before interruption), so the if (!forwardedText) continue guard at agent_activity.ts:3449 doesn't prevent the local insert.
Prompt for agents
The truncate() method now sends conversation.item.delete when audioEndMs is 0, but the caller in agents/src/voice/agent_activity.ts (around line 3452-3462) still inserts the same item into the local agent chat context. This creates a local/remote chat context desync that causes re-creation of the deleted item on the next updateChatCtx call.
Two possible approaches to fix:
1. In agent_activity.ts addRealtimeMessageOutputs (around line 3422), skip local chatCtx insertion when audioEndMs would be 0 (i.e. when the item will be deleted rather than truncated). This could be done by checking if playbackPositionInS results in audioEndMs > 0 before adding to chatCtx.
2. Alternatively, instead of directly sending conversation.item.delete inside truncate(), emit an event or set a flag that the caller can check, so the caller knows not to add the item to local chatCtx. This keeps the logic centralized.
The key insight is that both the delete-from-remote and add-to-local decisions need to be coordinated. Currently they are made independently in different code paths.
Was this helpful? React with 👍 or 👎 to provide feedback.
| if (audioEndMs > 0) { | ||
| this.sendEvent({ | ||
| type: 'conversation.item.truncate', | ||
| content_index: 0, | ||
| item_id: _options.messageId, | ||
| audio_end_ms: audioEndMs, | ||
| } as api_proto.ConversationItemTruncateEvent); | ||
| } else { | ||
| this.sendEvent({ | ||
| type: 'conversation.item.delete', | ||
| item_id: _options.messageId, | ||
| event_id: shortuuid('chat_ctx_delete_'), | ||
| } as api_proto.ConversationItemDeleteEvent); | ||
| } |
There was a problem hiding this comment.
🚩 Semantic shift from truncate-to-zero to full deletion is a significant behavioral change
The old code sent conversation.item.truncate with audio_end_ms: 0, which keeps the item in the OpenAI conversation but removes its audio content. The new code sends conversation.item.delete, which completely removes the item from the server's conversation history. This is a fundamentally different operation with different downstream effects — truncation preserves the item (and any text transcript) in the server context, while deletion removes all traces of it. The PR doesn't document whether this change was motivated by the OpenAI API rejecting audio_end_ms: 0 or by a product decision. If the API does reject 0, then deletion may be correct — but a log or comment explaining that motivation would help future readers. This also changes the contract of the truncate() method: callers using it as an abstract interface (agents/src/llm/realtime.ts:166) may not expect it to delete items.
Was this helpful? React with 👍 or 👎 to provide feedback.
Port of livekit/agents#6158.
Summary
conversation.item.truncate.Testing
pnpm --filter @livekit/agents-plugin-openai lint(passes with existing warnings)pnpm --filter @livekit/agents buildpnpm --filter @livekit/agents-plugins-test buildpnpm --filter @livekit/agents-plugin-silero buildpnpm --filter @livekit/agents-plugin-openai buildPorted from livekit/agents#6158
Original PR description
Fixes #6157
Skip the realtime
conversation.item.truncatewhen the generation isinterrupted before any audio frame has played (
audio_end_ms == 0), whichthe Realtime API rejects with
unsupported_content_type("Only model output audio messages can be truncated").