Skip to content

fix(openai): skip realtime truncate when no audio played#1903

Open
rosetta-livekit-bot[bot] wants to merge 1 commit into
mainfrom
deployed-lamb-tenancy
Open

fix(openai): skip realtime truncate when no audio played#1903
rosetta-livekit-bot[bot] wants to merge 1 commit into
mainfrom
deployed-lamb-tenancy

Conversation

@rosetta-livekit-bot

@rosetta-livekit-bot rosetta-livekit-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Port of livekit/agents#6158.

Summary

  • Delete OpenAI realtime audio items when interruption playback duration is 0 ms instead of sending conversation.item.truncate.
  • Keep sending truncate events when audio has actually played.

Testing

  • pnpm --filter @livekit/agents-plugin-openai lint (passes with existing warnings)
  • pnpm --filter @livekit/agents build
  • pnpm --filter @livekit/agents-plugins-test build
  • pnpm --filter @livekit/agents-plugin-silero build
  • pnpm --filter @livekit/agents-plugin-openai build

Ported from livekit/agents#6158

Original PR description

Fixes #6157

Skip the realtime conversation.item.truncate when the generation is
interrupted before any audio frame has played (audio_end_ms == 0), which
the Realtime API rejects with unsupported_content_type
("Only model output audio messages can be truncated").

@changeset-bot

changeset-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: e9ebacc

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@rosetta-livekit-bot rosetta-livekit-bot Bot requested a review from longcw June 29, 2026 12:03

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

Open in Devin Review

Comment on lines +919 to +925
} else {
this.sendEvent({
type: 'conversation.item.delete',
item_id: _options.messageId,
event_id: shortuuid('chat_ctx_delete_'),
} as api_proto.ConversationItemDeleteEvent);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Deleting an interrupted message from the server while still adding it locally causes conversation history to desync

An interrupted conversation item is deleted from the remote API (type: 'conversation.item.delete' at plugins/openai/src/realtime/realtime_model.ts:921) when zero audio was reported, but the caller still inserts that same item into the local chat context (agents/src/voice/agent_activity.ts:3460), so future context syncs will re-create the deleted item on the server.

Impact: The AI model's conversation history accumulates ghost messages that were supposed to be deleted, potentially causing confusing or incoherent responses.

Mechanism: delete vs. truncate produces new local/remote desync

Previously, when audioEndMs was 0, the code sent conversation.item.truncate with audio_end_ms: 0. The server's conversation.item.truncated response is NOT handled in the event dispatcher (no case for it around realtime_model.ts:1230), so the item remained in remoteChatCtx. The local chatCtx also had the item — no desync.

With the new code, when audioEndMs === 0, a conversation.item.delete is sent instead. The server responds with conversation.item.deleted, which IS handled (realtime_model.ts:1234). handleConversationItemDeleted at realtime_model.ts:1540 removes the item from remoteChatCtx. Meanwhile, agent_activity.ts:3452-3462 still inserts the item (with its forwardedText) into this.agent._chatCtx.

This creates a new desync: the item exists locally but not remotely. When createChatCtxUpdateEvents (realtime_model.ts:667) runs next, it diffs local vs. remote via computeChatCtxDiff at line 692, sees the item only in local, and generates a conversation.item.create event to re-add it to the server.

The audioEndMs === 0 condition is hit when:

  1. The avatar transport fails to report a playback position (NaN clamped to 0 at agent_activity.ts:3438-3440), even though the user may have heard audio and synchronizedTranscript is non-empty.
  2. The audio was genuinely interrupted at position 0.

In scenario 1, the synchronized transcript is often non-empty (some words were forwarded before interruption), so the if (!forwardedText) continue guard at agent_activity.ts:3449 doesn't prevent the local insert.

Prompt for agents
The truncate() method now sends conversation.item.delete when audioEndMs is 0, but the caller in agents/src/voice/agent_activity.ts (around line 3452-3462) still inserts the same item into the local agent chat context. This creates a local/remote chat context desync that causes re-creation of the deleted item on the next updateChatCtx call.

Two possible approaches to fix:

1. In agent_activity.ts addRealtimeMessageOutputs (around line 3422), skip local chatCtx insertion when audioEndMs would be 0 (i.e. when the item will be deleted rather than truncated). This could be done by checking if playbackPositionInS results in audioEndMs > 0 before adding to chatCtx.

2. Alternatively, instead of directly sending conversation.item.delete inside truncate(), emit an event or set a flag that the caller can check, so the caller knows not to add the item to local chatCtx. This keeps the logic centralized.

The key insight is that both the delete-from-remote and add-to-local decisions need to be coordinated. Currently they are made independently in different code paths.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +912 to +925
if (audioEndMs > 0) {
this.sendEvent({
type: 'conversation.item.truncate',
content_index: 0,
item_id: _options.messageId,
audio_end_ms: audioEndMs,
} as api_proto.ConversationItemTruncateEvent);
} else {
this.sendEvent({
type: 'conversation.item.delete',
item_id: _options.messageId,
event_id: shortuuid('chat_ctx_delete_'),
} as api_proto.ConversationItemDeleteEvent);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Semantic shift from truncate-to-zero to full deletion is a significant behavioral change

The old code sent conversation.item.truncate with audio_end_ms: 0, which keeps the item in the OpenAI conversation but removes its audio content. The new code sends conversation.item.delete, which completely removes the item from the server's conversation history. This is a fundamentally different operation with different downstream effects — truncation preserves the item (and any text transcript) in the server context, while deletion removes all traces of it. The PR doesn't document whether this change was motivated by the OpenAI API rejecting audio_end_ms: 0 or by a product decision. If the API does reject 0, then deletion may be correct — but a log or comment explaining that motivation would help future readers. This also changes the contract of the truncate() method: callers using it as an abstract interface (agents/src/llm/realtime.ts:166) may not expect it to delete items.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants