Skip to content

fix(openai-realtime): cancel timed-out response.create to prevent late playback#6238

Open
tsushanth wants to merge 1 commit into
livekit:mainfrom
tsushanth:fix/cancel-timed-out-response
Open

fix(openai-realtime): cancel timed-out response.create to prevent late playback#6238
tsushanth wants to merge 1 commit into
livekit:mainfrom
tsushanth:fix/cancel-timed-out-response

Conversation

@tsushanth

Copy link
Copy Markdown

Fixes #6222

What is happening

When generate_reply times out (10 s default), _on_timeout sets an exception on the caller's future but never sends response.cancel to the server. The server continues processing the response and eventually delivers response.created. _handle_response_created detected that the matched future was already done, logged a one-line warning, but then unconditionally called self.emit("generation_created", generation_ev), wiring up a SpeechHandle that played back the audio the caller had already given up on.

Fix

Two targeted changes to RealtimeSession in realtime_model.py:

1. _on_timeout β€” ask the server to stop early

After setting the exception on the future, send response.cancel. This is the same event the _on_fut_done callback sends when the future is explicitly cancelled (e.g. interrupted by the user). The timeout path was the only path that skipped it.

2. _handle_response_created β€” suppress the late emission

When the matched future is already done (timed-out case), suppress the emit("generation_created") call so no SpeechHandle is created and no audio is queued for playback. A second response.cancel is also sent here as a belt-and-suspenders guard for the window between the timeout callback firing and the server actually cancelling.

_current_generation is intentionally kept alive rather than closed immediately. Any response.output_item.* events that arrive before the server honours the cancel would trip their assert self._current_generation is not None guards if we cleared it early β€” the same race that the comment in _handle_response_done already acknowledges. response.done arrives shortly after and closes the generation through the normal path.

Testing

Reproduce per the reporter's steps: reduce the call_later timeout from 10.0 to 0.01 in generate_reply, call await agent_session.generate_reply(), and observe that no audio plays back after the RealtimeError("generate_reply timed out.") is raised.

…e playback

When generate_reply times out (10 s default), the SDK sets an exception on
the caller's future but never sent response.cancel to the server, so the
server continued processing and eventually emitted response.created.
_handle_response_created detected the "already done" future, logged a
warning, but still called emit("generation_created"), wiring up a
SpeechHandle that played back audio the caller never expected.

Two-part fix:
1. _on_timeout now sends response.cancel immediately after setting the
   exception, asking the server to stop the in-flight response early.
2. _handle_response_created suppresses emit("generation_created") when the
   matched future is already done (timed out).  _current_generation is kept
   alive so that any response.output_item.* events that arrive before the
   server honours the cancel do not trip their assertions; response.done will
   close the generation normally via the existing path.

Fixes livekit#6222
@tsushanth tsushanth requested a review from a team as a code owner June 26, 2026 02:21

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

Open in Devin Review

Comment on lines 1735 to +1753
generation_ev.user_initiated = True
fut.set_result(generation_ev)
else:
logger.warning("response of generate_reply received after it's timed out.")
# The generate_reply caller already received a timeout error. The
# server kept running and delivered response.created anyway. Cancel it
# so no audio frames are queued for playback. We keep
# _current_generation alive so that the subsequent response.output_item.*
# and response.done events (which may arrive before the server honours
# the cancel) do not trip their assertions; response.done will close the
# generation normally.
logger.warning(
"response of generate_reply received after it's timed out; "
"cancelling to prevent unexpected playback."
)
self.send_event(ResponseCancelEvent(type="response.cancel"))
timed_out = True

self.emit("generation_created", generation_ev)
if not timed_out:
self.emit("generation_created", generation_ev)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ”΄ Timeout detection in response handler is unreachable, so unexpected audio playback still occurs after timeout

The late-arriving response is never detected as timed-out (self._response_created_futures.pop(client_event_id, None) at livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model.py:1732) because the timeout callback already removed the future from the dictionary (realtime_model.py:1584), so the generation event is still emitted to listeners and triggers unintended speech playback.

Impact: After a generate_reply timeout, audio from the late server response can still play back to the user β€” the exact scenario the PR intends to prevent.

Detailed mechanism: _on_timeout pops the future before _handle_response_created can find it

In _on_timeout (line 1583-1589):

  1. Line 1584: self._response_created_futures.pop(event_id, None) removes the future from the dict
  2. Line 1586: fut.set_exception(...) marks the future as done
  3. Line 1589: sends ResponseCancelEvent

Later, when the server delivers response.created (because the cancel wasn't processed yet), _handle_response_created runs:

  • Line 1732: self._response_created_futures.pop(client_event_id, None) returns None (already removed)
  • The walrus assignment makes the condition falsy, so the entire if block (lines 1729-1750) is skipped
  • timed_out stays False
  • Line 1753: self.emit("generation_created", generation_ev) fires with user_initiated=False

The _on_generation_created handler at livekit-agents/livekit/agents/voice/agent_activity.py:1744 then processes this event (since user_initiated is False, it doesn't return early at line 1745), creates a SpeechHandle, and starts a _realtime_generation_task β€” causing unexpected audio playback.

To fix this, _on_timeout should NOT pop the future from the dict, so that _handle_response_created can later detect it as done. Alternatively, keep a separate set of timed-out event_ids that _handle_response_created can check.

(Refers to lines 1728-1753)

Prompt for agents
The bug is that _on_timeout (line 1584) pops the future from self._response_created_futures BEFORE _handle_response_created can check it. This makes the else branch at line 1737 unreachable for timeouts.

The fix should ensure that when response.created arrives for a timed-out generate_reply, the code can detect it. Two possible approaches:

1. In _on_timeout, do NOT pop from self._response_created_futures. Instead, leave the (now-done) future in the dict so _handle_response_created can find it and check fut.done(). Then rely on _handle_response_created to pop it. You'd also need to ensure _on_fut_done doesn't pop it prematurely (since set_exception triggers the done callback).

2. Maintain a separate set (e.g., self._timed_out_event_ids) that _on_timeout adds to, and _handle_response_created checks. This avoids changing the existing pop semantics.

Either way, the goal is: when _handle_response_created runs and finds the client_event_id corresponds to a timed-out request, it should set timed_out=True, send the cancel, and suppress the generation_created emission.
Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Comment on lines +1587 to +1589
# The response.create was already sent; ask the server to cancel it so
# that any audio it produces does not arrive and play back unexpectedly.
self.send_event(ResponseCancelEvent(type="response.cancel"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 The cancel sent by _on_timeout may cancel an unrelated active response

At livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model.py:1589, the newly added ResponseCancelEvent is sent without specifying a response_id. If the generate_reply's response hasn't actually started on the server (no response.created received yet), but another response IS active (e.g., a VAD-triggered server-initiated response), the cancel may target that unrelated response instead. The OpenAI Realtime API's response.cancel cancels the currently in-progress response, which may not be the one that timed out. This is the same pattern used elsewhere in the file (e.g., line 1610 in interrupt()), so it's consistent with existing behavior, but worth noting as a potential race condition in multi-response scenarios.

Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Playback still happen for late response.create (after the timeout of its corresponding generate_reply)

1 participant