fix(openai-realtime): cancel timed-out response.create to prevent late playback#6238
fix(openai-realtime): cancel timed-out response.create to prevent late playback#6238tsushanth wants to merge 1 commit into
Conversation
β¦e playback
When generate_reply times out (10 s default), the SDK sets an exception on
the caller's future but never sent response.cancel to the server, so the
server continued processing and eventually emitted response.created.
_handle_response_created detected the "already done" future, logged a
warning, but still called emit("generation_created"), wiring up a
SpeechHandle that played back audio the caller never expected.
Two-part fix:
1. _on_timeout now sends response.cancel immediately after setting the
exception, asking the server to stop the in-flight response early.
2. _handle_response_created suppresses emit("generation_created") when the
matched future is already done (timed out). _current_generation is kept
alive so that any response.output_item.* events that arrive before the
server honours the cancel do not trip their assertions; response.done will
close the generation normally via the existing path.
Fixes livekit#6222
| generation_ev.user_initiated = True | ||
| fut.set_result(generation_ev) | ||
| else: | ||
| logger.warning("response of generate_reply received after it's timed out.") | ||
| # The generate_reply caller already received a timeout error. The | ||
| # server kept running and delivered response.created anyway. Cancel it | ||
| # so no audio frames are queued for playback. We keep | ||
| # _current_generation alive so that the subsequent response.output_item.* | ||
| # and response.done events (which may arrive before the server honours | ||
| # the cancel) do not trip their assertions; response.done will close the | ||
| # generation normally. | ||
| logger.warning( | ||
| "response of generate_reply received after it's timed out; " | ||
| "cancelling to prevent unexpected playback." | ||
| ) | ||
| self.send_event(ResponseCancelEvent(type="response.cancel")) | ||
| timed_out = True | ||
|
|
||
| self.emit("generation_created", generation_ev) | ||
| if not timed_out: | ||
| self.emit("generation_created", generation_ev) |
There was a problem hiding this comment.
π΄ Timeout detection in response handler is unreachable, so unexpected audio playback still occurs after timeout
The late-arriving response is never detected as timed-out (self._response_created_futures.pop(client_event_id, None) at livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model.py:1732) because the timeout callback already removed the future from the dictionary (realtime_model.py:1584), so the generation event is still emitted to listeners and triggers unintended speech playback.
Impact: After a generate_reply timeout, audio from the late server response can still play back to the user β the exact scenario the PR intends to prevent.
Detailed mechanism: _on_timeout pops the future before _handle_response_created can find it
In _on_timeout (line 1583-1589):
- Line 1584:
self._response_created_futures.pop(event_id, None)removes the future from the dict - Line 1586:
fut.set_exception(...)marks the future as done - Line 1589: sends
ResponseCancelEvent
Later, when the server delivers response.created (because the cancel wasn't processed yet), _handle_response_created runs:
- Line 1732:
self._response_created_futures.pop(client_event_id, None)returnsNone(already removed) - The walrus assignment makes the condition falsy, so the entire
ifblock (lines 1729-1750) is skipped timed_outstaysFalse- Line 1753:
self.emit("generation_created", generation_ev)fires withuser_initiated=False
The _on_generation_created handler at livekit-agents/livekit/agents/voice/agent_activity.py:1744 then processes this event (since user_initiated is False, it doesn't return early at line 1745), creates a SpeechHandle, and starts a _realtime_generation_task β causing unexpected audio playback.
To fix this, _on_timeout should NOT pop the future from the dict, so that _handle_response_created can later detect it as done. Alternatively, keep a separate set of timed-out event_ids that _handle_response_created can check.
(Refers to lines 1728-1753)
Prompt for agents
The bug is that _on_timeout (line 1584) pops the future from self._response_created_futures BEFORE _handle_response_created can check it. This makes the else branch at line 1737 unreachable for timeouts.
The fix should ensure that when response.created arrives for a timed-out generate_reply, the code can detect it. Two possible approaches:
1. In _on_timeout, do NOT pop from self._response_created_futures. Instead, leave the (now-done) future in the dict so _handle_response_created can find it and check fut.done(). Then rely on _handle_response_created to pop it. You'd also need to ensure _on_fut_done doesn't pop it prematurely (since set_exception triggers the done callback).
2. Maintain a separate set (e.g., self._timed_out_event_ids) that _on_timeout adds to, and _handle_response_created checks. This avoids changing the existing pop semantics.
Either way, the goal is: when _handle_response_created runs and finds the client_event_id corresponds to a timed-out request, it should set timed_out=True, send the cancel, and suppress the generation_created emission.
Was this helpful? React with π or π to provide feedback.
| # The response.create was already sent; ask the server to cancel it so | ||
| # that any audio it produces does not arrive and play back unexpectedly. | ||
| self.send_event(ResponseCancelEvent(type="response.cancel")) |
There was a problem hiding this comment.
π© The cancel sent by _on_timeout may cancel an unrelated active response
At livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model.py:1589, the newly added ResponseCancelEvent is sent without specifying a response_id. If the generate_reply's response hasn't actually started on the server (no response.created received yet), but another response IS active (e.g., a VAD-triggered server-initiated response), the cancel may target that unrelated response instead. The OpenAI Realtime API's response.cancel cancels the currently in-progress response, which may not be the one that timed out. This is the same pattern used elsewhere in the file (e.g., line 1610 in interrupt()), so it's consistent with existing behavior, but worth noting as a potential race condition in multi-response scenarios.
Was this helpful? React with π or π to provide feedback.
Fixes #6222
What is happening
When
generate_replytimes out (10 s default),_on_timeoutsets an exception on the caller's future but never sendsresponse.cancelto the server. The server continues processing the response and eventually deliversresponse.created._handle_response_createddetected that the matched future was already done, logged a one-line warning, but then unconditionally calledself.emit("generation_created", generation_ev), wiring up aSpeechHandlethat played back the audio the caller had already given up on.Fix
Two targeted changes to
RealtimeSessioninrealtime_model.py:1.
_on_timeoutβ ask the server to stop earlyAfter setting the exception on the future, send
response.cancel. This is the same event the_on_fut_donecallback sends when the future is explicitly cancelled (e.g. interrupted by the user). The timeout path was the only path that skipped it.2.
_handle_response_createdβ suppress the late emissionWhen the matched future is already done (timed-out case), suppress the
emit("generation_created")call so noSpeechHandleis created and no audio is queued for playback. A secondresponse.cancelis also sent here as a belt-and-suspenders guard for the window between the timeout callback firing and the server actually cancelling._current_generationis intentionally kept alive rather than closed immediately. Anyresponse.output_item.*events that arrive before the server honours the cancel would trip theirassert self._current_generation is not Noneguards if we cleared it early β the same race that the comment in_handle_response_donealready acknowledges.response.donearrives shortly after and closes the generation through the normal path.Testing
Reproduce per the reporter's steps: reduce the
call_latertimeout from10.0to0.01ingenerate_reply, callawait agent_session.generate_reply(), and observe that no audio plays back after theRealtimeError("generate_reply timed out.")is raised.