fix(openai-realtime): cancel timed-out response.create to prevent late playback by tsushanth · Pull Request #6238 · livekit/agents

tsushanth · 2026-06-26T02:21:16Z

What is happening

When generate_reply times out (10 s default), _on_timeout sets an exception on the caller's future but never sends response.cancel to the server. The server continues processing the response and eventually delivers response.created. _handle_response_created detected that the matched future was already done, logged a one-line warning, but then unconditionally called self.emit("generation_created", generation_ev), wiring up a SpeechHandle that played back the audio the caller had already given up on.

Fix

Two targeted changes to RealtimeSession in realtime_model.py:

1. _on_timeout — ask the server to stop early

After setting the exception on the future, send response.cancel. This is the same event the _on_fut_done callback sends when the future is explicitly cancelled (e.g. interrupted by the user). The timeout path was the only path that skipped it.

2. _handle_response_created — suppress the late emission

When the matched future is already done (timed-out case), suppress the emit("generation_created") call so no SpeechHandle is created and no audio is queued for playback. A second response.cancel is also sent here as a belt-and-suspenders guard for the window between the timeout callback firing and the server actually cancelling.

_current_generation is intentionally kept alive rather than closed immediately. Any response.output_item.* events that arrive before the server honours the cancel would trip their assert self._current_generation is not None guards if we cleared it early — the same race that the comment in _handle_response_done already acknowledges. response.done arrives shortly after and closes the generation through the normal path.

Testing

Reproduce per the reporter's steps: reduce the call_later timeout from 10.0 to 0.01 in generate_reply, call await agent_session.generate_reply(), and observe that no audio plays back after the RealtimeError("generate_reply timed out.") is raised.

…e playback When generate_reply times out (10 s default), the SDK sets an exception on the caller's future but never sent response.cancel to the server, so the server continued processing and eventually emitted response.created. _handle_response_created detected the "already done" future, logged a warning, but still called emit("generation_created"), wiring up a SpeechHandle that played back audio the caller never expected. Two-part fix: 1. _on_timeout now sends response.cancel immediately after setting the exception, asking the server to stop the in-flight response early. 2. _handle_response_created suppresses emit("generation_created") when the matched future is already done (timed out). _current_generation is kept alive so that any response.output_item.* events that arrive before the server honours the cancel do not trip their assertions; response.done will close the generation normally via the existing path. Fixes livekit#6222

devin-ai-integration

Devin Review found 2 potential issues.

devin-ai-integration · 2026-06-26T02:26:14Z

                generation_ev.user_initiated = True
                fut.set_result(generation_ev)
            else:
-                logger.warning("response of generate_reply received after it's timed out.")
+                # The generate_reply caller already received a timeout error.  The
+                # server kept running and delivered response.created anyway.  Cancel it
+                # so no audio frames are queued for playback.  We keep
+                # _current_generation alive so that the subsequent response.output_item.*
+                # and response.done events (which may arrive before the server honours
+                # the cancel) do not trip their assertions; response.done will close the
+                # generation normally.
+                logger.warning(
+                    "response of generate_reply received after it's timed out; "
+                    "cancelling to prevent unexpected playback."
+                )
+                self.send_event(ResponseCancelEvent(type="response.cancel"))
+                timed_out = True

-        self.emit("generation_created", generation_ev)
+        if not timed_out:
+            self.emit("generation_created", generation_ev)


🔴 Timeout detection in response handler is unreachable, so unexpected audio playback still occurs after timeout

The late-arriving response is never detected as timed-out (self._response_created_futures.pop(client_event_id, None) at livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model.py:1732) because the timeout callback already removed the future from the dictionary (realtime_model.py:1584), so the generation event is still emitted to listeners and triggers unintended speech playback.

Impact: After a generate_reply timeout, audio from the late server response can still play back to the user — the exact scenario the PR intends to prevent.

Detailed mechanism: _on_timeout pops the future before _handle_response_created can find it

In _on_timeout (line 1583-1589):

Line 1584: self._response_created_futures.pop(event_id, None) removes the future from the dict

Line 1586: fut.set_exception(...) marks the future as done

Line 1589: sends ResponseCancelEvent

Later, when the server delivers response.created (because the cancel wasn't processed yet), _handle_response_created runs:

Line 1732: self._response_created_futures.pop(client_event_id, None) returns None (already removed)

The walrus assignment makes the condition falsy, so the entire if block (lines 1729-1750) is skipped

timed_out stays False

Line 1753: self.emit("generation_created", generation_ev) fires with user_initiated=False

The _on_generation_created handler at livekit-agents/livekit/agents/voice/agent_activity.py:1744 then processes this event (since user_initiated is False, it doesn't return early at line 1745), creates a SpeechHandle, and starts a _realtime_generation_task — causing unexpected audio playback.

To fix this, _on_timeout should NOT pop the future from the dict, so that _handle_response_created can later detect it as done. Alternatively, keep a separate set of timed-out event_ids that _handle_response_created can check.

(Refers to lines 1728-1753)

Prompt for agents

The bug is that _on_timeout (line 1584) pops the future from self._response_created_futures BEFORE _handle_response_created can check it. This makes the else branch at line 1737 unreachable for timeouts. The fix should ensure that when response.created arrives for a timed-out generate_reply, the code can detect it. Two possible approaches: 1. In _on_timeout, do NOT pop from self._response_created_futures. Instead, leave the (now-done) future in the dict so _handle_response_created can find it and check fut.done(). Then rely on _handle_response_created to pop it. You'd also need to ensure _on_fut_done doesn't pop it prematurely (since set_exception triggers the done callback). 2. Maintain a separate set (e.g., self._timed_out_event_ids) that _on_timeout adds to, and _handle_response_created checks. This avoids changing the existing pop semantics. Either way, the goal is: when _handle_response_created runs and finds the client_event_id corresponds to a timed-out request, it should set timed_out=True, send the cancel, and suppress the generation_created emission.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-06-26T02:26:15Z

+                # The response.create was already sent; ask the server to cancel it so
+                # that any audio it produces does not arrive and play back unexpectedly.
+                self.send_event(ResponseCancelEvent(type="response.cancel"))


🚩 The cancel sent by _on_timeout may cancel an unrelated active response

At livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model.py:1589, the newly added ResponseCancelEvent is sent without specifying a response_id. If the generate_reply's response hasn't actually started on the server (no response.created received yet), but another response IS active (e.g., a VAD-triggered server-initiated response), the cancel may target that unrelated response instead. The OpenAI Realtime API's response.cancel cancels the currently in-progress response, which may not be the one that timed out. This is the same pattern used elsewhere in the file (e.g., line 1610 in interrupt()), so it's consistent with existing behavior, but worth noting as a potential race condition in multi-response scenarios.

Was this helpful? React with 👍 or 👎 to provide feedback.

tsushanth requested a review from a team as a code owner June 26, 2026 02:21

devin-ai-integration Bot reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(openai-realtime): cancel timed-out response.create to prevent late playback#6238

fix(openai-realtime): cancel timed-out response.create to prevent late playback#6238
tsushanth wants to merge 1 commit into
livekit:mainfrom
tsushanth:fix/cancel-timed-out-response

tsushanth commented Jun 26, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Jun 26, 2026

Uh oh!

devin-ai-integration Bot Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tsushanth commented Jun 26, 2026

What is happening

Fix

Testing

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant