Skip to content

fix(google): pause audio input during synchronous tool execution on t…#5556

Open
vedevpatel wants to merge 1 commit intolivekit:mainfrom
vedevpatel:fix/gemini-3.1-sync-tool-audio-gate
Open

fix(google): pause audio input during synchronous tool execution on t…#5556
vedevpatel wants to merge 1 commit intolivekit:mainfrom
vedevpatel:fix/gemini-3.1-sync-tool-audio-gate

Conversation

@vedevpatel
Copy link
Copy Markdown
Contributor

Gemini 3.1 live model

Gemini 3.1 forces synchronous tool calling, which means the model blocks until tool responses arrive. The plugin's _send_task was constantly forwarding microphone audio while tools executed, which caused the server to think of incoming audio as a new turn and cancel the pending tool call after ~12s. This caused duplicate tool execution with already-resolved call_ids as well as corrupted conversation state.

  1. Adds a _tool_call_pending flag (for Gemini 3.1 only) that drops push_audio frames from the moment a toolCall is received until send_tool_response is flushed.
  2. Also clears the flag on tool_call_cancellation so the session never stalls. No behavior change for Gemini 2.5 models.

Copilot AI review requested due to automatic review settings April 26, 2026 13:49
devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Google Gemini Realtime session to prevent microphone audio from being streamed while Gemini 3.1 synchronous tool calls are in-flight, avoiding server-side “new turn” detection that cancels pending tool calls and corrupts state.

Changes:

  • Added a _tool_call_pending flag to pause push_audio() during Gemini 3.1 tool execution.
  • Set the flag when a tool call is received (Gemini 3.1 only), clear it after sending tool responses.
  • Clear the flag on tool_call_cancellation to prevent stalling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 678 to +680

def push_audio(self, frame: rtc.AudioFrame) -> None:
if self._tool_call_pending:
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When audio is dropped due to _tool_call_pending, the AudioByteStream (and potentially the resampler) may still contain buffered partial samples from before the tool call. When _tool_call_pending flips back to False, the next push_audio() call can combine that stale buffered audio with new audio, creating a discontinuity/corrupted stream. Consider clearing _bstream (and resetting _input_resampler if needed) when entering the pending state (or before returning early here).

Suggested change
def push_audio(self, frame: rtc.AudioFrame) -> None:
if self._tool_call_pending:
def _clear_pending_audio_state(self) -> None:
flush_bstream = getattr(self._bstream, "flush", None)
if callable(flush_bstream):
for _ in flush_bstream():
pass
input_resampler = getattr(self, "_input_resampler", None)
if input_resampler is None:
return
reset_resampler = getattr(input_resampler, "reset", None)
if callable(reset_resampler):
reset_resampler()
return
flush_resampler = getattr(input_resampler, "flush", None)
if callable(flush_resampler):
for _ in flush_resampler():
pass
def push_audio(self, frame: rtc.AudioFrame) -> None:
if self._tool_call_pending:
self._clear_pending_audio_state()

Copilot uses AI. Check for mistakes.
Comment on lines +498 to +501
# true while synchronous tool call is in flight for 3.1 only
# Audio frames dropped here to prevent server from thinking incoming audio is a
# new turn and cancelling the pending tool call
self._tool_call_pending = False
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_tool_call_pending is only cleared on tool response send and server tool-call cancellation. If the session is restarted/disconnected while a tool call is pending (e.g., send/recv task errors trigger _mark_restart_needed(on_error=True)), the flag can remain True across reconnects and permanently mute push_audio() for Gemini 3.1. Suggest resetting _tool_call_pending as part of session restart/close (e.g., in _close_active_session, _mark_restart_needed, or at the start of each connect loop).

Copilot uses AI. Check for mistakes.
)
)
self._mark_current_generation_done()
if "3.1" in self._opts.model:
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model gating if "3.1" in self._opts.model is imprecise and can accidentally match non-Live models or future model names (the file already enumerates known Live model names via KNOWN_GEMINI_API_MODELS / LiveAPIModels). Prefer an exact match (or a well-scoped prefix check like model.startswith("gemini-3.1-")) to keep the behavior tightly bound to Gemini 3.1 Live only.

Suggested change
if "3.1" in self._opts.model:
if self._opts.model.startswith("gemini-3.1-"):

Copilot uses AI. Check for mistakes.
@vedevpatel vedevpatel force-pushed the fix/gemini-3.1-sync-tool-audio-gate branch from 0dbb2e9 to 5ae5e69 Compare April 26, 2026 14:51
devin-ai-integration[bot]

This comment was marked as resolved.

@vedevpatel vedevpatel force-pushed the fix/gemini-3.1-sync-tool-audio-gate branch from 5ae5e69 to 9325de1 Compare April 26, 2026 14:56
)
)
self._mark_current_generation_done()
if "3.1" in self._opts.model:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first of all, I don't think dropping the audio when there is in-flight tool call is a right solution, but I am wondering why here it's only applied to 3.1?

fyi, we support gemini NON_BLOCKING tool call via tool_behavior option, you may check that option instead of using the model name? or even making this configurable when the tool behavior is blocking?

…he Gemini 3.1 live model

Gemini 3.1 forces synchronous tool calling, which means the model blocks until tool
responses arrive. The plugin's _send_task was constantly forwarding microphone
audio while tools executed, which caused the server to think of incoming audio as a
new turn and cancel the pending tool call after ~12s. This caused duplicate
tool execution with already-resolved call_ids as well as corrupted conversation state.

Adds a _tool_call_pending flag (for Gemini 3.1 only) that drops push_audio frames
from the moment a toolCall is received until send_tool_response is flushed.
Also clears the flag on tool_call_cancellation so the session never stalls.
No behavior change for Gemini 2.5 models.
@vedevpatel vedevpatel force-pushed the fix/gemini-3.1-sync-tool-audio-gate branch from 9325de1 to db77c65 Compare April 27, 2026 18:41
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 8 additional findings in Devin Review.

Open in Devin Review

Comment on lines +1306 to +1312
is_blocking = (
not is_given(self._opts.tool_behavior)
or self._opts.tool_behavior == types.Behavior.BLOCKING
)
if is_blocking:
self._tool_call_pending = True
self._bstream.clear()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Audio silently dropped during blocking tool calls on all models, not just 3.1 as intended

The comment on line 498 says "for 3.1 only" and the commit message says "on the Gemini 3.1 live model", but the is_blocking check at lines 1306-1309 has no model guard — it evaluates to True for any model when tool_behavior is NOT_GIVEN (the default) or BLOCKING. This means push_audio silently drops all audio frames during blocking tool execution on the default 2.5 models (gemini-2.5-flash-native-audio-preview-12-2025) as well, where the underlying server issue (audio being interpreted as a new turn that cancels the pending tool call) may not exist. Users speaking during tool execution on 2.5 models will have their audio silently discarded.

Prompt for agents
The _handle_tool_calls method sets _tool_call_pending = True for all models with blocking tool behavior, but the comment and commit message state this should only apply to 3.1 models. The model name is available via self._opts.model. The fix should add a model check, e.g. checking if '3.1' is in self._opts.model (similar to how the RealtimeModel.__init__ uses '3.1 in model' to determine mutability at realtime_api.py:289). For example, the is_blocking check should also verify the model is a 3.1 model before setting _tool_call_pending = True and clearing the byte stream.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants