🔴 Required Information
Describe the Bug
When using run_live() (or bidi_stream_query()) with tools that the model calls after finishing speech ("fire-and-forget" pattern), the function response sent via send_content() arrives after turnComplete has already broken the receive() loop. On re-entry via the while True loop in _receive_from_model(), the orphaned response is consumed by the model as fresh input, triggering a complete duplicate response.
This affects any tool where the model has finished speaking before calling it — UI suggestion chips, session state updates, FAQ/search tools on short answers. The duplication rate correlates with how many zero-audio tool calls occur per turn: 100% on greeting turns with 3 tools, 47-80% on other tool-calling turns.
This is distinct from the multi-agent transfer duplication fixed in v1.20.0 (cf21ca3). That fix addressed agent_transfer tool calls. This covers single-agent fire-and-forget tool calls, which follow the same code path but are not covered by the existing guard.
Reproduces on adk web (Google's own dev server) — not specific to any custom infrastructure.
Steps to Reproduce
- Create a file
greeter/agent.py:
from google.adk.agents import Agent
def suggest_topics(topics: list[str]) -> dict:
"""Display follow-up topic suggestions to the user."""
return {"displayed": topics}
root_agent = Agent(
model="gemini-live-2.5-flash-native-audio",
name="greeter",
instruction=(
"You are a friendly greeter. "
"Greet the user warmly in 1-2 sentences. "
"After greeting, call suggest_topics with 3 relevant follow-up topics."
),
tools=[suggest_topics],
)
-
Create an empty greeter/__init__.py.
-
Start adk web:
export GOOGLE_GENAI_USE_VERTEXAI=True
export GOOGLE_CLOUD_PROJECT=<your-project>
export GOOGLE_CLOUD_LOCATION=us-central1
adk web . --port 9001 --no-reload
-
Open http://127.0.0.1:9001 in browser, select greeter agent.
-
Click the microphone button and say "Hello" (text input uses generateContent which doesn't support the live audio model).
-
Observe two symptoms:
- Empty turns: The model calls
suggest_topics, an empty turnComplete fires (no audio), then a second turnComplete delivers the actual greeting. The user hears one greeting but the session contains two turn cycles.
- Lost tool results: On subsequent turns, ask the model to search for topics. It may say "Sure, let me find some topics," call the tool, receive results, but then never verbalize the results — the user has to prompt again. The tool response is consumed by the feedback loop but produces no spoken output.
Alternatively, reproduce programmatically via Runner.run_live():
import asyncio
from google.adk.agents import Agent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.adk.agents.live_request_queue import LiveRequestQueue, LiveRequest
from google import genai
def suggest_topics(topics: list[str]) -> dict:
"""Display follow-up topic suggestions to the user."""
return {"displayed": topics}
agent = Agent(
model="gemini-live-2.5-flash-native-audio",
name="greeter",
instruction="Greet the user warmly. After greeting, call suggest_topics with 3 topics.",
tools=[suggest_topics],
)
session_service = InMemorySessionService()
runner = Runner(agent=agent, app_name="repro", session_service=session_service)
async def run():
session = await session_service.create_session(app_name="repro", user_id="test")
queue = LiveRequestQueue()
queue.send(LiveRequest(content=genai.types.Content(
role="user", parts=[genai.types.Part(text="Hello")]
)))
turn_completes = 0
async for event in runner.run_live(
user_id="test", session_id=session.id, live_request_queue=queue
):
if getattr(event, "turn_complete", None) is not None:
turn_completes += 1
print(f"turnComplete #{turn_completes}")
if turn_completes >= 2:
queue.close()
print("BUG: Duplicate response detected")
return
asyncio.run(run())
Expected Behavior
The model should respond once per user turn. Tool responses sent back to the model after turnComplete should not trigger a new model turn with duplicate content.
Observed Behavior
The observable symptoms vary depending on whether the orphaned tool response triggers a spoken response or an empty turn:
-
Duplicate spoken responses (common in programmatic/text input): The model responds twice with substantially identical content — a full re-generation (34-70 audio chunks). Most visible in automated testing.
-
Lost tool results (common in browser/mic input): The model calls a tool, an empty turnComplete fires immediately (fire-and-forget), and the tool results are consumed by the feedback loop but produce no spoken follow-up. The user has to re-prompt. Example from a live mic session — user asks about topics, model says "Sure, let me search," calls suggest_topics, receives results, but never verbalizes them.
-
Silent extra turns: Empty turnComplete events (no audio, no transcription) appear between spoken turns, inflating turn counts and disrupting session state.
adk web session DB from a live microphone interaction shows the pattern clearly:
+ 0ms USER: "hello"
+ 29281ms FUNCTION_CALL: suggest_topics
+ 29283ms FUNCTION_RESPONSE: suggest_topics
+ 29287ms TURN_COMPLETE ← empty, no audio (fire-and-forget)
+ 30699ms TRANSCRIPTION: "Hello there! Welcome..."
+ 30703ms TURN_COMPLETE ← actual spoken response
... later in conversation:
+129273ms FUNCTION_CALL: suggest_topics
+129275ms FUNCTION_RESPONSE: suggest_topics
+129287ms TURN_COMPLETE ← empty
+129405ms TURN_COMPLETE ← empty again, results NEVER spoken
Traced event sequence from the programmatic minimal repro:
+1164ms FUNCTION_CALL: suggest_topics
+1164ms FUNCTION_RESPONSE: suggest_topics
+1174ms TURN_COMPLETE #1 ← fire-and-forget (10ms after tool call)
+6168ms TRANSCRIPTION: "Hello! It's great to chat with you..."
+6170ms TURN_COMPLETE #2 ← DUPLICATE (5s later, full greeting)
Sub-millisecond trace data from 5 instrumented trials on a production agent (3 tools):
| Trial |
turnComplete (ms) |
tool_response_sent (ms) |
Delta |
Outcome |
| 0 |
455716.8 |
455718.2 |
+1.4ms |
Orphaned (TC first) |
| 1 |
467824.1 |
467821.7 |
-2.4ms |
Sent first, model ignores |
| 2 |
486189.5 |
486190.1 |
+0.6ms |
Orphaned (TC first) |
| 3 |
500995.8 |
500997.8 |
+2.0ms |
Orphaned (TC first) |
| 4 |
515989.3 |
515989.8 |
+0.5ms |
Orphaned (TC first) |
Environment Details
- ADK Library Version: 1.27.1 (also reproduced on 1.24.1)
- Desktop OS: macOS 15.4 (also reproduced on Debian 12 / Cloud Run)
- Python Version: 3.13.0 (also tested 3.12)
- genai SDK: google-genai 1.14.0
Model Information
- Are you using LiteLLM: No
- Which model:
gemini-live-2.5-flash-native-audio (via Vertex AI, GOOGLE_GENAI_USE_VERTEXAI=True)
🟡 Optional Information
Regression
This affects ADK 1.24.1 through 1.27.1 (all versions tested). The v1.20.0 fix (cf21ca3) addressed multi-agent transfer duplication but did NOT cover this single-agent fire-and-forget pattern.
| ADK Version |
Greeting duplication rate (3 tools) |
| 1.24.1 |
95% (50 trials) |
| 1.27.1 |
100% (10 trials) |
How often has this issue occurred?
Always (100%) on greeting turns with post-speech tool calls. 47-80% on other tool-calling turns depending on audio duration.
Minimal Reproduction Code
See Steps to Reproduce above — both adk web browser and programmatic Runner.run_live() reproductions included.
Logs
Traced output from the programmatic reproduction:
ADK Duplicate Response — Minimal Reproduction
Model: gemini-live-2.5-flash-native-audio
Iterations: 5
ADK: 1.27.1
+ 1164ms FUNCTION_CALL: suggest_topics
+ 1164ms FUNCTION_RESPONSE: suggest_topics
+ 1174ms TURN_COMPLETE #1
+ 6168ms TRANSCRIPTION_FINISHED: Hello! It's great to chat with you. How can I help you today?
+ 6170ms TURN_COMPLETE #2
Trial 0: DUPE (tc=2)
+ 1155ms FUNCTION_CALL: suggest_topics
+ 1155ms FUNCTION_RESPONSE: suggest_topics
+ 1156ms TURN_COMPLETE #1
+ 2806ms TRANSCRIPTION_FINISHED: Hello! I'm the greeter agent...
+ 2808ms TURN_COMPLETE #2
Trial 1: DUPE (tc=2)
BUG CONFIRMED: 5/5 trials produced duplicate responses.
Session DB from adk web live microphone interaction (ADK's own SQLite storage):
+ 0ms USER: hello
+ 29281ms FUNCTION_CALL: suggest_topics
+ 29283ms FUNCTION_RESPONSE: suggest_topics
+ 29287ms TURN_COMPLETE ← empty, fire-and-forget (no audio)
+ 30699ms TRANSCRIPTION: "Hello there! Welcome..."
+ 30703ms TURN_COMPLETE ← actual spoken response (2nd turn cycle)
Additional Context
Root cause analysis
The interaction between two code sections in base_llm_flow.py creates a feedback cycle:
Function response reinjection (lines ~536-543):
yield event
# send back the function response to models
if event.get_function_responses():
invocation_context.live_request_queue.send_content(event.content)
The yield suspends the generator. While suspended, turnComplete arrives and is buffered. When execution resumes, send_content() fires but receive() has already exited.
The while-True loop (lines ~700+):
while True:
async with llm_connection.receive() as resp:
async for event in self._postprocess_live(...):
yield event
await asyncio.sleep(0)
After receive() returns on turnComplete, the loop re-enters receive(). The orphaned tool response lands in this new cycle as fresh input.
Why fire-and-forget happens: The model registers tools as BLOCKING (default), but when it has nothing to say alongside the tool call, it emits turnComplete with 0 audio and empty transcription simultaneously. The model treats the tool as a side-effect and does not wait.
Audio duration correlation: When the model is still streaming audio while calling a tool, the response has time to arrive before turnComplete. When the model has 0 remaining audio, the 0-2ms window is too narrow.
| Scenario |
Audio chunks |
Response arrives before TC? |
Duplicate? |
| Tool called mid-speech |
52 chunks (~3s) |
Yes |
No |
| Tool called post-speech |
0 chunks |
No (0-2ms race) |
Yes |
Isolation testing
| Layer |
Duplicates |
Trials |
| Direct Gemini API (no ADK) |
0% |
40 |
| ADK with while-True loop removed |
0% |
275 |
| ADK unmodified (1.27.1) |
44% overall, 100% on greeting |
275 |
adk web (Google's server) |
100% on greeting |
10 |
Proposed fixes
Option A: Send function response before yielding — TESTED, does NOT work
We tested moving send_content() before yield (10 trials). Result: 10/10 still duplicate. The race is not between send_content and yield — it's between the model sending turnComplete and ADK processing the tool call. By the time ADK sees the FUNCTION_CALL, the turnComplete is already buffered on the WebSocket. send_content() only enqueues to the LiveRequestQueue; it doesn't prevent the already-buffered turnComplete from being read by receive().
# Tested (still races — turnComplete already buffered before send_content runs):
if event.get_function_responses():
invocation_context.live_request_queue.send_content(event.content)
yield event
# Result: 10/10 DUPE
Option B: Drain the WebSocket receive buffer before re-entering the while-True loop. Ensure any pending send_content responses are delivered and acknowledged before receive() reads the next message.
Option C: Extend the v1.20.0 guard (cf21ca3) to cover fire-and-forget tool calls (not just agent transfers). Detect when a cycle had function responses + 0 audio output, and suppress re-entry.
Option D (validated): Remove the while-True loop entirely
We monkey-patched _receive_from_model() to process one receive() cycle and return (no while True re-entry). Result: 0% duplication across 275 trials (10 conversation patterns, 4 tiers, LLM-judged). Trade-off: +11pp no-response rate (the loop was implicitly retrying on silence), mitigable with client-side retry.
Happy to contribute a PR if the team confirms the preferred direction.
Related Issues
🔴 Required Information
Describe the Bug
When using
run_live()(orbidi_stream_query()) with tools that the model calls after finishing speech ("fire-and-forget" pattern), the function response sent viasend_content()arrives afterturnCompletehas already broken thereceive()loop. On re-entry via thewhile Trueloop in_receive_from_model(), the orphaned response is consumed by the model as fresh input, triggering a complete duplicate response.This affects any tool where the model has finished speaking before calling it — UI suggestion chips, session state updates, FAQ/search tools on short answers. The duplication rate correlates with how many zero-audio tool calls occur per turn: 100% on greeting turns with 3 tools, 47-80% on other tool-calling turns.
This is distinct from the multi-agent transfer duplication fixed in v1.20.0 (cf21ca3). That fix addressed
agent_transfertool calls. This covers single-agent fire-and-forget tool calls, which follow the same code path but are not covered by the existing guard.Reproduces on
adk web(Google's own dev server) — not specific to any custom infrastructure.Steps to Reproduce
greeter/agent.py:Create an empty
greeter/__init__.py.Start
adk web:Open
http://127.0.0.1:9001in browser, select greeter agent.Click the microphone button and say "Hello" (text input uses
generateContentwhich doesn't support the live audio model).Observe two symptoms:
suggest_topics, an emptyturnCompletefires (no audio), then a secondturnCompletedelivers the actual greeting. The user hears one greeting but the session contains two turn cycles.Alternatively, reproduce programmatically via
Runner.run_live():Expected Behavior
The model should respond once per user turn. Tool responses sent back to the model after
turnCompleteshould not trigger a new model turn with duplicate content.Observed Behavior
The observable symptoms vary depending on whether the orphaned tool response triggers a spoken response or an empty turn:
Duplicate spoken responses (common in programmatic/text input): The model responds twice with substantially identical content — a full re-generation (34-70 audio chunks). Most visible in automated testing.
Lost tool results (common in browser/mic input): The model calls a tool, an empty
turnCompletefires immediately (fire-and-forget), and the tool results are consumed by the feedback loop but produce no spoken follow-up. The user has to re-prompt. Example from a live mic session — user asks about topics, model says "Sure, let me search," callssuggest_topics, receives results, but never verbalizes them.Silent extra turns: Empty
turnCompleteevents (no audio, no transcription) appear between spoken turns, inflating turn counts and disrupting session state.adk websession DB from a live microphone interaction shows the pattern clearly:Traced event sequence from the programmatic minimal repro:
Sub-millisecond trace data from 5 instrumented trials on a production agent (3 tools):
turnComplete(ms)tool_response_sent(ms)Environment Details
Model Information
gemini-live-2.5-flash-native-audio(via Vertex AI,GOOGLE_GENAI_USE_VERTEXAI=True)🟡 Optional Information
Regression
This affects ADK 1.24.1 through 1.27.1 (all versions tested). The v1.20.0 fix (cf21ca3) addressed multi-agent transfer duplication but did NOT cover this single-agent fire-and-forget pattern.
How often has this issue occurred?
Always (100%) on greeting turns with post-speech tool calls. 47-80% on other tool-calling turns depending on audio duration.
Minimal Reproduction Code
See Steps to Reproduce above — both
adk webbrowser and programmaticRunner.run_live()reproductions included.Logs
Traced output from the programmatic reproduction:
Session DB from
adk weblive microphone interaction (ADK's own SQLite storage):Additional Context
Root cause analysis
The interaction between two code sections in
base_llm_flow.pycreates a feedback cycle:Function response reinjection (lines ~536-543):
The
yieldsuspends the generator. While suspended,turnCompletearrives and is buffered. When execution resumes,send_content()fires butreceive()has already exited.The while-True loop (lines ~700+):
After
receive()returns onturnComplete, the loop re-entersreceive(). The orphaned tool response lands in this new cycle as fresh input.Why fire-and-forget happens: The model registers tools as
BLOCKING(default), but when it has nothing to say alongside the tool call, it emitsturnCompletewith 0 audio and empty transcription simultaneously. The model treats the tool as a side-effect and does not wait.Audio duration correlation: When the model is still streaming audio while calling a tool, the response has time to arrive before
turnComplete. When the model has 0 remaining audio, the 0-2ms window is too narrow.Isolation testing
adk web(Google's server)Proposed fixes
Option A: Send function response before yielding — TESTED, does NOT work
We tested moving
send_content()beforeyield(10 trials). Result: 10/10 still duplicate. The race is not betweensend_contentandyield— it's between the model sendingturnCompleteand ADK processing the tool call. By the time ADK sees theFUNCTION_CALL, theturnCompleteis already buffered on the WebSocket.send_content()only enqueues to theLiveRequestQueue; it doesn't prevent the already-bufferedturnCompletefrom being read byreceive().Option B: Drain the WebSocket receive buffer before re-entering the while-True loop. Ensure any pending
send_contentresponses are delivered and acknowledged beforereceive()reads the next message.Option C: Extend the v1.20.0 guard (cf21ca3) to cover fire-and-forget tool calls (not just agent transfers). Detect when a cycle had function responses + 0 audio output, and suppress re-entry.
Option D (validated): Remove the while-True loop entirely
We monkey-patched
_receive_from_model()to process onereceive()cycle and return (nowhile Truere-entry). Result: 0% duplication across 275 trials (10 conversation patterns, 4 tiers, LLM-judged). Trade-off: +11pp no-response rate (the loop was implicitly retrying on silence), mitigable with client-side retry.Happy to contribute a PR if the team confirms the preferred direction.
Related Issues