Describe the bug
run_live() contains a while True: reconnection loop intended for session resumption. However, this loop has no way to distinguish between:
- An intentional client-side shutdown via
LiveRequestQueue.close()
- An unintentional network drop that should trigger reconnection
As a result, calling LiveRequestQueue.close() does not actually terminate the live session. After the application code believes the session has ended, run_live() silently re-establishes a new WebSocket connection (a "zombie" session) without any notification to the caller.
Steps to reproduce
- Start a live session with
agent_runner.run_live() and consume events via async for event in live_events:
- Call
live_request_queue.close() to signal session end
- Wait — the underlying WebSocket connection will be re-established automatically by the reconnect loop
- Observe in Cloud Audit Logs (or server-side logs): periodic
"The operation was cancelled." errors at ~10-minute intervals indefinitely
A minimal reproduction is possible with the official bidi-demo sample:
https://github.com/google/adk-samples/tree/main/python/agents/bidi-demo/app
After sending a message and leaving idle, the zombie session and its periodic cancellations persist indefinitely.
Expected behavior
Calling LiveRequestQueue.close() should fully terminate the live session. run_live() should exit cleanly without reconnecting.
Observed behavior
Scenario A — session resumption handle present: run_live() catches APIError(1000) (normal WebSocket close), finds a session handle, and calls continue — reconnecting despite the intentional close.
Scenario B — no session resumption handle: run_live() catches APIError(1000), finds no handle, logs a spurious ERROR: APIError in live flow: 1000 None., and raises — treating a clean close as an error.
In both cases, a zombie connection is either kept alive or repeatedly re-established. The Gemini Live server cancels idle connections after ~10 minutes, which surfaces as:
ERROR: "The operation was cancelled." (gRPC code 1)
in Cloud Audit Logs — repeated indefinitely at ~10-minute intervals, even long after the application believes the session has ended.
The Google auth token refresh cycle visible in debug logs confirms the zombie connection remains active:
[DEBUG] google.auth.transport.requests: Making request... # every ~10 min
[DEBUG] google.auth.transport.requests: Response received...
No application-level logs appear — the zombie reconnect is completely transparent to user code.
Environment
- google-adk version: 1.22.1 (also reproduced on latest)
- google-genai version: 1.59.0+
- Python version: 3.12
- OS: Linux
- Model:
gemini-live-2.5-flash (Vertex AI)
- Method:
google.cloud.aiplatform.v1beta1.LlmBidiService.BidiGenerateContent
Regression
This affects all versions of google-adk that include the while True: reconnection loop in run_live() (introduced with session resumption support). PR #5007 did not address this case as it fixed the opposite direction (session resumption loop never iterating).
Logs
Cloud Audit Log (repeated every ~10 minutes after session is believed closed):
{
"protoPayload": {
"status": { "code": 1, "message": "The operation was cancelled." },
"methodName": "google.cloud.aiplatform.v1beta1.LlmBidiService.BidiGenerateContent"
},
"severity": "ERROR"
}
Application debug logs (every ~10 minutes — auth refresh for zombie connection):
[DEBUG] google.auth.transport.requests: Making request...
[DEBUG] google.auth.transport.requests: Response received...
No application-level logs appear — the zombie reconnect is completely transparent to user code.
Root cause
In base_llm_flow.py, run_live()'s exception handlers cannot tell whether APIError(1000) / ConnectionClosed originated from:
LiveRequestQueue.close() calling llm_connection.close() (intentional)
- A server-side or network-triggered close (unintentional)
except errors.APIError as e:
if e.code in [1000, 1006]:
if invocation_context.live_session_resumption_handle:
continue # reconnects even after intentional close!
logger.error('APIError in live flow: %s', e) # spurious error if no handle
raise
Proposed fix
PR #5226 addresses this by adding an is_closed flag to LiveRequestQueue that is set synchronously in close(). run_live()'s exception handlers check this flag before attempting to reconnect:
if e.code == 1000 and invocation_context.live_request_queue.is_closed:
logger.info('Live session for agent %s closed by client request.', ...)
return # clean exit, no reconnect
Additional context
- Google Cloud Support confirmed: "simply pushing a 'close' message or sentinel to the LiveRequestQueue is not sufficient to fully terminate the underlying bidirectional streaming connection"
- Discussed in GitHub Discussion #4156
Describe the bug
run_live()contains awhile True:reconnection loop intended for session resumption. However, this loop has no way to distinguish between:LiveRequestQueue.close()As a result, calling
LiveRequestQueue.close()does not actually terminate the live session. After the application code believes the session has ended,run_live()silently re-establishes a new WebSocket connection (a "zombie" session) without any notification to the caller.Steps to reproduce
agent_runner.run_live()and consume events viaasync for event in live_events:live_request_queue.close()to signal session end"The operation was cancelled."errors at ~10-minute intervals indefinitelyA minimal reproduction is possible with the official
bidi-demosample:https://github.com/google/adk-samples/tree/main/python/agents/bidi-demo/app
After sending a message and leaving idle, the zombie session and its periodic cancellations persist indefinitely.
Expected behavior
Calling
LiveRequestQueue.close()should fully terminate the live session.run_live()should exit cleanly without reconnecting.Observed behavior
Scenario A — session resumption handle present:
run_live()catchesAPIError(1000)(normal WebSocket close), finds a session handle, and callscontinue— reconnecting despite the intentional close.Scenario B — no session resumption handle:
run_live()catchesAPIError(1000), finds no handle, logs a spuriousERROR: APIError in live flow: 1000 None., and raises — treating a clean close as an error.In both cases, a zombie connection is either kept alive or repeatedly re-established. The Gemini Live server cancels idle connections after ~10 minutes, which surfaces as:
in Cloud Audit Logs — repeated indefinitely at ~10-minute intervals, even long after the application believes the session has ended.
The Google auth token refresh cycle visible in debug logs confirms the zombie connection remains active:
No application-level logs appear — the zombie reconnect is completely transparent to user code.
Environment
gemini-live-2.5-flash(Vertex AI)google.cloud.aiplatform.v1beta1.LlmBidiService.BidiGenerateContentRegression
This affects all versions of
google-adkthat include thewhile True:reconnection loop inrun_live()(introduced with session resumption support). PR #5007 did not address this case as it fixed the opposite direction (session resumption loop never iterating).Logs
Cloud Audit Log (repeated every ~10 minutes after session is believed closed):
{ "protoPayload": { "status": { "code": 1, "message": "The operation was cancelled." }, "methodName": "google.cloud.aiplatform.v1beta1.LlmBidiService.BidiGenerateContent" }, "severity": "ERROR" }Application debug logs (every ~10 minutes — auth refresh for zombie connection):
No application-level logs appear — the zombie reconnect is completely transparent to user code.
Root cause
In
base_llm_flow.py,run_live()'s exception handlers cannot tell whetherAPIError(1000)/ConnectionClosedoriginated from:LiveRequestQueue.close()callingllm_connection.close()(intentional)Proposed fix
PR #5226 addresses this by adding an
is_closedflag toLiveRequestQueuethat is set synchronously inclose().run_live()'s exception handlers check this flag before attempting to reconnect:Additional context