fix(voice): don't re-issue in-flight tool calls on a new turn#6227
Conversation
|
|
||
| tool_messages = new_calls + new_fnc_outputs | ||
| if fnc_executed_ev._reply_required: | ||
| if fnc_executed_ev._reply_required and not speech_handle.interrupted: |
There was a problem hiding this comment.
🚩 Interruption guard only applied to the pipeline path, not the realtime path
The new and not speech_handle.interrupted guard at livekit-agents/livekit/agents/voice/agent_activity.py:3130 prevents scheduling a tool response when the speech was interrupted during tool execution. However, the analogous realtime model path at livekit-agents/livekit/agents/voice/agent_activity.py:3781-3804 does not have this guard. The realtime path's fnc_executed_ev._reply_required check at line 3782 does not account for interruption. This is a pre-existing inconsistency (not introduced by this PR), but worth noting since this PR explicitly addresses the pipeline case.
Was this helpful? React with 👍 or 👎 to provide feedback.
When a new generation fires while a tool from a previous turn is still running, the in-flight call has no entry in the turn's chat context, so the model re-issues it and duplicates side effects. This injects an in-progress function_call/output pair for the running tool calls (from the session-wide _RunningTasks registry, covering both activity- and session-scoped executors) before the LLM call, so the model leaves the call alone. The pair is flagged and stripped again before the context is forwarded to the tool-reply turn, which re-injects from the live running set, so a placeholder is never persisted and can't go stale. Injecting in place rather than into a copy preserves any edits a custom llm_node makes to the context. It also stops spawning a tool-response generation for an interrupted turn: that reply would be dropped by the interrupted check anyway and waste an LLM call. The completed tool outputs are committed to the chat context directly instead, so an interrupted tool's result is preserved rather than lost.
09fd12d to
7e6e03f
Compare
gyx09212214-prog
left a comment
There was a problem hiding this comment.
Thanks for tackling the duplicate in-flight tool call case. One edge case I think still needs coverage: _inject_running_tool_calls(chat_ctx, ...) mutates chat_ctx before the LLM call, but _strip_running_tool_calls(chat_ctx) currently only runs inside the _reply_required and not speech_handle.interrupted branch. If a turn is interrupted, or if _reply_required is false, the injected function_call/function_call_output pair can remain in this activity's chat context even though the PR description says placeholders are stripped and never persisted/stale.
Could we move the strip into a finally/common cleanup path after inference, or add a regression test for interrupted / no-reply-required tool completion showing the placeholder is removed from chat_ctx?
the chat_ctx it modified is a copy and will be discarded after the pipeline generation task, the only chance it reused is passing it to next tool reply turn |
When a new generation fires while a tool from a previous turn is still running, the in-flight call has no entry in the turn's chat context, so the model re-issues it and duplicates side effects (e.g. booking a flight twice).
This injects an ephemeral in-progress
function_call/function_call_outputpair for the running tool calls (from the session-wide_RunningTasksregistry, covering both activity- and session-scoped executors) into the copy of the chat context fed to the LLM only. It is recomputed every turn and never persisted or forwarded, so it is superseded as soon as the real output lands and a placeholder can never go stale.It also stops spawning a tool-response generation for an interrupted turn: that reply would be dropped by the interrupted check anyway and waste an LLM call. The completed tool outputs are committed to the chat context directly instead, so an interrupted tool's result is preserved rather than lost.