Fix stale playout state across multi-step voice turns#5533
Fix stale playout state across multi-step voice turns#5533jayeshp19 wants to merge 3 commits intolivekit:mainfrom
Conversation
efc3ae3 to
efcb35c
Compare
do you mean a tool reply created after a speech was paused, the paused_speech is still the same as the new tool reply? can you share an example to demonstrate the issue it caused? |
8c7fa33 to
741c6cf
Compare
Yes, that is the case. A specific interleaving example is:
Before this change, The visible effect in the regression test is that false-interruption resume transitions the agent from With generation-step tracking, the tool-reply pause refreshes the paused state for the current generation, so resume restores the correct state. I rebased the PR and added a regression test for this specific interleaving. |
|
@jayeshp19 thanks for the details! and yeah I see the issue, it could be fixed in a simple way that we reset the paused speech in |
Thanks, it makes sense for the bug I was trying to fix. One small note: I also tested a edge case where the false-interruption timer fires before the silent tool-call generation finishes. In that case the runtime can still do a So I’m good with #5594 superseding this PR for the cross-step stale-state bug. I can close this PR once #5594 lands. |
This PR fixes a bug where paused-speech state could become stale across generation steps within a single
SpeechHandle. The key changes are:current_generation_playout_activeflag onSpeechHandle— tracks whether the current generation's audio is actively playing, reset on each step advance.generation_stepfield on_PausedSpeechInfo— ensures paused speech is only applied/resumed if it matches the current generation step._on_generation_playout_started/_on_generation_playout_finished— extracts duplicated playout lifecycle logic (state updates, audio recognition notifications, interruption toggling) into two reusable methods.The audio-interruption condition now also gates on
current_generation_playout_active, preventing interruptions when the agent isn't actually playing audio for the current generation.