feat(agent_session): allow update_options(stt=, tts=) for runtime STT/TTS swap#6235
Open
SamarthUrs18 wants to merge 4 commits into
Open
feat(agent_session): allow update_options(stt=, tts=) for runtime STT/TTS swap#6235SamarthUrs18 wants to merge 4 commits into
SamarthUrs18 wants to merge 4 commits into
Conversation
The activity method previously only rewired the audio_recognition pipeline, leaving session._stt / agent._stt untouched. A caller invoking update_options on the activity directly would see activity.stt resolve to the OLD instance on the next read, so the per-call STT lookup inside stt_node kept using the old WebSocket. Mirror the swap onto self._session._stt / self._agent._stt (and the tts counterparts) so the method is symmetric regardless of whether it was invoked via AgentSession.update_options (the normal path) or directly. Also tightens the tts parameter type from NotGivenOr[object] to NotGivenOr[tts.TTS | None] for symmetry with stt. Addresses the asymmetry flagged by Devin Review on PR livekit#6235.
Two new regression tests for the asymmetry flagged by Devin Review on PR livekit#6235: when update_options is invoked on the AgentActivity directly (rather than going through AgentSession), it must still update session._stt / agent._stt / session._tts / agent._tts, not just rewire the audio_recognition pipeline. - test_update_options_directly_on_activity_swaps_state: session has its own STT/TTS, agent doesn't — verify session, agent, and activity all read the new instances after a direct activity.update_options call. - test_update_options_directly_on_activity_with_agent_bound_stt: agent has its own STT — verify the agent-level mirror also fires from a direct activity call.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
There is currently no public API to swap STT/TTS at runtime. The workaround
requires reaching into private internals:
update_stt()already exists and is already the right mechanism — this PRjust exposes it through the existing
update_options()surface.Use case
Mid-call language switching. User says "speak in Kannada" → agent swaps the
Sarvam STT language code and the TTS voice without dropping the call.
Verified end-to-end with
livekit-plugins-sarvamagainst a local LiveKitserver in console mode: the Sarvam STT WebSocket closes and reconnects
with the new language code on
update_options(stt=...); TTS takes effecton the next synthesis call.
A reference agent that uses this as an LLM-callable tool lives in
examples/dev/sarvam_language_swap.py(gitignored underexamples/dev/*). It exposes aswitch_language(language_code)tool andswaps STT/TTS based on the user's request.
Changes
AgentSession.update_optionsgainsstt=andtts=kwargs.AgentActivity.update_optionsforwards them and callsaudio_recognition.update_stt(...)for STT; TTS takes effect on the nextsynthesis call because
tts_nodereadsactivity.ttsper call.mirrored onto
agent._stt/agent._ttssoactivity.stt/activity.ttscontinue to prefer the agent-bound instance (matching theexisting resolution order).
The
stt/ttsparameter names shadow the same-named module imports insidethe method body. Since the body never references the modules after the
parameter list, no aliasing is required; this is documented in a Note
block in the docstring.
Tests
Five new tests in
tests/test_agent_session.py:test_update_options_stt_swaps_session_and_rewires_activity— verifiessession._sttis swapped,activity.sttresolves to the new STT, andaudio_recognition.update_sttis called with the agent's boundstt_node.test_update_options_tts_swaps_session_and_agent_resolves_to_new_tts—TTS swap mirrors the session-level behavior.
test_update_options_stt_mirrors_to_agent_when_agent_has_own_stt—confirms the agent-level mirror when the agent has its own STT.
test_update_options_stt_none_disables_pipeline—stt=Noneclearsthe pipeline.
test_update_options_stt_unchanged_when_not_provided— passing onlyendpointing_optsleaves STT/TTS untouched.All 53 tests in
test_agent_session.pypass;make checkis clean.Backwards compatibility
update_optionswas already sync (not async), so the new kwargs arepurely additive. All existing callers continue to work unchanged. The
parameter list keeps
*-only style and usesNotGivenOr[T | None]tomatch the rest of the codebase's "explicit None to disable" convention.