Skip to content

feat(agent_session): allow update_options(stt=, tts=) for runtime STT/TTS swap#6235

Open
SamarthUrs18 wants to merge 4 commits into
livekit:mainfrom
SamarthUrs18:feat/update-options-stt-tts
Open

feat(agent_session): allow update_options(stt=, tts=) for runtime STT/TTS swap#6235
SamarthUrs18 wants to merge 4 commits into
livekit:mainfrom
SamarthUrs18:feat/update-options-stt-tts

Conversation

@SamarthUrs18

Copy link
Copy Markdown

Motivation

There is currently no public API to swap STT/TTS at runtime. The workaround
requires reaching into private internals:

session._activity._audio_recognition.update_stt(new_stt_node)

update_stt() already exists and is already the right mechanism — this PR
just exposes it through the existing update_options() surface.

Use case

Mid-call language switching. User says "speak in Kannada" → agent swaps the
Sarvam STT language code and the TTS voice without dropping the call.

Verified end-to-end with livekit-plugins-sarvam against a local LiveKit
server in console mode: the Sarvam STT WebSocket closes and reconnects
with the new language code on update_options(stt=...); TTS takes effect
on the next synthesis call.

A reference agent that uses this as an LLM-callable tool lives in
examples/dev/sarvam_language_swap.py (gitignored under
examples/dev/*). It exposes a switch_language(language_code) tool and
swaps STT/TTS based on the user's request.

Changes

  • AgentSession.update_options gains stt= and tts= kwargs.
  • AgentActivity.update_options forwards them and calls
    audio_recognition.update_stt(...) for STT; TTS takes effect on the next
    synthesis call because tts_node reads activity.tts per call.
  • When the agent was constructed with its own STT/TTS instance, the swap is
    mirrored onto agent._stt / agent._tts so activity.stt /
    activity.tts continue to prefer the agent-bound instance (matching the
    existing resolution order).

The stt/tts parameter names shadow the same-named module imports inside
the method body. Since the body never references the modules after the
parameter list, no aliasing is required; this is documented in a Note
block in the docstring.

Tests

Five new tests in tests/test_agent_session.py:

  • test_update_options_stt_swaps_session_and_rewires_activity — verifies
    session._stt is swapped, activity.stt resolves to the new STT, and
    audio_recognition.update_stt is called with the agent's bound
    stt_node.
  • test_update_options_tts_swaps_session_and_agent_resolves_to_new_tts
    TTS swap mirrors the session-level behavior.
  • test_update_options_stt_mirrors_to_agent_when_agent_has_own_stt
    confirms the agent-level mirror when the agent has its own STT.
  • test_update_options_stt_none_disables_pipelinestt=None clears
    the pipeline.
  • test_update_options_stt_unchanged_when_not_provided — passing only
    endpointing_opts leaves STT/TTS untouched.

All 53 tests in test_agent_session.py pass; make check is clean.

Backwards compatibility

update_options was already sync (not async), so the new kwargs are
purely additive. All existing callers continue to work unchanged. The
parameter list keeps *-only style and uses NotGivenOr[T | None] to
match the rest of the codebase's "explicit None to disable" convention.

@SamarthUrs18 SamarthUrs18 requested a review from a team as a code owner June 25, 2026 19:56
@CLAassistant

CLAassistant commented Jun 25, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

devin-ai-integration[bot]

This comment was marked as resolved.

The activity method previously only rewired the audio_recognition pipeline,
leaving session._stt / agent._stt untouched. A caller invoking
update_options on the activity directly would see activity.stt resolve to
the OLD instance on the next read, so the per-call STT lookup inside
stt_node kept using the old WebSocket.

Mirror the swap onto self._session._stt / self._agent._stt (and the tts
counterparts) so the method is symmetric regardless of whether it was
invoked via AgentSession.update_options (the normal path) or directly.
Also tightens the tts parameter type from NotGivenOr[object] to
NotGivenOr[tts.TTS | None] for symmetry with stt.

Addresses the asymmetry flagged by Devin Review on PR livekit#6235.
Two new regression tests for the asymmetry flagged by Devin Review on
PR livekit#6235: when update_options is invoked on the AgentActivity directly
(rather than going through AgentSession), it must still update
session._stt / agent._stt / session._tts / agent._tts, not just rewire
the audio_recognition pipeline.

- test_update_options_directly_on_activity_swaps_state: session has its
  own STT/TTS, agent doesn't — verify session, agent, and activity all
  read the new instances after a direct activity.update_options call.
- test_update_options_directly_on_activity_with_agent_bound_stt: agent
  has its own STT — verify the agent-level mirror also fires from a
  direct activity call.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants