Conversation
|
@chenghao-mou Excited to see this! A couple of questions:
|
Thanks for your patience! We don't have an official decision or timeline yet, but hopefully I can get it ready within a month or two. |
| base_url="http://0.0.0.0:8080/v1", | ||
| ), | ||
| endpointing={ | ||
| "min_delay": 1.5, |
There was a problem hiding this comment.
Temporary change, will remove before merging
| self._opts = TurnDetectorOptions( | ||
| sample_rate=sample_rate, | ||
| base_url=lk_base_url, | ||
| api_key="devkey", | ||
| api_secret="devsecret", | ||
| conn_options=conn_options, | ||
| ) |
There was a problem hiding this comment.
🔴 Hardcoded "devkey"/"devsecret" credentials override resolved API key/secret
The constructor resolves lk_api_key and lk_api_secret from environment variables and validates them (lines 59-72), but then discards these resolved values and hardcodes api_key="devkey" and api_secret="devsecret" in TurnDetectorOptions. These hardcoded values are then used in stream.py:242 to create JWT access tokens via create_access_token(self._opts.api_key, self._opts.api_secret) (see livekit-agents/livekit/agents/inference/_utils.py:61-68). This means authentication will always fail against any real LiveKit server since the token is signed with the wrong credentials. This is clearly a debugging leftover.
| self._opts = TurnDetectorOptions( | |
| sample_rate=sample_rate, | |
| base_url=lk_base_url, | |
| api_key="devkey", | |
| api_secret="devsecret", | |
| conn_options=conn_options, | |
| ) | |
| self._opts = TurnDetectorOptions( | |
| sample_rate=sample_rate, | |
| base_url=lk_base_url, | |
| api_key=lk_api_key, | |
| api_secret=lk_api_secret, | |
| conn_options=conn_options, | |
| ) |
Was this helpful? React with 👍 or 👎 to provide feedback.
| def update_turn_detector(self, detector: MultimodalTurnDetector | None) -> None: | ||
| self._turn_detector = detector |
There was a problem hiding this comment.
🔴 update_turn_detector(None) destroys non-Multimodal turn detectors set by update_options
In update_options, when turn_detection is a regular _TurnDetector (e.g. MultilingualModel from the turn-detector plugin), the code first correctly sets self._turn_detector = turn_detection (line 228), but then calls self.update_turn_detector(None) (line 232). Inside update_turn_detector at line 613, self._turn_detector = detector overwrites the just-assigned turn detector with None. This silently disables the text-based turn detector for all non-MultimodalTurnDetector instances passed through update_options.
Prompt for agents
The problem is that `update_turn_detector` unconditionally sets `self._turn_detector = detector` on line 613, which overwrites the value just set by `update_options` on line 228. The `update_turn_detector` method should only manage the streaming turn detection infrastructure (stream, channel, task) without touching `self._turn_detector`.
In `audio_recognition.py`, the `update_turn_detector` method at line 612-633 should not set `self._turn_detector = detector`. Instead, it should only manage `self._turn_detector_stream`, `self._turn_detection_ch`, and `self._turn_detection_atask`. Remove the line `self._turn_detector = detector` from `update_turn_detector`. The caller (`update_options` and `start`) is already responsible for setting `self._turn_detector` appropriately.
Was this helpful? React with 👍 or 👎 to provide feedback.
| # turn_detection=MultilingualModel(), | ||
| turn_detection=inference.MultimodalTurnDetector( | ||
| base_url="http://0.0.0.0:8080/v1", | ||
| ), | ||
| endpointing={ | ||
| "min_delay": 1.5, | ||
| "max_delay": 3.0, | ||
| }, |
There was a problem hiding this comment.
🔴 Example basic_agent.py hardcodes localhost URL for turn detection
The basic example hardcodes base_url="http://0.0.0.0:8080/v1" for the MultimodalTurnDetector. This overrides the production default DEFAULT_BASE_URL = "https://agent-gateway.livekit.cloud/v1" (detector.py:24). Since this is the primary example users reference, anyone copying this code will get connection failures unless they happen to run a local turn detection service on port 8080. This appears to be a debugging leftover—the previous code used MultilingualModel() with no special URL.
| # turn_detection=MultilingualModel(), | |
| turn_detection=inference.MultimodalTurnDetector( | |
| base_url="http://0.0.0.0:8080/v1", | |
| ), | |
| endpointing={ | |
| "min_delay": 1.5, | |
| "max_delay": 3.0, | |
| }, | |
| # turn_detection=MultilingualModel(), | |
| turn_detection=inference.MultimodalTurnDetector(), | |
| endpointing={ | |
| "min_delay": 1.5, | |
| "max_delay": 3.0, | |
| }, |
Was this helpful? React with 👍 or 👎 to provide feedback.
Requires livekit/protocol#1485
Add support for multimodal&multilingual EOT model (Inference only)