Skip to content

Multimodal EOU#4722

Open
chenghao-mou wants to merge 31 commits intomainfrom
feat/AGT-2520-multimodal-EOU
Open

Multimodal EOU#4722
chenghao-mou wants to merge 31 commits intomainfrom
feat/AGT-2520-multimodal-EOU

Conversation

@chenghao-mou
Copy link
Copy Markdown
Member

@chenghao-mou chenghao-mou commented Feb 5, 2026

Requires livekit/protocol#1485

Add support for multimodal&multilingual EOT model (Inference only)

@hsjun99
Copy link
Copy Markdown

hsjun99 commented Feb 25, 2026

@chenghao-mou Excited to see this! A couple of questions:

  1. Will the multimodal EOT model be publicly accessible via model weights or agent-gateway.livekit.cloud, or in some other way?
  2. Any rough timeline for when MultiModalTurnDetector gets fully wired up?

@chenghao-mou
Copy link
Copy Markdown
Member Author

@chenghao-mou Excited to see this! A couple of questions:

  1. Will the multimodal EOT model be publicly accessible via model weights or agent-gateway.livekit.cloud, or in some other way?
  2. Any rough timeline for when MultiModalTurnDetector gets fully wired up?

Thanks for your patience! We don't have an official decision or timeline yet, but hopefully I can get it ready within a month or two.

@chenghao-mou chenghao-mou marked this pull request as ready for review April 22, 2026 07:38
@chenghao-mou chenghao-mou requested a review from a team April 22, 2026 07:38
@chenghao-mou chenghao-mou changed the title [WIP] multimodal EOU Multimodal EOU Apr 22, 2026
base_url="http://0.0.0.0:8080/v1",
),
endpointing={
"min_delay": 1.5,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary change, will remove before merging

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +74 to +80
self._opts = TurnDetectorOptions(
sample_rate=sample_rate,
base_url=lk_base_url,
api_key="devkey",
api_secret="devsecret",
conn_options=conn_options,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Hardcoded "devkey"/"devsecret" credentials override resolved API key/secret

The constructor resolves lk_api_key and lk_api_secret from environment variables and validates them (lines 59-72), but then discards these resolved values and hardcodes api_key="devkey" and api_secret="devsecret" in TurnDetectorOptions. These hardcoded values are then used in stream.py:242 to create JWT access tokens via create_access_token(self._opts.api_key, self._opts.api_secret) (see livekit-agents/livekit/agents/inference/_utils.py:61-68). This means authentication will always fail against any real LiveKit server since the token is signed with the wrong credentials. This is clearly a debugging leftover.

Suggested change
self._opts = TurnDetectorOptions(
sample_rate=sample_rate,
base_url=lk_base_url,
api_key="devkey",
api_secret="devsecret",
conn_options=conn_options,
)
self._opts = TurnDetectorOptions(
sample_rate=sample_rate,
base_url=lk_base_url,
api_key=lk_api_key,
api_secret=lk_api_secret,
conn_options=conn_options,
)
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +612 to +613
def update_turn_detector(self, detector: MultimodalTurnDetector | None) -> None:
self._turn_detector = detector
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 update_turn_detector(None) destroys non-Multimodal turn detectors set by update_options

In update_options, when turn_detection is a regular _TurnDetector (e.g. MultilingualModel from the turn-detector plugin), the code first correctly sets self._turn_detector = turn_detection (line 228), but then calls self.update_turn_detector(None) (line 232). Inside update_turn_detector at line 613, self._turn_detector = detector overwrites the just-assigned turn detector with None. This silently disables the text-based turn detector for all non-MultimodalTurnDetector instances passed through update_options.

Prompt for agents
The problem is that `update_turn_detector` unconditionally sets `self._turn_detector = detector` on line 613, which overwrites the value just set by `update_options` on line 228. The `update_turn_detector` method should only manage the streaming turn detection infrastructure (stream, channel, task) without touching `self._turn_detector`. 

In `audio_recognition.py`, the `update_turn_detector` method at line 612-633 should not set `self._turn_detector = detector`. Instead, it should only manage `self._turn_detector_stream`, `self._turn_detection_ch`, and `self._turn_detection_atask`. Remove the line `self._turn_detector = detector` from `update_turn_detector`. The caller (`update_options` and `start`) is already responsible for setting `self._turn_detector` appropriately.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +100 to +107
# turn_detection=MultilingualModel(),
turn_detection=inference.MultimodalTurnDetector(
base_url="http://0.0.0.0:8080/v1",
),
endpointing={
"min_delay": 1.5,
"max_delay": 3.0,
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Example basic_agent.py hardcodes localhost URL for turn detection

The basic example hardcodes base_url="http://0.0.0.0:8080/v1" for the MultimodalTurnDetector. This overrides the production default DEFAULT_BASE_URL = "https://agent-gateway.livekit.cloud/v1" (detector.py:24). Since this is the primary example users reference, anyone copying this code will get connection failures unless they happen to run a local turn detection service on port 8080. This appears to be a debugging leftover—the previous code used MultilingualModel() with no special URL.

Suggested change
# turn_detection=MultilingualModel(),
turn_detection=inference.MultimodalTurnDetector(
base_url="http://0.0.0.0:8080/v1",
),
endpointing={
"min_delay": 1.5,
"max_delay": 3.0,
},
# turn_detection=MultilingualModel(),
turn_detection=inference.MultimodalTurnDetector(),
endpointing={
"min_delay": 1.5,
"max_delay": 3.0,
},
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants