Multimodal EOU by chenghao-mou · Pull Request #4722 · livekit/agents

chenghao-mou · 2026-02-05T15:09:40Z

Add support for multimodal&multilingual EOT model (Inference only)

hsjun99 · 2026-02-25T01:00:31Z

@chenghao-mou Excited to see this! A couple of questions:

Will the multimodal EOT model be publicly accessible via model weights or agent-gateway.livekit.cloud, or in some other way?
Any rough timeline for when MultiModalTurnDetector gets fully wired up?

chenghao-mou · 2026-02-25T10:07:07Z

@chenghao-mou Excited to see this! A couple of questions:

Will the multimodal EOT model be publicly accessible via model weights or agent-gateway.livekit.cloud, or in some other way?

Any rough timeline for when MultiModalTurnDetector gets fully wired up?

Thanks for your patience! We don't have an official decision or timeline yet, but hopefully I can get it ready within a month or two.

chenghao-mou · 2026-04-22T07:40:46Z

+                base_url="http://0.0.0.0:8080/v1",
+            ),
+            endpointing={
+                "min_delay": 1.5,


Temporary change, will remove before merging

devin-ai-integration

Devin Review found 3 potential issues.

View 5 additional findings in Devin Review.

devin-ai-integration · 2026-04-22T07:48:41Z

+        self._opts = TurnDetectorOptions(
+            sample_rate=sample_rate,
+            base_url=lk_base_url,
+            api_key="devkey",
+            api_secret="devsecret",
+            conn_options=conn_options,
+        )


🔴 Hardcoded "devkey"/"devsecret" credentials override resolved API key/secret

The constructor resolves lk_api_key and lk_api_secret from environment variables and validates them (lines 59-72), but then discards these resolved values and hardcodes api_key="devkey" and api_secret="devsecret" in TurnDetectorOptions. These hardcoded values are then used in stream.py:242 to create JWT access tokens via create_access_token(self._opts.api_key, self._opts.api_secret) (see livekit-agents/livekit/agents/inference/_utils.py:61-68). This means authentication will always fail against any real LiveKit server since the token is signed with the wrong credentials. This is clearly a debugging leftover.

Suggested change

self._opts = TurnDetectorOptions(

sample_rate=sample_rate,

base_url=lk_base_url,

api_key="devkey",

api_secret="devsecret",

conn_options=conn_options,

)

self._opts = TurnDetectorOptions(

sample_rate=sample_rate,

base_url=lk_base_url,

api_key=lk_api_key,

api_secret=lk_api_secret,

conn_options=conn_options,

)

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-22T07:48:42Z

+    def update_turn_detector(self, detector: MultimodalTurnDetector | None) -> None:
+        self._turn_detector = detector


🔴 update_turn_detector(None) destroys non-Multimodal turn detectors set by update_options

In update_options, when turn_detection is a regular _TurnDetector (e.g. MultilingualModel from the turn-detector plugin), the code first correctly sets self._turn_detector = turn_detection (line 228), but then calls self.update_turn_detector(None) (line 232). Inside update_turn_detector at line 613, self._turn_detector = detector overwrites the just-assigned turn detector with None. This silently disables the text-based turn detector for all non-MultimodalTurnDetector instances passed through update_options.

Prompt for agents

The problem is that `update_turn_detector` unconditionally sets `self._turn_detector = detector` on line 613, which overwrites the value just set by `update_options` on line 228. The `update_turn_detector` method should only manage the streaming turn detection infrastructure (stream, channel, task) without touching `self._turn_detector`. In `audio_recognition.py`, the `update_turn_detector` method at line 612-633 should not set `self._turn_detector = detector`. Instead, it should only manage `self._turn_detector_stream`, `self._turn_detection_ch`, and `self._turn_detection_atask`. Remove the line `self._turn_detector = detector` from `update_turn_detector`. The caller (`update_options` and `start`) is already responsible for setting `self._turn_detector` appropriately.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-22T07:48:44Z

+            # turn_detection=MultilingualModel(),
+            turn_detection=inference.MultimodalTurnDetector(
+                base_url="http://0.0.0.0:8080/v1",
+            ),
+            endpointing={
+                "min_delay": 1.5,
+                "max_delay": 3.0,
+            },


🔴 Example basic_agent.py hardcodes localhost URL for turn detection

The basic example hardcodes base_url="http://0.0.0.0:8080/v1" for the MultimodalTurnDetector. This overrides the production default DEFAULT_BASE_URL = "https://agent-gateway.livekit.cloud/v1" (detector.py:24). Since this is the primary example users reference, anyone copying this code will get connection failures unless they happen to run a local turn detection service on port 8080. This appears to be a debugging leftover—the previous code used MultilingualModel() with no special URL.

Suggested change

# turn_detection=MultilingualModel(),

turn_detection=inference.MultimodalTurnDetector(

base_url="http://0.0.0.0:8080/v1",

),

endpointing={

"min_delay": 1.5,

"max_delay": 3.0,

},

# turn_detection=MultilingualModel(),

turn_detection=inference.MultimodalTurnDetector(),

endpointing={

"min_delay": 1.5,

"max_delay": 3.0,

},

Was this helpful? React with 👍 or 👎 to provide feedback.

add interface draft

87068d5

chenghao-mou added 25 commits March 6, 2026 10:47

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

e0d5ec1

draft

8eebccc

fix type issues

f92fbc0

refactor stream to support turn detector protocol

d1086ff

minor fixes

0a02bb1

minor fixes

168d0d7

WIP: use only ws stream

277db6e

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

03c0e2e

fix uv.lock bad merge

56b4796

WIP: more refactoring

be9a550

fix mypy

601229c

remove temp url

c4d92f8

disable turn detection when agent is still speaking

e963d85

minor refactoring

c529d79

fix type issues

09baed8

wip

3830638

clean up encoder

f214aa0

wip

c922f44

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

f94a0dd

update protos

604bfdc

minor fixes

f9ec64a

address comments

ddbf594

add text fallback

d465564

add text fallback

6e7d6bf

fix threshold

200d634

chenghao-mou marked this pull request as ready for review April 22, 2026 07:38

chenghao-mou requested a review from a team April 22, 2026 07:38

chenghao-mou changed the title ~~[WIP] multimodal EOU~~ Multimodal EOU Apr 22, 2026

chenghao-mou commented Apr 22, 2026

View reviewed changes

remove temp deps

dbd11b0

devin-ai-integration Bot reviewed Apr 22, 2026

View reviewed changes

chenghao-mou added 4 commits April 22, 2026 16:40

support realtime model

60004dd

fix type issues

6de53f4

add id in logs

4ed8a82

use threaded audio encoder

0db57ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal EOU#4722

Multimodal EOU#4722
chenghao-mou wants to merge 31 commits intomainfrom
feat/AGT-2520-multimodal-EOU

chenghao-mou commented Feb 5, 2026 •

edited

Loading

Uh oh!

hsjun99 commented Feb 25, 2026

Uh oh!

chenghao-mou commented Feb 25, 2026

Uh oh!

chenghao-mou Apr 22, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 22, 2026

Uh oh!

devin-ai-integration Bot Apr 22, 2026

Uh oh!

devin-ai-integration Bot Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		def update_turn_detector(self, detector: MultimodalTurnDetector \| None) -> None:
		self._turn_detector = detector

Conversation

chenghao-mou commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsjun99 commented Feb 25, 2026

Uh oh!

chenghao-mou commented Feb 25, 2026

Uh oh!

chenghao-mou Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenghao-mou commented Feb 5, 2026 •

edited

Loading