Skip to content

feat(room-io): add json_format option for timed transcription output#5472

Merged
longcw merged 3 commits intomainfrom
longc/json-text-output
Apr 23, 2026
Merged

feat(room-io): add json_format option for timed transcription output#5472
longcw merged 3 commits intomainfrom
longc/json-text-output

Conversation

@longcw
Copy link
Copy Markdown
Contributor

@longcw longcw commented Apr 17, 2026

Summary

  • Add json_format field to TextOutputOptions for the room text output chain
  • When enabled, each transcription chunk published on the lk.transcription datastream topic is a JSON object with text, and start_time/end_time if the chunk is a TimedString

needs livekit/protocol#1502

Adds `json_format` to `TextOutputOptions` so the transcription stream on
the `lk.transcription` topic emits each chunk as a JSON object with
`text` and optional `start_time`/`end_time` fields when the chunk is a
`TimedString`. This makes it easier for clients to consume TTS-aligned
timed transcripts.
@chenghao-mou chenghao-mou requested a review from a team April 17, 2026 02:19
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@longcw longcw marked this pull request as draft April 17, 2026 07:55
@longcw longcw marked this pull request as ready for review April 21, 2026 02:52
Copy link
Copy Markdown
Member

@chenghao-mou chenghao-mou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. one small question.

ts_pb.confidence = text.confidence
if utils.is_given(text.start_time_offset):
ts_pb.start_time_offset = text.start_time_offset
text = json.dumps(MessageToDict(ts_pb, preserving_proto_field_name=True)) + "\n"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use always_print_fields_with_no_presence so keys are always present?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps no, if the text is not a TimeString we may not want start_time or end_time be included in the dict.

stt=inference.STT("deepgram/nova-3"),
llm=inference.LLM("google/gemini-2.5-flash"),
tts=inference.TTS("cartesia/sonic-3"),
tts=cartesia.TTS(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does inference not support this? if not we should let the team know.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do have these options to enable the timestamp in tts inference (added in #4949), but it seems there is no timestamps returned when enabled. will forward to the team.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 7 additional findings in Devin Review.

Open in Devin Review

Comment on lines +366 to +368
self._out_ch.send_nowait(
TimedString(word, end_time=time.time() - self._start_wall_time)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 TimedString end_time does not subtract _paused_duration, producing incorrect timestamps after pause/resume

In _main_task, the newly added TimedString objects compute end_time as time.time() - self._start_wall_time, but fail to subtract self._paused_duration. The synchronization delay calculation on synchronizer.py:337 correctly uses elapsed = time.time() - self._start_wall_time - self._paused_duration, but the end_time written to the output TimedString at lines 332 and 367 omits this subtraction. When audio playback is paused and resumed (e.g., during barge-in via _SyncedAudioOutput.pause() at synchronizer.py:613-618), the reported end_time will be inflated by the total pause duration, producing incorrect timing data for downstream consumers like the JSON format transcription output.

Suggested change
self._out_ch.send_nowait(
TimedString(word, end_time=time.time() - self._start_wall_time)
)
self._out_ch.send_nowait(
TimedString(word, end_time=time.time() - self._start_wall_time - self._paused_duration)
)
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is intentional, we should include paused time in the timestamp from synchronizer, it's the actual sent time of a transcript.

Comment thread livekit-agents/livekit/agents/voice/transcription/synchronizer.py
@longcw longcw merged commit 489f1e8 into main Apr 23, 2026
25 checks passed
@longcw longcw deleted the longc/json-text-output branch April 23, 2026 06:42
Copy link
Copy Markdown
Contributor

🤖 This is an automated Claude Code routine created by @toubatbrian. Right now it is in experimentation stage.

This PR looks like a core runtime improvement (room_io text output chain) and is eligible for automatic porting. The automation will start porting this PR into agents-js automatically and will open a follow-up PR there shortly.


Generated by Claude Code

Copy link
Copy Markdown
Contributor

🤖 Port opened: livekit/agents-js#1305


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants