fix(stream): decode SSE/hex audio for --stream and handle EPIPE by moxi000 · Pull Request #55 · MiniMax-AI/cli

moxi000 · 2026-04-09T04:45:48Z

Summary

Fixes #54.

speech synthesize --stream and music generate --stream now write decoded raw audio bytes to stdout, matching the documentation and making the existing | mpv --no-terminal - example actually work. Previously they wrote the upstream SSE stream verbatim (data: {"data":{"audio":"<hex>"}} envelopes), so any audio player saw garbage and bailed with "Failed to recognize file format".
A new shared helper src/utils/audio-stream.ts (pipeAudioSseToStdout) parses SSE frames, JSON-decodes each data: line, hex-decodes data.audio, and writes the bytes to stdout. Buffers across chunk boundaries so partial events are handled correctly.
The same helper installs an EPIPE handler on process.stdout, so consumers closing the pipe early (... | head, a player that exits, or mpv not being installed) result in a clean exit(0) instead of an unhandled 'error' event and Node stack trace.
Both commands/speech/synthesize.ts and commands/music/generate.ts had the same bug; both now use the shared helper.

Test plan

Built locally with bun run build and verified against the live API:

mmx speech synthesize --text "..." --stream > out.mp3 produces a valid MP3 (file reports ID3 v2.4 / MPEG ADTS layer III, 128 kbps, 32 kHz).
mmx speech synthesize --text "..." --stream | mpv --no-terminal - plays correctly end-to-end.
mmx speech synthesize --text "..." --stream | head -c 5 > /dev/null exits 0 with no EPIPE crash.
Music streaming path uses the same helper; would appreciate a maintainer sanity-check against a real music generate --stream call since I only smoke-tested speech.

Notes

The SSE parser tolerates [DONE] sentinels, comment/keepalive lines, and the optional single space after data: per the SSE spec.
No changes to flags, defaults, or non-stream code paths.

`speech synthesize --stream` and `music generate --stream` were writing the upstream Server-Sent Events stream verbatim to stdout — JSON envelopes containing hex-encoded audio chunks — so the documented `| mpv -` pipe example produced "Failed to recognize file format". The decode logic already exists for `--out`; the stream path just never used it. This adds a shared `pipeAudioSseToStdout` helper that parses the SSE frames, hex-decodes `data.audio`, and writes raw audio bytes to stdout, buffering across chunk boundaries. Both stream commands now use it. Also installs an EPIPE handler on stdout so consumers that close the pipe early (`... | head`, a player exiting, mpv missing) cause a clean exit instead of an unhandled `'error'` event crash with a Node stack trace. Fixes MiniMax-AI#54

Copilot

Pull request overview

This PR fixes the --stream behavior for speech/music commands so they output decoded raw audio bytes to stdout (instead of the upstream SSE/JSON/hex envelopes) and adds stdout EPIPE handling to avoid crashes when downstream pipes close early.

Changes:

Added a shared streaming helper to parse SSE frames, JSON-decode data: payloads, hex-decode data.audio, and write raw bytes to stdout.
Updated speech synthesize --stream and music generate --stream to use the shared helper.
Added stdout EPIPE handling so early pipe-close exits cleanly.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File	Description
`src/utils/audio-stream.ts`	New helper for decoding SSE hex audio to raw stdout + stdout EPIPE handling.
`src/commands/speech/synthesize.ts`	Routes `--stream` output through the new audio streaming helper.
`src/commands/music/generate.ts`	Routes `--stream` output through the new audio streaming helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T04:48:33Z

src/utils/audio-stream.ts

+export function installStdoutEpipeHandler(): void {
+  process.stdout.on('error', (err: NodeJS.ErrnoException) => {
+    if (err && err.code === 'EPIPE') {
+      process.exit(0);
+    }
+    throw err;
+  });


installStdoutEpipeHandler is described as “one-shot”, but it uses process.stdout.on('error', ...) and is called from pipeAudioSseToStdout, so repeated uses will register multiple listeners (risking MaxListenersExceededWarning in tests or long-lived processes). Consider guarding with a module-level flag and/or using once instead of on.

Copilot · 2026-04-09T04:48:33Z

src/utils/audio-stream.ts

+      // SSE events are separated by blank lines.
+      let sep: number;
+      while ((sep = buffer.indexOf('\n\n')) >= 0) {
+        const event = buffer.slice(0, sep);
+        buffer = buffer.slice(sep + 2);
+        writeEvent(event);
+      }


SSE framing here only looks for \n\n as the event separator. If the upstream stream uses CRLF (\r\n\r\n), events may never be split and writeEvent will receive strings containing \r, causing JSON.parse failures. Consider normalizing \r\n to \n before parsing and/or reusing the existing parseSSE helper in src/client/stream.ts to avoid duplicate (and more spec-complete) SSE parsing logic.

Copilot · 2026-04-09T04:48:33Z

src/utils/audio-stream.ts

+function writeEvent(event: string): void {
+  for (const rawLine of event.split('\n')) {
+    if (!rawLine.startsWith('data:')) continue;
+    // Per SSE spec, an optional single space after `data:` should be stripped.
+    const payload = rawLine.slice(5).replace(/^ /, '');
+    if (!payload || payload === '[DONE]') continue;
+
+    let parsed: { data?: { audio?: string } };
+    try {
+      parsed = JSON.parse(payload);
+    } catch {
+      // Non-JSON keepalive or comment — skip.
+      continue;


writeEvent parses each data: line independently. Per SSE spec, a single event can contain multiple data: lines that must be concatenated with \n; parsing line-by-line can drop valid payloads (e.g., pretty-printed JSON) and also fails if lines end with \r. Using the shared parseSSE generator (which already concatenates data lines) or trimming rawLine/payload would make this more robust.

Copilot · 2026-04-09T04:48:34Z

src/utils/audio-stream.ts

+    const hex = parsed?.data?.audio;
+    if (typeof hex === 'string' && hex.length > 0) {
+      process.stdout.write(Buffer.from(hex, 'hex'));
+    }


process.stdout.write(Buffer.from(hex, 'hex')) ignores backpressure. For long streams this can buffer large amounts in memory if stdout can’t keep up. Consider making the write path async and awaiting drain when stdout.write() returns false (similar to how file download handles backpressure elsewhere in the codebase).

Copilot · 2026-04-09T04:48:34Z

src/commands/speech/synthesize.ts

+      try {
+        await pipeAudioSseToStdout(res.body);
+      } catch (err) {
+        if (err instanceof Error && err.message === 'No response body') {
+          throw new CLIError('No response body', ExitCode.GENERAL);
+        }
+        throw err;


This error handling relies on matching err.message === 'No response body' from the helper. Using error-message strings for control flow is brittle; consider exporting a dedicated error class/type (e.g., NoResponseBodyError) from audio-stream.ts, or have pipeAudioSseToStdout throw CLIError directly so callers don’t need to pattern-match messages.

Copilot · 2026-04-09T04:48:34Z

src/commands/speech/synthesize.ts

    if (flags.stream) {
      const res = await request(config, { url, method: 'POST', body, stream: true });
-      const reader = res.body?.getReader();
-      if (!reader) throw new CLIError('No response body', ExitCode.GENERAL);
-      while (true) {
-        const { done, value } = await reader.read();
-        if (done) break;
-        process.stdout.write(value);
+      try {
+        await pipeAudioSseToStdout(res.body);
+      } catch (err) {


The --stream path now decodes SSE->JSON->hex->bytes, but there’s no automated test coverage for this behavior. Consider adding a test that mocks an SSE response (the test suite already has sseResponse in test/helpers/mock-server.ts) and asserts that the command writes the expected raw bytes to stdout and stops on [DONE].

Copilot · 2026-04-09T04:48:34Z

src/commands/music/generate.ts

+      try {
+        await pipeAudioSseToStdout(res.body);
+      } catch (err) {
+        if (err instanceof Error && err.message === 'No response body') {
+          throw new CLIError('No response body', ExitCode.GENERAL);
+        }
+        throw err;
      }


This error handling relies on matching err.message === 'No response body' from the helper. Using error-message strings for control flow is brittle; consider exporting a dedicated error class/type (e.g., NoResponseBodyError) from audio-stream.ts, or have pipeAudioSseToStdout throw CLIError directly so callers don’t need to pattern-match messages.

Copilot · 2026-04-09T04:48:35Z

src/commands/music/generate.ts

    if (flags.stream) {
      const res = await request(config, { url, method: 'POST', body, stream: true });
-      const reader = res.body?.getReader();
-      if (!reader) throw new CLIError('No response body', ExitCode.GENERAL);
-      while (true) {
-        const { done, value } = await reader.read();
-        if (done) break;
-        process.stdout.write(value);
+      try {
+        await pipeAudioSseToStdout(res.body);
+      } catch (err) {


The --stream path now decodes SSE->JSON->hex->bytes, but there’s no automated test coverage for this behavior. Consider adding a test that serves an SSE response via the existing mock server helper and verifies the generated bytes written to stdout for music generate --stream (including chunk-boundary buffering).

The MiniMax streaming TTS endpoint emits N incremental SSE events followed by a final summary event that re-sends the *entire* audio along with extra_info / metadata (this is what --out saves). Naively concatenating all events produced an MP3 with the full file appended after the streamed frames, leaving broken framing in the middle — ffprobe accepted the file but mpv reported 'mp3float: Header missing / Error decoding audio'. Discriminate the summary event by the presence of top-level 'extra_info' (every event carries 'trace_id', so it can't be used). Verified end-to-end: piping --stream into mpv now plays cleanly with no decoder warnings, and the resulting bytes contain only the incremental frames.

moxi000 · 2026-04-09T04:50:40Z

Pushed a follow-up fix (84920bc) after reproducing a remaining decode error reported by mpv:

[ffmpeg/audio] mp3float: Header missing
Error decoding audio.

Root cause: the streaming endpoint emits N incremental SSE events followed by a terminal "summary" event that re-sends the entire audio plus metadata (this is what --out saves). The first version of this PR concatenated all events, so the complete file was appended after the incremental frames — ffprobe accepted the result but mpv's decoder choked on the broken framing where the two segments meet.

Captured raw SSE structure for mmx speech synthesize --text "Stream me" --stream:

ev1 topKeys=['base_resp','data','trace_id']                dataKeys=['audio','ced','status'] audioLen=9306
ev2 topKeys=['base_resp','data','trace_id']                dataKeys=['audio','ced','status'] audioLen=24192
ev3 topKeys=['base_resp','data','trace_id']                dataKeys=['audio','ced','status'] audioLen=4608
ev4 topKeys=['base_resp','data','trace_id']                dataKeys=['audio','ced','status'] audioLen=0
ev5 topKeys=['base_resp','data','extra_info','trace_id']   dataKeys=['audio','ced','status'] audioLen=40296   ← summary

Note that trace_id is on every event, so it can't be used as the discriminator — only extra_info is unique to the summary event. The fix skips events with top-level extra_info.

Re-verified after rebuild:

mmx speech synthesize --text "Stream me" --stream | mpv - plays cleanly, no Header missing / Error decoding audio
mpv on the captured .mp3 file: no decoder warnings
... | head -c 5 still exits 0 (EPIPE handling intact)

Worth confirming with maintainers whether the same envelope shape applies to music generate --stream (the helper is shared, so the same fix applies if so).

- Reuse the existing parseSSE generator from src/client/stream.ts instead of hand-rolling SSE framing. parseSSE already handles multi-line data fields, comments, and trailing buffer flush correctly, eliminating duplicated and less spec-complete parsing. - Make installStdoutEpipeHandler idempotent via a module-level flag, so repeated calls don't register multiple listeners and trigger MaxListenersExceededWarning in tests or long-lived processes. - Export a dedicated NoResponseBodyError class instead of relying on err.message string matching for control flow in callers. - Honor stdout backpressure: await 'drain' when write() returns false, so long streams don't accumulate large buffers in memory. - pipeAudioSseToStdout now takes the Response directly (parseSSE owns the body reader), simplifying both call sites.

moxi000 · 2026-04-09T04:54:38Z

Pushed 070699a addressing the Copilot review:

feat: merge victor0602 contributions (file commands, vision --file-id, agent/CI arch, config export-schema) #2 + 🧊 Chore: Shelve File API (v0.4.0) — HTTP 404 issue #3 (SSE parsing) — Dropped the hand-rolled framer and now reuse the existing parseSSE generator from src/client/stream.ts, which already handles multi-line data: fields, comments, and trailing-buffer flush correctly. pipeAudioSseToStdout now takes the Response directly so parseSSE owns the body reader.
feat: remove zod, centralize config I/O, add self-update #1 (duplicate EPIPE listeners) — installStdoutEpipeHandler is now idempotent via a module-level flag, so repeated calls don't accumulate listeners or trigger MaxListenersExceededWarning.
feat: terminal UX improvements — Model hint, quota status & env detection #5 + feat(quota): redesign quota show as rich HUD with usage bar and i18n #7 (string-matched error) — Exported a dedicated NoResponseBodyError class. Both call sites now instanceof-check it instead of comparing err.message.
docs: add agent skill guide, simplify README #4 (backpressure) — process.stdout.write() is now awaited via 'drain' when it returns false, so long streams don't accumulate in memory.

Skipped for this round:

feat: add MiniMax brand true-color status bar #6 + fix: resolve 8 CLI quality issues found during audit #8 (mock-server tests) — Happy to add these if you'd like, but I'd appreciate guidance on the preferred shape (e.g., do you want a chunk-boundary case using sseResponse?). Let me know and I'll push a follow-up commit.

Re-verified end-to-end after rebuild:

mmx speech synthesize --text "Stream me" --stream | mpv - → plays cleanly, no decoder warnings
mpv on the captured .mp3 file → no warnings
mmx speech synthesize ... --stream | head -c 5 > /dev/null → exit 0, no EPIPE crash

ryanlee666 · 2026-04-09T10:44:24Z

Thanks for commit. Please solve conflicts and commit again.

Copilot AI review requested due to automatic review settings April 9, 2026 04:45

Copilot started reviewing on behalf of moxi000 April 9, 2026 04:46 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(stream): decode SSE/hex audio for --stream and handle EPIPE#55

fix(stream): decode SSE/hex audio for --stream and handle EPIPE#55
moxi000 wants to merge 3 commits intoMiniMax-AI:mainfrom
moxi000:fix/stream-raw-audio

moxi000 commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

moxi000 commented Apr 9, 2026

Uh oh!

moxi000 commented Apr 9, 2026

Uh oh!

ryanlee666 commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

moxi000 commented Apr 9, 2026

Summary

Test plan

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

moxi000 commented Apr 9, 2026

Uh oh!

moxi000 commented Apr 9, 2026

Uh oh!

ryanlee666 commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants