feat: add video SEF/subject-ref, image seed/size, speech clone/design, music-2.5+ by raylanlin · Pull Request #66 · MiniMax-AI/cli

raylanlin · 2026-04-09T14:08:54Z

Summary

This PR adds comprehensive new features across all generation modules, extensive bug fixes, and full test coverage.

18 files changed, +1197/-143 lines, 11 commits, 109 tests pass

New Features

🎬 Video Generation — `src/commands/video/generate.ts` (+54 lines)

`--last-frame <path-or-url>` — SEF (Start-End Frame) Interpolation

Generates a video that smoothly transitions between a start frame (from prompt) and an end frame (provided image).

mmx video generate \
  --prompt "A flower blooming in spring garden" \
  --last-frame ./end-frame.jpg

Automatically switches model to hailuo-02 (required for SEF mode)
Accepts local file path or URL
Note: Explicit --model hailuo-02 is still required if your plan doesn't include this model

`--subject-image <path-or-url>` — Subject-to-Video (S2V)

Keeps a character/object consistent throughout the generated video.

mmx video generate \
  --prompt "walking through a neon-lit cyberpunk city" \
  --subject-image ./character.png

Automatically switches model to s2v-01 (required for S2V mode)
File uploads to /v1/files/upload automatically
Note: Explicit --model s2v-01 is still required if your plan doesn't include this model

Model Override Priority

Explicit --model flag now takes priority over automatic model switching. If you specify --model hailuo-01 with --last-frame, it will try to use hailuo-01 (and fail if the API doesn't support it), rather than silently switching.

🖼️ Image Generation — `src/commands/image/generate.ts` (+34 lines)

`--seed <n>` — Reproducible Generation

mmx image generate --prompt "A sunset" --seed 42 --out a.png
mmx image generate --prompt "A sunset" --seed 42 --out b.png
# a.png and b.png are identical (MD5 match)

`--width <px>` / `--height <px>` — Custom Dimensions

mmx image generate --prompt "Wide banner" --width 2048 --height 512

Range: [512, 2048], must be multiple of 8
Overrides --aspect-ratio when both are set
Only effective for image-01 model

`--prompt-optimizer` — AI Prompt Enhancement

mmx image generate --prompt "A cat" --prompt-optimizer
# Prompt is sent through LLM enhancement before generation

`--aigc-watermark` — AI Content Watermark

mmx image generate --prompt "Logo design" --aigc-watermark
# Adds standard AI generation watermark per Chinese regulations

🗣️ TTS — New Commands

`speech clone` — Voice Cloning (`src/commands/speech/clone.ts`, 110 lines)

Clone a voice from an audio sample.

mmx speech clone --audio ./my-voice.mp3 --name "My Voice"
# → voice_id returned for use with speech synthesize

mmx speech clone --audio ./sample.wav --name "Carol" --quiet
# Output: vc_xxxxxxxx

Uploads audio to /v1/files/upload first, then calls voice_clone API
Supports mp3, wav, ogg formats
Returns voice_id for use with mmx speech synthesize --voice <voice_id>

`speech design` — Voice Design (`src/commands/speech/design.ts`, 70 lines)

Create a voice from a text description.

mmx speech design --prompt "Warm female voice, slightly raspy, suitable for audiobook narration"
# → voice_id returned for use with speech synthesize

mmx speech design --prompt "Deep male narrator voice" --gender male

Calls voice_design API with text description
Optional --gender hint (male/female)
Returns voice_id for use with mmx speech synthesize --voice <voice_id>

🎵 Music Generation — `src/commands/music/generate.ts` (+127 lines)

`music-2.5+` Model with Native Instrumental Support

# Instrumental music — no lyrics needed
mmx music generate --prompt "Cinematic orchestral, building tension" --instrumental

# With lyrics (music-2.5+)
mmx music generate \
  --prompt "Indie folk, melancholic" \
  --lyrics "[Verse]
Rain on the window pane
[Chorus]
I'm waiting for the sun to come"

`--lyrics-optimizer` — AI-Generated Lyrics

mmx music generate --prompt "Upbeat pop song about summer" --lyrics-optimizer --out summer.mp3
# Lyrics auto-generated from prompt, then used for music generation

`--output-format url` — Direct Download URL

mmx music generate --prompt "Lo-fi hip hop" --instrumental --output-format url
# → URL returned (24h expiry — download promptly!)

Expanded Lyric Tags (14 Total)

Tag	Usage
`[Intro]`	Song opening
`[Verse]`	Main narrative section
`[Pre Chorus]`	Build-up to chorus
`[Chorus]`	Hook/main melody
`[Interlude]`	Instrumental break
`[Bridge]`	Contrasting section
`[Outro]`	Song ending
`[Post Chorus]`	After chorus
`[Transition]`	Section bridge
`[Break]`	Rhythmic pause
`[Hook]`	Catchy repeated phrase
`[Build Up]`	Tension building
`[Inst]`	Instrumental section
`[Solo]`	Solo performance

⚠️ Tags must be clean — no descriptions inside brackets. [Verse: piano] will be sung as lyrics.

Bug Fixes

#	File	Bug	Fix
1	`src/client/endpoints.ts`	File upload endpoint `/v1/files` returned 404	Changed to `/v1/files/upload`
2	`src/output/audio.ts`	`extra_info` field names didn't match API response	`audio_length` → `music_duration`, `audio_size` → `music_size`, `audio_sample_rate` → `music_sample_rate`
3	`src/registry.ts`	Compile wrapper pointed to old `dist/minimax.mjs`	Updated to `dist/mmx.mjs`
4	`src/commands/music/generate.ts`	`--instrumental` with "无歌词" still sent lyrics field	Set `lyrics = undefined` when using "无歌词"
5	`src/commands/video/generate.ts`	Explicit `--model` was overwritten by auto-switch	Explicit flag now takes priority
6	`src/commands/music/generate.ts`	URL output went to stderr	Changed to `console.log` (stdout)
7	`src/commands/music/generate.ts`	Auto-truncation of prompt/lyrics hid API errors	Removed truncation, let API handle validation

Tests

Coverage: 109 pass / 0 fail across 25 test files

Test File	Tests	Status
`test/commands/image/generate.test.ts`	17 (expanded from 5)	✅
`test/commands/video/generate.test.ts`	10 (expanded from 2)	✅
`test/commands/speech/clone.test.ts`	7 (new)	✅
`test/commands/speech/design.test.ts`	7 (new)	✅
Other existing tests	68	✅

Test Highlights

Seed reproducibility: same seed + prompt → identical MD5 hash
Dimension validation: rejects non-multiples of 8, out-of-range values
Mutual exclusivity: --width + --aspect-ratio → warning
Model auto-switching: --last-frame → Hailuo-02, --subject-image → S2V-01
Explicit model override: --model hailuo-01 --last-frame → uses hailuo-01 (not auto-switched)
Voice clone flow: file upload → voice_clone API call
Voice design flow: text description → voice_design API call

Documentation

`skill/SKILL.md` (+160 lines)

Full parameter tables for all new features
Usage examples with real-world scenarios
Piping patterns for agent workflows
Lyrics structure tags规范 with warnings
Model compatibility matrix

`README.md` (+18 lines)

Updated feature list
New command examples for all modules
Quick start section with common workflows

`--help` (all commands)

music: 14 lyric tags list, "no descriptions in tags" warning, 3500 char limit
image: seed reproducibility note, width/height range [512,2048], 8-multiple requirement
video: SEF mode explanation, subject-image model requirements
speech synthesize: model characteristics, speed/volume/pitch ranges, supported formats
text chat: temperature range 0.0-1.0 (default 0.7), top-p default 0.95, tool OpenAI compatibility
search query: natural language support, pipeline examples

API Reference

All features verified against official MiniMax API documentation:

Commits

Commit	Type	Description
`281c6d2`	skill	Lyrics structure tags规范 and album song example
`4a04cc4`	feat	music-2.5+ support, lyrics-optimizer, output-format url
`3800064`	fix	Remove auto-truncation, let API handle length errors
`fb7ddd0`	fix	Clean up music-2.5+ instrumental, remove duplicate warnings
`41d762b`	fix	URL output to stdout, extra_info field names, compile wrapper
`7e7fb50`	feat	Video SEF/subject-ref, image seed/size/optimizer, speech clone/design
`847c9b4`	fix	File upload endpoint `/v1/files` → `/v1/files/upload`
`1e2462c`	test	Comprehensive tests + video model override fix
`f928de2`	docs	SKILL.md, README.md, --help sync for all features
`233ff94`	docs	Improve --help for music/image/video with detailed descriptions
`e98eb11`	docs	Improve --help for text chat/search/speech synthesize

raylanlin closed this Apr 10, 2026

raylanlin force-pushed the main branch from 8abfaed to f07c4b3 Compare April 10, 2026 08:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add video SEF/subject-ref, image seed/size, speech clone/design, music-2.5+#66

feat: add video SEF/subject-ref, image seed/size, speech clone/design, music-2.5+#66
raylanlin wants to merge 0 commit intoMiniMax-AI:mainfrom
raylanlin:main

raylanlin commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raylanlin commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Features

🎬 Video Generation — src/commands/video/generate.ts (+54 lines)

--last-frame <path-or-url> — SEF (Start-End Frame) Interpolation

--subject-image <path-or-url> — Subject-to-Video (S2V)

Model Override Priority

🖼️ Image Generation — src/commands/image/generate.ts (+34 lines)

--seed <n> — Reproducible Generation

--width <px> / --height <px> — Custom Dimensions

--prompt-optimizer — AI Prompt Enhancement

--aigc-watermark — AI Content Watermark

🗣️ TTS — New Commands

speech clone — Voice Cloning (src/commands/speech/clone.ts, 110 lines)

speech design — Voice Design (src/commands/speech/design.ts, 70 lines)

🎵 Music Generation — src/commands/music/generate.ts (+127 lines)

music-2.5+ Model with Native Instrumental Support

--lyrics-optimizer — AI-Generated Lyrics

--output-format url — Direct Download URL

Expanded Lyric Tags (14 Total)

Bug Fixes

Tests

Coverage: 109 pass / 0 fail across 25 test files

Test Highlights

Documentation

skill/SKILL.md (+160 lines)

README.md (+18 lines)

--help (all commands)

API Reference

Commits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

raylanlin commented Apr 9, 2026 •

edited

Loading

🎬 Video Generation — `src/commands/video/generate.ts` (+54 lines)

`--last-frame <path-or-url>` — SEF (Start-End Frame) Interpolation

`--subject-image <path-or-url>` — Subject-to-Video (S2V)

🖼️ Image Generation — `src/commands/image/generate.ts` (+34 lines)

`--seed <n>` — Reproducible Generation

`--width <px>` / `--height <px>` — Custom Dimensions

`--prompt-optimizer` — AI Prompt Enhancement

`--aigc-watermark` — AI Content Watermark

`speech clone` — Voice Cloning (`src/commands/speech/clone.ts`, 110 lines)

`speech design` — Voice Design (`src/commands/speech/design.ts`, 70 lines)

🎵 Music Generation — `src/commands/music/generate.ts` (+127 lines)

`music-2.5+` Model with Native Instrumental Support

`--lyrics-optimizer` — AI-Generated Lyrics

`--output-format url` — Direct Download URL

`skill/SKILL.md` (+160 lines)

`README.md` (+18 lines)

`--help` (all commands)