[Klaud Cold]minimaxm3-fp8-mi300x-vllm-mtp: day-zero MiniMax-M3 EAGLE3 (MTP) MI300X recipe by functionstackx · Pull Request #1749 · SemiAnalysisAI/InferenceX

functionstackx · 2026-06-14T04:13:08Z

Summary

Adds the EAGLE3 speculative-decoding (spec-decoding: mtp) sibling of minimaxm3-fp8-mi300x-vllm (#1746): MiniMax-M3 MXFP8 on MI300X (gfx942) single-node vLLM (ROCm), pairing MiniMaxAI/MiniMax-M3-MXFP8 with the Inferact/MiniMax-M3-EAGLE3 draft head. Based on the MI300X non-MTP recipe + the (merged) MI355X MTP recipe (#1745).

New benchmark script

benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi300x_mtp.sh, mirroring minimaxm3_fp8_mi300x.sh:

Keeps the MI300X serve deltas vs MI355X: the default BF16 KV cache (gfx942 has no calibrated ROCm FP8 attention scales — fallback scale 1.0 corrupts accuracy) and --no-enable-prefix-caching. Same --block-size 128, --language-model-only, --attention-backend TRITON_ATTN, --enforce-eager, minimax_m3 parsers.
Adds the EAGLE3 draft: --speculative-config '{"method":"eagle3","model":"Inferact/MiniMax-M3-EAGLE3","num_speculative_tokens":3}', draft download, --use-chat-template for realistic acceptance.
Carries the same in-place EAGLE3 patch as the MI355X MTP recipe (validated green there): the shipped vllm/vllm-openai-rocm:minimax-m3 image's AMD MiniMax-M3 class lacks SupportsEagle3, so the recipe patches the installed amd/model.py before serving ([AI generated draft] minimax_m3(amd): implement SupportsEagle3 for EAGLE3 spec decoding on ROCm functionstackx/vllm#1, upstreamed as [Bug Fix] [MiniMax-M3] Implement EAGLE3 support on the AMD MiniMax M3 vllm-project/vllm#45546). Idempotent; hard-fails on base drift. The image's amd/model.py is shared across MI300X/MI355X (same ROCm image), so the patch anchors match identically — dry-run verified locally.

Config (`amd-master.yaml`)

minimaxm3-fp8-mi300x-vllm-mtp, same image and mi300x runner. TP8-only search space (gfx942 192 GB is memory-tight, like H100), with TP8 latency rows started at conc 1 (matching the H100/MI355X MTP recipes):

1k1k: TP8 (1–128), TP8+EP8 (256)
8k1k: TP8 (1–64), TP8+EP8 (128–256)

Launcher fix

launch_mi300x-amds.sh hardcoded _mi300x.sh with no SPEC_SUFFIX — an mtp config would silently run the non-MTP script. Added SPEC_SUFFIX (same fix applied to the H100 launchers), so mtp routes to minimaxm3_fp8_mi300x_mtp.sh.

Validation

generate_sweep_configs.py test-config → 18 configs, all spec-decoding=mtp on mi300x, min conc 1, scenario-trimmed max-model-len 2304 / 9472.
bash -n passes on the script + launcher; embedded patch dry-run applies cleanly to the image's amd/model.py and ast.parses; launcher routing simulated (mtp → _mtp.sh).

Like #1745, this is a validation harness (runtime monkey-patch); once the upstream fix is in a rebuilt ROCm image, the in-place patch idempotently no-ops.

🤖 Generated with Claude Code

Note

Medium Risk
Benchmark-only changes, but the recipe mutates installed vLLM model code at runtime; drift in the image could fail jobs or mis-route MTP sweeps if launcher routing regresses.

Overview
Adds MiniMax-M3 MXFP8 EAGLE3 speculative decoding on MI300X as the spec-decoding: mtp sibling of the existing non-MTP MI300X recipe, pairing MiniMaxAI/MiniMax-M3-MXFP8 with Inferact/MiniMax-M3-EAGLE3 (3 speculative tokens).

A new sweep config minimaxm3-fp8-mi300x-vllm-mtp in amd-master.yaml keeps a TP8-only search space (memory-tight gfx942), with TP8 latency sweeps starting at conc 1 and high-concurrency TP8+EP8 rows for 1k1k and 8k1k.

The new script minimaxm3_fp8_mi300x_mtp.sh serves with EAGLE3 --speculative-config, downloads the draft model, uses chat-template prompts for realistic acceptance, and retains MI300X-specific choices (BF16 KV, TRITON_ATTN, no prefix caching). Because the shipped ROCm image’s AMD MiniMax-M3 model lacks SupportsEagle3, the recipe patches installed amd/model.py in-place before vllm serve (idempotent, aborts on anchor drift).

launch_mi300x-amds.sh now sets SPEC_SUFFIX when SPEC_DECODING=mtp so MTP configs run *_mtp.sh instead of silently using the base script. perf-changelog.yaml documents the new config key.

^{Reviewed by Cursor Bugbot for commit 36265ae. Bugbot is set up for automated code reviews on this repo. Configure here.}

Adds the spec-decoding=mtp sibling of minimaxm3-fp8-mi300x-vllm, based on the MI300X non-MTP recipe + the MI355X MTP recipe. Keeps the MI300X serve shape (BF16 KV cache — gfx942 lacks calibrated ROCm FP8 attention scales — plus --no-enable-prefix-caching, TRITON_ATTN, --enforce-eager, minimax_m3 parsers) and adds the Inferact/MiniMax-M3-EAGLE3 draft via --speculative-config (method eagle3, 3 spec tokens) + chat-template prompts. Carries the same in-place EAGLE3 patch as the MI355X MTP recipe: the shipped ROCm image's AMD MiniMax-M3 model lacks SupportsEagle3, so the recipe patches the installed amd/model.py before serving (functionstackx/vllm#1, upstream vllm-project/vllm#45546; validated green on MI355X). Idempotent; hard-fails on base drift. TP8-only search space (gfx942 192 GB is memory-tight, like H100), TP8 latency rows started at conc 1, matching the H100/MI355X MTP recipes. Also adds SPEC_SUFFIX to launch_mi300x-amds.sh so spec-decoding=mtp routes to the _mtp script (the launcher hardcoded _mi300x.sh). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-14T04:13:16Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-14T04:13:48Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27487999086
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27487999086

github-actions · 2026-06-14T05:44:49Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27487999691
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27487999691

functionstackx · 2026-06-14T05:49:04Z

/reuse-sweep-run

github-actions · 2026-06-14T05:49:43Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27489883815
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27489883815

… (MTP) MI300X recipe (#1749) * minimaxm3-fp8-mi300x-vllm-mtp: day-zero MiniMax-M3 EAGLE3 MI300X recipe Adds the spec-decoding=mtp sibling of minimaxm3-fp8-mi300x-vllm, based on the MI300X non-MTP recipe + the MI355X MTP recipe. Keeps the MI300X serve shape (BF16 KV cache — gfx942 lacks calibrated ROCm FP8 attention scales — plus --no-enable-prefix-caching, TRITON_ATTN, --enforce-eager, minimax_m3 parsers) and adds the Inferact/MiniMax-M3-EAGLE3 draft via --speculative-config (method eagle3, 3 spec tokens) + chat-template prompts. Carries the same in-place EAGLE3 patch as the MI355X MTP recipe: the shipped ROCm image's AMD MiniMax-M3 model lacks SupportsEagle3, so the recipe patches the installed amd/model.py before serving (functionstackx/vllm#1, upstream vllm-project/vllm#45546; validated green on MI355X). Idempotent; hard-fails on base drift. TP8-only search space (gfx942 192 GB is memory-tight, like H100), TP8 latency rows started at conc 1, matching the H100/MI355X MTP recipes. Also adds SPEC_SUFFIX to launch_mi300x-amds.sh so spec-decoding=mtp routes to the _mtp script (the launcher hardcoded _mi300x.sh). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * perf-changelog: fill in PR link for minimaxm3-fp8-mi300x-vllm-mtp (#1749) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

functionstackx requested a review from a team June 14, 2026 04:13

functionstackx requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners June 14, 2026 04:13

github-project-automation Bot added this to InferenceMAX Board Jun 14, 2026

perf-changelog: fill in PR link for minimaxm3-fp8-mi300x-vllm-mtp (#1749

d367a73

) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

functionstackx added the full-sweep-enabled label Jun 14, 2026

functionstackx changed the title ~~[Klaud Cold][AI draft test] minimaxm3-fp8-mi300x-vllm-mtp: day-zero MiniMax-M3 EAGLE3 (MTP) MI300X recipe~~ [Klaud Cold]minimaxm3-fp8-mi300x-vllm-mtp: day-zero MiniMax-M3 EAGLE3 (MTP) MI300X recipe Jun 14, 2026

Merge remote-tracking branch 'origin/main' into pr-1749-reuse

36265ae

functionstackx merged commit f274a81 into main Jun 14, 2026
4 of 6 checks passed

functionstackx deleted the feat/minimax-m3-mi300-mtp-dayzero branch June 14, 2026 05:49

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Klaud Cold]minimaxm3-fp8-mi300x-vllm-mtp: day-zero MiniMax-M3 EAGLE3 (MTP) MI300X recipe#1749

[Klaud Cold]minimaxm3-fp8-mi300x-vllm-mtp: day-zero MiniMax-M3 EAGLE3 (MTP) MI300X recipe#1749
functionstackx merged 3 commits into
mainfrom
feat/minimax-m3-mi300-mtp-dayzero

functionstackx commented Jun 14, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

functionstackx commented Jun 14, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented Jun 14, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New benchmark script

Config (amd-master.yaml)

Launcher fix

Validation

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

functionstackx commented Jun 14, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

functionstackx commented Jun 14, 2026 •

edited by cursor Bot

Loading

Config (`amd-master.yaml`)