Add minimax M3 MXFP8 MI355X vLLM EAGLE3 (related PR for upstreaming patch https://github.com/vllm-project/vllm/pull/45546) by functionstackx · Pull Request #1745 · SemiAnalysisAI/InferenceX

functionstackx · 2026-06-13T19:55:34Z

Summary

PR for upstreaming patch vllm-project/vllm#45546

Test PR to validate the MiniMax-M3 EAGLE3 fix on real MI355X hardware before the ROCm image is rebuilt. (Re-opened from #1744, which hit a GitHub Actions glitch.) Built on the EAGLE3 MI355X recipe (60d9910f).

Background

EAGLE3 MTP on MI355X failed engine init (RuntimeError: Model does not support EAGLE3 interface but aux_hidden_state_outputs was requested). Root cause: the MiniMax-M3 impl is platform-split, and the AMD model class (vllm/models/minimax_m3/amd/model.py) doesn't implement SupportsEagle3, while the NVIDIA one does — which is why B300/B200/H100/H200 pass and MI355X doesn't. Fix authored upstream as a draft on the fork: functionstackx/vllm#1.

What this PR does

The recipe applies that fix in-place to the installed vLLM inside the container, immediately before vllm serve:

Adds EagleModelMixin to the inner MiniMaxM3Model + aux-hidden-state emission in forward(), and SupportsEagle3 to MiniMaxM3SparseForCausalLM / MiniMaxM3SparseForConditionalGeneration — mirroring nvidia/model.py.
Idempotent (skips if already patched) and hard-fails if the installed amd/model.py has drifted from the expected base, so we never silently run unpatched and mislabel the result.
Verified locally: the image's amd/model.py (commit g4a560dd8d) is byte-identical to the patch base, all 5 anchors match exactly, the patched source ast.parses, and the embedded patcher applies + is idempotent on a sandboxed copy of the image file.

Serve config unchanged from the EAGLE3 recipe: --speculative-config '{"method":"eagle3","model":"Inferact/MiniMax-M3-EAGLE3","num_speculative_tokens":3}', TRITON_ATTN, --enforce-eager, block-size 128.

Expected outcome

Green → the upstream patch is correct and the held #1742 recipe just needs that patch in the image. This PR is a validation harness, not meant to merge as-is (the runtime monkey-patch should be replaced by a rebuilt image).

🤖 Generated with Claude Code

Note

Medium Risk
The recipe mutates installed vLLM model code at runtime; drift or patch failure can break jobs or skew results, though it fails loudly on mismatch. Scope is benchmark/CI only, not production serving.

Overview
Adds MI355X EAGLE3 speculative-decoding coverage for MiniMax-M3 MXFP8 vLLM: a new minimaxm3-fp8-mi355x-vllm-mtp CI config and minimaxm3_fp8_mi355x_mtp.sh, plus a perf-changelog entry.

The config pairs MiniMaxAI/MiniMax-M3-MXFP8 with draft Inferact/MiniMax-M3-EAGLE3 (spec-decoding: mtp, 3 tokens). Search space follows the non-MTP MI355X recipe but caps high concurrency and drops tp2-ep2, aligned with other MiniMax-M3 MTP entries.

The shell recipe serves with EAGLE3 --speculative-config, downloads the draft model, uses TRITON_ATTN (no CUDA-style drafter backend override), and runs benchmarks with --use-chat-template. Because the pinned ROCm image’s AMD minimax_m3 model lacks SupportsEagle3, the job patches installed vllm in-place before vllm serve (idempotent, aborts if anchors drift)—intended as a hardware validation harness until upstream/image picks up the fix.

^{Reviewed by Cursor Bugbot for commit 5d1ddae. Bugbot is set up for automated code reviews on this repo. Configure here.}

Adds the spec-decoding=mtp sibling of minimaxm3-fp8-mi355x-vllm: same MXFP8 target and ROCm serve shape (--block-size 128, FP8 KV cache, --attention-backend TRITON_ATTN, --enforce-eager, minimax_m3 parsers), plus the Inferact/MiniMax-M3-EAGLE3 draft head via --speculative-config (method eagle3, 3 speculative tokens). Unlike the CUDA recipes the drafter needs no attention_backend override — the FlashInfer page-128/MHA limitation that forced FLASH_ATTN on Blackwell is FlashInfer-specific; the whole server runs on TRITON_ATTN here, which serves the MHA draft fine. Benchmark prompts run through the chat template so acceptance reflects real text. Search space mirrors the non-MTP entry trimmed at the extreme-concurrency end (tp2-ep2 dropped), matching the b300/b200 MTP precedent. Launcher needs no change — launch_mi355x-amds.sh already resolves the _mtp script via SPEC_SUFFIX. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Test PR built on the EAGLE3 MI355X recipe (60d9910). The shipped vllm/vllm-openai-rocm:minimax-m3 image lacks SupportsEagle3 on the AMD MiniMax-M3 model, so method=eagle3 aborts engine init. Rather than wait for an image rebuild, the recipe applies the fix (functionstackx/vllm#1, ported from nvidia/model.py) in-place to the installed vllm before serving — adds EagleModelMixin + aux-hidden-state emission to the inner model and SupportsEagle3 to the two outer classes. The patch is idempotent and hard-fails if the installed amd/model.py drifted from the expected base (verified byte-identical to the image commit g4a560dd8d). Validates EAGLE3 + Inferact/MiniMax-M3-EAGLE3 on real MI355X hardware ahead of the upstream fix landing in the image. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…le3 test Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-13T19:55:41Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-13T19:55:41Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-13T19:56:08Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27477411944
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27477411944

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 149e11e. Configure here.}

cursor · 2026-06-13T19:57:22Z

+if [[ "$MODEL" != /* ]]; then
+  hf download "$MODEL"
+  hf download "$DRAFT_MODEL"
+fi


Draft download lacks NFS retry

Medium Severity

The recipe fetches the unstaged Inferact/MiniMax-M3-EAGLE3 draft with a single hf download into the shared NFS HF cache, while the MI355X launcher mounts that cache for MiniMax-M3 runs. Parallel matrix jobs can contend on hub lock files the same way as other MiniMax MTP recipes on network storage, but this script omits the retry loop those siblings use for the draft.

^{Reviewed by Cursor Bugbot for commit 149e11e. Configure here.}

github-actions · 2026-06-13T21:46:18Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27477412884
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27477412884

functionstackx · 2026-06-13T22:12:09Z

/reuse-sweep-run

github-actions · 2026-06-13T22:12:53Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27480644316
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27480644316

functionstackx and others added 4 commits June 13, 2026 15:48

perf-changelog: fill in PR link for minimaxm3-fp8-mi355x-vllm-mtp eag…

651824f

…le3 test Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

perf-changelog: reset PR link for mi355x eagle3 test (fresh PR)

0822923

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

functionstackx requested a review from a team June 13, 2026 19:55

functionstackx requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners June 13, 2026 19:55

github-project-automation Bot added this to InferenceMAX Board Jun 13, 2026

perf-changelog: fill in PR link for mi355x eagle3 test (#1745)

149e11e

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

functionstackx added the full-sweep-enabled label Jun 13, 2026

cursor Bot reviewed Jun 13, 2026

View reviewed changes

functionstackx changed the title ~~[Klaud Cold][AI draft test] minimaxm3-fp8-mi355x-vllm-mtp: runtime-patch EAGLE3 to validate on MI355X~~ [Klaud Cold] minimaxm3-fp8-mi355x-vllm-mtp: runtime-patch EAGLE3 to validate on MI355X Jun 13, 2026

functionstackx changed the title ~~[Klaud Cold] minimaxm3-fp8-mi355x-vllm-mtp: runtime-patch EAGLE3 to validate on MI355X~~ Add minimax M3 MXFP8 MI355X vLLM EAGLE3 (related PR for upstreaming patch https://github.com/vllm-project/vllm/pull/45546) Jun 13, 2026

Merge branch 'main' into feat/minimax-m3-mi355-eagle3

5d1ddae

functionstackx merged commit cf3ad37 into main Jun 13, 2026
4 of 6 checks passed

functionstackx deleted the feat/minimax-m3-mi355-eagle3 branch June 13, 2026 22:12

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add minimax M3 MXFP8 MI355X vLLM EAGLE3 (related PR for upstreaming patch https://github.com/vllm-project/vllm/pull/45546)#1745

Add minimax M3 MXFP8 MI355X vLLM EAGLE3 (related PR for upstreaming patch https://github.com/vllm-project/vllm/pull/45546)#1745
functionstackx merged 6 commits into
mainfrom
feat/minimax-m3-mi355-eagle3

functionstackx commented Jun 13, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

functionstackx commented Jun 13, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented Jun 13, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

What this PR does

Expected outcome

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 13, 2026

Choose a reason for hiding this comment

Draft download lacks NFS retry

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

functionstackx commented Jun 13, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

functionstackx commented Jun 13, 2026 •

edited by cursor Bot

Loading