Skip to content

Minimax M2.5 MXFP4 benchmark for MI355x vLLM v0.19.1 (TP=1,2,4)#827

Open
functionstackx wants to merge 17 commits intomainfrom
claude/issue-826-20260301-0409
Open

Minimax M2.5 MXFP4 benchmark for MI355x vLLM v0.19.1 (TP=1,2,4)#827
functionstackx wants to merge 17 commits intomainfrom
claude/issue-826-20260301-0409

Conversation

@functionstackx
Copy link
Copy Markdown
Contributor

@functionstackx functionstackx commented Mar 1, 2026

Add MiniMax M2.5 MXFP4 benchmark config for MI355x with vLLM v0.17.1, now that AMD's MXFP4 checkpoint is out: https://huggingface.co/amd/MiniMax-M2.5-MXFP4

  • Model: amd/MiniMax-M2.5-MXFP4
  • Image: vllm/vllm-openai-rocm:v0.17.1
  • TP=2 and TP=4 (matching MiniMax M2.5 FP8 pattern)
  • VLLM_ROCM_USE_AITER=1, with AITER MoE fallback for TP>=4
  • Seq lengths: 1k1k, 1k8k, 8k1k (conc 4-64)

Closes #826

Generated with Claude Code

@functionstackx functionstackx requested a review from a team March 1, 2026 04:14
@functionstackx functionstackx changed the title [AMD] Add MiniMax M2.1 MXFP4 benchmark for MI355x vLLM (TP=2,4) [WIP] Add MiniMax M2.1 MXFP4 benchmark for MI355x vLLM (TP=2,4) Mar 1, 2026
@functionstackx
Copy link
Copy Markdown
Contributor Author

@claude can u comment out TP=4 and only do TP=2 for now since TP=4 is bugged vllm-project/vllm#35637

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Mar 1, 2026

Claude finished @functionstackx's task in 53s —— View job


Commented out TP=4 for MiniMax M2.1 MXFP4

Due to vllm-project/vllm#35637, TP=4 has been disabled. Only TP=2 will run.

  • Read current benchmark files
  • Comment out TP=4 in .github/configs/amd-master.yaml (all 3 seq-len configs)
  • Update perf-changelog.yaml with PR link and description
  • Commit and push changes

Changes made:

@functionstackx
Copy link
Copy Markdown
Contributor Author

@chunfangamd i am seeing good mxfp4 performance when comparing to FP8. just waiting for AMD to release mxfp4 version of MiniMax M2.5 now. currently AMD only have mxfp4 version of MiniMax M2.1.

image

@functionstackx functionstackx marked this pull request as draft March 1, 2026 23:23
@chunfangamd chunfangamd marked this pull request as ready for review March 4, 2026 09:09
Copy link
Copy Markdown
Collaborator

@chunfangamd chunfangamd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@chunfangamd chunfangamd changed the title [WIP] Add MiniMax M2.1 MXFP4 benchmark for MI355x vLLM (TP=2,4) Add MiniMax M2.1 MXFP4 benchmark for MI355x vLLM (TP=2,4) Mar 4, 2026
@chunfangamd chunfangamd enabled auto-merge (squash) March 4, 2026 09:11
@functionstackx functionstackx changed the title Add MiniMax M2.1 MXFP4 benchmark for MI355x vLLM (TP=2,4) [Do Not Merge] [WIP till AMD releases MXFP4 of MiniMax M2.5] Add MiniMax M2.1 MXFP4 benchmark for MI355x vLLM (TP=2,4) Mar 4, 2026
@functionstackx functionstackx changed the title [Do Not Merge] [WIP till AMD releases MXFP4 of MiniMax M2.5] Add MiniMax M2.1 MXFP4 benchmark for MI355x vLLM (TP=2,4) Add MiniMax M2.5 MXFP4 benchmark for MI355x vLLM v0.17.1 (TP=2,4) Mar 20, 2026
@functionstackx functionstackx force-pushed the claude/issue-826-20260301-0409 branch 2 times, most recently from bd10495 to e849d65 Compare March 20, 2026 01:50
@functionstackx functionstackx force-pushed the claude/issue-826-20260301-0409 branch 2 times, most recently from 86cc700 to b82116b Compare March 20, 2026 01:57
@functionstackx functionstackx force-pushed the claude/issue-826-20260301-0409 branch from b82116b to 7dd6063 Compare March 20, 2026 01:59
@seungrokj
Copy link
Copy Markdown
Collaborator

seungrokj commented Apr 20, 2026

@benenzhu vllm/vllm-openai-rocm:v0.19.1 is ready, can you submit a PR ? cc. @ajith-sirra-amd @functionstackx

@functionstackx
Copy link
Copy Markdown
Contributor Author

Feel free to ping me for quick review when u have a PR for minimax mxfp4 that is working and has passed pr validation and there is an vllm-projects/recipes update (if needed)

@benenzhu
Copy link
Copy Markdown
Collaborator

benenzhu commented Apr 20, 2026

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24647578553/job/72063351980?pr=827
image

@seungrokj Hi, I retriggerd the CI, seems the mia1-p01-g12 machine's huggingface dir is broken now. Can't load the model. Only this machine will fail

@functionstackx
Copy link
Copy Markdown
Contributor Author

@chunfangamd can u take a look at debugging this? Probably previous runs have corrupted the huggingface model cache from @benenzhu testing. The fix is just rm this minimax ckpt folder on the cluster

@benenzhu
Copy link
Copy Markdown
Collaborator

@chunfangamd can u take a look at debugging this? Probably previous runs have corrupted the huggingface model cache from @benenzhu testing. The fix is just rm this minimax ckpt folder on the cluster

image

@functionstackx Hi I think that's some gh runner problem in this machine, the docker command also fails.

@functionstackx
Copy link
Copy Markdown
Contributor Author

@chunfangamd can u take a look at debugging this? Probably previous runs have corrupted the huggingface model cache from @benenzhu testing. The fix is just rm this minimax ckpt folder on the cluster

image @functionstackx Hi I think that's some gh runner problem in this machine, the docker command also fails.

Both @chunfangamd and/or @cquil11 have cluster access to fix it. @chunfangamd can u help?

@functionstackx functionstackx changed the title [AMD] [DNM, still merge in 0.18 as trust_remote_code=True is not passed to quark] Add MiniMax M2.5 MXFP4 benchmark for MI355x vLLM v0.17.1 (TP=2,4) Minimax M2.5 MXFP4 benchmark for MI355x vLLM v0.19.1 (TP=2,4) Apr 20, 2026
@benenzhu benenzhu changed the title Minimax M2.5 MXFP4 benchmark for MI355x vLLM v0.19.1 (TP=2,4) Minimax M2.5 MXFP4 benchmark for MI355x vLLM v0.19.1 (TP=1,2,4) Apr 20, 2026
@benenzhu
Copy link
Copy Markdown
Collaborator

benenzhu commented Apr 20, 2026

@functionstackx Hi, I have disable the bad machine and the actions have passed. May you help review it?

  1. Same script as minimax fp8 and no additional flags. So no need update to vllm recipes.
  2. Accuracy result:
image

https://inferencex.semianalysis.com/inference?unofficialRun=24648550630&g_rundate=2026-04-17&g_runid=24588340987&g_model=MiniMax-M2.5

@functionstackx
Copy link
Copy Markdown
Contributor Author

@cquil11 or @Oseltamivir csn u review this pr

@functionstackx
Copy link
Copy Markdown
Contributor Author

I am the original author of this pull request so GitHub doesn't let me approve my own pr

@functionstackx
Copy link
Copy Markdown
Contributor Author

@benenzhu one issue I see before I go to sleep is that vllm-project/recipes doesn't have an recipe for this minimax M2.5 mxfp4. We wanna ensure that the entire ml community benefits from ur hard work. Can u please create an recipe for it in that documentation

@benenzhu
Copy link
Copy Markdown
Collaborator

@benenzhu one issue I see before I go to sleep is that vllm-project/recipes doesn't have an recipe for this minimax M2.5 mxfp4. We wanna ensure that the entire ml community benefits from ur hard work. Can u please create an recipe for it in that documentation

Yeah, thanks for the review. I will raise one PR for this.

FP4 uses a model path of amd/MiniMax-M2.5-MXFP4, so should let ppl know that our model's path.

Have a good night.

@benenzhu
Copy link
Copy Markdown
Collaborator

vllm-project/recipes#300
image

@functionstackx functionstackx disabled auto-merge April 20, 2026 10:22
@functionstackx
Copy link
Copy Markdown
Contributor Author

Thanks @benenzhu ! Can u please merge this Kimi K2.5 mxfp4 recipe? It's been stuck unmerged for 3 weeks.

vllm-project/recipes#296

@functionstackx
Copy link
Copy Markdown
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

mi355 fp4 minimax vllm single node

7 participants