[AMD] feat: MiniMax M3 day-zero benchmark for MI325X#1748
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Claude finished @cquil11's task in 2m 46s —— View job Review: PR #1748 — MiniMax M3 MI325X day-zero benchmark
LGTM — no blocking issues found. Validation summary:
Note (non-blocking): The |
|
Claude finished @cquil11's task in 1m 10s —— View job Review: PR #1748 — MiniMax M3 MI325X day-zero benchmark
LGTM — no blocking issues found. Validation summary:
Note (non-blocking): The |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27483150384 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27483150384 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27483489611 |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 5ec3e11. Configure here.
| --no-enable-prefix-caching \ | ||
| --tool-call-parser minimax_m3 \ | ||
| --reasoning-parser minimax_m3 \ | ||
| --enable-auto-tool-choice > "$SERVER_LOG" 2>&1 & |
There was a problem hiding this comment.
Missing FP8 KV cache flag
Medium Severity
The new MI325X vllm serve invocation omits --kv-cache-dtype fp8 even though the PR recipe alignment, changelog, and the existing minimaxm3_fp8_mi355x.sh baseline all specify FP8 KV cache. Without it, vLLM may use a non-FP8 KV default, skewing memory headroom and throughput versus the official MI325X MXFP8 recipe and other MiniMax M3 entries.
Reviewed by Cursor Bugbot for commit 5ec3e11. Configure here.


Summary
minimaxm3-fp8-mi325x-vllmfor MiniMax M3 MXFP8 on MI325Xvllm/vllm-openai-rocm:minimax-m3and the official MI325X MXFP8 recipe shape/local-nvme/hf-hub-cache/and runtime compiler caches to container-local/tmp/dev/kfdand/dev/driexplicitly for ROCmRecipe Alignment
MiniMaxAI/MiniMax-M3-MXFP8vllm/vllm-openai-rocm:minimax-m3--block-size 128--attention-backend TRITON_ATTN--language-model-only--no-enable-prefix-caching--enforce-eagerworkaroundUpstream reference: https://recipes.vllm.ai/MiniMaxAI/MiniMax-M3?hardware=mi325x&variant=mxfp8
Validation
Representative throughput smoke: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27482912444
/local-nvme/hf-hub-cacheTargeted DPA accuracy validation: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27484953170
0.95750.9568Failure diagnosis:
--kv-cache-dtype fp8produced deterministic repetitive/cross-prompt generation corruption and 1-2% GSM8K. On the same node, image, weights, and layouts, removing only FP8 KV restored correct generation with and without CUDA graphs. The PR therefore leaves KV cache at vLLM's default dtype.Additional validation:
git diff --checkpass/enrootresolves to local NVMe on every healthy compute nodeXDG_CACHE_HOMEandTRITON_CACHE_DIRuse per-job local paths, avoiding stale NFS compiler artifactsFull PR sweep: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27485135330
Changelog Integrity
perf-changelog.yamlis current main byte-for-byte followed only by this PR's entry at the tail.