[DO NOT MERGE] [Klaud Cold] experimental: MiniMax-M3 MI325X conc 4/8 — apply vllm#45639 (AITER AR + Gemma-RMS fusion) by functionstackx · Pull Request #1772 · SemiAnalysisAI/InferenceX

functionstackx · 2026-06-15T04:45:45Z

[DO NOT MERGE] — experimental hardware validation of an upstream WIP vLLM PR. Not for merge.

What

MI325X (gfx942) counterpart of #1770. Benchmarks MiniMax-M3 MXFP8 on MI325X, conc 4 and 8 (TP8), with vllm-project/vllm#45639 (AITER fused all-reduce + Gemma-RMSNorm) applied in-place to the shipped vllm/vllm-openai-rocm:minimax-m3 image.

How

benchmarks/single_node/fixed_seq_len/minimaxm3arf_fp8_mi325x.sh:

applies the vendored #45639 diff with patch -p1 (installs patch via apt if missing): idempotent (proceeds if already applied), and hard-fails (exit 1) if it neither applies cleanly nor is already applied (image drifted from m3_release).
serves with the fusion enabled: VLLM_ROCM_USE_AITER=1, --compilation-config '{"custom_ops": ["-minimax_gemma_rms_norm"], "pass_config": {"fuse_allreduce_rms": true}}'. BF16 KV (gfx942 has no calibrated FP8 attention scales).
otherwise mirrors minimaxm3_fp8_mi325x.sh; carries a PROFILE=1 --profiler-config gate so the companion profiling PR reuses the recipe.

amd-master.yaml minimaxm3arf-fp8-mi325x-vllm — distinct model-prefix (minimaxm3arf) routes to the new recipe; conc 4 & 8, TP8. Prod recipe/config untouched.

Validation

bash -n ✓; YAML parses ✓; test-config → 2 jobs (TP8, conc 4 + 8, mi325x, minimaxm3arf_1k1k) ✓.

🤖 Generated with Claude Code

Note

Low Risk
Benchmark-only experimental config and recipe; no changes to production MiniMax-M3 paths. Runtime in-container patching is isolated to marked experimental jobs.

Overview
Adds an experimental, do-not-merge MI325X smoke path to validate vllm#45639 (AITER fused all-reduce + Gemma-RMSNorm for MiniMax-M3) on real gfx942 hardware before the upstream change ships in a rebuilt image.

A new minimaxm3arf-fp8-mi325x-vllm entry in amd-master.yaml uses model-prefix minimaxm3arf so jobs run minimaxm3arf_fp8_mi325x.sh instead of the production MI325X MiniMax-M3 recipe. The sweep is narrow: TP8 at conc 4/8 for 1k1k perf, plus an 8k1k conc-16 row so lm-eval can run under the existing eval policy.

The recipe vendors and applies the #45639 diff to the installed vLLM inside the minimax-m3 container (idempotent apply, hard-fail if the image no longer matches the patch base). Serving then turns on VLLM_ROCM_USE_AITER=1, fuse_allreduce_rms, and disables the minimax_gemma_rms_norm custom op so the fusion pass can match IR. DEBUG logging plus a post-startup grep “fusion-pass verdict” block records whether the pass registered and replaced patterns.

perf-changelog.yaml documents the new config key. Production minimaxm3* configs are unchanged.

^{Reviewed by Cursor Bugbot for commit eb422fe. Bugbot is set up for automated code reviews on this repo. Configure here.}

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… applied MI325X (gfx942) counterpart of #1770. Validate vllm-project/vllm#45639 ("[ROCm][M3] Enable AITER AR + Gemma-RMS fusion for MiniMax-M3") on MI325X before an image rebuild, by applying the PR diff in-place to the shipped minimax-m3 image. - patches/vllm-45639-aiter-ar-gemma-rms.diff: vendored PR diff. - minimaxm3arf_fp8_mi325x.sh: applies the diff (idempotent; HARD-FAILS if it neither applies nor is already applied), serves with VLLM_ROCM_USE_AITER=1 + --compilation-config (custom_ops -minimax_gemma_rms_norm, pass_config.fuse_allreduce_rms). BF16 KV (gfx942). Includes a PROFILE=1 --profiler-config gate so the same recipe serves the companion profiling PR. - amd-master.yaml minimaxm3arf-fp8-mi325x-vllm: model-prefix minimaxm3arf routes to the new recipe; conc 4 and 8, TP8. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

cursor · 2026-06-15T04:50:27Z

+    echo "[vllm#45639] already applied to $VLLM_SP/vllm"
+elif ( cd "$VLLM_SP" && patch -p1 --dry-run < "$PATCH_FILE" >/dev/null 2>&1 ); then
+    ( cd "$VLLM_SP" && patch -p1 < "$PATCH_FILE" )
+    echo "[vllm#45639] applied to $VLLM_SP/vllm"


Patch apply errors ignored

High Severity

After a successful patch --dry-run, the script runs patch -p1 without checking its exit status and always prints that vllm#45639 was applied. A failed apply still starts vllm serve, so the job can finish while benchmarking an image that never received the fusion patch.

^{Reviewed by Cursor Bugbot for commit 65be443. Configure here.}

github-actions · 2026-06-15T05:12:00Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27524660670
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27524660670

…ity w/ #1770) Mirror the #1770 updates onto the MI325X #45639 sweep: - VLLM_LOGGING_LEVEL=DEBUG + a post-server-ready grep of the server log into the job log, printing the AITER AR+RMS fusion verdict (registration bail warnings => 0 patterns; "Replaced N patterns" / "fusion pass matches" => match count). - 8k1k conc-16 TP8 row so mark_eval_entries marks an lm-eval (validate #45639 fused-kernel correctness on gfx942); perf points stay at conc 4/8. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit eb422fe. Configure here.}

cursor · 2026-06-15T05:24:36Z

+    - isl: 8192
+      osl: 1024
+      search-space:
+      - { tp: 8, conc-list: [ 16 ] }


conc-list breaks full-sweep

Medium Severity

The new single-node fixed-seq-len rows use only conc-list, but generate_full_sweep reads conc-start and conc-end for single-node entries and will raise KeyError when it hits this config during an unfiltered amd-master full sweep.

^{Reviewed by Cursor Bugbot for commit eb422fe. Configure here.}

github-actions · 2026-06-15T05:48:28Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27525819313
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27525819313

functionstackx requested a review from a team June 15, 2026 04:45

functionstackx requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners June 15, 2026 04:45

github-project-automation Bot added this to InferenceMAX Board Jun 15, 2026

functionstackx added the full-sweep-enabled label Jun 15, 2026

functionstackx added a commit that referenced this pull request Jun 15, 2026

perf-changelog: fill in PR link for minimaxm3arf-fp8-mi325x-vllm (#1772)

aee12aa

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

functionstackx and others added 2 commits June 15, 2026 00:47

perf-changelog: fill in PR link for minimaxm3arf-fp8-mi325x-vllm (#1772)

65be443

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

functionstackx force-pushed the experimental/minimaxm3-mi325-arfusion-45639-dnm branch from aee12aa to 65be443 Compare June 15, 2026 04:47

functionstackx mentioned this pull request Jun 15, 2026

[DO NOT MERGE] [Klaud Cold] experimental: profile MiniMax-M3 MI325X conc 4/8 — vllm#45639 (AITER AR + Gemma-RMS fusion) #1773

Open

cursor Bot reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT MERGE] [Klaud Cold] experimental: MiniMax-M3 MI325X conc 4/8 — apply vllm#45639 (AITER AR + Gemma-RMS fusion)#1772

[DO NOT MERGE] [Klaud Cold] experimental: MiniMax-M3 MI325X conc 4/8 — apply vllm#45639 (AITER AR + Gemma-RMS fusion)#1772
functionstackx wants to merge 3 commits into
mainfrom
experimental/minimaxm3-mi325-arfusion-45639-dnm

functionstackx commented Jun 15, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented Jun 15, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Validation

Uh oh!

cursor Bot Jun 15, 2026

Choose a reason for hiding this comment

Patch apply errors ignored

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 15, 2026

Choose a reason for hiding this comment

conc-list breaks full-sweep

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

functionstackx commented Jun 15, 2026 •

edited by cursor Bot

Loading