Skip to content

[NV] update minimaxm2.5-fp8-b200-vllm#1068

Merged
Oseltamivir merged 9 commits intomainfrom
minimaxm2.5-fp8-b200-vllm-v2
Apr 22, 2026
Merged

[NV] update minimaxm2.5-fp8-b200-vllm#1068
Oseltamivir merged 9 commits intomainfrom
minimaxm2.5-fp8-b200-vllm-v2

Conversation

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

PORT=${PORT:-8888}

export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl
export VLLM_FLOAT32_MATMUL_PRECISION=high
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The B300 benchmark script (benchmarks/single_node/minimaxm2.5_fp8_b300.sh) was not updated to match the env var change made to the B200 script in this PR. The B300 script explicitly documents that it "reuses the existing MiniMax-M2.5 FP8 B200 vLLM recipe as-is", so it should be updated in the same PR to replace VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl with VLLM_FLOAT32_MATMUL_PRECISION=high.

Extended reasoning...

What the bug is and how it manifests

This PR updates benchmarks/single_node/minimaxm2.5_fp8_b200.sh (line 27) to replace export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl with export VLLM_FLOAT32_MATMUL_PRECISION=high. However, the companion B300 script (benchmarks/single_node/minimaxm2.5_fp8_b300.sh) was not updated and still exports VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl at line 31. After this PR merges, the two scripts will be out of sync with different environment configurations.

The specific code path that triggers it

The B300 script contains an explicit design comment at lines 3–5: "this script reuses the existing MiniMax-M2.5 FP8 B200 vLLM recipe as-is until B300-specific tuning is available." This is corroborated by the perf-changelog.yaml entry for minimaxm2.5-fp8-b300-vllm which also states it reuses the B200 recipe as-is. The deliberate design intent is for B300 to mirror B200 until independent tuning is done.

Why existing code doesn't prevent it

There is no automated mechanism to enforce parity between the B200 and B300 scripts. The only enforcement is the human convention expressed in the B300 comment, which was overlooked in this PR when only the B200 script was modified.

What the impact would be

After this PR merges, running benchmarks on B300 will use the old VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl setting, while B200 uses the new VLLM_FLOAT32_MATMUL_PRECISION=high setting. Since the env var was presumably changed on B200 for correctness or performance reasons (possibly mnnvl was incorrect or suboptimal on this hardware), B300 will be benchmarked under suboptimal or incorrect conditions — producing results that are not comparable to B200 and do not reflect the intended configuration.

How to fix it

In benchmarks/single_node/minimaxm2.5_fp8_b300.sh at line 31, replace:

export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl

with:

export VLLM_FLOAT32_MATMUL_PRECISION=high

Step-by-step proof

  1. Before this PR, B200 script (line 27) had: export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl
  2. Before this PR, B300 script (line 31) had: export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl — matching B200 as intended
  3. This PR changes B200 line 27 to: export VLLM_FLOAT32_MATMUL_PRECISION=high
  4. This PR does NOT change the B300 script
  5. After this PR: B200 uses VLLM_FLOAT32_MATMUL_PRECISION=high, B300 uses VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl
  6. B300's own comment says it should mirror B200 — contradiction.

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

@claude - add PR description

@hshrivastava-droid hshrivastava-droid changed the title [WIP][NV] update minimaxm2.5-fp8-b200-vllm [NV] update minimaxm2.5-fp8-b200-vllm Apr 20, 2026
@hshrivastava-droid hshrivastava-droid changed the title [NV] update minimaxm2.5-fp8-b200-vllm [WIP][NV] update minimaxm2.5-fp8-b200-vllm Apr 20, 2026
Copy link
Copy Markdown
Collaborator

@jgangani jgangani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hshrivastava-droid hshrivastava-droid changed the title [WIP][NV] update minimaxm2.5-fp8-b200-vllm [NV] update minimaxm2.5-fp8-b200-vllm Apr 20, 2026
@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

@functionstackx - could you please review?

@functionstackx
Copy link
Copy Markdown
Contributor

@functionstackx - could you please review?

can update vllm recipe with this change
image
and it seems like pr validation is failing

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24691388125/job/72214032593?pr=1068

@Ankur-singh
Copy link
Copy Markdown
Collaborator

@functionstackx already updated the recipe vllm-project/recipes#334

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

hshrivastava-droid commented Apr 21, 2026

@functionstackx - could you please review? conflicts resolved.

Copy link
Copy Markdown
Collaborator

@Oseltamivir Oseltamivir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Oseltamivir Oseltamivir merged commit 41ed82d into main Apr 22, 2026
3 checks passed
@Oseltamivir Oseltamivir deleted the minimaxm2.5-fp8-b200-vllm-v2 branch April 22, 2026 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

6 participants