-
Notifications
You must be signed in to change notification settings - Fork 140
[WIP][NV] update minimaxm2.5 fp4 b200 vllm flag #1069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
hshrivastava-droid
wants to merge
5
commits into
main
Choose a base branch
from
minimaxm2.5_fp4_b200-v2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+17
−10
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
7b40de9
update vllm flag
hshrivastava-droid 3a29b80
update PR number
hshrivastava-droid 3c10c2b
update conc
hshrivastava-droid 5195d6e
conc update
hshrivastava-droid ce080a0
Merge branch 'main' into minimaxm2.5_fp4_b200-v2
hshrivastava-droid File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 This PR adds
export VLLM_FLOAT32_MATMUL_PRECISION=hightominimaxm2.5_fp4_b200.shbut omits the same change from its B300 counterpart (minimaxm2.5_fp4_b300.sh), which explicitly states it reuses the B200 recipe as-is. Please add the same export tobenchmarks/single_node/minimaxm2.5_fp4_b300.shafter the PORT assignment, before the DP_ATTENTION conditional block.Extended reasoning...
What the bug is and how it manifests
The PR adds
export VLLM_FLOAT32_MATMUL_PRECISION=hightobenchmarks/single_node/minimaxm2.5_fp4_b200.sh(line 27) as a performance optimization for NVFP4 matmul operations on B200 hardware. However, the structurally identical B300 script (benchmarks/single_node/minimaxm2.5_fp4_b300.sh) does not receive the same update. After this PR merges, the two scripts diverge in a way that contradicts the B300 script's own documented intent.The specific code path that triggers it
The B300 script (lines 3-5) includes an explicit comment: "At the time of submission ... this script reuses the existing MiniMax-M2.5 FP4 B200 vLLM recipe as-is until B300-specific tuning is available." Both scripts use the same Docker image (
vllm/vllm-openai:v0.19.0-cu130), the same NVFP4 model, and identicalvllm servearguments. Before this PR they were functionally equivalent; after this PR, B200 benchmarks run withVLLM_FLOAT32_MATMUL_PRECISION=highset and B300 benchmarks do not.Why existing code does not prevent it
There is no shared configuration or template mechanism that would automatically propagate the new env var to the B300 script. Each script is a standalone shell file, so the author must manually mirror changes. The B300 comment documents the intent to stay in sync with B200, but there is no enforcement.
What the impact would be
B200 and B300 benchmarks for the same MiniMax-M2.5 NVFP4 model will run under different PyTorch matmul precision settings, making the results incomparable. If
VLLM_FLOAT32_MATMUL_PRECISION=highimproves throughput or latency on B200 (motivating the change), the B300 numbers will be artificially lower than they should be, understating B300 performance relative to B200.How to fix it
Add
export VLLM_FLOAT32_MATMUL_PRECISION=hightobenchmarks/single_node/minimaxm2.5_fp4_b300.shimmediately after line 30 (PORT=${PORT:-8888}), mirroring the placement in the B200 script. Theperf-changelog.yamlentry should also reference theminimaxm2.5-fp4-b300-vllmconfig-key alongsideminimaxm2.5-fp4-b200-vllm.Step-by-step proof
SERVER_LOGandPORT, then jump directly to the DP_ATTENTION conditional block with no env var in between.export VLLM_FLOAT32_MATMUL_PRECISION=highbetween the PORT line and the DP_ATTENTION check in the B200 script only.grep VLLM_FLOAT32_MATMUL_PRECISION benchmarks/single_node/minimaxm2.5_fp4_b200.shreturns a match; the same grep on the B300 script returns nothing.minimaxm2.5_fp4_b300.shwill therefore start the vLLM server without the high-precision matmul flag, while an equivalent B200 job benefits from it -- inconsistent with the B300 script's stated goal of reusing the B200 recipe.