Add B300 config: kimi-k2.5-fp4-vllm by cquil11 · Pull Request #1100 · SemiAnalysisAI/InferenceX

cquil11 · 2026-04-20T13:47:17Z

Warning

This PR is a reopen of #1056, which was reverted in #1099 due to an error with the first PR. The contents and description are otherwise identical to the original.

Summary

Add kimik2.5-fp4-b300-vllm benchmark config and the corresponding benchmarks/single_node/kimik2.5_fp4_b300.sh launch script
At the time of submission, the vLLM Kimi-K2.5 recipes page does not have a B300-specific recipe, so this reuses the existing Kimi-K2.5 FP4 (NVFP4) B200 vLLM recipe as-is until B300-specific tuning is available
Image: vllm/vllm-openai:v0.17.0 (same as B200), runner: b300, same TP/EP/concurrency search-space as B200

Test plan

CI config validation passes
Run kimik2.5-fp4-b300-vllm single-node benchmark on a B300 node and confirm server starts, benchmark completes, and result file is produced

🤖 Generated with Claude Code

github-actions · 2026-04-20T13:47:27Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude

LGTM — straightforward B300 config addition following established patterns; the stale pr-link in the changelog entry is a minor nit already flagged inline.

Extended reasoning...

Overview

This PR adds a new benchmark configuration kimik2.5-fp4-b300-vllm to three files: .github/configs/nvidia-master.yaml (new config block), benchmarks/single_node/kimik2.5_fp4_b300.sh (new launch script), and perf-changelog.yaml (new changelog entry). It is a direct re-addition of the change from PR #1056, which was reverted by PR #1099.

Security risks

None. These are benchmark configuration and shell script changes with no auth, crypto, or permission-sensitive code paths. The shell script follows the same pattern as all other single-node benchmark scripts in the repo.

Level of scrutiny

Low. The config block mirrors the existing kimik2.5-fp4-b200-vllm entry with only the runner changed to b300 and the image bumped to v0.19.0-cu130. The launch script is a copy of the B200 script with the same vLLM serve invocation. Comparable B300 entries for MiniMax-M2.5 FP4/FP8 (PRs #1054, #1055) used the identical approach.

Other factors

The only bug found is a [Nit]: the perf-changelog.yaml entry carries pr-link: .../pull/1056 (the reverted PR) instead of pull/1100. This is a cosmetic tracking issue that doesn't affect CI, benchmarks, or functionality. The inline comment already captures this.

claude · 2026-04-20T13:54:30Z

+    - "Add Kimi-K2.5 FP4 (NVFP4) B300 vLLM benchmark"
+    - "Image: vllm/vllm-openai:v0.19.0-cu130"
+    - "At the time of submission, https://docs.vllm.ai/projects/recipes/en/latest/moonshotai/Kimi-K2.5.html does not have a B300-specific recipe, so this reuses the existing Kimi-K2.5 FP4 B200 vLLM recipe as-is"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1056


🟡 The perf-changelog.yaml entry for kimik2.5-fp4-b300-vllm has pr-link pointing to PR #1056, which was explicitly reverted by PR #1099. Since this PR (#1100) is the one that will actually land the change, the pr-link should reference https://github.com/SemiAnalysisAI/InferenceX/pull/1100 instead.

Extended reasoning...

What the bug is and how it manifests

The new perf-changelog.yaml entry for kimik2.5-fp4-b300-vllm (added in this PR) sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1056. PR #1056 was the original submission of this B300 config, but it was subsequently reverted by PR #1099 due to an error. The PR description for #1100 explicitly states: "This PR is a reopen of #1056, which was reverted in #1099 due to an error with the first PR." This means PR #1056 exists on GitHub in a reverted/superseded state, while PR #1100 is the change that will actually merge the configuration into main.

The specific code path that triggers it

In the diff, the new changelog block reads:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1056

This was carried over from the original PR #1056 submission without being updated to reflect the current PR number (#1100).

Why existing code does not prevent it

The changelog is a manually maintained YAML file with no automated validation that cross-checks pr-link values against the actual PR being merged. There is no CI check to enforce that the link matches the current PR. The error is a straightforward copy-paste oversight when reopening the PR.

What the impact would be

When anyone later looks up the changelog entry for kimik2.5-fp4-b300-vllm to understand when and how it was introduced, they will follow the pr-link and land on PR #1056, which GitHub shows as reverted. This is misleading — it appears the feature was reverted rather than merged. The actual merging PR (#1100) would not be linked anywhere in the changelog, making historical tracking inaccurate.

How to fix it

Change the pr-link in the new changelog entry from pull/1056 to pull/1100.

Step-by-step proof

PR Add B300 config: kimi-k2.5-fp4-vllm #1056 was opened and merged, adding the kimik2.5-fp4-b300-vllm config.

PR Revert "Add B300 config: kimi-k2.5-fp4-vllm (#1056)" [skip-sweep] #1099 explicitly reverted PR Add B300 config: kimi-k2.5-fp4-vllm #1056 (commit: "Revert 'Add B300 config: kimi-k2.5-fp4-vllm (Add B300 config: kimi-k2.5-fp4-vllm #1056)'").

PR Add B300 config: kimi-k2.5-fp4-vllm #1100 (this PR) reopens the same change — its description confirms: "This PR is a reopen of Add B300 config: kimi-k2.5-fp4-vllm #1056, which was reverted in Revert "Add B300 config: kimi-k2.5-fp4-vllm (#1056)" [skip-sweep] #1099".

The new changelog entry in this PR's diff sets pr-link to /pull/1056 (the reverted PR) instead of /pull/1100 (this PR).

Comparable B300 entries in the same file correctly reference their own PR numbers: minimaxm2.5-fp8-b300-vllm -> Add B300 config: minimaxm2.5-fp8-vllm #1054, minimaxm2.5-fp4-b300-vllm -> Add B300 config: minimaxm2.5-fp4-vllm #1055, dsr1-fp8-b300-sglang-mtp -> Add B300 config: dsr1-fp8-sglang-mtp #1059.

The pr-link should therefore be updated to https://github.com/SemiAnalysisAI/InferenceX/pull/1100.

xinli-sw · 2026-04-20T14:18:59Z

Hi @cquil11 , I think we could do two things to get better perf

B300 could benefit from more parallel configs, we can run some more and report back
We should also include changes from https://github.com/SemiAnalysisAI/InferenceX/pull/1047/changes

Depending on how fast you want the data, we can either have #1407 as a fast follow of this PR or just wait until that's ready. Thanks!

cquil11 · 2026-04-20T14:23:22Z

@xinli-sw No real rush, we can wait for y'all.

functionstackx · 2026-04-20T19:58:00Z

hi @xinli-sw it would be if https://recipes.vllm.ai/ had b300 specific configs so that ppl can know what flags to use specifically for b300

xinli-sw · 2026-04-20T20:11:12Z

yup for sure, I think mostly just having better parallel configs so that B300's higher mem size, will update shortly

- Add kimik2.5-fp4-b300-vllm benchmark config and the corresponding benchmarks/single_node/kimik2.5_fp4_b300.sh launch script - At the time of submission, the vLLM Kimi-K2.5 recipes page does not have a B300-specific recipe, so this reuses the existing Kimi-K2.5 FP4 (NVFP4) B200 vLLM recipe as-is until B300-specific tuning is available - Image: vllm/vllm-openai:v0.17.0 (same as B200), runner: b300, same TP/EP/concurrency search-space as B200

functionstackx · 2026-04-20T22:35:07Z

@claude look at the fix in #1104 for b200, add it to this b300 PR too

Some attention backends require the KV-cache block count to be a multiple of 128 tokens. With block_size=32 that means block_num must be divisible by 4. The previous ISL + OSL + 200 padding did not guarantee this alignment (e.g. ISL=8192, OSL=1024 rounds to 295 blocks, not divisible by 4), triggering: RuntimeError: Expected block_num % (128 / block_size) == 0, got block_num=295 and block_size=32 Bumping the headroom to +256 (any multiple of 128) keeps the context length aligned. Mirrors the fix from #1104. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Klaud-Cold · 2026-04-20T23:17:01Z

Claude finished @functionstackx's task in 2m 14s —— View job

Applying B200 fix from PR #1104 to this B300 PR

Look at PR Bump max-model-len headroom from +200 to +256 (block-size alignment) #1104 diff to understand the B200 fix
Identify what needs to change in this B300 PR
Verify current state

Result

The fix from PR #1104 (bump max-model-len headroom from +200 to +256 for block-size alignment) is already included in this branch via commit 69a2584:

File	Status
`benchmarks/benchmark_lib.sh` (`setup_eval_context`)	+256
`benchmarks/single_node/gptoss_fp4_b200.sh`	+256
`benchmarks/single_node/gptoss_fp4_h200.sh`	+256
`utils/matrix_logic/generate_sweep_configs.py` (4 sites)	+256
`utils/matrix_logic/test_generate_sweep_configs.py`	+256

The B300 kimi script (kimik2.5_fp4_b300.sh) itself doesn't need a direct change because it receives MAX_MODEL_LEN from the environment variable set by generate_sweep_configs.py, which already uses the +256 headroom. No additional changes needed.
|

cquil11 requested a review from a team April 20, 2026 13:47

cquil11 requested review from jgangani and kedarpotdar-nv as code owners April 20, 2026 13:47

claude bot reviewed Apr 20, 2026

View reviewed changes

cquil11 mentioned this pull request Apr 20, 2026

Add B300 config: kimi-k2.5-int4-vllm #1071

Open

2 tasks

functionstackx added the sweep-enabled label Apr 20, 2026

functionstackx force-pushed the reopen-1056-kimik2.5-fp4-b300 branch from fc0ae0d to 3f24821 Compare April 20, 2026 22:33

functionstackx merged commit f34a83b into main Apr 20, 2026
25 checks passed

functionstackx deleted the reopen-1056-kimik2.5-fp4-b300 branch April 20, 2026 23:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add B300 config: kimi-k2.5-fp4-vllm#1100

Add B300 config: kimi-k2.5-fp4-vllm#1100
functionstackx merged 2 commits intomainfrom
reopen-1056-kimik2.5-fp4-b300

cquil11 commented Apr 20, 2026

Uh oh!

github-actions bot commented Apr 20, 2026

Uh oh!

claude bot left a comment

Uh oh!

claude bot Apr 20, 2026

Uh oh!

xinli-sw commented Apr 20, 2026

Uh oh!

cquil11 commented Apr 20, 2026

Uh oh!

functionstackx commented Apr 20, 2026

Uh oh!

xinli-sw commented Apr 20, 2026

Uh oh!

functionstackx commented Apr 20, 2026

Uh oh!

Klaud-Cold commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

cquil11 commented Apr 20, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Apr 20, 2026

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

xinli-sw commented Apr 20, 2026

Uh oh!

cquil11 commented Apr 20, 2026

Uh oh!

functionstackx commented Apr 20, 2026

Uh oh!

xinli-sw commented Apr 20, 2026

Uh oh!

functionstackx commented Apr 20, 2026

Uh oh!

Klaud-Cold commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Applying B200 fix from PR #1104 to this B300 PR

Result

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Klaud-Cold commented Apr 20, 2026 •

edited

Loading