Add B300 config: kimi-k2.5-int4-vllm by cquil11 · Pull Request #1071 · SemiAnalysisAI/InferenceX

cquil11 · 2026-04-17T21:59:52Z

Warning

This PR is a reopen of #1057, which was reverted in #1070 due to an error with the first PR. The contents and description are otherwise identical to the original.

Summary

Add kimik2.5-int4-b300-vllm benchmark config and the corresponding benchmarks/single_node/kimik2.5_int4_b300.sh launch script
At the time of submission, the vLLM Kimi-K2.5 recipes page does not have a B300-specific recipe, so this reuses the existing Kimi-K2.5 INT4 B200 vLLM recipe as-is until B300-specific tuning is available
Image: vllm/vllm-openai:v0.15.1 (same as B200), runner: b300, same TP=8 and concurrency 4-64 search-space as B200

Test plan

CI config validation passes
Run kimik2.5-int4-b300-vllm single-node benchmark on a B300 node and confirm server starts, benchmark completes, and result file is produced

🤖 Generated with Claude Code

github-actions · 2026-04-17T22:00:01Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-04-17T22:00:01Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-04-17T22:00:02Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-04-17T22:07:48Z

+    - kimik2.5-int4-b300-vllm
+  description:
+    - "Add Kimi-K2.5 INT4 B300 vLLM benchmark"
+    - "Image: vllm/vllm-openai:v0.19.0-cu130"
+    - "At the time of submission, https://docs.vllm.ai/projects/recipes/en/latest/moonshotai/Kimi-K2.5.html does not have a B300-specific recipe, so this reuses the existing Kimi-K2.5 INT4 B200 vLLM recipe as-is"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1057
+
 - config-keys:
    - gptoss-fp4-mi300x-vllm


🟡 The new kimik2.5-int4-b300-vllm entry in perf-changelog.yaml has two documentation issues: (1) pr-link points to the reverted PR #1057 instead of the current PR #1071, and (2) the entry was inserted before the existing gptoss-fp4-mi300x-vllm entry rather than appended at the very end of the file, violating the AGENTS.md ordering convention. Both can be fixed by moving the new entry to the bottom of the file and updating the pr-link to https://github.com/SemiAnalysisAI/InferenceX/pull/1071.

Extended reasoning...

Issue 1: Wrong PR link

The new kimik2.5-int4-b300-vllm entry carries pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1057. PR #1057 was the original submission that was subsequently reverted entirely by PR #1070. The current PR, #1071, is explicitly described as a reopen of #1057 with otherwise identical contents. Since PR #1057 never landed in main (it was reverted), the canonical PR that actually introduces this config is #1071. The pr-link field is used as historical documentation linking a config to the PR that merged it, so pointing it at a reverted PR is incorrect — anyone following the link will reach a reverted (and confusing) PR rather than the one that introduced the change.

Issue 2: Entry inserted in the wrong position

AGENTS.md line 159 states: "The file is read in chronological order: oldest at the top, newest at the bottom. New entries MUST be appended to the END of the file — never insert in the middle or prepend." The diff shows the new kimik2.5-int4-b300-vllm block was inserted between the kimik2.5-fp4-b300-vllm entry (PR #1056) and the gptoss-fp4-mi300x-vllm entry (PR #1053). After the PR merges, gptoss-fp4-mi300x-vllm remains the last entry in the file, while the newer kimik2.5-int4-b300-vllm entry sits above it — a clear ordering inversion.

Why existing code doesn't prevent this

There is no automated enforcement of the append-only rule or the pr-link value in the CI pipeline. The AGENTS.md instruction is a human convention. When the author copied the entry from PR #1057, both the pr-link value and the insertion position were carried over as-is, bypassing the update.

Impact

The impact is limited to documentation accuracy. The wrong pr-link means anyone using the changelog to trace config history will land on the reverted PR, potentially causing confusion about whether the config is actually live. The ordering violation makes the changelog harder to read chronologically and sets a precedent for future out-of-order insertions.

How to fix

Move the entire kimik2.5-int4-b300-vllm block to the very bottom of perf-changelog.yaml (after the gptoss-fp4-mi300x-vllm entry) and update its pr-link to https://github.com/SemiAnalysisAI/InferenceX/pull/1071.

Step-by-step proof

The last entry before this PR was gptoss-fp4-mi300x-vllm (PR [AMD][MI300X] Expand GPT-OSS FP4 TP=1 concurrency from 64 to 256 #1053).

The diff shows the new block inserted at the position before gptoss-fp4-mi300x-vllm — confirmed by the diff context showing - config-keys: [gptoss-fp4-mi300x-vllm]… appearing immediately after the added block.

After the merge, tailing perf-changelog.yaml shows the order: …kimik2.5-fp4-b300-vllm → kimik2.5-int4-b300-vllm → gptoss-fp4-mi300x-vllm, so PR [AMD][MI300X] Expand GPT-OSS FP4 TP=1 concurrency from 64 to 256 #1053's entry is still last despite being older.

The pr-link value in the new entry is pull/1057, which maps to the reverted PR, not the current one (pull/1071).

claude · 2026-04-17T22:07:48Z

+# NOTE: At the time of submission, https://docs.vllm.ai/projects/recipes/en/latest/moonshotai/Kimi-K2.5.html
+# does not have a B300-specific recipe, so this config reuses the existing
+# Kimi-K2.5 INT4 B200 vLLM recipe as-is until B300-specific tuning is available.
+kimik2.5-int4-b300-vllm:
+  image: vllm/vllm-openai:v0.19.0-cu130


🟡 The PR description explicitly states "Image: vllm/vllm-openai:v0.15.1 (same as B200)", but the actual B300 config uses vllm/vllm-openai:v0.19.0-cu130 — a different image. The image choice for B300 is intentional and correct (cu130 is required for Blackwell hardware), but the PR description is factually inaccurate; the perf-changelog.yaml already correctly records v0.19.0-cu130, so updating the PR description to match would remove the inconsistency.

Extended reasoning...

What the bug is and how it manifests

The PR description says: "Image: vllm/vllm-openai:v0.15.1 (same as B200)", but the actual config in nvidia-master.yaml sets image: vllm/vllm-openai:v0.19.0-cu130. The B200 config (kimik2.5-int4-b200-vllm) uses vllm/vllm-openai:v0.15.1. These are two different images with different vLLM versions and different CUDA base layers.

The specific locations

.github/configs/nvidia-master.yaml lines 2010–2014: the kimik2.5-int4-b300-vllm entry carries image: vllm/vllm-openai:v0.19.0-cu130.

The PR description body literally says v0.15.1 (same as B200).

benchmarks/single_node/kimik2.5_int4_b300.sh header comment says the script reuses the B200 recipe "as-is".

perf-changelog.yaml (correctly) records the image as v0.19.0-cu130, contradicting the PR description.

Why existing code doesn't prevent it

No automated check compares the image name stated in a PR description against the YAML config, so the factual error in the PR description goes undetected.

Addressing the refutation

The refutation correctly notes that the "as-is" language in the inline comments refers to the serving parameters (TP=8, concurrency 4–64, GPU memory utilization, vLLM flags), not the Docker image tag, and that all other B300 configs follow the same pattern of bumping to a cu130 image while keeping the rest of the recipe unchanged. This interpretation is reasonable for the code comments. However, the PR description goes further: it explicitly names a specific image version (v0.15.1) and asserts "same as B200", which is unambiguously wrong regardless of interpretation. The perf-changelog entry in the same PR already records the correct image (v0.19.0-cu130), confirming the config itself is intentional and correct.

Impact

The actual runtime behavior is correct — the B300 job will use v0.19.0-cu130 as required by Blackwell architecture. The only harm is documentation inaccuracy: anyone reading the PR description to understand what changed will be told the wrong image version, potentially causing confusion when trying to reproduce results or audit the change history.

How to fix

Update the PR description to read: "Image: vllm/vllm-openai:v0.19.0-cu130 (cu130 required for B300; B200 uses v0.15.1)." Optionally, tighten the "as-is" comment to clarify that the serving parameters are reused as-is while the image is bumped to the standard B300 cu130 image.

Step-by-step proof

Look up kimik2.5-int4-b200-vllm in nvidia-master.yaml: image: vllm/vllm-openai:v0.15.1.

Look up kimik2.5-int4-b300-vllm in the same file (added by this PR): image: vllm/vllm-openai:v0.19.0-cu130.

Open the PR description: "Image: vllm/vllm-openai:v0.15.1 (same as B200)".

v0.15.1 ≠ v0.19.0-cu130 — the PR description is factually wrong.

The perf-changelog.yaml entry added in the same PR correctly records "Image: vllm/vllm-openai:v0.19.0-cu130", confirming the author knows the actual image but did not update the PR description.

xinli-sw · 2026-04-20T13:37:10Z

Hi @cquil11 , there were some recent fixes that went in to support this combo, we are waiting on a vLLM release. If you don't mind we can let you know when the release is ready

cquil11 · 2026-04-20T13:58:01Z

@xinli-sw Can you have a look at this one that I re-opened as well, please? #1100

- Add kimik2.5-int4-b300-vllm benchmark config and the corresponding benchmarks/single_node/kimik2.5_int4_b300.sh launch script - At the time of submission, the vLLM Kimi-K2.5 recipes page does not have a B300-specific recipe, so this reuses the existing Kimi-K2.5 INT4 B200 vLLM recipe as-is until B300-specific tuning is available - Image: vllm/vllm-openai:v0.15.1 (same as B200), runner: b300, same TP=8 and concurrency 4-64 search-space as B200

The B300 script pins vllm/vllm-openai:v0.19.0-cu130, which removed the --disable-log-requests CLI flag. Passing it causes the server to exit immediately: vllm: error: unrecognized arguments: --disable-log-requests Server died before becoming healthy. Exiting. Logging is off by default in newer vLLM releases, so removing the flag is sufficient. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cquil11 requested a review from a team April 17, 2026 21:59

cquil11 requested review from jgangani and kedarpotdar-nv as code owners April 17, 2026 21:59

github-project-automation Bot added this to InferenceMAX Board Apr 17, 2026

claude Bot reviewed Apr 17, 2026

View reviewed changes

functionstackx force-pushed the reopen-1057-kimik2.5-int4-b300 branch from 355b44e to 0f04bb7 Compare April 20, 2026 22:39

functionstackx added the sweep-enabled label Apr 20, 2026

cquil11 and others added 2 commits April 20, 2026 19:25

functionstackx force-pushed the reopen-1057-kimik2.5-int4-b300 branch from 0f04bb7 to ce47b76 Compare April 20, 2026 23:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add B300 config: kimi-k2.5-int4-vllm#1071

Add B300 config: kimi-k2.5-int4-vllm#1071
cquil11 wants to merge 2 commits intomainfrom
reopen-1057-kimik2.5-int4-b300

cquil11 commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

claude Bot Apr 17, 2026

Uh oh!

claude Bot Apr 17, 2026

Uh oh!

xinli-sw commented Apr 20, 2026

Uh oh!

cquil11 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cquil11 commented Apr 17, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

claude Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

xinli-sw commented Apr 20, 2026

Uh oh!

cquil11 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants