Add dsv4-fp8-h200-sglang single-node config by cquil11 · Pull Request #1136 · SemiAnalysisAI/InferenceX

cquil11 · 2026-04-24T07:21:35Z

Summary

Adds dsv4-fp8-h200-sglang to .github/configs/nvidia-master.yaml using lmsysorg/sglang:deepseek-v4-hopper and sgl-project/DeepSeek-V4-Flash-FP8
Adds benchmarks/single_node/dsv4_fp8_h200.sh following the DeepSeek-V4-Flash-FP8 H200 SGLang recipe with prefix caching (--disable-radix-cache) and speculative decoding both disabled
Applies the /workspace/sglang editable-install workaround to launch_h200-cw.sh and launch_h200-nb.sh: mount at /ix when the image is deepseek-v4-hopper, else /workspace (matches the treatment of deepseek-v4-blackwell in the B200 runner)
Adds a perf-changelog.yaml entry

Test plan

Full sweep (full-sweep-enabled label) produces results for 1k/1k (conc 4-64) and 8k/1k (conc 4-32) at tp=4
Server launch logs show sglang module resolves correctly under the /ix mount

🤖 Generated with Claude Code

Adds the DeepSeek-V4-Flash-FP8 H200 SGLang recipe from https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V4. Server launch follows the cookbook command (--tp N, --moe-runner-backend flashinfer_mxfp4, --chunked-prefill-size 4096, --disable-flashinfer-autotune, SGLANG_JIT_DEEPGEMM_PRECOMPILE=0, SGLANG_DSV4_FP4_EXPERTS=0). Speculative decoding omitted and --disable-radix-cache added for the no-spec / no-prefix-cache baseline. Also applies the same /workspace mount workaround to the H200 runners (launch_h200-cw.sh and launch_h200-nb.sh): the deepseek-v4-hopper image installs sglang editable under /workspace/sglang/python, which our bind-mount would mask. Mount at /ix for this image only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-24T07:21:50Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-04-24T07:30:35Z

+# TODO(Cam): lmsysorg/sglang:deepseek-v4-hopper installs sglang editable at
+# /workspace/sglang/python (prior sglang tags used /sgl-workspace/sglang), so
+# the default $GITHUB_WORKSPACE:/workspace/ bind-mount masks the install and
+# breaks `import sglang`. Mount this one image at /ix instead; drop the
+# conditional once the image stops installing editable under /workspace.
+if [[ "$IMAGE" == *deepseek-v4-hopper* ]]; then
+    CONTAINER_MOUNT_DIR=/ix
+else
+    CONTAINER_MOUNT_DIR=/workspace
+fi


🔴 The /ix mount workaround is applied to launch_h200-cw.sh and launch_h200-nb.sh but not to runners/launch_h200-dgxc-slurm.sh. Per .github/configs/runners.yaml, 14 of the 18 h200 pool runners are h200-dgxc-slurm_*, so the new dsv4-fp8-h200-sglang config (declared as runner: h200) will most often be scheduled onto the unfixed launcher, where /workspace bind-mount will mask /workspace/sglang/python and import sglang will fail — the exact failure this PR is trying to prevent. Apply the same conditional CONTAINER_MOUNT_DIR=/ix logic to the single-node else-branch (lines 289-295) of runners/launch_h200-dgxc-slurm.sh.

Extended reasoning...

The bug

This PR adds a conditional /ix mount for the lmsysorg/sglang:deepseek-v4-hopper image to two of the three H200 launchers (launch_h200-cw.sh, launch_h200-nb.sh) but leaves the third — runners/launch_h200-dgxc-slurm.sh — unpatched. Its single-node else-branch still hardcodes:

--container-mounts=$GITHUB_WORKSPACE:/workspace/,$HF_HUB_CACHE_MOUNT:$HF_HUB_CACHE --container-workdir=/workspace/

(see runners/launch_h200-dgxc-slurm.sh lines 291 and 293, inside the else branch that starts at line 262).

Why it matters: the majority of h200 runners hit the unfixed launcher

From .github/configs/runners.yaml the h200 pool (lines 29-47) is:

2× h200-cw_* — patched by this PR

2× h200-nb_* — patched by this PR

14× h200-dgxc-slurm_* — not patched

That is 14/18 ≈ 78% of the pool. The new dsv4-fp8-h200-sglang entry in .github/configs/nvidia-master.yaml declares runner: h200 (not h200-multinode), so it is schedulable onto any of these 18 runners.

How it triggers

.github/workflows/benchmark-tmpl.yml:154 selects the launcher via bash ./runners/launch_${RUNNER_NAME%%_*}.sh. So a runner labeled h200-dgxc-slurm_7 executes runners/launch_h200-dgxc-slurm.sh. The new config is single-node (multinode: false), so the workflow takes the else branch (single-node path, line 262 onward) which hardcodes /workspace.

Step-by-step proof

GitHub Actions dispatches the dsv4-fp8-h200-sglang job with runner: h200.

The scheduler picks one of the 18 pool runners; 14 of 18 are h200-dgxc-slurm_N.

benchmark-tmpl.yml invokes runners/launch_h200-dgxc-slurm.sh.

IS_MULTINODE is not true (config declares multinode: false), so execution enters the else branch at line 262.

srun runs with --container-mounts=$GITHUB_WORKSPACE:/workspace/ and --container-workdir=/workspace/ (lines 291, 293).

The lmsysorg/sglang:deepseek-v4-hopper image installs the editable sglang at /workspace/sglang/python, but the bind-mount has masked that path with $GITHUB_WORKSPACE contents.

benchmarks/single_node/dsv4_fp8_h200.sh runs python3 -m sglang.launch_server ..., which errors with ModuleNotFoundError: No module named 'sglang' (the exact failure the PR's own TODO comment is guarding against).

Why existing code doesn't prevent it

The two launchers that were patched added a conditional CONTAINER_MOUNT_DIR at their top-level; the dgxc-slurm variant has no such conditional, and still literally writes /workspace in both the --container-mounts and --container-workdir flags of the single-node srun. Nothing else in the launcher rewrites these paths based on image name.

How to fix

Apply the same two-step fix the PR already made to the other launchers, to the single-node branch of runners/launch_h200-dgxc-slurm.sh:

Near the top of the file (or inside the else-branch before the srun), add:
if [[ "$IMAGE" == *deepseek-v4-hopper* ]]; then CONTAINER_MOUNT_DIR=/ix else CONTAINER_MOUNT_DIR=/workspace fi

Change lines 291 and 293 from /workspace/ to $CONTAINER_MOUNT_DIR (matching the pattern already used in launch_h200-cw.sh:53,55 and launch_h200-nb.sh:26,29).

This is a purely mechanical fix that mirrors the existing two-launcher patch and resolves the failure on the majority of the h200 pool.

claude · 2026-04-24T07:30:35Z

+- config-keys:
+    - dsv4-fp8-h200-sglang
+  description:
+    - "Add DeepSeek-V4-Flash-FP8 single-node H200 SGLang benchmark (TP4)"
+    - "Container: lmsysorg/sglang:deepseek-v4-hopper"
+    - "Model: sgl-project/DeepSeek-V4-Flash-FP8"
+    - "Recipe from https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V4"
+    - "Prefix caching and speculative decoding disabled for baseline numbers"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/TBD


🟡 The new dsv4-fp8-h200-sglang entry was prepended at lines 1-9 of perf-changelog.yaml, but AGENTS.md requires new entries to be appended to the END of the file. Please move this entry to the bottom of the file, alongside the other recent entries (e.g., #1043, #1120).

Extended reasoning...

What the bug is

AGENTS.md (line 160) contains an explicit, unambiguous rule for perf-changelog.yaml:

The file is read in chronological order: oldest at the top, newest at the bottom. New entries MUST be appended to the END of the file — never insert in the middle or prepend.

This PR's diff header @@ -1,3 +1,13 @@ shows that the new dsv4-fp8-h200-sglang entry was inserted at the top of perf-changelog.yaml (lines 1-9), immediately before the previous first entry (dsr1-fp8-h100-dynamo-trt / dsr1-fp8-h100-dynamo-sglang). That directly violates the documented convention.

Why existing code doesn't prevent it

perf-changelog.yaml is a plain YAML sequence, so order is stylistic/documentary rather than functional — process_changelog.py will still pick up the entry no matter where it sits. There is no lint or CI check that enforces the append-only convention; it relies on the rule in AGENTS.md.

Why this is the right interpretation (convention is still active)

Scanning the end of the modified perf-changelog.yaml, the most recent entries are all properly appended at the bottom:

PR [AMD/ROCM] atom qwen fp8/fp8_mtp3 on mi355x #1040 (atom qwen fp8/fp8_mtp3 on mi355x) — at the end

PR [AMD/ROCM] atom glm5.1 fp4 on mi355x #1043 (glm5.1 fp4 atom) — at the end

PR trigger H100 multinode evals #1120 (H100 multinode evals) — near the end

PR [NV] minimaxm2.5 fp8 b300 vllm update #1106, [NV] update minimaxm2.5 fp4 b300 vllm #1107, [NV] update minimaxm2.5-fp8-b200-vllm #1068, [NV] update minimaxm2.5 fp4 b200 vllm flag #1069 — all clustered near the end

So the convention is still being actively followed by other contributors. The prepend in this PR is an outlier.

Proof (step-by-step)

Open AGENTS.md at line 160: the rule says entries "MUST be appended to the END of the file — never insert in the middle or prepend."

Open the PR diff for perf-changelog.yaml: the hunk header is @@ -1,3 +1,13 @@, meaning the 10 new lines start at line 1 of the new file — i.e. the top.

Look at the current tail of perf-changelog.yaml: the newest pre-existing entry (PR [AMD/ROCM] atom glm5.1 fp4 on mi355x #1043, glm5.1-fp4-mi355x-atom) sits there, confirming the append convention is still in force.

Therefore this PR prepends rather than appends, in direct contradiction of AGENTS.md.

How to fix

Move the new entry block (the 10 lines starting with - config-keys: / dsv4-fp8-h200-sglang / description: / pr-link:) from lines 1-9 to the end of perf-changelog.yaml, after the glm5.1-fp4-mi355x-atom entry (PR #1043). Also update the pr-link: from TBD to the actual PR URL (.../pull/1136) while you're in there.

Impact

Functionally harmless — process_changelog.py will still process the entry correctly. But it's a documented-convention violation that makes the "newest at the bottom" ordering no longer reliable for readers or tooling that assumes chronological order (e.g. quick tail inspections). Hence nit severity.

cquil11 requested a review from a team April 24, 2026 07:21

cquil11 added the full-sweep-enabled label Apr 24, 2026

cquil11 requested review from jgangani and kedarpotdar-nv as code owners April 24, 2026 07:21

github-project-automation Bot added this to InferenceMAX Board Apr 24, 2026

cquil11 closed this Apr 24, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 24, 2026

claude Bot reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dsv4-fp8-h200-sglang single-node config#1136

Add dsv4-fp8-h200-sglang single-node config#1136
cquil11 wants to merge 1 commit intomainfrom
chore/dsv4-sgl-h200-fp8

cquil11 commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

claude Bot Apr 24, 2026

Uh oh!

claude Bot Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cquil11 commented Apr 24, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

claude Bot Apr 24, 2026

Choose a reason for hiding this comment

The bug

Why it matters: the majority of h200 runners hit the unfixed launcher

How it triggers

Step-by-step proof

Why existing code doesn't prevent it

How to fix

Uh oh!

claude Bot Apr 24, 2026

Choose a reason for hiding this comment

What the bug is

Why existing code doesn't prevent it

Why this is the right interpretation (convention is still active)

Proof (step-by-step)

How to fix

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant