[AMD][MI300X] Extend GPT-OSS FP4 TP=8 search to conc=1 (extends interactivity frontier to ~249 tps/user) by ramineroane · Pull Request #1092 · SemiAnalysisAI/InferenceX

ramineroane · 2026-04-19T05:47:48Z

Summary

Extends the TP=8 concurrency search for gptoss-fp4-mi300x-vllm from [4..16] to [1..16], adding a single new low-concurrency point that pushes the interactivity Pareto frontier rightward.

Motivation

The current best 1k/1k single-user point on the InferenceX gpt-oss-120b vLLM FP4 MI300X frontier is ~234 tokens/sec/user (TP=8 conc=4 → 219 TPS/GPU). The latency-optimal regime (smallest M, all GPUs, no batching) was not being measured.

Adding conc=1 captures that regime and gives the dashboard a true low-latency endpoint for users prioritizing interactive single-user use cases (chat, copilot, agentic).

Measured result

Single MI300X node, image vllm/vllm-openai-rocm:v0.17.0. Bench harness: benchmarks/single_node/gptoss_fp4_mi300x.sh (no script change needed — it already accepts conc=1).

Server flags (unchanged from existing config):

--attention-backend ROCM_AITER_UNIFIED_ATTN
-cc.pass_config.fuse_rope_kvcache=True
-cc.use_inductor_graph_partition=True
--tensor-parallel-size=8
--gpu-memory-utilization 0.95
--max-model-len 2248
--block-size=64
--no-enable-prefix-caching

Bench params: --random-input-len 1024 --random-output-len 1024 --random-range-ratio 0.8 --num-prompts 64 --max-concurrency 1 --num-warmups 1 --ignore-eos.

Metric	Value
Total TPS	493
Output TPS	247
TPS/GPU	61.7
Mean ITL	4.02 ms
Tokens/sec/user	248.9
Mean TTFT	33.6 ms
Completed	64/64

Pareto positioning

Frontier point	tps/user	tps/gpu
TP=8 conc=4 (existing best in interactivity dir.)	234.5	219
TP=8 conc=1 (this PR)	248.9	62

Net effect: a new datapoint at the high-interactivity tail; no existing points are displaced.

Impact on CI

Adds a single conc point to the existing TP=8 sweep for gptoss-fp4-mi300x-vllm.
8k/1k row updated symmetrically.
No script, image, or harness changes. Only .github/configs/amd-master.yaml.

cc @seungrokj @functionstackx @chunfangamd

Previously the TP=8 sweep started at conc=4. At conc=1, TP=8 reaches ~249 tokens/sec/user, extending the interactivity Pareto frontier beyond the prior best (~234 tps/user at TP=8 conc=4 → 219 tps/gpu). Measured on a single MI300X node, vllm/vllm-openai-rocm:v0.17.0, ROCM_AITER_UNIFIED_ATTN, fuse_rope_kvcache, inductor graph partition, 1k/1k random workload (range-ratio 0.8): conc=1 (new): Total TPS : 493 Output TPS : 247 TPS/GPU : 61.7 Mean ITL : 4.02 ms (= 248.9 tps/sec/user) TTFT : 33.6 ms Completed : 64/64 This single conc=1 point widens the right end of the interactivity frontier without requiring any code or config changes outside the search-space bound. Existing benchmark script handles conc=1 directly. The 8k/1k row is updated symmetrically.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

chunfangamd

lgtm

chunfangamd · 2026-04-19T07:15:05Z

@claude /sweep test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi300x-vllm

Klaud-Cold · 2026-04-19T07:15:31Z

Claude finished @chunfangamd's task in 0s —— View job

I'll analyze this and get back to you.

seungrokj · 2026-04-20T09:04:02Z

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi300x-vllm

github-actions · 2026-04-20T09:04:13Z

@seungrokj Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24657932304
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi300x-vllm
Pinned ref: 699643e
Approval: not required (trusted collaborator).

seungrokj · 2026-04-20T09:07:38Z

@claude @Klaud-Cold update perf-changelog.yaml

seungrokj · 2026-04-21T01:12:43Z

seungrokj

Lgtm

seungrokj · 2026-04-21T01:17:23Z

@functionstackx @cquil11 can you plz merge this?

functionstackx · 2026-04-21T01:25:52Z

@ramineroane @seungrokj at the new interactivity ranges, h100 beats mi300 of 210 to 250 tok/s/user. any plans for imrpovement?

functionstackx

approved. feel free to merge

ramineroane requested a review from a team April 19, 2026 05:47

github-project-automation bot added this to InferenceMAX Board Apr 19, 2026

ramineroane requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners April 19, 2026 05:47

claude bot reviewed Apr 19, 2026

View reviewed changes

chunfangamd approved these changes Apr 19, 2026

View reviewed changes

seungrokj added the AMD label Apr 20, 2026

seungrokj added 2 commits April 20, 2026 19:18

Merge branch 'main' into gptoss-fp4-mi300x-tp8-low-latency

5a4c75b

Update perf-changelog.yaml

33b0a04

Merge branch 'main' into gptoss-fp4-mi300x-tp8-low-latency

bcc562b

seungrokj approved these changes Apr 21, 2026

View reviewed changes

functionstackx approved these changes Apr 21, 2026

View reviewed changes

seungrokj merged commit 5261b0a into SemiAnalysisAI:main Apr 21, 2026
14 checks passed

github-project-automation bot moved this to Done in InferenceMAX Board Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD][MI300X] Extend GPT-OSS FP4 TP=8 search to conc=1 (extends interactivity frontier to ~249 tps/user)#1092

[AMD][MI300X] Extend GPT-OSS FP4 TP=8 search to conc=1 (extends interactivity frontier to ~249 tps/user)#1092
seungrokj merged 4 commits intoSemiAnalysisAI:mainfrom
ramineroane:gptoss-fp4-mi300x-tp8-low-latency

ramineroane commented Apr 19, 2026

Uh oh!

claude bot left a comment

Uh oh!

chunfangamd left a comment

Uh oh!

chunfangamd commented Apr 19, 2026

Uh oh!

Klaud-Cold commented Apr 19, 2026 •

edited

Loading

Uh oh!

seungrokj commented Apr 20, 2026

Uh oh!

github-actions bot commented Apr 20, 2026

Uh oh!

seungrokj commented Apr 20, 2026

Uh oh!

seungrokj commented Apr 21, 2026 •

edited

Loading

Uh oh!

seungrokj left a comment

Uh oh!

seungrokj commented Apr 21, 2026

Uh oh!

functionstackx commented Apr 21, 2026

Uh oh!

functionstackx left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

ramineroane commented Apr 19, 2026

Summary

Motivation

Measured result

Pareto positioning

Impact on CI

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

chunfangamd left a comment

Choose a reason for hiding this comment

Uh oh!

chunfangamd commented Apr 19, 2026

Uh oh!

Klaud-Cold commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seungrokj commented Apr 20, 2026

Uh oh!

github-actions bot commented Apr 20, 2026

Uh oh!

seungrokj commented Apr 20, 2026

Uh oh!

seungrokj commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seungrokj left a comment

Choose a reason for hiding this comment

Uh oh!

seungrokj commented Apr 21, 2026

Uh oh!

functionstackx commented Apr 21, 2026

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Klaud-Cold commented Apr 19, 2026 •

edited

Loading

seungrokj commented Apr 21, 2026 •

edited

Loading