[WIP] agentx integration by cquil11 · Pull Request #1103 · SemiAnalysisAI/InferenceX

cquil11 · 2026-04-20T20:41:31Z

No description provided.

github-actions · 2026-04-20T20:41:41Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

- Add benchmarks/single_node/agentic/ with trace replay scripts for B200 FP4, H200 FP8, MI355X FP4/FP8 - Add utils/agentic-benchmark/ with metrics collector, analysis scripts, and Pareto frontier plotting - Scripts reference utils/trace-replay (submodule) and utils/agentic-benchmark (support utilities) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

+        # for tp in sorted(df["tp"].unique()):
+        #     tp_data = df[df["tp"] == tp]
+        #     ax.scatter(tp_data[x_col], tp_data[y_col],
+        #                c=tp_colors.get(tp, "purple"),
+        #                marker=tp_markers.get(tp, "x"),
+        #                s=40, alpha=0.15, linewidths=0.3,
+        #                edgecolors="gray")


+from pathlib import Path
+
+import pandas as pd
+import numpy as np


+import sys
+import pandas as pd
+import matplotlib.pyplot as plt
+import numpy as np


+                    with open(metadata_file) as f:
+                        metadata = json.load(f)
+                    total_time_sec = metadata.get("benchmark_runtime_sec")
+                except Exception:


+                    with open(metadata_file) as f:
+                        metadata = json.load(f)
+                    total_time_sec = metadata.get("benchmark_runtime_sec")
+                except Exception:


+            self._task.cancel()
+            try:
+                await self._task
+            except asyncio.CancelledError:


+                with open(metadata_file) as f:
+                    metadata = json.load(f)
+                total_time_sec = metadata.get("benchmark_runtime_sec")
+            except Exception:


Add scenario-type input to benchmark-tmpl.yml (default: fixed-seq-len). When scenario-type is agentic-coding, SCENARIO_SUBDIR routes to benchmarks/single_node/agentic/ instead of benchmarks/single_node/. All 12 runner scripts updated to use ${SCENARIO_SUBDIR} in script paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add dsr1-fp4-b200-vllm-agentic (nvidia) and dsr1-fp4-mi355x-vllm-agentic (amd) with agentic-coding scenarios. Remove trace-source from AgenticCodingConfig model (handled by scripts). Ported from experimental multiturn-agentic-trace.yaml B200/MI355X DSR1 configs with cpu-offloading on/off variants. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rename multiturn_* to dsr1_* in benchmarks/single_node/agentic/ and update model-prefix from 'multiturn' to 'dsr1' in master configs so runner script path construction works correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Extract common agentic functions to benchmark_lib.sh (resolve_trace_source, install_agentic_deps, start/stop metrics collector, build_replay_cmd, trim_idle_metrics) - Refactor all 4 agentic scripts to use shared helpers - Remove --max-ttft and --max-new-tokens-per-period from replay command - Remove vLLM version check and commented-out config blocks - Rename model-prefix from 'multiturn' to 'dsr1' in master configs - Rename config keys from *-vllm-agentic to *-vllm - Switch submodule branch from neon-trace-support to agentx Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add AgenticMatrixEntry Pydantic model with users, offload-mode, scenario-type fields - Implement agentic-coding matrix generation in generate_full_sweep() and generate_test_config_sweep() - Skip agentic entries in mark_eval_entries() (no eval support) - Generates 46 entries for B200 + 22 for MI355X from the agentic configs Matrix entries include scenario-type: agentic-coding which the benchmark template uses to route to benchmarks/single_node/agentic/ scripts via SCENARIO_SUBDIR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add sweep-agentic job group in run-sweep.yml that dispatches agentic matrix entries to benchmark-tmpl.yml with scenario-type: agentic-coding - Add offload-mode and total-cpu-dram-gb inputs to benchmark-tmpl.yml - Add USERS, OFFLOAD_MODE, TOTAL_CPU_DRAM_GB env vars to template - Route agentic entries to single_node['agentic'] in process_changelog.py - Update ChangelogMatrixEntry to accept AgenticMatrixEntry in single_node Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Skip process_result.py and fixed-seq-len result file check for agentic - Check status.txt for agentic scenario success/failure - Add dedicated artifact upload step for agentic results (metrics CSV, detailed_results, debug_trace, workload distributions, etc.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Produces bmk_agentic_*.json artifacts matching the naming convention of fixed-seq-len results. Includes: - QPS statistics (mean, median, p90, p99, p99.9) - Latency statistics (TTFT, TTLT, ITL with percentiles) - Workload distribution (input/output token stats) - KV cache hit rates (server-reported GPU/CPU and theoretical infinite) - Throughput (total, per-GPU, input/output split) - Request success counts Wired into benchmark-tmpl.yml as "Process agentic result" step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…en unavailable) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add agentic-config output to get-jobs step - Filter agentic entries out of single-node bucket - Add test-sweep-agentic job group with scenario-type routing Enables running agentic benchmarks via: gh workflow run e2e-tests.yml --ref chore/agentx-integration \ -f generate-cli-command='test-config --config-keys dsr1-fp4-b200-vllm --config-files .github/configs/nvidia-master.yaml --scenario-type agentic-coding' Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vllm/vllm-openai:v0.19.1 is CUDA-only. MI355X needs vllm/vllm-openai-rocm:v0.19.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use ${!var_name:-} to avoid 'unbound variable' error when scripts use set -u (set -euo pipefail). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Agentic scripts expect RESULT_DIR but it wasn't set at the workflow level. Fixed-seq-len scripts set it internally via the runner, but agentic scripts need it from the environment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The trace-replay submodule wasn't being checked out, causing 'requirements.txt not found' errors on all agentic jobs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

initial plumming

c12bc12

github-project-automation bot added this to InferenceMAX Board Apr 20, 2026

github-code-quality bot found potential problems Apr 20, 2026

View reviewed changes

cquil11 and others added 17 commits April 20, 2026 16:13

fix: always include cache hit rate fields in agentic results (null wh…

b5b7f6f

…en unavailable) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: hardcode HF trace source to semianalysisai/cc-traces-weka-042026

1b3e92e

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: wait for agentic jobs before collect-results in e2e-tests

07c5ba9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: use ROCm vLLM image for MI355X agentic config

d5b023d

vllm/vllm-openai:v0.19.1 is CUDA-only. MI355X needs vllm/vllm-openai-rocm:v0.19.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: bump MI355X agentic image to vllm-openai-rocm:v0.19.1

1fc8373

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: handle unset vars in check_env_vars with set -u

c9058c0

Use ${!var_name:-} to avoid 'unbound variable' error when scripts use set -u (set -euo pipefail). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: enable submodule checkout in benchmark template

3502e64

The trace-replay submodule wasn't being checked out, causing 'requirements.txt not found' errors on all agentic jobs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] agentx integration#1103

[WIP] agentx integration#1103
cquil11 wants to merge 19 commits intomainfrom
chore/agentx-integration

cquil11 commented Apr 20, 2026

Uh oh!

github-actions bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cquil11 commented Apr 20, 2026

Uh oh!

github-actions bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant