[Others]Benchmark compare skill by Linboyan-trc · Pull Request #7803 · PaddlePaddle/FastDeploy

Linboyan-trc · 2026-05-13T07:25:13Z

Motivation

在日常性能评估工作中，需要频繁对比 FastDeploy 与 SGLang 两个推理框架的性能表现。手动操作涉及环境安装、服务启动、健康检查、benchmark 执行、指标提取和报告生成等多个步骤，流程繁琐且易出错。本 PR 新增一个 Agent Skill（.claude/skills/benchmark-compare/），实现全流程自动化编排，支持通过自然语言或 /benchmark 命令一键完成性能对比测试并生成可视化 HTML 报告。

Modifications

新增 .claude/skills/benchmark-compare/ 目录，包含以下文件：

SKILL.md — 主技能定义，包含完整 12 步工作流编排、参数表、决策树和两种工作模式（全自动测试 / 仅生成报告）
README.md — 使用说明文档
scripts/launch_service.sh — 通用服务启动脚本，支持 FD/SG 两个框架和 single/TP/PD 多种部署模式
scripts/health_check.sh — 服务健康检查脚本，轮询 /v1/models 接口
scripts/run_benchmark.sh — Benchmark 执行封装脚本
scripts/extract_metrics.py — 从 benchmark 结果文件中提取核心指标（吞吐、延迟、TTFT 等）输出为 JSON
scripts/generate_report.py — 生成多模式可视化 HTML 对比报告
references/html_template.md — HTML 报告模板（含 CSS/JS 和占位符）
references/model_profiles.md — 模型推荐部署参数表

支持特性：

单卡 / 多卡 TP / PD 分离等多种部署模式
BF16 / FP8 等量化方式
自动 GPU 空闲检测和分配
自动匹配 hyperparameter YAML 配置

Usage or Command

作为 Agent Skill 使用（在 Claude Code / Ducc 中）：

# 方式 1: slash command
/benchmark

# 方式 2: 自然语言
帮我跑 benchmark，模型用 /path/to/GLM-4.7-Flash，TP=2，并发 64，开启 fp8 量化

# 方式 3: 仅从已有数据生成报告
帮我根据这些日志生成 HTML 对比报告

Copilot

Pull request overview

本 PR 在仓库的 .claude/skills/ 下新增 benchmark-compare Skill，用于自动编排 FastDeploy 与 SGLang 的部署、压测、指标提取与 HTML 报告生成，目标是把日常性能对比流程“一键化”。

Changes:

新增 benchmark 对比 Skill 的主流程编排文档（含两种工作模式、参数表、决策树与执行步骤）。
新增服务启动/健康检查/benchmark 执行脚本，以及指标提取与 HTML 报告生成脚本。
新增参考资料（模型 profile 表、HTML 模板规范）与使用说明 README。

PR 标题/描述检查（按仓库要求）：

标题基本符合 [CLASS]Title 形式，但当前标题里没有空格且语义偏口语；建议改为更明确的动作标题，例如 [Others] Add benchmark comparison skill（并去掉外层引号，如果只是展示格式导致的）。
描述信息充分，已覆盖动机、改动点与用法；但需注意其中包含内网代理/路径等信息，可能不适合直接进入仓库文档（见已提的安全类 review comment）。

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
.claude/skills/benchmark-compare/SKILL.md	Skill 主编排文档：全流程步骤、参数表、模式 A/B、调用脚本示例
.claude/skills/benchmark-compare/README.md	Skill 使用说明与示例
.claude/skills/benchmark-compare/scripts/launch_service.sh	启动 FastDeploy/SGLang 服务的统一脚本（TP/DP/PD 等）
.claude/skills/benchmark-compare/scripts/health_check.sh	轮询 `/v1/models` 的健康检查脚本
.claude/skills/benchmark-compare/scripts/run_benchmark.sh	benchmark_serving.py 封装执行脚本
.claude/skills/benchmark-compare/scripts/extract_metrics.py	从 benchmark 输出提取指标并生成 JSON（含对比计算）
.claude/skills/benchmark-compare/scripts/generate_report.py	从多场景数据/日志扫描生成交互式 HTML 报告
.claude/skills/benchmark-compare/references/model_profiles.md	模型推荐部署参数与 profile 匹配说明
.claude/skills/benchmark-compare/references/html_template.md	HTML 报告模板/规范参考

Comments suppressed due to low confidence (2)

.claude/skills/benchmark-compare/scripts/run_benchmark.sh:132

这里用 set -euo pipefail 后直接 eval "$CMD"，一旦 benchmark 进程返回非 0，脚本会因 -e 立刻退出，导致后面的 EXIT_CODE/tail 逻辑基本不可达。建议在执行 benchmark 命令前临时关闭 -e 或用条件执行方式捕获退出码，确保失败时能打印最后日志并返回正确退出码。

echo "[INFO] 执行: $CMD"
echo "[INFO] 输出重定向到: $OUTPUT"

eval "$CMD" > "$OUTPUT" 2>&1
EXIT_CODE=$?

if [[ $EXIT_CODE -eq 0 ]]; then
    echo "[INFO] Benchmark [$LABEL] 完成"
else
    echo "[ERROR] Benchmark [$LABEL] 失败 (exit code: $EXIT_CODE)"
    echo "[ERROR] 最后 20 行输出:"
    tail -20 "$OUTPUT" 2>/dev/null || true
fi

.claude/skills/benchmark-compare/SKILL.md:349

extract_metrics.py 输出的 metrics.json 是 {model/config/raw_metrics/comparison} 结构，但步骤 11 直接把该文件作为 generate_report.py --data-json 输入；generate_report 期望的是 {quant}_bs{concurrency}: {fd:{...}, sg:{...}} 的场景映射，当前会在解析 key 时崩溃或生成错误报告。建议统一两者的数据协议：要么让 extract_metrics 支持输出 report 所需的场景格式（单场景也包一层 key），要么让 generate_report 兼容 extract_metrics 的输出结构。

```bash
python3 scripts/extract_metrics.py \
  --fd-result "$OUTPUT_DIR/$RESULT_FD" \
  --sg-result "$OUTPUT_DIR/$RESULT_SG" \
  --model-path "$MODEL_PATH" \
  --fd-config '{"gpu":"H800","tp":'$TP_SIZE',"concurrency":'$CONCURRENCY',"quantization":"'$QUANTIZATION'"}' \
  --sg-config '{"gpu":"H800","tp":'$TP_SIZE',"concurrency":'$CONCURRENCY',"quantization":"'$QUANTIZATION'"}' \
  --output "$OUTPUT_DIR/metrics.json"

+CMD+=" --hyperparameter-path $HYPERPARAMS"
+CMD+=" --num-prompts $NUM_PROMPTS"
+CMD+=" --max-concurrency $CONCURRENCY"
+CMD+=" --percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len"
+CMD+=" --metric-percentiles 80,95,99,99.9,99.95,99.99"
+CMD+=" --save-result"
+
+# 多实例模式
+if [[ -n "$IP_LIST" ]]; then
+    CMD+=" --ip-list $IP_LIST"
+fi
+
+# 额外参数
+if [[ -n "$EXTRA_ARGS" ]]; then
+    CMD+=" $EXTRA_ARGS"
+fi
+
+echo "[INFO] 执行: $CMD"
+echo "[INFO] 输出重定向到: $OUTPUT"
+
+eval "$CMD" > "$OUTPUT" 2>&1


+bash scripts/run_benchmark.sh \
+  --label fd \
+  --model "$MODEL_PATH" \
+  --port "$FD_PORT" \
+  --dataset "$DATASET_PATH" \


+所有涉及外网的操作（git clone、pip install）前必须设置：
+```bash
+export no_proxy=localhost,bj.bcebos.com,su.bcebos.com,pypi.tuna.tsinghua.edu.cn,paddle-ci.gz.bcebos.com,0.0.0.0,baidu-int.com,aliyun.com,127.0.0.1,.baidu.com,.bcebos.com
+export http_proxy=http://agent.baidu.com:8891
+export https_proxy=http://agent.baidu.com:8891
+git config --global http.proxy http://agent.baidu.com:8891
+git config --global https.proxy http://agent.baidu.com:8891


+    # 量化
+    if [[ "$QUANTIZATION" != "none" ]]; then
+        CMD+=" --quantization $QUANTIZATION"


+    const quantLabel = currentQuant === 'bf16' ? 'BF16' : currentQuant.toUpperCase();
+    document.getElementById('badge-quant').textContent = quantLabel;
+    document.getElementById('badge-bs').textContent = '\\u5e76\\u53d1 ' + currentBS;
+    document.getElementById('cfg-fd-bs').textContent = currentBS;
+    document.getElementById('cfg-sg-bs').textContent = currentBS;
+    document.getElementById('cfg-fd-quant').textContent = currentQuant === 'bf16' ? 'BF16' : 'Block-Wise FP8 (block_wise_fp8)';
+    document.getElementById('cfg-sg-quant').textContent = currentQuant === 'bf16' ? 'BF16' : 'FP8 (per-tensor)';
+    const d = getData();


+4 个 Chart.js 图表：
+
+| 图表 | 类型 | X 轴 | 说明 |
+|------|------|------|------|
+| 吞吐量对比 | bar | Total Token, Output Token, Request (×scale) | scale 根据 request_throughput 大小决定 |
+| 延迟对比 | bar | TTFT, TPOT (×10), ITL (×10), E2EL (/10) | 归一化到同一量级展示 |
+| TTFT 分位 | line | Mean, Median, P80, P95, P99 | fill area |
+| ITL 分位 | line | Mean, Median, P80, P95, P99 | fill area |
+
+切换主题或切换数据时均需 `destroy()` 旧图表后重建。
+


+        # 计算差异百分比 (FD 相对于 SG)
+        if sg_val != 0:
+            diff_pct = round((fd_val - sg_val) / sg_val * 100, 2)
+        else:
+            diff_pct = 0
+        entry["diff_pct"] = diff_pct
+
+        # 判断胜出方
+        if key in higher_is_better:
+            entry["winner"] = "fd" if fd_val > sg_val else "sg"
+        elif key in lower_is_better:
+            entry["winner"] = "fd" if fd_val < sg_val else "sg"
+        else:
+            entry["winner"] = "tie"


+```
+根据benchmark_compare_skill，完成FastDeploy和SGLang性能测试对比：
+模型：/root/paddlejob/share-storage/gpfs/system-public/changwenbin/models/GLM/GLM-4.7-Flash
+数据集：/root/paddlejob/share-storage/gpfs/system-public/yangrongjin/Downloads/Dataset/20260302_browsecomp_plus_processed_num_830_fd.jsonl
+并发：64，512
+量化：不量化（BF16），FP8
+使用GPU5和GPU6
+```


paddle-bot · 2026-05-13T10:17:01Z

Thanks for your contribution!

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

.claude/skills/benchmark-compare/scripts/generate_report.py:539

这里把 “非 bf16” 一律展示为 Block-Wise FP8 (block_wise_fp8) / FP8 (per-tensor)，但参数表里实际允许 wint4 / wint8 等量化方式（见 SKILL.md 参数表与 model_profiles.md）。当 quant 是 wint4/wint8 时，FD/SG 配置卡片会被错误地显示成 FP8 文案，与实际跑的量化方式不符。建议根据 currentQuant 做完整的分支映射，或直接用 currentQuant.toUpperCase() 兜底，再针对已知特殊量化做专门展示。

    document.getElementById('cfg-fd-quant').textContent = currentQuant === 'bf16' ? 'BF16' : 'Block-Wise FP8 (block_wise_fp8)';
    document.getElementById('cfg-sg-quant').textContent = currentQuant === 'bf16' ? 'BF16' : 'FP8 (per-tensor)';

.claude/skills/benchmark-compare/scripts/launch_service.sh:243

这里只把 block_wise_fp8 映射成 SGLang 的 fp8，但参数表里允许的 wint4 / wint8 是 FD 专有的量化名称，SGLang 并不接受，会直接传给 --quantization 导致服务启动失败。建议要么在脚本里对 SG 不支持的量化做提示/退出，要么映射到 SGLang 对应的 awq / gptq_marlin 等真实支持的值。

    if [[ "$QUANTIZATION" != "none" ]]; then
        local SG_QUANT="$QUANTIZATION"
        # 映射 FD 量化名到 SG 名
        if [[ "$SG_QUANT" == "block_wise_fp8" ]]; then
            SG_QUANT="fp8"
        fi
        CMD+=" --quantization $SG_QUANT"
    fi

+eval "$CMD" > "$OUTPUT" 2>&1
+EXIT_CODE=$?
+
+if [[ $EXIT_CODE -eq 0 ]]; then
+    echo "[INFO] Benchmark [$LABEL] 完成"
+else
+    echo "[ERROR] Benchmark [$LABEL] 失败 (exit code: $EXIT_CODE)"
+    echo "[ERROR] 最后 20 行输出:"
+    tail -20 "$OUTPUT" 2>/dev/null || true
+fi


+    patterns = {
+        "successful_requests": r"Successful requests:\s+([\d.]+)",
+        "benchmark_duration": r"Benchmark duration \(s\):\s+([\d.]+)",
+        "total_input_tokens": r"Total input tokens:\s+([\d.]+)",
+        "total_generated_tokens": r"Total generated tokens:\s+([\d.]+)",
+        "request_throughput": r"Request throughput \(req/s\):\s+([\d.]+)",
+        "output_token_throughput": r"Output token throughput \(tok/s\):\s+([\d.]+)",
+        "total_token_throughput": r"Total Token throughput \(tok/s\):\s+([\d.]+)",
+        "mean_ttft": r"Mean TTFT \(ms\):\s+([\d.]+)",
+        "median_ttft": r"Median TTFT \(ms\):\s+([\d.]+)",
+        "p80_ttft": r"P80 TTFT \(ms\):\s+([\d.]+)",
+        "p95_ttft": r"P95 TTFT \(ms\):\s+([\d.]+)",
+        "p99_ttft": r"P99 TTFT \(ms\):\s+([\d.]+)",
+        "mean_tpot": r"Mean TPOT \(ms\):\s+([\d.]+)",
+        "median_tpot": r"Median TPOT \(ms\):\s+([\d.]+)",
+        "p80_tpot": r"P80 TPOT \(ms\):\s+([\d.]+)",
+        "p95_tpot": r"P95 TPOT \(ms\):\s+([\d.]+)",
+        "p99_tpot": r"P99 TPOT \(ms\):\s+([\d.]+)",
+        "mean_itl": r"Mean ITL \(ms\):\s+([\d.]+)",
+        "median_itl": r"Median ITL \(ms\):\s+([\d.]+)",
+        "p80_itl": r"P80 ITL \(ms\):\s+([\d.]+)",
+        "p95_itl": r"P95 ITL \(ms\):\s+([\d.]+)",
+        "p99_itl": r"P99 ITL \(ms\):\s+([\d.]+)",
+        "mean_e2el": r"Mean E2EL \(ms\):\s+([\d.]+)",
+        "median_e2el": r"Median E2EL \(ms\):\s+([\d.]+)",
+        "p80_e2el": r"P80 E2EL \(ms\):\s+([\d.]+)",
+        "p95_e2el": r"P95 E2EL \(ms\):\s+([\d.]+)",
+        "p99_e2el": r"P99 E2EL \(ms\):\s+([\d.]+)",
+        "mean_decode": r"Mean Decode \(tok/s\):\s+([\d.]+)",
+        "median_decode": r"Median Decode \(tok/s\):\s+([\d.]+)",
+        "p80_decode": r"P80 Decode \(tok/s\):\s+([\d.]+)",
+        "p95_decode": r"P95 Decode \(tok/s\):\s+([\d.]+)",
+        "p99_decode": r"P99 Decode \(tok/s\):\s+([\d.]+)",
+    }
+
+    for key, pattern in patterns.items():
+        match = re.search(pattern, content)
+        if match:
+            metrics[key] = float(match.group(1))
+
+    return metrics


+# ============================================================
+if lsof -i :"$PORT" &>/dev/null; then
+    echo "[INFO] 端口 $PORT 被占用，正在清理..."
+    kill $(lsof -t -i :"$PORT") 2>/dev/null || true


PaddlePaddle-bot · 2026-05-13T17:23:42Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-15 20:43:11

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 3f9531f
Merge base: —（获取失败）
CI 详情

1 任务总览

⚠️ CI 数据暂时不可用：GitHub Actions API 请求超时，无法获取本次 PR 的 CI 运行状态。

可能原因：

GitHub API 临时不可用或网络超时
CI 任务尚未触发（PR 刚提交）
GitHub Actions 运行记录尚未就绪

建议操作：

稍后查看 CI 详情页确认 CI 是否已触发
若 CI 未触发，可尝试 /rebuild 或 push 一个空提交重新触发
本 Agent 将在下次调度时重新获取并更新此评论

2 任务状态汇总

状态	说明
🔴 数据不可用	GitHub API 超时，无法获取任务列表

3 失败详情

无（CI 数据不可用，无法分析）

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-15 20:12:46

📋 Review 摘要

PR 概述：新增 .claude/skills/benchmark-compare/ Agent Skill，自动化编排 FastDeploy vs SGLang 推理框架性能对比测试全流程，支持一键生成可视化 HTML 报告。
变更范围：.claude/skills/benchmark-compare/（SKILL.md、README.md、scripts/、references/）
影响面 Tag：[Benchmark] [Others]

问题

级别	文件	概述
🟡 建议	`scripts/extract_metrics.py:32`	`parse_benchmark_result` 中 patterns dict 与 `generate_report.py:parse_benchmark_log` 完全重复（35+ 正则模式），违反 DRY 原则
❓ 疑问	`scripts/extract_metrics.py:141`	`diff_pct` 统一用 `(fd-sg)/sg100`，与 `html_template.md` 延迟类公式 `(sg-fd)/fd100` 语义相反

📝 PR 规范检查

存在两处问题：① 标题 [Others]Benchmark compare skill 中 tag 与正文间缺少空格，且 diff 内容更匹配官方 [Benchmark] tag；② PR body 缺少 ## Accuracy Tests 和 ## Checklist 两个必填 section。

标题建议（可直接复制）：

[Benchmark] Add FastDeploy vs SGLang benchmark compare skill

PR 描述建议（可直接复制，必须复刻 checklist §D2 模板的完整结构）：

## Motivation
在日常性能评估工作中，需要频繁对比 FastDeploy 与 SGLang 两个推理框架的性能表现。手动操作涉及环境安装、服务启动、健康检查、benchmark 执行、指标提取和报告生成等多个步骤，流程繁琐且易出错。本 PR 新增一个 Agent Skill（`.claude/skills/benchmark-compare/`），实现全流程自动化编排，支持通过自然语言或 `/benchmark` 命令一键完成性能对比测试并生成可视化 HTML 报告。

## Modifications
新增 `.claude/skills/benchmark-compare/` 目录，包含以下文件：

- **SKILL.md** — 主技能定义，包含完整 12 步工作流编排、参数表、决策树和两种工作模式（全自动测试 / 仅生成报告）
- **README.md** — 使用说明文档
- **scripts/launch_service.sh** — 通用服务启动脚本，支持 FD/SG 两个框架和 single/TP/PD 多种部署模式
- **scripts/health_check.sh** — 服务健康检查脚本，轮询 `/v1/models` 接口
- **scripts/run_benchmark.sh** — Benchmark 执行封装脚本
- **scripts/extract_metrics.py** — 从 benchmark 结果文件中提取核心指标（吞吐、延迟、TTFT 等）输出为 JSON
- **scripts/generate_report.py** — 生成多模式可视化 HTML 对比报告
- **references/html_template.md** — HTML 报告模板（含 CSS/JS 和占位符）
- **references/model_profiles.md** — 模型推荐部署参数表

## Usage or Command
```bash
# 方式 1: slash command
/benchmark

# 方式 2: 自然语言
帮我跑 benchmark，模型用 /path/to/GLM-4.7-Flash，TP=2，并发 64，开启 fp8 量化

# 方式 3: 仅从已有数据生成报告
帮我根据这些日志生成 HTML 对比报告
```

## Accuracy Tests
N/A（本 PR 为工具 Skill，不涉及模型精度变更）

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

整体工作流编排清晰、文档完善，HTML 报告设计细致。存在 patterns dict 重复定义的维护性问题和 diff_pct 语义一致性疑问，建议修复后合入。

PaddlePaddle-bot · 2026-05-15T12:18:47Z

+    with open(filepath, "r") as f:
+        content = f.read()
+
+    patterns = {


🟡 建议 parse_benchmark_result 中 35+ 条正则 patterns 与 generate_report.py:parse_benchmark_log（第 43 行）完全重复，违反 DRY 原则。

未来若 benchmark 输出格式变更（如新增指标或修改 label），需要在两处同步修改，容易遗漏造成不一致。

建议将公共 patterns 抽取到独立共享模块（如 scripts/metrics_patterns.py），由两个脚本 import 复用：

# scripts/metrics_patterns.py BENCHMARK_PATTERNS = { "successful_requests": r"Successful requests:\s+([\d.]+)", # ... 其余 patterns }

或直接将解析函数统一后在两个脚本中 import 调用。

PaddlePaddle-bot · 2026-05-15T12:18:47Z

+    for key in sorted(all_keys):
+        fd_val = fd_metrics.get(key)
+        sg_val = sg_metrics.get(key)
+


❓ 疑问 diff_pct 统一使用 (fd - sg) / sg * 100，导致延迟类指标（lower is better）正值表示 FD 更慢（FD 落后），与 html_template.md 中约定方向相反：

延迟类指标（越低越好）：diff = (sg - fd) / fd * 100，正值 → FD 领先 → 绿色

如果下游报告或脚本直接消费 comparison.json 中的 diff_pct 字段，可能误读胜负关系。建议与 html_template.md 对齐，或在字段注释中明确说明语义（如改名为 fd_over_sg_pct 并补充说明正负含义）。

PaddlePaddle-bot · 2026-05-15T12:18:47Z

+    with open(filepath, "r") as f:
+        content = f.read()
+
+    patterns = {


🟡 建议 parse_benchmark_log 的 patterns dict（第 43-79 行）与 extract_metrics.py:parse_benchmark_result 完全相同，约 40 行重复代码。

参见 extract_metrics.py:32 处评论，建议统一抽取为共享模块复用。

Copilot AI review requested due to automatic review settings May 13, 2026 07:25

Linboyan-trc temporarily deployed to Metax_ci May 13, 2026 07:25 — with GitHub Actions Inactive

Copilot started reviewing on behalf of Linboyan-trc May 13, 2026 07:25 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Linboyan-trc temporarily deployed to Metax_ci May 13, 2026 07:53 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

paddle-bot Bot added the contributor External developers label May 13, 2026

Copilot AI review requested due to automatic review settings May 13, 2026 16:49

Linboyan-trc temporarily deployed to Metax_ci May 13, 2026 16:50 — with GitHub Actions Inactive

Copilot started reviewing on behalf of Linboyan-trc May 13, 2026 16:50 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

Linboyan-trc force-pushed the benchmark_compare_skill branch from e694aff to dee775a Compare May 15, 2026 11:34

Linboyan-trc temporarily deployed to Metax_ci May 15, 2026 11:34 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

Linboyan-trc added 5 commits May 15, 2026 20:09

add benchmark compare skill

0904ac7

update benchmark compare skill

e1babbc

modify benchmark compare skill

d792891

modify benchmark compare skill

f067475

modify benchmark skill

3f9531f

Copilot AI review requested due to automatic review settings May 15, 2026 12:09

Linboyan-trc force-pushed the benchmark_compare_skill branch from dee775a to 3f9531f Compare May 15, 2026 12:09

Linboyan-trc temporarily deployed to Metax_ci May 15, 2026 12:09 — with GitHub Actions Inactive

Copilot started reviewing on behalf of Linboyan-trc May 15, 2026 12:10 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

PaddlePaddle-bot reviewed May 15, 2026

View reviewed changes

Conversation

Linboyan-trc commented May 13, 2026

Motivation

Modifications

Usage or Command

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

paddle-bot Bot commented May 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

3 失败详情

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PaddlePaddle-bot commented May 13, 2026 •

edited

Loading