Skip to content

[Others]Benchmark compare skill#7803

Open
Linboyan-trc wants to merge 5 commits into
PaddlePaddle:developfrom
Linboyan-trc:benchmark_compare_skill
Open

[Others]Benchmark compare skill#7803
Linboyan-trc wants to merge 5 commits into
PaddlePaddle:developfrom
Linboyan-trc:benchmark_compare_skill

Conversation

@Linboyan-trc
Copy link
Copy Markdown

Motivation

在日常性能评估工作中,需要频繁对比 FastDeploy 与 SGLang 两个推理框架的性能表现。手动操作涉及环境安装、服务启动、健康检查、benchmark 执行、指标提取和报告生成等多个步骤,流程繁琐且易出错。本 PR 新增一个 Agent Skill(.claude/skills/benchmark-compare/),实现全流程自动化编排,支持通过自然语言或 /benchmark 命令一键完成性能对比测试并生成可视化 HTML 报告。

Modifications

新增 .claude/skills/benchmark-compare/ 目录,包含以下文件:

  • SKILL.md — 主技能定义,包含完整 12 步工作流编排、参数表、决策树和两种工作模式(全自动测试 / 仅生成报告)
  • README.md — 使用说明文档
  • scripts/launch_service.sh — 通用服务启动脚本,支持 FD/SG 两个框架和 single/TP/PD 多种部署模式
  • scripts/health_check.sh — 服务健康检查脚本,轮询 /v1/models 接口
  • scripts/run_benchmark.sh — Benchmark 执行封装脚本
  • scripts/extract_metrics.py — 从 benchmark 结果文件中提取核心指标(吞吐、延迟、TTFT 等)输出为 JSON
  • scripts/generate_report.py — 生成多模式可视化 HTML 对比报告
  • references/html_template.md — HTML 报告模板(含 CSS/JS 和占位符)
  • references/model_profiles.md — 模型推荐部署参数表

支持特性:

  • 单卡 / 多卡 TP / PD 分离等多种部署模式
  • BF16 / FP8 等量化方式
  • 自动 GPU 空闲检测和分配
  • 自动匹配 hyperparameter YAML 配置

Usage or Command

作为 Agent Skill 使用(在 Claude Code / Ducc 中):

# 方式 1: slash command
/benchmark

# 方式 2: 自然语言
帮我跑 benchmark,模型用 /path/to/GLM-4.7-Flash,TP=2,并发 64,开启 fp8 量化

# 方式 3: 仅从已有数据生成报告
帮我根据这些日志生成 HTML 对比报告

Copilot AI review requested due to automatic review settings May 13, 2026 07:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

本 PR 在仓库的 .claude/skills/ 下新增 benchmark-compare Skill,用于自动编排 FastDeploy 与 SGLang 的部署、压测、指标提取与 HTML 报告生成,目标是把日常性能对比流程“一键化”。

Changes:

  • 新增 benchmark 对比 Skill 的主流程编排文档(含两种工作模式、参数表、决策树与执行步骤)。
  • 新增服务启动/健康检查/benchmark 执行脚本,以及指标提取与 HTML 报告生成脚本。
  • 新增参考资料(模型 profile 表、HTML 模板规范)与使用说明 README。

PR 标题/描述检查(按仓库要求):

  • 标题基本符合 [CLASS]Title 形式,但当前标题里没有空格且语义偏口语;建议改为更明确的动作标题,例如 [Others] Add benchmark comparison skill(并去掉外层引号,如果只是展示格式导致的)。
  • 描述信息充分,已覆盖动机、改动点与用法;但需注意其中包含内网代理/路径等信息,可能不适合直接进入仓库文档(见已提的安全类 review comment)。

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
.claude/skills/benchmark-compare/SKILL.md Skill 主编排文档:全流程步骤、参数表、模式 A/B、调用脚本示例
.claude/skills/benchmark-compare/README.md Skill 使用说明与示例
.claude/skills/benchmark-compare/scripts/launch_service.sh 启动 FastDeploy/SGLang 服务的统一脚本(TP/DP/PD 等)
.claude/skills/benchmark-compare/scripts/health_check.sh 轮询 /v1/models 的健康检查脚本
.claude/skills/benchmark-compare/scripts/run_benchmark.sh benchmark_serving.py 封装执行脚本
.claude/skills/benchmark-compare/scripts/extract_metrics.py 从 benchmark 输出提取指标并生成 JSON(含对比计算)
.claude/skills/benchmark-compare/scripts/generate_report.py 从多场景数据/日志扫描生成交互式 HTML 报告
.claude/skills/benchmark-compare/references/model_profiles.md 模型推荐部署参数与 profile 匹配说明
.claude/skills/benchmark-compare/references/html_template.md HTML 报告模板/规范参考
Comments suppressed due to low confidence (2)

.claude/skills/benchmark-compare/scripts/run_benchmark.sh:132

  • 这里用 set -euo pipefail 后直接 eval "$CMD",一旦 benchmark 进程返回非 0,脚本会因 -e 立刻退出,导致后面的 EXIT_CODE/tail 逻辑基本不可达。建议在执行 benchmark 命令前临时关闭 -e 或用条件执行方式捕获退出码,确保失败时能打印最后日志并返回正确退出码。
echo "[INFO] 执行: $CMD"
echo "[INFO] 输出重定向到: $OUTPUT"

eval "$CMD" > "$OUTPUT" 2>&1
EXIT_CODE=$?

if [[ $EXIT_CODE -eq 0 ]]; then
    echo "[INFO] Benchmark [$LABEL] 完成"
else
    echo "[ERROR] Benchmark [$LABEL] 失败 (exit code: $EXIT_CODE)"
    echo "[ERROR] 最后 20 行输出:"
    tail -20 "$OUTPUT" 2>/dev/null || true
fi

.claude/skills/benchmark-compare/SKILL.md:349

  • extract_metrics.py 输出的 metrics.json 是 {model/config/raw_metrics/comparison} 结构,但步骤 11 直接把该文件作为 generate_report.py --data-json 输入;generate_report 期望的是 {quant}_bs{concurrency}: {fd:{...}, sg:{...}} 的场景映射,当前会在解析 key 时崩溃或生成错误报告。建议统一两者的数据协议:要么让 extract_metrics 支持输出 report 所需的场景格式(单场景也包一层 key),要么让 generate_report 兼容 extract_metrics 的输出结构。
```bash
python3 scripts/extract_metrics.py \
  --fd-result "$OUTPUT_DIR/$RESULT_FD" \
  --sg-result "$OUTPUT_DIR/$RESULT_SG" \
  --model-path "$MODEL_PATH" \
  --fd-config '{"gpu":"H800","tp":'$TP_SIZE',"concurrency":'$CONCURRENCY',"quantization":"'$QUANTIZATION'"}' \
  --sg-config '{"gpu":"H800","tp":'$TP_SIZE',"concurrency":'$CONCURRENCY',"quantization":"'$QUANTIZATION'"}' \
  --output "$OUTPUT_DIR/metrics.json"

Comment on lines +103 to +123
CMD+=" --hyperparameter-path $HYPERPARAMS"
CMD+=" --num-prompts $NUM_PROMPTS"
CMD+=" --max-concurrency $CONCURRENCY"
CMD+=" --percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len"
CMD+=" --metric-percentiles 80,95,99,99.9,99.95,99.99"
CMD+=" --save-result"

# 多实例模式
if [[ -n "$IP_LIST" ]]; then
CMD+=" --ip-list $IP_LIST"
fi

# 额外参数
if [[ -n "$EXTRA_ARGS" ]]; then
CMD+=" $EXTRA_ARGS"
fi

echo "[INFO] 执行: $CMD"
echo "[INFO] 输出重定向到: $OUTPUT"

eval "$CMD" > "$OUTPUT" 2>&1
Comment on lines +314 to +318
bash scripts/run_benchmark.sh \
--label fd \
--model "$MODEL_PATH" \
--port "$FD_PORT" \
--dataset "$DATASET_PATH" \
Comment on lines +130 to +136
所有涉及外网的操作(git clone、pip install)前必须设置:
```bash
export no_proxy=localhost,bj.bcebos.com,su.bcebos.com,pypi.tuna.tsinghua.edu.cn,paddle-ci.gz.bcebos.com,0.0.0.0,baidu-int.com,aliyun.com,127.0.0.1,.baidu.com,.bcebos.com
export http_proxy=http://agent.baidu.com:8891
export https_proxy=http://agent.baidu.com:8891
git config --global http.proxy http://agent.baidu.com:8891
git config --global https.proxy http://agent.baidu.com:8891
Comment on lines +159 to +161
# 量化
if [[ "$QUANTIZATION" != "none" ]]; then
CMD+=" --quantization $QUANTIZATION"
Comment on lines +533 to +540
const quantLabel = currentQuant === 'bf16' ? 'BF16' : currentQuant.toUpperCase();
document.getElementById('badge-quant').textContent = quantLabel;
document.getElementById('badge-bs').textContent = '\\u5e76\\u53d1 ' + currentBS;
document.getElementById('cfg-fd-bs').textContent = currentBS;
document.getElementById('cfg-sg-bs').textContent = currentBS;
document.getElementById('cfg-fd-quant').textContent = currentQuant === 'bf16' ? 'BF16' : 'Block-Wise FP8 (block_wise_fp8)';
document.getElementById('cfg-sg-quant').textContent = currentQuant === 'bf16' ? 'BF16' : 'FP8 (per-tensor)';
const d = getData();
Comment on lines +160 to +170
4 个 Chart.js 图表:

| 图表 | 类型 | X 轴 | 说明 |
|------|------|------|------|
| 吞吐量对比 | bar | Total Token, Output Token, Request (×scale) | scale 根据 request_throughput 大小决定 |
| 延迟对比 | bar | TTFT, TPOT (×10), ITL (×10), E2EL (/10) | 归一化到同一量级展示 |
| TTFT 分位 | line | Mean, Median, P80, P95, P99 | fill area |
| ITL 分位 | line | Mean, Median, P80, P95, P99 | fill area |

切换主题或切换数据时均需 `destroy()` 旧图表后重建。

Comment on lines +147 to +160
# 计算差异百分比 (FD 相对于 SG)
if sg_val != 0:
diff_pct = round((fd_val - sg_val) / sg_val * 100, 2)
else:
diff_pct = 0
entry["diff_pct"] = diff_pct

# 判断胜出方
if key in higher_is_better:
entry["winner"] = "fd" if fd_val > sg_val else "sg"
elif key in lower_is_better:
entry["winner"] = "fd" if fd_val < sg_val else "sg"
else:
entry["winner"] = "tie"
Comment on lines +13 to +20
```
根据benchmark_compare_skill,完成FastDeploy和SGLang性能测试对比:
模型:/root/paddlejob/share-storage/gpfs/system-public/changwenbin/models/GLM/GLM-4.7-Flash
数据集:/root/paddlejob/share-storage/gpfs/system-public/yangrongjin/Downloads/Dataset/20260302_browsecomp_plus_processed_num_830_fd.jsonl
并发:64,512
量化:不量化(BF16),FP8
使用GPU5和GPU6
```
PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 13, 2026

Thanks for your contribution!

@paddle-bot paddle-bot Bot added the contributor External developers label May 13, 2026
Copilot AI review requested due to automatic review settings May 13, 2026 16:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

.claude/skills/benchmark-compare/scripts/generate_report.py:539

  • 这里把 “非 bf16” 一律展示为 Block-Wise FP8 (block_wise_fp8) / FP8 (per-tensor),但参数表里实际允许 wint4 / wint8 等量化方式(见 SKILL.md 参数表与 model_profiles.md)。当 quant 是 wint4/wint8 时,FD/SG 配置卡片会被错误地显示成 FP8 文案,与实际跑的量化方式不符。建议根据 currentQuant 做完整的分支映射,或直接用 currentQuant.toUpperCase() 兜底,再针对已知特殊量化做专门展示。
    document.getElementById('cfg-fd-quant').textContent = currentQuant === 'bf16' ? 'BF16' : 'Block-Wise FP8 (block_wise_fp8)';
    document.getElementById('cfg-sg-quant').textContent = currentQuant === 'bf16' ? 'BF16' : 'FP8 (per-tensor)';

.claude/skills/benchmark-compare/scripts/launch_service.sh:243

  • 这里只把 block_wise_fp8 映射成 SGLang 的 fp8,但参数表里允许的 wint4 / wint8 是 FD 专有的量化名称,SGLang 并不接受,会直接传给 --quantization 导致服务启动失败。建议要么在脚本里对 SG 不支持的量化做提示/退出,要么映射到 SGLang 对应的 awq / gptq_marlin 等真实支持的值。
    if [[ "$QUANTIZATION" != "none" ]]; then
        local SG_QUANT="$QUANTIZATION"
        # 映射 FD 量化名到 SG 名
        if [[ "$SG_QUANT" == "block_wise_fp8" ]]; then
            SG_QUANT="fp8"
        fi
        CMD+=" --quantization $SG_QUANT"
    fi

Comment on lines +123 to +132
eval "$CMD" > "$OUTPUT" 2>&1
EXIT_CODE=$?

if [[ $EXIT_CODE -eq 0 ]]; then
echo "[INFO] Benchmark [$LABEL] 完成"
else
echo "[ERROR] Benchmark [$LABEL] 失败 (exit code: $EXIT_CODE)"
echo "[ERROR] 最后 20 行输出:"
tail -20 "$OUTPUT" 2>/dev/null || true
fi
Comment on lines +43 to +83
patterns = {
"successful_requests": r"Successful requests:\s+([\d.]+)",
"benchmark_duration": r"Benchmark duration \(s\):\s+([\d.]+)",
"total_input_tokens": r"Total input tokens:\s+([\d.]+)",
"total_generated_tokens": r"Total generated tokens:\s+([\d.]+)",
"request_throughput": r"Request throughput \(req/s\):\s+([\d.]+)",
"output_token_throughput": r"Output token throughput \(tok/s\):\s+([\d.]+)",
"total_token_throughput": r"Total Token throughput \(tok/s\):\s+([\d.]+)",
"mean_ttft": r"Mean TTFT \(ms\):\s+([\d.]+)",
"median_ttft": r"Median TTFT \(ms\):\s+([\d.]+)",
"p80_ttft": r"P80 TTFT \(ms\):\s+([\d.]+)",
"p95_ttft": r"P95 TTFT \(ms\):\s+([\d.]+)",
"p99_ttft": r"P99 TTFT \(ms\):\s+([\d.]+)",
"mean_tpot": r"Mean TPOT \(ms\):\s+([\d.]+)",
"median_tpot": r"Median TPOT \(ms\):\s+([\d.]+)",
"p80_tpot": r"P80 TPOT \(ms\):\s+([\d.]+)",
"p95_tpot": r"P95 TPOT \(ms\):\s+([\d.]+)",
"p99_tpot": r"P99 TPOT \(ms\):\s+([\d.]+)",
"mean_itl": r"Mean ITL \(ms\):\s+([\d.]+)",
"median_itl": r"Median ITL \(ms\):\s+([\d.]+)",
"p80_itl": r"P80 ITL \(ms\):\s+([\d.]+)",
"p95_itl": r"P95 ITL \(ms\):\s+([\d.]+)",
"p99_itl": r"P99 ITL \(ms\):\s+([\d.]+)",
"mean_e2el": r"Mean E2EL \(ms\):\s+([\d.]+)",
"median_e2el": r"Median E2EL \(ms\):\s+([\d.]+)",
"p80_e2el": r"P80 E2EL \(ms\):\s+([\d.]+)",
"p95_e2el": r"P95 E2EL \(ms\):\s+([\d.]+)",
"p99_e2el": r"P99 E2EL \(ms\):\s+([\d.]+)",
"mean_decode": r"Mean Decode \(tok/s\):\s+([\d.]+)",
"median_decode": r"Median Decode \(tok/s\):\s+([\d.]+)",
"p80_decode": r"P80 Decode \(tok/s\):\s+([\d.]+)",
"p95_decode": r"P95 Decode \(tok/s\):\s+([\d.]+)",
"p99_decode": r"P99 Decode \(tok/s\):\s+([\d.]+)",
}

for key, pattern in patterns.items():
match = re.search(pattern, content)
if match:
metrics[key] = float(match.group(1))

return metrics
# ============================================================
if lsof -i :"$PORT" &>/dev/null; then
echo "[INFO] 端口 $PORT 被占用,正在清理..."
kill $(lsof -t -i :"$PORT") 2>/dev/null || true
PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 13, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-15 20:43:11

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

⚠️ CI 数据暂时不可用:GitHub Actions API 请求超时,无法获取本次 PR 的 CI 运行状态。

可能原因:

  • GitHub API 临时不可用或网络超时
  • CI 任务尚未触发(PR 刚提交)
  • GitHub Actions 运行记录尚未就绪

建议操作

  1. 稍后查看 CI 详情页 确认 CI 是否已触发
  2. 若 CI 未触发,可尝试 /rebuild 或 push 一个空提交重新触发
  3. 本 Agent 将在下次调度时重新获取并更新此评论

2 任务状态汇总

状态 说明
🔴 数据不可用 GitHub API 超时,无法获取任务列表

3 失败详情

无(CI 数据不可用,无法分析)

PaddlePaddle-bot

This comment was marked as outdated.

Copilot AI review requested due to automatic review settings May 15, 2026 12:09
@Linboyan-trc Linboyan-trc force-pushed the benchmark_compare_skill branch from dee775a to 3f9531f Compare May 15, 2026 12:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-15 20:12:46

📋 Review 摘要

PR 概述:新增 .claude/skills/benchmark-compare/ Agent Skill,自动化编排 FastDeploy vs SGLang 推理框架性能对比测试全流程,支持一键生成可视化 HTML 报告。
变更范围.claude/skills/benchmark-compare/(SKILL.md、README.md、scripts/、references/)
影响面 Tag[Benchmark] [Others]

问题

级别 文件 概述
🟡 建议 scripts/extract_metrics.py:32 parse_benchmark_result 中 patterns dict 与 generate_report.py:parse_benchmark_log 完全重复(35+ 正则模式),违反 DRY 原则
❓ 疑问 scripts/extract_metrics.py:141 diff_pct 统一用 (fd-sg)/sg*100,与 html_template.md 延迟类公式 (sg-fd)/fd*100 语义相反

📝 PR 规范检查

存在两处问题:① 标题 [Others]Benchmark compare skill 中 tag 与正文间缺少空格,且 diff 内容更匹配官方 [Benchmark] tag;② PR body 缺少 ## Accuracy Tests## Checklist 两个必填 section。

标题建议(可直接复制):

  • [Benchmark] Add FastDeploy vs SGLang benchmark compare skill

PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):

## Motivation
在日常性能评估工作中,需要频繁对比 FastDeploy 与 SGLang 两个推理框架的性能表现。手动操作涉及环境安装、服务启动、健康检查、benchmark 执行、指标提取和报告生成等多个步骤,流程繁琐且易出错。本 PR 新增一个 Agent Skill(`.claude/skills/benchmark-compare/`),实现全流程自动化编排,支持通过自然语言或 `/benchmark` 命令一键完成性能对比测试并生成可视化 HTML 报告。

## Modifications
新增 `.claude/skills/benchmark-compare/` 目录,包含以下文件:

- **SKILL.md** — 主技能定义,包含完整 12 步工作流编排、参数表、决策树和两种工作模式(全自动测试 / 仅生成报告)
- **README.md** — 使用说明文档
- **scripts/launch_service.sh** — 通用服务启动脚本,支持 FD/SG 两个框架和 single/TP/PD 多种部署模式
- **scripts/health_check.sh** — 服务健康检查脚本,轮询 `/v1/models` 接口
- **scripts/run_benchmark.sh** — Benchmark 执行封装脚本
- **scripts/extract_metrics.py** — 从 benchmark 结果文件中提取核心指标(吞吐、延迟、TTFT 等)输出为 JSON
- **scripts/generate_report.py** — 生成多模式可视化 HTML 对比报告
- **references/html_template.md** — HTML 报告模板(含 CSS/JS 和占位符)
- **references/model_profiles.md** — 模型推荐部署参数表

## Usage or Command
```bash
# 方式 1: slash command
/benchmark

# 方式 2: 自然语言
帮我跑 benchmark,模型用 /path/to/GLM-4.7-Flash,TP=2,并发 64,开启 fp8 量化

# 方式 3: 仅从已有数据生成报告
帮我根据这些日志生成 HTML 对比报告
```

## Accuracy Tests
N/A(本 PR 为工具 Skill,不涉及模型精度变更)

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

整体工作流编排清晰、文档完善,HTML 报告设计细致。存在 patterns dict 重复定义的维护性问题和 diff_pct 语义一致性疑问,建议修复后合入。

with open(filepath, "r") as f:
content = f.read()

patterns = {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 parse_benchmark_result 中 35+ 条正则 patterns 与 generate_report.py:parse_benchmark_log(第 43 行)完全重复,违反 DRY 原则。

未来若 benchmark 输出格式变更(如新增指标或修改 label),需要在两处同步修改,容易遗漏造成不一致。

建议将公共 patterns 抽取到独立共享模块(如 scripts/metrics_patterns.py),由两个脚本 import 复用:

# scripts/metrics_patterns.py
BENCHMARK_PATTERNS = {
    "successful_requests": r"Successful requests:\s+([\d.]+)",
    # ... 其余 patterns
}

或直接将解析函数统一后在两个脚本中 import 调用。

for key in sorted(all_keys):
fd_val = fd_metrics.get(key)
sg_val = sg_metrics.get(key)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 diff_pct 统一使用 (fd - sg) / sg * 100,导致延迟类指标(lower is better)正值表示 FD 更慢(FD 落后),与 html_template.md 中约定方向相反:

延迟类指标(越低越好):diff = (sg - fd) / fd * 100,正值 → FD 领先 → 绿色

如果下游报告或脚本直接消费 comparison.json 中的 diff_pct 字段,可能误读胜负关系。建议与 html_template.md 对齐,或在字段注释中明确说明语义(如改名为 fd_over_sg_pct 并补充说明正负含义)。

with open(filepath, "r") as f:
content = f.read()

patterns = {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 parse_benchmark_log 的 patterns dict(第 43-79 行)与 extract_metrics.py:parse_benchmark_result 完全相同,约 40 行重复代码。

参见 extract_metrics.py:32 处评论,建议统一抽取为共享模块复用。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants