Skip to content

Add inner benchmark metrics component#7831

Open
Deleter-D wants to merge 1 commit into
PaddlePaddle:release/2.6from
Deleter-D:2.6_inner_benchmark
Open

Add inner benchmark metrics component#7831
Deleter-D wants to merge 1 commit into
PaddlePaddle:release/2.6from
Deleter-D:2.6_inner_benchmark

Conversation

@Deleter-D
Copy link
Copy Markdown
Collaborator

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-15 15:48:31

📋 Review 摘要

PR 概述:新增内置 Benchmark 性能指标监控组件,在推理服务运行时实时采集 TTFT/TPOT/E2EL 等关键时延指标并输出 JSONL 格式日志,附带格式化展示和绘图脚本。
变更范围fastdeploy/metrics/(新增)、fastdeploy/config.pyfastdeploy/engine/common_engine.pyfastdeploy/output/token_processor.pybenchmarks/(新增)
影响面 Tag[FDConfig] [Engine] [DataProcessor] [Benchmark]

📝 PR 规范检查

PR 标题缺少 Tag,且目标分支为 release/2.6(非 develop),需遵循 Cherry-Pick 格式;PR body 各段落均为空,仅保留模板占位文本。

标题建议(可直接复制):

  • `[Cherry-Pick][Benchmark] Add inner benchmark metrics component(#XXXX)`

⚠️ #XXXX 请替换为该变更在 develop 分支对应的原始 PR 编号;若尚未向 develop 提交,请先提交到 develop 再 Cherry-Pick 到 release/2.6

PR 描述建议(可直接复制):

## Motivation
新增内置 Benchmark 性能指标监控组件,可在推理服务运行过程中实时采集并统计 TTFT、TPOT、E2EL 等关键性能指标,输出与 benchmark_serving.py 对齐的 JSONL 格式滚动窗口日志,无需额外外部基准测试工具即可持续监控服务性能。

## Modifications
- `fastdeploy/metrics/benchmark_metrics_logger.py`:新增 `BenchmarkMetricsLogger` 类,使用后台 daemon 线程进行滑动窗口统计并持续写入 JSONL 日志
- `fastdeploy/config.py`:新增 `BenchmarkMetricsConfig` 配置类,支持 `window_size``percentiles``metrics` 三个配置项;`FDConfig` 集成该配置
- `fastdeploy/engine/args_utils.py`:新增 `--benchmark-metrics-config` CLI 参数
- `fastdeploy/engine/common_engine.py`:根据配置实例化 `BenchmarkMetricsLogger` 并注入 `TokenProcessor`
- `fastdeploy/output/token_processor.py`:在请求完成时收集 ITL/TTFT/E2EL 等原始时序数据并上报给 logger
- `benchmarks/format_benchmark_metrics.py`:新增脚本,读取最新 JSONL 记录并格式化输出统计摘要
- `benchmarks/plot_benchmark_metrics.py`:新增脚本,读取全量 JSONL 记录并绘制趋势折线图

## Usage or Command
启动服务时开启 benchmark 指标采集:
```bash
python -m fastdeploy.serve --model <model_path> \
    --benchmark-metrics-config '{"window_size": 100, "percentiles": "50,90,95,99", "metrics": "all"}'
```
查看指标摘要(默认读取 $FD_LOG_DIR/benchmark_metrics.jsonl):
```bash
python benchmarks/format_benchmark_metrics.py
```
绘制趋势图:
```bash
python benchmarks/plot_benchmark_metrics.py --output-dir ./benchmark_plots
```

## Accuracy Tests
N/A(本 PR 为性能监控组件,不涉及模型前向逻辑变更,无精度测试)

## Checklist

- [ ] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别 文件 概述
🔴 Bug fastdeploy/metrics/benchmark_metrics_logger.py:101 滑动窗口实现错误:_window.clear() 实现的是翻滚窗口而非滑动窗口,test_logger_sliding_window 测试将会失败
🟡 建议 fastdeploy/engine/common_engine.py:215 缺少 shutdown() 生命周期管理,引擎停止时日志文件可能未被正确 flush/close
🟡 建议 benchmarks/format_benchmark_metrics.py:46 decode_speed key 写错,实际 key 为 s_decode,导致解码速度列无法显示单位后缀
❓ 疑问 fastdeploy/metrics/benchmark_metrics_logger.py:157 datetime.now() 使用本地时区,分布式多节点环境中时区不一致(checklist §C 表层信号)

总体评价

功能设计结构完整,测试覆盖较好,但滑动窗口核心逻辑存在 P0 Bug 导致单测无法通过,需修复后再合入。

self._file.write(line + "\n")
if self.config.window_size > 0 and len(self._window) >= self.config.window_size:
self._window.clear()
self._file.flush()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 滑动窗口实现错误,应使用 deque(maxlen=window_size) 而非 .clear()

当前实现是翻滚窗口(每满 N 条就清空),而 window_size 文档描述为「Number of recent requests to aggregate」(保留最近 N 条),意图是滑动窗口

实测:5 条请求、window_size=3 时,第 5 条记录对应 completed=2,但 test_logger_sliding_window 断言 completed==3 → 测试失败。

建议修复:

# __init__ 中替换初始化方式
self._window: deque = deque(maxlen=config.window_size) if config.window_size > 0 else deque()

# _process_pending 中删除以下两行
# if self.config.window_size > 0 and len(self._window) >= self.config.window_size:
#     self._window.clear()

log_dir=envs.FD_LOG_DIR,
dp_rank=self.cfg.parallel_config.local_data_parallel_id,
)
self.token_processor.set_benchmark_logger(self.benchmark_metrics_logger)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 缺少 shutdown() 生命周期管理。

BenchmarkMetricsLogger__init__ 中打开文件并启动后台线程,但 common_engine 的析构/停止路径未调用 shutdown()。引擎正常退出时,_pending 中可能还有未处理记录,文件不会被显式 flush/close。

建议在 EngineService 的停止/析构方法中添加:

if hasattr(self, 'benchmark_metrics_logger') and self.benchmark_metrics_logger:
    self.benchmark_metrics_logger.shutdown()

if not stats:
return
suffix = "(ms)" if is_time else ""
if key == "decode_speed":
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 key == "decode_speed" 写错,实际传入的 key 是 "s_decode",该条件永远为 False。

导致调用 print_stat_block(data, "s_decode", "Decode", "解码速度(tok/s)", is_time=False) 时,suffix 保持为空字符串,输出的 Mean/Median/Pxx 行均无单位后缀。

建议修复:

if key == "s_decode":
    suffix = "(tok/s)"


result: dict[str, Any] = {
"timestamp": datetime.now().isoformat(),
"window_size": self.config.window_size,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 datetime.now() 使用本地时区。

在分布式多节点部署场景中,各节点时区可能不同,导致 JSONL 日志中时间戳不可比较。根据 checklist §C,建议使用 UTC 时间:

from datetime import timezone
# 替换为
datetime.now(timezone.utc).isoformat()

@PaddlePaddle-bot
Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-15 16:06:08

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

❌ 存在 1 个 Required 失败任务,需优先处理后才能合并。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
2(0) 2 0 2 0 0 0

⚠️ 注意:本次 CI 获取时有多个 Workflow 因 GitHub API 返回 502/504 错误未能成功拉取 Job 详情,上表仅统计成功获取的 2 个任务。

2 任务状态汇总

2.1 Required 任务:0/1 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Approval 8s PR审批问题:需 FastDeploy RD 成员 Approve 请 qingqing01/Jiang-Jia-Jun/heavengate Approve Job -

2.2 可选任务 — 0/1 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Trigger Jenkins for PR 1m2s Job -

3 失败详情(仅 Required)

Approval — PR审批问题(置信度: 高)

Approval

  • 状态: ❌ 失败
  • 错误类型: 代码规范
  • 置信度: 高
  • 根因摘要: PR审批检查失败,需FastDeploy RD成员批准
  • 分析器: 通用分析(fallback)

根因详情:
scripts/check_approval.sh 脚本检测到审批错误(exit code 6)。PR 目标分支为 release/2.6,属于 Cherry-Pick 场景,要求 PR 标题包含 [Cherry-Pick] 标记及原始 develop PR 编号,并且需要 FastDeploy RD 成员(qingqing01、Jiang-Jia-Jun 或 heavengate)对 PR 进行 Approve。当前检测到 1 个审批错误,尚未满足审批要求。

关键日志:

==> PR title: Add inner benchmark metrics component
0. Cherry-Pick PR must come from develop and the title must contain [Cherry-Pick]
   and the original develop PR number (e.g., #5010).
   Approval required from FastDeploy RD: qingqing01(dangqingqing),
   Jiang-Jia-Jun(jiangjiajun), heavengate(dengkaipeng).
There are 1 approved errors.
##[error]Process completed with exit code 6.

修复建议:

  1. 请 FastDeploy RD 成员(qingqing01 / Jiang-Jia-Jun / heavengate)在 GitHub 上对此 PR 执行 Approve Review
  2. 若此 PR 确为 Cherry-Pick,请确认 PR 标题格式包含 [Cherry-Pick] 及原始 develop PR 编号

修复建议摘要: 请 FastDeploy RD 成员对 PR 执行 Approve

链接: 查看日志

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 15, 2026

Thanks for your contribution!

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 48.63388% with 94 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@d02f3ba). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/metrics/benchmark_metrics_logger.py 45.13% 79 Missing ⚠️
fastdeploy/output/token_processor.py 18.18% 7 Missing and 2 partials ⚠️
fastdeploy/engine/args_utils.py 66.66% 2 Missing and 1 partial ⚠️
fastdeploy/engine/common_engine.py 50.00% 2 Missing ⚠️
fastdeploy/config.py 93.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             release/2.6    #7831   +/-   ##
==============================================
  Coverage               ?   72.94%           
==============================================
  Files                  ?      382           
  Lines                  ?    54327           
  Branches               ?     8493           
==============================================
  Hits                   ?    39627           
  Misses                 ?    11919           
  Partials               ?     2781           
Flag Coverage Δ
GPU 72.94% <48.63%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants