Add inner benchmark metrics component by Deleter-D · Pull Request #7831 · PaddlePaddle/FastDeploy

Deleter-D · 2026-05-15T07:36:00Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-15 15:48:31

📋 Review 摘要

PR 概述：新增内置 Benchmark 性能指标监控组件，在推理服务运行时实时采集 TTFT/TPOT/E2EL 等关键时延指标并输出 JSONL 格式日志，附带格式化展示和绘图脚本。
变更范围：fastdeploy/metrics/（新增）、fastdeploy/config.py、fastdeploy/engine/common_engine.py、fastdeploy/output/token_processor.py、benchmarks/（新增）
影响面 Tag：[FDConfig] [Engine] [DataProcessor] [Benchmark]

📝 PR 规范检查

PR 标题缺少 Tag，且目标分支为 release/2.6（非 develop），需遵循 Cherry-Pick 格式；PR body 各段落均为空，仅保留模板占位文本。

标题建议（可直接复制）：

`[Cherry-Pick][Benchmark] Add inner benchmark metrics component(#XXXX)`

⚠️ #XXXX 请替换为该变更在 develop 分支对应的原始 PR 编号；若尚未向 develop 提交，请先提交到 develop 再 Cherry-Pick 到 release/2.6。

PR 描述建议（可直接复制）：

## Motivation
新增内置 Benchmark 性能指标监控组件，可在推理服务运行过程中实时采集并统计 TTFT、TPOT、E2EL 等关键性能指标，输出与 benchmark_serving.py 对齐的 JSONL 格式滚动窗口日志，无需额外外部基准测试工具即可持续监控服务性能。

## Modifications
- `fastdeploy/metrics/benchmark_metrics_logger.py`：新增 `BenchmarkMetricsLogger` 类，使用后台 daemon 线程进行滑动窗口统计并持续写入 JSONL 日志
- `fastdeploy/config.py`：新增 `BenchmarkMetricsConfig` 配置类，支持 `window_size`、`percentiles`、`metrics` 三个配置项；`FDConfig` 集成该配置
- `fastdeploy/engine/args_utils.py`：新增 `--benchmark-metrics-config` CLI 参数
- `fastdeploy/engine/common_engine.py`：根据配置实例化 `BenchmarkMetricsLogger` 并注入 `TokenProcessor`
- `fastdeploy/output/token_processor.py`：在请求完成时收集 ITL/TTFT/E2EL 等原始时序数据并上报给 logger
- `benchmarks/format_benchmark_metrics.py`：新增脚本，读取最新 JSONL 记录并格式化输出统计摘要
- `benchmarks/plot_benchmark_metrics.py`：新增脚本，读取全量 JSONL 记录并绘制趋势折线图

## Usage or Command
启动服务时开启 benchmark 指标采集：
```bash
python -m fastdeploy.serve --model <model_path> \
    --benchmark-metrics-config '{"window_size": 100, "percentiles": "50,90,95,99", "metrics": "all"}'
```
查看指标摘要（默认读取 $FD_LOG_DIR/benchmark_metrics.jsonl）：
```bash
python benchmarks/format_benchmark_metrics.py
```
绘制趋势图：
```bash
python benchmarks/plot_benchmark_metrics.py --output-dir ./benchmark_plots
```

## Accuracy Tests
N/A（本 PR 为性能监控组件，不涉及模型前向逻辑变更，无精度测试）

## Checklist

- [ ] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别	文件	概述
🔴 Bug	`fastdeploy/metrics/benchmark_metrics_logger.py:101`	滑动窗口实现错误：`_window.clear()` 实现的是翻滚窗口而非滑动窗口，`test_logger_sliding_window` 测试将会失败
🟡 建议	`fastdeploy/engine/common_engine.py:215`	缺少 `shutdown()` 生命周期管理，引擎停止时日志文件可能未被正确 flush/close
🟡 建议	`benchmarks/format_benchmark_metrics.py:46`	`decode_speed` key 写错，实际 key 为 `s_decode`，导致解码速度列无法显示单位后缀
❓ 疑问	`fastdeploy/metrics/benchmark_metrics_logger.py:157`	`datetime.now()` 使用本地时区，分布式多节点环境中时区不一致（checklist §C 表层信号）

总体评价

功能设计结构完整，测试覆盖较好，但滑动窗口核心逻辑存在 P0 Bug 导致单测无法通过，需修复后再合入。

PaddlePaddle-bot · 2026-05-15T07:52:04Z

+            self._file.write(line + "\n")
+            if self.config.window_size > 0 and len(self._window) >= self.config.window_size:
+                self._window.clear()
+        self._file.flush()


🔴 Bug 滑动窗口实现错误，应使用 deque(maxlen=window_size) 而非 .clear()。

当前实现是翻滚窗口（每满 N 条就清空），而 window_size 文档描述为「Number of recent requests to aggregate」（保留最近 N 条），意图是滑动窗口。

实测：5 条请求、window_size=3 时，第 5 条记录对应 completed=2，但 test_logger_sliding_window 断言 completed==3 → 测试失败。

建议修复：

# __init__ 中替换初始化方式 self._window: deque = deque(maxlen=config.window_size) if config.window_size > 0 else deque() # _process_pending 中删除以下两行 # if self.config.window_size > 0 and len(self._window) >= self.config.window_size: # self._window.clear()

PaddlePaddle-bot · 2026-05-15T07:52:04Z

+                log_dir=envs.FD_LOG_DIR,
+                dp_rank=self.cfg.parallel_config.local_data_parallel_id,
+            )
+            self.token_processor.set_benchmark_logger(self.benchmark_metrics_logger)


🟡 建议 缺少 shutdown() 生命周期管理。

BenchmarkMetricsLogger 在 __init__ 中打开文件并启动后台线程，但 common_engine 的析构/停止路径未调用 shutdown()。引擎正常退出时，_pending 中可能还有未处理记录，文件不会被显式 flush/close。

建议在 EngineService 的停止/析构方法中添加：

if hasattr(self, 'benchmark_metrics_logger') and self.benchmark_metrics_logger: self.benchmark_metrics_logger.shutdown()

PaddlePaddle-bot · 2026-05-15T07:52:04Z

+    if not stats:
+        return
+    suffix = "(ms)" if is_time else ""
+    if key == "decode_speed":


🟡 建议 key == "decode_speed" 写错，实际传入的 key 是 "s_decode"，该条件永远为 False。

导致调用 print_stat_block(data, "s_decode", "Decode", "解码速度(tok/s)", is_time=False) 时，suffix 保持为空字符串，输出的 Mean/Median/Pxx 行均无单位后缀。

建议修复：

if key == "s_decode": suffix = "(tok/s)"

PaddlePaddle-bot · 2026-05-15T07:52:04Z

+
+        result: dict[str, Any] = {
+            "timestamp": datetime.now().isoformat(),
+            "window_size": self.config.window_size,


❓ 疑问 datetime.now() 使用本地时区。

在分布式多节点部署场景中，各节点时区可能不同，导致 JSONL 日志中时间戳不可比较。根据 checklist §C，建议使用 UTC 时间：

from datetime import timezone # 替换为 datetime.now(timezone.utc).isoformat()

PaddlePaddle-bot · 2026-05-15T08:07:17Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-15 16:06:08

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 2b635b8
Merge base: d02f3ba (branch: release/2.6)
查看完整 Diff
CI 详情

1 任务总览

❌ 存在 1 个 Required 失败任务，需优先处理后才能合并。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
2(0)	2	0	2	0	0	0

⚠️ 注意：本次 CI 获取时有多个 Workflow 因 GitHub API 返回 502/504 错误未能成功拉取 Job 详情，上表仅统计成功获取的 2 个任务。

2 任务状态汇总

2.1 Required 任务：0/1 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Approval`	8s	PR审批问题：需 FastDeploy RD 成员 Approve	请 qingqing01/Jiang-Jia-Jun/heavengate Approve	Job	-

2.2 可选任务 — 0/1 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Trigger Jenkins for PR`	1m2s	Job	-

3 失败详情（仅 Required）

Approval — PR审批问题（置信度: 高）

Approval

状态: ❌ 失败
错误类型: 代码规范
置信度: 高
根因摘要: PR审批检查失败，需FastDeploy RD成员批准
分析器: 通用分析(fallback)

根因详情:
scripts/check_approval.sh 脚本检测到审批错误（exit code 6）。PR 目标分支为 release/2.6，属于 Cherry-Pick 场景，要求 PR 标题包含 [Cherry-Pick] 标记及原始 develop PR 编号，并且需要 FastDeploy RD 成员（qingqing01、Jiang-Jia-Jun 或 heavengate）对 PR 进行 Approve。当前检测到 1 个审批错误，尚未满足审批要求。

关键日志:

==> PR title: Add inner benchmark metrics component
0. Cherry-Pick PR must come from develop and the title must contain [Cherry-Pick]
   and the original develop PR number (e.g., #5010).
   Approval required from FastDeploy RD: qingqing01(dangqingqing),
   Jiang-Jia-Jun(jiangjiajun), heavengate(dengkaipeng).
There are 1 approved errors.
##[error]Process completed with exit code 6.

修复建议:

请 FastDeploy RD 成员（qingqing01 / Jiang-Jia-Jun / heavengate）在 GitHub 上对此 PR 执行 Approve Review
若此 PR 确为 Cherry-Pick，请确认 PR 标题格式包含 [Cherry-Pick] 及原始 develop PR 编号

修复建议摘要: 请 FastDeploy RD 成员对 PR 执行 Approve

链接: 查看日志

paddle-bot · 2026-05-15T08:28:57Z

Thanks for your contribution!

codecov-commenter · 2026-05-15T09:27:50Z

Codecov Report

❌ Patch coverage is 48.63388% with 94 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@d02f3ba). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/metrics/benchmark_metrics_logger.py	45.13%	79 Missing ⚠️
fastdeploy/output/token_processor.py	18.18%	7 Missing and 2 partials ⚠️
fastdeploy/engine/args_utils.py	66.66%	2 Missing and 1 partial ⚠️
fastdeploy/engine/common_engine.py	50.00%	2 Missing ⚠️
fastdeploy/config.py	93.33%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             release/2.6    #7831   +/-   ##
==============================================
  Coverage               ?   72.94%           
==============================================
  Files                  ?      382           
  Lines                  ?    54327           
  Branches               ?     8493           
==============================================
  Hits                   ?    39627           
  Misses                 ?    11919           
  Partials               ?     2781

Flag	Coverage Δ
GPU	`72.94% <48.63%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add inner benchmark metrics component

2b635b8

Deleter-D had a problem deploying to Metax_ci May 15, 2026 07:36 — with GitHub Actions Failure

PaddlePaddle-bot suggested changes May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add inner benchmark metrics component#7831

Add inner benchmark metrics component#7831
Deleter-D wants to merge 1 commit into
PaddlePaddle:release/2.6from
Deleter-D:2.6_inner_benchmark

Deleter-D commented May 15, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot May 15, 2026

Uh oh!

PaddlePaddle-bot May 15, 2026

Uh oh!

PaddlePaddle-bot May 15, 2026

Uh oh!

PaddlePaddle-bot May 15, 2026

Uh oh!

PaddlePaddle-bot commented May 15, 2026

Approval

Uh oh!

paddle-bot Bot commented May 15, 2026

Uh oh!

codecov-commenter commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Deleter-D commented May 15, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

PaddlePaddle-bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot commented May 15, 2026

1 任务总览

2 任务状态汇总

2.1 Required 任务：0/1 通过

2.2 可选任务 — 0/1 通过

3 失败详情（仅 Required）

Approval

Uh oh!

paddle-bot Bot commented May 15, 2026

Uh oh!

codecov-commenter commented May 15, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants