Add inner benchmark metrics component#7831
Conversation
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-15 15:48:31
📋 Review 摘要
PR 概述:新增内置 Benchmark 性能指标监控组件,在推理服务运行时实时采集 TTFT/TPOT/E2EL 等关键时延指标并输出 JSONL 格式日志,附带格式化展示和绘图脚本。
变更范围:fastdeploy/metrics/(新增)、fastdeploy/config.py、fastdeploy/engine/common_engine.py、fastdeploy/output/token_processor.py、benchmarks/(新增)
影响面 Tag:[FDConfig] [Engine] [DataProcessor] [Benchmark]
📝 PR 规范检查
PR 标题缺少 Tag,且目标分支为 release/2.6(非 develop),需遵循 Cherry-Pick 格式;PR body 各段落均为空,仅保留模板占位文本。
标题建议(可直接复制):
`[Cherry-Pick][Benchmark] Add inner benchmark metrics component(#XXXX)`
⚠️ #XXXX请替换为该变更在develop分支对应的原始 PR 编号;若尚未向develop提交,请先提交到develop再 Cherry-Pick 到release/2.6。
PR 描述建议(可直接复制):
## Motivation
新增内置 Benchmark 性能指标监控组件,可在推理服务运行过程中实时采集并统计 TTFT、TPOT、E2EL 等关键性能指标,输出与 benchmark_serving.py 对齐的 JSONL 格式滚动窗口日志,无需额外外部基准测试工具即可持续监控服务性能。
## Modifications
- `fastdeploy/metrics/benchmark_metrics_logger.py`:新增 `BenchmarkMetricsLogger` 类,使用后台 daemon 线程进行滑动窗口统计并持续写入 JSONL 日志
- `fastdeploy/config.py`:新增 `BenchmarkMetricsConfig` 配置类,支持 `window_size`、`percentiles`、`metrics` 三个配置项;`FDConfig` 集成该配置
- `fastdeploy/engine/args_utils.py`:新增 `--benchmark-metrics-config` CLI 参数
- `fastdeploy/engine/common_engine.py`:根据配置实例化 `BenchmarkMetricsLogger` 并注入 `TokenProcessor`
- `fastdeploy/output/token_processor.py`:在请求完成时收集 ITL/TTFT/E2EL 等原始时序数据并上报给 logger
- `benchmarks/format_benchmark_metrics.py`:新增脚本,读取最新 JSONL 记录并格式化输出统计摘要
- `benchmarks/plot_benchmark_metrics.py`:新增脚本,读取全量 JSONL 记录并绘制趋势折线图
## Usage or Command
启动服务时开启 benchmark 指标采集:
```bash
python -m fastdeploy.serve --model <model_path> \
--benchmark-metrics-config '{"window_size": 100, "percentiles": "50,90,95,99", "metrics": "all"}'
```
查看指标摘要(默认读取 $FD_LOG_DIR/benchmark_metrics.jsonl):
```bash
python benchmarks/format_benchmark_metrics.py
```
绘制趋势图:
```bash
python benchmarks/plot_benchmark_metrics.py --output-dir ./benchmark_plots
```
## Accuracy Tests
N/A(本 PR 为性能监控组件,不涉及模型前向逻辑变更,无精度测试)
## Checklist
- [ ] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | fastdeploy/metrics/benchmark_metrics_logger.py:101 |
滑动窗口实现错误:_window.clear() 实现的是翻滚窗口而非滑动窗口,test_logger_sliding_window 测试将会失败 |
| 🟡 建议 | fastdeploy/engine/common_engine.py:215 |
缺少 shutdown() 生命周期管理,引擎停止时日志文件可能未被正确 flush/close |
| 🟡 建议 | benchmarks/format_benchmark_metrics.py:46 |
decode_speed key 写错,实际 key 为 s_decode,导致解码速度列无法显示单位后缀 |
| ❓ 疑问 | fastdeploy/metrics/benchmark_metrics_logger.py:157 |
datetime.now() 使用本地时区,分布式多节点环境中时区不一致(checklist §C 表层信号) |
总体评价
功能设计结构完整,测试覆盖较好,但滑动窗口核心逻辑存在 P0 Bug 导致单测无法通过,需修复后再合入。
| self._file.write(line + "\n") | ||
| if self.config.window_size > 0 and len(self._window) >= self.config.window_size: | ||
| self._window.clear() | ||
| self._file.flush() |
There was a problem hiding this comment.
🔴 Bug 滑动窗口实现错误,应使用 deque(maxlen=window_size) 而非 .clear()。
当前实现是翻滚窗口(每满 N 条就清空),而 window_size 文档描述为「Number of recent requests to aggregate」(保留最近 N 条),意图是滑动窗口。
实测:5 条请求、window_size=3 时,第 5 条记录对应 completed=2,但 test_logger_sliding_window 断言 completed==3 → 测试失败。
建议修复:
# __init__ 中替换初始化方式
self._window: deque = deque(maxlen=config.window_size) if config.window_size > 0 else deque()
# _process_pending 中删除以下两行
# if self.config.window_size > 0 and len(self._window) >= self.config.window_size:
# self._window.clear()| log_dir=envs.FD_LOG_DIR, | ||
| dp_rank=self.cfg.parallel_config.local_data_parallel_id, | ||
| ) | ||
| self.token_processor.set_benchmark_logger(self.benchmark_metrics_logger) |
There was a problem hiding this comment.
🟡 建议 缺少 shutdown() 生命周期管理。
BenchmarkMetricsLogger 在 __init__ 中打开文件并启动后台线程,但 common_engine 的析构/停止路径未调用 shutdown()。引擎正常退出时,_pending 中可能还有未处理记录,文件不会被显式 flush/close。
建议在 EngineService 的停止/析构方法中添加:
if hasattr(self, 'benchmark_metrics_logger') and self.benchmark_metrics_logger:
self.benchmark_metrics_logger.shutdown()| if not stats: | ||
| return | ||
| suffix = "(ms)" if is_time else "" | ||
| if key == "decode_speed": |
There was a problem hiding this comment.
🟡 建议 key == "decode_speed" 写错,实际传入的 key 是 "s_decode",该条件永远为 False。
导致调用 print_stat_block(data, "s_decode", "Decode", "解码速度(tok/s)", is_time=False) 时,suffix 保持为空字符串,输出的 Mean/Median/Pxx 行均无单位后缀。
建议修复:
if key == "s_decode":
suffix = "(tok/s)"|
|
||
| result: dict[str, Any] = { | ||
| "timestamp": datetime.now().isoformat(), | ||
| "window_size": self.config.window_size, |
There was a problem hiding this comment.
❓ 疑问 datetime.now() 使用本地时区。
在分布式多节点部署场景中,各节点时区可能不同,导致 JSONL 日志中时间戳不可比较。根据 checklist §C,建议使用 UTC 时间:
from datetime import timezone
# 替换为
datetime.now(timezone.utc).isoformat()
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览❌ 存在 1 个 Required 失败任务,需优先处理后才能合并。
2 任务状态汇总2.1 Required 任务:0/1 通过
2.2 可选任务 — 0/1 通过
3 失败详情(仅 Required)Approval — PR审批问题(置信度: 高)Approval
根因详情: 关键日志: 修复建议:
修复建议摘要: 请 FastDeploy RD 成员对 PR 执行 Approve 链接: 查看日志 |
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release/2.6 #7831 +/- ##
==============================================
Coverage ? 72.94%
==============================================
Files ? 382
Lines ? 54327
Branches ? 8493
==============================================
Hits ? 39627
Misses ? 11919
Partials ? 2781
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.