Skip to content

feat(metrics): add model_name label and new throughput/cache metrics#1344

Merged
shihaobai merged 5 commits into
mainfrom
feat/metrics-model-name-and-new-metrics
Jun 11, 2026
Merged

feat(metrics): add model_name label and new throughput/cache metrics#1344
shihaobai merged 5 commits into
mainfrom
feat/metrics-model-name-and-new-metrics

Conversation

@sufubao

@sufubao sufubao commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

Ported metrics improvements from the qwen35 branch to main.

Changes

lightllm/server/metrics/metrics.py

  • 所有 Prometheus 指标(Histogram、Counter、Gauge)统一增加 model_name label,支持多模型部署时区分各模型的监控数据
  • 新增 counter_inc_by(name, amount) 方法,支持按任意数量递增计数器
  • 新增 5 个监控指标:
    • lightllm_prompt_tokens_total — 累计 prefill token 数
    • lightllm_generation_tokens_total — 累计 generation token 数
    • lightllm_cache_hit_rate — 前缀缓存命中率
    • lightllm_gen_throughput — 生成吞吐量(tokens/s)
    • lightllm_num_running_reqs — 当前运行中的请求数

lightllm/server/metrics/manager.py

  • MetricServer 新增 exposed_counter_inc_by RPC 方法
  • MetricClient 新增 counter_inc_by 异步调用方法

Test plan

  • 启动服务后访问 /metrics 端点,确认新指标出现且带有 model_name label
  • 发送请求后确认 lightllm_prompt_tokens_total / lightllm_generation_tokens_total 正确递增
  • 验证旧有指标(如 lightllm_request_duration)的 label 兼容性

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several new metrics (such as prompt/generation token totals, cache hit rate, throughput, and running requests) and updates the metrics manager to support incrementing counters by a specific amount. Additionally, all metrics are updated to include a model_name label. The review feedback highlights a critical issue where args.model_name could be None or missing, which would cause runtime errors in prometheus_client when updating metrics. It is recommended to use a safe fallback value like 'unknown' to prevent crashes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

self.init_metrics(args)

def init_metrics(self, args):
self.model_name = args.model_name

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If args.model_name is None (e.g., if the --model_name argument is not provided at startup) or if the attribute does not exist on args, self.model_name will be None or raise an AttributeError. In prometheus_client, passing None as a label value (e.g., model_name=None) will raise a ValueError: Invalid label value: None at runtime when any metric is updated, crashing the metric server or the background metric thread. To prevent this, use getattr with a safe fallback string like 'unknown'.

Suggested change
self.model_name = args.model_name
self.model_name = getattr(args, "model_name", None) or "unknown"

@sufubao sufubao force-pushed the feat/metrics-model-name-and-new-metrics branch 3 times, most recently from 7157c37 to cc5a4df Compare June 11, 2026 08:24
- Add model_name label to all Prometheus metrics (histograms, counters, gauges)
  so metrics can be distinguished when multiple models are served
- Add counter_inc_by() method to Monitor, MetricServer and MetricClient
  for incrementing counters by arbitrary amounts
- Add new metrics:
  - lightllm_prompt_tokens_total: total prefill tokens processed
  - lightllm_generation_tokens_total: total generation tokens processed
  - lightllm_cache_hit_rate: prefix cache hit rate
  - lightllm_gen_throughput: generation throughput (tokens/s)
  - lightllm_num_running_reqs: number of running requests

Ported from qwen35 branch.
@sufubao sufubao force-pushed the feat/metrics-model-name-and-new-metrics branch from cc5a4df to 075b405 Compare June 11, 2026 08:27
sufubao and others added 4 commits June 11, 2026 08:42
Port the metric-reporting part of qwen35's SystemStatusReporter so the
new metrics actually receive values on main:

- lightllm_prompt_tokens_total: incremented with batch.input_tokens()
  when a prefill batch is dispatched
- lightllm_generation_tokens_total: incremented per decode step with
  the number of running requests
- lightllm_cache_hit_rate / lightllm_gen_throughput /
  lightllm_num_running_reqs: gauges refreshed every log_stats_interval
  seconds (min 5s), same cadence and semantics as the qwen35 branch

Unlike qwen35, main's existing router logging is left untouched; only
the /metrics reporting is ported.
Previously the gauge was set inside the per-dp debug loop, so in
multi-dp deployments it only held the last dp's paused count. Align
with qwen35 by reporting the total via _get_paused_req_num().
@shihaobai shihaobai merged commit d471c21 into main Jun 11, 2026
1 check passed
@shihaobai shihaobai deleted the feat/metrics-model-name-and-new-metrics branch June 11, 2026 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants