add monitor by sufubao · Pull Request #1360 · ModelTC/LightLLM

sufubao · 2026-06-16T11:17:52Z

python ./tools/lightllm_monitor.py

gemini-code-assist

Code Review

This pull request introduces lightllm_monitor.py, a real-time terminal dashboard for monitoring LightLLM metrics using the rich library. Feedback on the implementation highlights a critical bug where filtering metrics ending in _count or _sum inadvertently discards independent counters like lightllm_request_count and lightllm_batch_inference_count. Additionally, reviewers recommended using a regex-based parser to robustly handle commas in Prometheus label values, and extracting a shared calculate_rate helper function to eliminate duplicate rate calculation logic and handle counter resets gracefully.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-16T11:19:52Z

+        if name.endswith("_bucket"):
+            base = name[: -len("_bucket")]
+            le = labels.get("le", "+Inf")
+            le_f = float("inf") if le == "+Inf" else float(le)
+            hist[base][le_f] = hist[base].get(le_f, 0.0) + value
+        elif name.endswith("_sum") or name.endswith("_count"):
+            continue  # 分位数用 bucket 算, rate 用 base counter, 不需要 sum/count
+        else:
+            scalars[name] += value


Critical Bug: The check name.endswith("_sum") or name.endswith("_count") incorrectly filters out actual independent counters like lightllm_request_count (displayed as req total) and lightllm_batch_inference_count (displayed as infer steps). As a result, these metrics will always display as — on the dashboard.

Since keeping the histogram _sum and _count metrics in the scalars dictionary is harmless (they are simply ignored during rendering), you can safely remove this elif block entirely to ensure all independent counters are correctly aggregated.

if name.endswith("_bucket"): base = name[: -len("_bucket")] le = labels.get("le", "+Inf") le_f = float("inf") if le == "+Inf" else float(le) hist[base][le_f] = hist[base].get(le_f, 0.0) + value else: scalars[name] += value

gemini-code-assist · 2026-06-16T11:19:53Z

+        labels = {}
+        if labels_str:
+            for kv in labels_str.split(","):
+                if "=" in kv:
+                    k, v = kv.split("=", 1)
+                    labels[k.strip()] = v.strip().strip('"')


Robustness Issue: Splitting labels_str by , to parse Prometheus labels will fail if any label value contains a comma (e.g., a prompt or metadata string containing a comma).

Using a regular expression to extract key="value" pairs is much more robust and correctly handles commas and escaped quotes within label values.

labels = {} if labels_str: import re for k, v in re.findall(r'([a-zA-Z_][a-zA-Z0-9_]*)\s*=\s*"((?:[^"\\]|\\.)*)"', labels_str): labels[k] = v.replace('\\"', '"').replace('\\\\', '\\')

gemini-code-assist · 2026-06-16T11:19:53Z

+def fmt_float(x, prec=1):
+    if x is None:
+        return "[dim]—[/dim]"
+    return f"{x:.{prec}f}"
+


Improvement: To avoid duplicating the rate calculation logic between build_panel and main(), and to robustly handle counter resets (e.g., when the LightLLM server restarts and counters reset to 0), we can introduce a shared calculate_rate helper function.

Suggested change

def fmt_float(x, prec=1):

if x is None:

return "[dim]—[/dim]"

return f"{x:.{prec}f}"

def fmt_float(x, prec=1):

if x is None:

return "[dim]—[/dim]"

return f"{x:.{prec}f}"

def calculate_rate(name, prev, scalars, now):

if name in prev and name in scalars:

pv, pt = prev[name]

dt = now - pt

if dt > 0:

diff = scalars[name] - pv

return diff / dt if diff >= 0 else 0.0

return None

gemini-code-assist · 2026-06-16T11:19:53Z

+    # —— rate helper (相邻两次 counter 差值 / dt) ——
+    def rate(name):
+        if name in prev and name in scalars:
+            pv, pt = prev[name]
+            dt = now - pt
+            if dt > 0:
+                return (scalars[name] - pv) / dt
+        return None
+
+    gen_tps = rate("lightllm_generation_tokens_total")
+    in_tps = rate("lightllm_prompt_tokens_total")
+    tpm = gen_tps * 60 if gen_tps is not None else None


Improvement: Use the newly introduced calculate_rate helper function to simplify the code and remove the nested rate function.

gen_tps = calculate_rate("lightllm_generation_tokens_total", prev, scalars, now) in_tps = calculate_rate("lightllm_prompt_tokens_total", prev, scalars, now) tpm = gen_tps * 60 if gen_tps is not None else None

gemini-code-assist · 2026-06-16T11:19:53Z

+                    # 瞬时 gen tok/s 用于趋势
+                    gen_tps = None
+                    if "lightllm_generation_tokens_total" in prev:
+                        pv, pt = prev["lightllm_generation_tokens_total"]
+                        dt = now - pt
+                        if dt > 0:
+                            gen_tps = (scalars["lightllm_generation_tokens_total"] - pv) / dt


Improvement: Use the shared calculate_rate helper function here to eliminate duplicate rate calculation logic.

Suggested change

# 瞬时 gen tok/s 用于趋势

gen_tps = None

if "lightllm_generation_tokens_total" in prev:

pv, pt = prev["lightllm_generation_tokens_total"]

dt = now - pt

if dt > 0:

gen_tps = (scalars["lightllm_generation_tokens_total"] - pv) / dt

# 瞬时 gen tok/s 用于趋势

gen_tps = calculate_rate("lightllm_generation_tokens_total", prev, scalars, now)

add monitor

de9ecf0

gemini-code-assist Bot reviewed Jun 16, 2026

View reviewed changes

sufubao added 2 commits June 16, 2026 19:20

move to tools

f5cebbb

monitor: add MTP avg accepted len (tokens/step)

b8f9f34

sufubao force-pushed the QPS branch from 8a37014 to b8f9f34 Compare June 16, 2026 11:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add monitor#1360

add monitor#1360
sufubao wants to merge 3 commits into
ModelTC:mainfrom
sufubao:QPS

sufubao commented Jun 16, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sufubao commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sufubao commented Jun 16, 2026 •

edited

Loading