obs(grafana): pair stat + timeseries on Pebble internals dashboard by bootjp · Pull Request #609 · bootjp/elastickv

bootjp · 2026-04-23T19:54:27Z

Summary

Follow-up to #597 that landed before this improvement could ride along. The Pebble Internals dashboard rendered as flat stat-only rows (5 node values per metric), which is a bad fit for LSM signals that matter as trends (L0 build-up, compaction debt, memtable count approaching write-stall).

This PR restructures every gauge panel into a stat (current value, colour-coded) + timeseries (per-node history with threshold line) pair.

Changes

Single file: monitoring/grafana/dashboards/elastickv-pebble-internals.json.

Final shape: 26 panels — 5 collapsible rows + 21 data panels (10 stat + 11 timeseries).

Metric	Paired	Threshold line
`elastickv_pebble_l0_sublevels`	✓	red @ 20 (write-stall start)
`elastickv_pebble_l0_num_files`	✓	red @ 1000
`elastickv_pebble_memtable_count`	✓	red @ 5
`elastickv_pebble_memtable_size_bytes`	✓	—
`elastickv_pebble_memtable_zombie_count`	✓	—
`elastickv_pebble_compact_estimated_debt_bytes`	✓	—
`elastickv_pebble_compact_in_progress`	✓	—
`elastickv_pebble_block_cache_hit_rate`	✓ (pre-existing)	—

Rate counters (compact_count_total, block_cache_hits_total, block_cache_misses_total) left as timeseries-only — an instantaneous rate stat is meaningless.

elastickv_store_write_conflict_total is labelled by key_prefix not node_id, so per-node stat reduction doesn't apply. Single timeseries.

elastickv_pebble_fsm_apply_sync_mode stays a single stat (boolean mode indicator).

Layout: two 12-wide × 8-high panels per row, grouped under Block Cache / LSM Level State (collapsed) / Compaction / FSM Apply / Store Write Conflicts rows. Queries standardised on max by (node_id).

Test plan

JSON valid: python3 -c "import json; json.load(open('monitoring/grafana/dashboards/elastickv-pebble-internals.json'))"
Import into Grafana, panels render without errors (operator verification)

/gemini review
@codex review

…l thresholds Restructures the Pebble Internals dashboard so every gauge-style signal is represented by two panels: a stat (current per-node value with colour thresholds) and a timeseries (per-node trend, with red threshold lines at the Pebble write-stall points where applicable). Signals paired: - L0 sublevels (stat thresholds 10/15, line at 20) - L0 num files (stat thresholds 500/1000, line at 1000) - Compaction estimated debt (stat thresholds 100/500 MiB) - Compactions in progress (stat thresholds 1/3) - Memtable count (stat thresholds 3/5, line at 5) - Memtable size bytes - Memtable zombie count Rate-of-counter metrics (compaction rate, cache hits/misses rate, store write conflicts by prefix) and the FSM apply sync mode indicator stay single-panel because a point-in-time rate or boolean has no meaningful stat companion. Panels are grouped into rows: Block Cache, LSM Level State (collapsed), Compaction, FSM Apply, Store Write Conflicts. Queries use `max by (node_id)` to coalesce the group / node_address dimensions the Pebble collector emits. Grid layout is two 12-wide panels per visual row at height 8.

gemini-code-assist · 2026-04-23T19:54:31Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

coderabbitai · 2026-04-23T19:54:34Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 20 minutes and 8 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 20 minutes and 8 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8c29060f-8626-497e-a4e1-b779fc4a998e

📥 Commits

Reviewing files that changed from the base of the PR and between 99c0560 and 050b4d9.

📒 Files selected for processing (1)

monitoring/grafana/dashboards/elastickv-pebble-internals.json

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch obs/grafana-pebble-stat-timeseries

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 050b4d93a2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-23T19:58:21Z

+            {
+              "datasource": "$datasource",
+              "editorMode": "code",
+              "expr": "max by (node_id) (elastickv_pebble_memtable_size_bytes{job=\"elastickv\",node_id=~\"$node_id\"})",


Aggregate memtable bytes across groups per node

On nodes that host multiple Raft groups, elastickv_pebble_memtable_size_bytes is emitted per group, so max by (node_id) only keeps the largest group and drops the rest. That under-reports true per-node memtable footprint (e.g., two 200MiB groups still display 200MiB), which can hide pressure in both the new stat thresholds and the paired trend panel; this should be sum by (node_id) if the panel is intended to represent per-node memory usage.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

obs(grafana): pair stat + timeseries on Pebble internals dashboard#609

obs(grafana): pair stat + timeseries on Pebble internals dashboard#609
bootjp wants to merge 1 commit intomainfrom
obs/grafana-pebble-stat-timeseries

bootjp commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026

Rate limit exceeded

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bootjp commented Apr 23, 2026

Summary

Changes

Test plan

Uh oh!

gemini-code-assist Bot commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026

Rate limit exceeded

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant