obs(grafana): pair stat + timeseries on Pebble internals dashboard#609
obs(grafana): pair stat + timeseries on Pebble internals dashboard#609
Conversation
…l thresholds Restructures the Pebble Internals dashboard so every gauge-style signal is represented by two panels: a stat (current per-node value with colour thresholds) and a timeseries (per-node trend, with red threshold lines at the Pebble write-stall points where applicable). Signals paired: - L0 sublevels (stat thresholds 10/15, line at 20) - L0 num files (stat thresholds 500/1000, line at 1000) - Compaction estimated debt (stat thresholds 100/500 MiB) - Compactions in progress (stat thresholds 1/3) - Memtable count (stat thresholds 3/5, line at 5) - Memtable size bytes - Memtable zombie count Rate-of-counter metrics (compaction rate, cache hits/misses rate, store write conflicts by prefix) and the FSM apply sync mode indicator stay single-panel because a point-in-time rate or boolean has no meaningful stat companion. Panels are grouped into rows: Block Cache, LSM Level State (collapsed), Compaction, FSM Apply, Store Write Conflicts. Queries use `max by (node_id)` to coalesce the group / node_address dimensions the Pebble collector emits. Grid layout is two 12-wide panels per visual row at height 8.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 20 minutes and 8 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 050b4d93a2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| { | ||
| "datasource": "$datasource", | ||
| "editorMode": "code", | ||
| "expr": "max by (node_id) (elastickv_pebble_memtable_size_bytes{job=\"elastickv\",node_id=~\"$node_id\"})", |
There was a problem hiding this comment.
Aggregate memtable bytes across groups per node
On nodes that host multiple Raft groups, elastickv_pebble_memtable_size_bytes is emitted per group, so max by (node_id) only keeps the largest group and drops the rest. That under-reports true per-node memtable footprint (e.g., two 200MiB groups still display 200MiB), which can hide pressure in both the new stat thresholds and the paired trend panel; this should be sum by (node_id) if the panel is intended to represent per-node memory usage.
Useful? React with 👍 / 👎.
Summary
Follow-up to #597 that landed before this improvement could ride along. The Pebble Internals dashboard rendered as flat stat-only rows (5 node values per metric), which is a bad fit for LSM signals that matter as trends (L0 build-up, compaction debt, memtable count approaching write-stall).
This PR restructures every gauge panel into a stat (current value, colour-coded) + timeseries (per-node history with threshold line) pair.
Changes
Single file:
monitoring/grafana/dashboards/elastickv-pebble-internals.json.Final shape: 26 panels — 5 collapsible rows + 21 data panels (10 stat + 11 timeseries).
elastickv_pebble_l0_sublevelselastickv_pebble_l0_num_fileselastickv_pebble_memtable_countelastickv_pebble_memtable_size_byteselastickv_pebble_memtable_zombie_countelastickv_pebble_compact_estimated_debt_byteselastickv_pebble_compact_in_progresselastickv_pebble_block_cache_hit_rateRate counters (
compact_count_total,block_cache_hits_total,block_cache_misses_total) left as timeseries-only — an instantaneous rate stat is meaningless.elastickv_store_write_conflict_totalis labelled bykey_prefixnotnode_id, so per-node stat reduction doesn't apply. Single timeseries.elastickv_pebble_fsm_apply_sync_modestays a single stat (boolean mode indicator).Layout: two 12-wide × 8-high panels per row, grouped under Block Cache / LSM Level State (collapsed) / Compaction / FSM Apply / Store Write Conflicts rows. Queries standardised on
max by (node_id).Test plan
python3 -c "import json; json.load(open('monitoring/grafana/dashboards/elastickv-pebble-internals.json'))"/gemini review
@codex review