Skip to content

obs(grafana): pair stat + timeseries on Pebble internals dashboard#609

Open
bootjp wants to merge 1 commit intomainfrom
obs/grafana-pebble-stat-timeseries
Open

obs(grafana): pair stat + timeseries on Pebble internals dashboard#609
bootjp wants to merge 1 commit intomainfrom
obs/grafana-pebble-stat-timeseries

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 23, 2026

Summary

Follow-up to #597 that landed before this improvement could ride along. The Pebble Internals dashboard rendered as flat stat-only rows (5 node values per metric), which is a bad fit for LSM signals that matter as trends (L0 build-up, compaction debt, memtable count approaching write-stall).

This PR restructures every gauge panel into a stat (current value, colour-coded) + timeseries (per-node history with threshold line) pair.

Changes

Single file: monitoring/grafana/dashboards/elastickv-pebble-internals.json.

Final shape: 26 panels — 5 collapsible rows + 21 data panels (10 stat + 11 timeseries).

Metric Paired Threshold line
elastickv_pebble_l0_sublevels red @ 20 (write-stall start)
elastickv_pebble_l0_num_files red @ 1000
elastickv_pebble_memtable_count red @ 5
elastickv_pebble_memtable_size_bytes
elastickv_pebble_memtable_zombie_count
elastickv_pebble_compact_estimated_debt_bytes
elastickv_pebble_compact_in_progress
elastickv_pebble_block_cache_hit_rate ✓ (pre-existing)

Rate counters (compact_count_total, block_cache_hits_total, block_cache_misses_total) left as timeseries-only — an instantaneous rate stat is meaningless.

elastickv_store_write_conflict_total is labelled by key_prefix not node_id, so per-node stat reduction doesn't apply. Single timeseries.

elastickv_pebble_fsm_apply_sync_mode stays a single stat (boolean mode indicator).

Layout: two 12-wide × 8-high panels per row, grouped under Block Cache / LSM Level State (collapsed) / Compaction / FSM Apply / Store Write Conflicts rows. Queries standardised on max by (node_id).

Test plan

  • JSON valid: python3 -c "import json; json.load(open('monitoring/grafana/dashboards/elastickv-pebble-internals.json'))"
  • Import into Grafana, panels render without errors (operator verification)

/gemini review
@codex review

…l thresholds

Restructures the Pebble Internals dashboard so every gauge-style signal
is represented by two panels: a stat (current per-node value with colour
thresholds) and a timeseries (per-node trend, with red threshold lines
at the Pebble write-stall points where applicable).

Signals paired:
- L0 sublevels (stat thresholds 10/15, line at 20)
- L0 num files (stat thresholds 500/1000, line at 1000)
- Compaction estimated debt (stat thresholds 100/500 MiB)
- Compactions in progress (stat thresholds 1/3)
- Memtable count (stat thresholds 3/5, line at 5)
- Memtable size bytes
- Memtable zombie count

Rate-of-counter metrics (compaction rate, cache hits/misses rate, store
write conflicts by prefix) and the FSM apply sync mode indicator stay
single-panel because a point-in-time rate or boolean has no meaningful
stat companion.

Panels are grouped into rows: Block Cache, LSM Level State (collapsed),
Compaction, FSM Apply, Store Write Conflicts. Queries use
`max by (node_id)` to coalesce the group / node_address dimensions the
Pebble collector emits. Grid layout is two 12-wide panels per visual
row at height 8.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 20 minutes and 8 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 20 minutes and 8 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8c29060f-8626-497e-a4e1-b779fc4a998e

📥 Commits

Reviewing files that changed from the base of the PR and between 99c0560 and 050b4d9.

📒 Files selected for processing (1)
  • monitoring/grafana/dashboards/elastickv-pebble-internals.json
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch obs/grafana-pebble-stat-timeseries

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 050b4d93a2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

{
"datasource": "$datasource",
"editorMode": "code",
"expr": "max by (node_id) (elastickv_pebble_memtable_size_bytes{job=\"elastickv\",node_id=~\"$node_id\"})",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Aggregate memtable bytes across groups per node

On nodes that host multiple Raft groups, elastickv_pebble_memtable_size_bytes is emitted per group, so max by (node_id) only keeps the largest group and drops the rest. That under-reports true per-node memtable footprint (e.g., two 200MiB groups still display 200MiB), which can hide pressure in both the new stat thresholds and the paired trend panel; this should be sum by (node_id) if the panel is intended to represent per-node memory usage.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant