obs(pebble): expose LSM internals as prometheus metrics by bootjp · Pull Request #578 · bootjp/elastickv

bootjp · 2026-04-21T19:14:09Z

Summary

Adds an elastickv_pebble_* Prometheus metrics family that periodically polls each raft group's *pebble.DB.Metrics() snapshot so operators can graph and alert on write-stall precursors.
Mirrors the existing DispatchCollector shape: a PebbleMetricsSource interface (satisfied by the pebbleStore via a new Metrics() accessor guarded by dbMu), a PebbleCollector that polls on the same 5 s cadence from main.go.
Covered by unit tests using a fakePebbleSource returning canned *pebble.Metrics; asserts gauge-overwrite semantics and reset-safe counter deltas.

Exposed metrics (per `group` label)

Gauges (overwritten each tick):

elastickv_pebble_l0_sublevels - canonical write-stall precursor
elastickv_pebble_l0_num_files
elastickv_pebble_compact_estimated_debt_bytes
elastickv_pebble_compact_in_progress
elastickv_pebble_memtable_count / _size_bytes / _zombie_count
elastickv_pebble_block_cache_size_bytes

Counters (positive-delta only, reset-safe across Restore/swap):

elastickv_pebble_compact_count_total
elastickv_pebble_block_cache_hits_total / _misses_total

Deviations from the task description

No WriteStallCount / WriteStallDuration: pebble v2.1.4 does not expose those on the Metrics struct (they are only delivered via EventListener.WriteStallBegin / End callbacks). Wiring an event listener is doable but it would require modifying defaultPebbleOptions() to install one - that is a larger surface than this observability pass. L0 sublevels and compaction debt are the more actionable write-stall proxies and are included. Can follow up with an EventListener-based stall counter in a separate PR if desired.
Counter semantics: the first observation emits the absolute cumulative value (e.g. 100 hits -> _total = 100); subsequent observations emit only the positive delta. A source reset (Restore reopens the DB with fresh counters) causes a smaller observed value - that delta is silently dropped so the Prometheus counter stays monotonic, mirroring DispatchCollector behaviour.

Test plan

go build ./...
go test -race -count=1 -short ./monitoring/... ./store/...
make lint (0 issues)

Add an elastickv_pebble_* metrics family that periodically polls each raft group's *pebble.DB.Metrics() snapshot and mirrors the most operationally useful fields into Prometheus so operators can graph and alert on write-stall precursors without restarting the process for a heap dump. Fields exposed (per-group label): Gauges (overwritten each tick): l0_sublevels - canonical write-stall precursor l0_num_files - L0 fan-in compact_estimated_debt_bytes - compactions falling behind ingest compact_in_progress memtable_count / memtable_size_bytes / memtable_zombie_count block_cache_size_bytes Counters (positive-delta only, reset-safe): compact_count_total block_cache_hits_total / block_cache_misses_total Rationale: L0 sublevels and compaction debt are the incident signals for write stalls; memtable size + zombies catch memtable-related stalls and long-lived iterators; block cache size/hit/miss track working-set fit. Note: pebble v2.1.4 does not expose WriteStallCount / WriteStallDuration on the Metrics struct (only as EventListener callbacks), so those are omitted from this pass. L0 sublevels is the more actionable proxy. Wiring mirrors the existing DispatchCollector shape: - store.pebbleStore.Metrics() accessor (guarded by dbMu so a concurrent Restore/Close cannot race the DB swap) - monitoring.PebbleMetricsSource interface satisfied by *pebbleStore - monitoring.PebbleCollector started from main.go next to the DispatchCollector, on the same 5 s cadence Tests use a fakePebbleSource returning canned *pebble.Metrics and assert both the gauge-overwrite semantics and the reset-safe counter deltas.

coderabbitai · 2026-04-21T19:14:16Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 59 minutes and 33 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 59 minutes and 33 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 701783b0-23e2-43e1-bd1c-09bb8e6dfc3e

📥 Commits

Reviewing files that changed from the base of the PR and between bbd7850 and be3c19b.

📒 Files selected for processing (5)

main.go
monitoring/pebble.go
monitoring/pebble_test.go
monitoring/registry.go
store/lsm_store.go

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch obs/pebble-metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a PebbleCollector to monitor Pebble LSM internals, such as L0 sublevels, compaction debt, and block cache usage, exposing them as Prometheus metrics. The implementation includes metrics definitions, polling logic, registry integration, and comprehensive unit tests. A typo was identified in the pebbleSnapshot struct field name blockCacheMisse which should be corrected for consistency.

gemini-code-assist · 2026-04-21T19:19:27Z

+type pebbleSnapshot struct {
+	compactCount    int64
+	blockCacheHits  int64
+	blockCacheMisse int64


There's a typo in this field name. It should be blockCacheMisses.

This typo is also present in its usages on lines 263, 271, and 272. Please correct it in all places for consistency and maintainability.

Suggested change

blockCacheMisse int64

blockCacheMisses int64

bootjp · 2026-04-21T20:47:04Z

Addressed the typo in monitoring/pebble.go: renamed blockCacheMisse -> blockCacheMisses on lines 182, 263, 271, 272 (commit b8a8b67). Tests pass (go test -race -count=1 -short ./monitoring/... ./store/...) and golangci-lint run ./monitoring/... is clean.

/gemini review

gemini-code-assist

Code Review

This pull request introduces a PebbleCollector to monitor and expose Pebble LSM internal metrics, such as L0 pressure, compaction debt, and cache usage, via Prometheus. The implementation includes logic to handle monotonic counters and source resets. Feedback suggests improving naming consistency for block cache metrics and optimizing performance by pre-calculating group ID strings to reduce allocations during the polling loop.

gemini-code-assist · 2026-04-21T20:53:09Z

+	// Block cache.
+	blockCacheSizeBytes *prometheus.GaugeVec
+	blockCacheHitsTotal *prometheus.CounterVec
+	blockCacheMissTotal *prometheus.CounterVec


The field name blockCacheMissTotal is inconsistent with blockCacheHitsTotal (plural). For better consistency with the other fields and the Prometheus metric name itself, consider renaming this to blockCacheMissesTotal.

Suggested change

blockCacheMissTotal *prometheus.CounterVec

blockCacheMissesTotal *prometheus.CounterVec

gemini-code-assist · 2026-04-21T20:53:10Z

+type PebbleSource struct {
+	GroupID uint64
+	Source  PebbleMetricsSource
+}


To avoid repeated allocations during the polling loop, consider pre-calculating the string representation of the GroupID and storing it in the PebbleSource struct. strconv.FormatUint is called for every group on every 5-second tick.

Suggested change

type PebbleSource struct {

GroupID uint64

Source PebbleMetricsSource

}

type PebbleSource struct {

GroupID uint64

GroupIDStr string

Source PebbleMetricsSource

}

gemini-code-assist · 2026-04-21T20:53:10Z

+		if snap == nil {
+			continue
+		}
+		group := strconv.FormatUint(src.GroupID, 10)


If you apply the suggestion to pre-calculate the group ID string in PebbleSource, you can use it here directly to avoid per-tick allocations.

Suggested change

group := strconv.FormatUint(src.GroupID, 10)

group := src.GroupIDStr

gemini-code-assist · 2026-04-21T20:53:10Z

+		out = append(out, monitoring.PebbleSource{
+			GroupID: runtime.spec.id,
+			Source:  src,
+		})


If the PebbleSource struct is updated to include a pre-calculated GroupIDStr, populate it here during the extraction loop.

out = append(out, monitoring.PebbleSource{ GroupID: runtime.spec.id, GroupIDStr: strconv.FormatUint(runtime.spec.id, 10), Source: src, })

Addresses gemini-code-assist medium-priority review on PR #578: the polling loop formatted src.GroupID with strconv.FormatUint on every tick (every 5s per group), causing avoidable allocations. Store the pre-computed decimal string on PebbleSource at construction time and reuse it as the "group" Prometheus label in observeOnce. - monitoring/pebble.go: add PebbleSource.GroupIDStr; consume it in observeOnce; drop now-unused strconv import. - main.go: populate GroupIDStr in pebbleMonitorSources. - monitoring/pebble_test.go: populate GroupIDStr in test fixtures.

bootjp · 2026-04-22T05:12:14Z

Addressed gemini's latest medium-priority review (2026-04-21T20:53:10Z) in 2af6f78.

Change: precompute PebbleSource.GroupIDStr once at construction (in pebbleMonitorSources) instead of calling strconv.FormatUint inside observeOnce on every 5s tick per group. Over the process lifetime this removes one small-string allocation per group per tick.

monitoring/pebble.go: add GroupIDStr string field to PebbleSource; use src.GroupIDStr as the group label in observeOnce; drop the now-unused strconv import.
main.go: populate GroupIDStr: strconv.FormatUint(runtime.spec.id, 10) in pebbleMonitorSources.
monitoring/pebble_test.go: populate GroupIDStr in the three PebbleSource literals used by the collector tests.

Verification: go build ./... clean; go test -race -count=1 -short ./monitoring/... ./store/... passes (monitoring 2.085s, store 7.011s); make lint 0 issues.

/gemini review

gemini-code-assist · 2026-04-22T05:12:17Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copilot

Pull request overview

Adds first-class observability for Pebble/LSM health by exporting selected *pebble.DB.Metrics() internals as Prometheus metrics, integrated into the existing monitoring registry and started alongside other periodic collectors.

Changes:

Add pebbleStore.Metrics() accessor guarded by dbMu to safely snapshot Pebble metrics.
Introduce monitoring.PebbleMetrics + PebbleCollector to translate Pebble snapshots into elastickv_pebble_* gauges/counters.
Wire the new collector into main.go and add unit tests validating gauge overwrite + reset-safe counter deltas.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
store/lsm_store.go	Exposes a thread-safe `Metrics()` snapshot method on `pebbleStore`.
monitoring/registry.go	Registers Pebble metrics and exposes `Registry.PebbleCollector()`.
monitoring/pebble.go	Implements Prometheus vectors + polling collector for Pebble metrics.
monitoring/pebble_test.go	Adds unit tests for overwrite semantics, delta counters, and nil safety.
main.go	Starts the Pebble collector and builds per-group Pebble sources.

Copilot · 2026-04-22T07:09:07Z

+// Returns nil while the store is closed or between a Restore/swap (no
+// current DB). Callers must handle nil.
+//
+// Safe for concurrent use: takes the dbMu read lock to protect against
+// Restore/Close swapping the DB pointer.


The docstring says this returns nil while the store is closed or mid Restore/swap, but the implementation only returns nil when s.db == nil. Restore holds dbMu exclusively (callers will block rather than observe nil), and Close() doesn’t set s.db to nil, so a post-Close call would still hit s.db.Metrics(). Please either update the comment to match actual behavior, or set s.db = nil as part of Close()/swap so the nil contract is true (and avoids calling Metrics() on a closed DB).

Copilot · 2026-04-22T07:09:07Z

+// dashboard.
+//
+// The point-in-time fields (Sublevels, NumFiles, EstimatedDebt,
+// MemTable.*, NumInProgress, BlockCache.Size/Count) are exposed as


The header comment mentions exposing BlockCache.Size/Count as gauges, but this collector only exports block_cache_size_bytes (no block cache count metric). Please update the comment to match what’s actually exported, or add the missing metric if intended.

Suggested change

// MemTable.*, NumInProgress, BlockCache.Size/Count) are exposed as

// MemTable.*, NumInProgress, BlockCache.Size) are exposed as

Copilot · 2026-04-22T07:09:08Z

+	blockCacheSizeBytes *prometheus.GaugeVec
+	blockCacheHitsTotal *prometheus.CounterVec
+	blockCacheMissTotal *prometheus.CounterVec
+}


Field name blockCacheMissTotal is inconsistent with the exported metric name (…_misses_total) and the nearby snapshot field (blockCacheMisses). Consider renaming to blockCacheMissesTotal for clarity/consistency.

- Metrics() docstring now matches actual behavior (nil only pre-Open or after a failed Open; Restore callers block on dbMu). - Drop "Count" from the collector header comment; only BlockCache.Size is exported. - Rename blockCacheMissTotal -> blockCacheMissesTotal for consistency with blockCacheHitsTotal, the snapshot field blockCacheMisses, and the exported metric name (..._misses_total).

bootjp · 2026-04-22T07:36:34Z

/gemini review

Addressed Copilot's 3 review items in be3c19b:

store/lsm_store.go:1803 — Updated Metrics() docstring to accurately describe nil semantics: nil only before the first Open or after a failed Open; Restore callers block on dbMu (held exclusively by Restore) rather than observing nil, and Close() does not clear s.db. Behavior unchanged (out of scope for this PR).
monitoring/pebble.go:19 — Removed "Count" from the header comment; only BlockCache.Size is exported (Size is more actionable than Count).
monitoring/pebble.go:53 — Renamed blockCacheMissTotal to blockCacheMissesTotal for consistency with blockCacheHitsTotal, the adjacent snapshot field blockCacheMisses, and the exported metric name ..._misses_total. All usages propagated.

Verification: go build ./... clean, go test -race -count=1 -short ./monitoring/... ./store/... passes, make lint 0 issues.

gemini-code-assist

Code Review

This pull request introduces a PebbleCollector to monitor Pebble LSM internals, such as L0 sublevels, compaction debt, and memtable/block cache usage, via Prometheus. It includes the necessary metrics definitions, a collector that polls these metrics on a fixed interval, and integration into the main application lifecycle. Feedback was provided regarding a concurrency issue in the metrics collection loop: holding a mutex while calling potentially blocking Metrics() methods could stall observability for all raft groups on a node. It is recommended to use more granular locking to ensure that a single slow or blocked store does not impact the monitoring of others.

gemini-code-assist · 2026-04-22T07:42:08Z

+	c.mu.Lock()
+	defer c.mu.Unlock()


Holding the PebbleCollector mutex across the entire loop is problematic because src.Source.Metrics() can block. In pebbleStore, Metrics() acquires dbMu.RLock(). If a raft group is performing a long-running operation that holds dbMu exclusively (like a Restore or a large Compact), this collector will block on that group while holding c.mu. This prevents metrics from being updated for all other raft groups on the same node, leading to a total loss of Pebble observability during critical recovery operations. Per repository guidelines, ensure that the blocking semantic of Metrics() is explicitly stated in its interface documentation. Additionally, when performing updates to the previous state, the lock must be held continuously from the read to the assignment to prevent race conditions.

func (c *PebbleCollector) observeOnce(sources []PebbleSource) { if c == nil || c.metrics == nil { return } for _, src := range sources { if src.Source == nil { continue } snap := src.Source.Metrics() if snap == nil { continue } group := src.GroupIDStr // L0 pressure: gauges, overwritten each tick. c.metrics.l0Sublevels.WithLabelValues(group).Set(float64(snap.Levels[0].Sublevels)) c.metrics.l0NumFiles.WithLabelValues(group).Set(float64(snap.Levels[0].TablesCount)) // Compaction. c.metrics.compactEstimatedDebt.WithLabelValues(group).Set(float64(snap.Compact.EstimatedDebt)) c.metrics.compactInProgress.WithLabelValues(group).Set(float64(snap.Compact.NumInProgress)) // Memtable. c.metrics.memtableCount.WithLabelValues(group).Set(float64(snap.MemTable.Count)) c.metrics.memtableSizeBytes.WithLabelValues(group).Set(float64(snap.MemTable.Size)) c.metrics.memtableZombieCount.WithLabelValues(group).Set(float64(snap.MemTable.ZombieCount)) // Block cache gauge. c.metrics.blockCacheSizeBytes.WithLabelValues(group).Set(float64(snap.BlockCache.Size)) // Monotonic counters: emit only the positive delta. A smaller // value means the source was reset (store reopened); rebase // silently without emitting negative. c.mu.Lock() prev := c.previous[src.GroupID] curr := pebbleSnapshot{ compactCount: snap.Compact.Count, blockCacheHits: snap.BlockCache.Hits, blockCacheMisses: snap.BlockCache.Misses, } if curr.compactCount > prev.compactCount { c.metrics.compactCountTotal.WithLabelValues(group).Add(float64(curr.compactCount - prev.compactCount)) } if curr.blockCacheHits > prev.blockCacheHits { c.metrics.blockCacheHitsTotal.WithLabelValues(group).Add(float64(curr.blockCacheHits - prev.blockCacheHits)) } if curr.blockCacheMisses > prev.blockCacheMisses { c.metrics.blockCacheMissesTotal.WithLabelValues(group).Add(float64(curr.blockCacheMisses - prev.blockCacheMisses)) } c.previous[src.GroupID] = curr c.mu.Unlock() } }

References

When an interface method is intended to block until a certain condition is met, explicitly state this blocking semantic in the documentation or interface description to avoid ambiguity for implementers.

When performing atomic operations based on a version or state variable, do not release the lock between reading the variable and performing the conditional check to prevent race conditions and the use of stale data.

bootjp · 2026-04-22T09:17:51Z

/gemini reivew

gemini-code-assist · 2026-04-22T09:18:04Z

It looks like you made a typo in the command /gemini reivew. To trigger a review, please use the correct command: /gemini review.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

fix(pebble-metrics): rename blockCacheMisss -> blockCacheMisses

b8a8b67

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

bootjp requested a review from Copilot April 22, 2026 07:03

Copilot started reviewing on behalf of bootjp April 22, 2026 07:03 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

bootjp requested a review from Copilot April 22, 2026 09:17

Copilot started reviewing on behalf of bootjp April 22, 2026 09:18 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

bootjp merged commit 0f97c45 into main Apr 22, 2026
12 checks passed

bootjp deleted the obs/pebble-metrics branch April 22, 2026 11:15

	blockCacheMissTotal *prometheus.CounterVec
	blockCacheMissesTotal *prometheus.CounterVec

	group := strconv.FormatUint(src.GroupID, 10)
	group := src.GroupIDStr

	// MemTable.*, NumInProgress, BlockCache.Size/Count) are exposed as
	// MemTable.*, NumInProgress, BlockCache.Size) are exposed as

Conversation

bootjp commented Apr 21, 2026

Summary

Exposed metrics (per group label)

Deviations from the task description

Test plan

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot commented Apr 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot commented Apr 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Exposed metrics (per `group` label)

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading