Skip to content

perf(store): add ELASTICKV_FSM_SYNC_MODE for FSM apply fsync opt-out#592

Merged
bootjp merged 4 commits intomainfrom
perf/fsm-apply-group-commit
Apr 23, 2026
Merged

perf(store): add ELASTICKV_FSM_SYNC_MODE for FSM apply fsync opt-out#592
bootjp merged 4 commits intomainfrom
perf/fsm-apply-group-commit

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 23, 2026

Root cause

Every committed raft entry triggered a pebble.Sync inside store.ApplyMutations / store.DeletePrefixAt (see store/lsm_store.go:1056, 1108), on top of the raft WAL fsync that etcd/raft already performs per Ready batch in persistReadyToWAL (internal/raftengine/etcd/wal_store.go:376).

The raft Ready loop in drainReady (internal/raftengine/etcd/engine.go:1389) already batches multiple entries per Ready, so the raft WAL fsync is fine. The hot fsync is the FSM-side b.Commit(pebble.Sync) that the apply loop (applyCommitted) hits once per entry. A prior cleanup (docs/review_todo.md section 3.4) intentionally kept ApplyMutations on pebble.Sync; this CL makes that choice tunable.

Microbenchmark (Apple M1 Max, APFS tempdir, BenchmarkApplyMutations_SyncMode):

mode ns/op allocs/op
sync 16292899 7
nosync 16293 8

~1000x on this platform. Real hardware fsync latency varies, but the sync/nosync ratio is consistently large on any WAL that enforces platform durability.

Durability argument

Pebble's FSM-commit fsync is redundant with the raft WAL under this codebase's crash-recovery model:

  1. Raft WAL (etcd/raft) fsyncs every committed entry via persist.Save before Advance.
  2. On restart, newMemoryStorage (internal/raftengine/etcd/persistence.go:352) reloads the snapshot + all WAL entries. newRawNode does not set Config.Applied, so etcdraft defaults it to snapshot.Metadata.Index.
  3. The engine sets e.applied = maxAppliedIndex(LocalSnap) and every committed entry past the snapshot is re-delivered through CommittedEntries on the first Ready.
  4. kv/fsm.applyCommitWithIdempotencyFallback treats an already-committed key (LatestCommitTS >= commitTS) as an idempotent retry, so replaying an entry whose effect survived the crash is safe.
  5. FSM snapshots are fsynced (writeFSMSnapshotFile then f.Sync() in fsm_snapshot_file.go).

Therefore a crash that loses the unfsynced tail of Pebble's own WAL is recoverable: raft replays from the last fsynced FSM snapshot onwards, and the idempotent apply path re-materialises the lost state. Pebble on the FSM commit path effectively becomes a volatile cache of applied state whose durability boundary is the raft WAL.

Other pebble.Sync call sites (snapshot-batch commit, metadata-restore writes, compaction commitSnapshotBatch) are untouched: those are orthogonal durability boundaries (e.g. restore-directory swap) and are not per-proposal cost.

Env var + default

  • ELASTICKV_FSM_SYNC_MODE=sync (default) - current behaviour.
  • ELASTICKV_FSM_SYNC_MODE=nosync - b.Commit(pebble.NoSync) on the FSM hot path. Raft WAL remains the durability boundary.

Unknown values fall back to sync (fail-safe toward durability). Parsing is case-insensitive and whitespace-tolerant.

A Prometheus gauge elastickv_fsm_apply_sync_mode{mode="sync"|"nosync"} is set at NewRegistry time via store.FSMApplySyncModeLabel(), so dashboards can alert if a rolling deploy accidentally flips the durability posture.

Test plan

  • go test ./store/... ./monitoring/... ./kv/... ./internal/raftengine/... -count=1
  • Env var parsing: sync/nosync/mixed-case/whitespace/unknown then sync default
  • Functional equivalence of sync vs nosync on a Pebble store
  • Clean-shutdown reopen visibility (NoSync + Close + reopen preserves writes)
  • Prometheus gauge mutual exclusivity across successive SetFSMApplySyncMode calls
  • (Follow-up) Jepsen-style OS-level crash test for unfsynced-tail recovery - tracked in JEPSEN_TODO.md

Benchmark

go test ./store -run='^$' -bench='BenchmarkApplyMutations_SyncMode' -benchtime=2s -benchmem
BenchmarkApplyMutations_SyncMode/sync-10      141   16292899 ns/op   368 B/op   7 allocs/op
BenchmarkApplyMutations_SyncMode/nosync-10    129262     16293 ns/op   284 B/op   8 allocs/op

Related

Summary by CodeRabbit

  • New Features

    • Configuration option for FSM commit durability mode with adjustable sync and no-sync settings
    • New Prometheus metric exposing the current durability mode on the /metrics endpoint
  • Tests

    • Unit tests validating FSM durability mode functionality
    • Integration tests ensuring functional correctness across different durability modes
    • Performance benchmarks evaluating FSM commit operation characteristics

Every committed raft entry triggered a pebble.Sync in ApplyMutations /
DeletePrefixAt, on top of the raft WAL fsync that etcd upstream
already performs per Ready batch. Under write-heavy workloads this
per-proposal fsync dominates p50 latency (Apple M1 Max: 16 ms/op on
an APFS tempdir vs 16 us/op with NoSync -- three orders of magnitude).

Because raft re-delivers all committed entries past the last FSM
snapshot on restart (Config.Applied defaults to snapshot.Metadata.Index
and kv/fsm.applyCommitWithIdempotencyFallback treats already-committed
writes as idempotent retries), Pebble durability on the FSM commit
path is redundant with raft-WAL durability: a crash that loses the
tail of the Pebble WAL is recoverable via raft-log replay from the
fsynced FSM snapshot.

This CL exposes that lever behind ELASTICKV_FSM_SYNC_MODE, defaulting
to "sync" so production behaviour is unchanged. Operators can opt in
to "nosync" to trade FSM-state crash durability (re-covered from raft)
for write throughput. Other pebble.Sync call sites (snapshot commit,
metadata boundary writes, compaction) are untouched: their durability
contract is orthogonal to the raft log.

Also adds an elastickv_fsm_apply_sync_mode gauge so dashboards alert
on unexpected posture changes, wired at NewRegistry time via
store.FSMApplySyncModeLabel().

Tests: env-var parsing (sync / nosync / case / whitespace / unknown
fallback), functional equivalence of both modes, and clean-shutdown
reopen visibility. Crash-recovery via un-fsynced WAL tail loss is an
OS-level scenario tracked in JEPSEN_TODO and not a Go unit test.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 41 minutes and 51 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 41 minutes and 51 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b62e6cbc-70cb-497d-90fb-96995377acf0

📥 Commits

Reviewing files that changed from the base of the PR and between 88910f9 and 1236dae.

📒 Files selected for processing (3)
  • monitoring/pebble.go
  • monitoring/pebble_test.go
  • store/lsm_store_sync_mode_benchmark_test.go
📝 Walkthrough

Walkthrough

This PR adds environment-controlled Pebble WAL durability configuration for FSM commits through a new ELASTICKV_FSM_SYNC_MODE environment variable supporting "sync" and "nosync" modes, with corresponding Prometheus metrics to expose the active mode at startup and runtime.

Changes

Cohort / File(s) Summary
Startup Configuration
main.go, monitoring/registry.go
Adds registry method SetFSMApplySyncMode() to propagate resolved FSM sync mode to metrics during initialization, with nil-safety checks.
Pebble Metrics
monitoring/pebble.go, monitoring/pebble_test.go
Introduces elastickv_fsm_apply_sync_mode Prometheus GaugeVec metric and SetFSMApplySyncMode() method to track active durability mode; includes tests for mode flipping and nil-safety.
LSM Store Durability Control
store/lsm_store.go, store/lsm_store_env_test.go
Adds environment-driven FSM apply write options configuration via ELASTICKV_FSM_SYNC_MODE, parsing logic with fallback to sync, package-level fsmApplyWriteOpts variable, and FSMApplySyncModeLabel() accessor for metrics integration; updates ApplyMutations() and DeletePrefixAt() to use configured write options.
Sync Mode Testing & Benchmarking
store/lsm_store_sync_mode_test.go, store/lsm_store_sync_mode_benchmark_test.go
Adds functional tests validating data visibility and durability under NoSync mode, plus write-heavy benchmark (BenchmarkApplyMutations_SyncMode) measuring per-operation latency and throughput across sync modes.

Sequence Diagram(s)

sequenceDiagram
    participant main as Startup Process
    participant store as LSM Store<br/>(init)
    participant registry as Metrics Registry
    participant pebble as Pebble Metrics
    participant metrics as /metrics Endpoint

    main->>store: Load process
    activate store
    store->>store: Parse ELASTICKV_FSM_SYNC_MODE env var<br/>(default: "sync")
    store->>store: Resolve to fsmApplyWriteOpts<br/>& fsmApplySyncModeLabel
    deactivate store

    main->>registry: NewRegistry()
    activate registry
    registry->>pebble: NewPebbleMetrics()
    activate pebble
    pebble->>pebble: Register elastickv_fsm_apply_sync_mode<br/>GaugeVec
    deactivate pebble
    deactivate registry

    main->>store: FSMApplySyncModeLabel()
    store-->>main: Return active mode label<br/>("sync" or "nosync")

    main->>registry: SetFSMApplySyncMode(activeLabel)
    activate registry
    registry->>pebble: SetFSMApplySyncMode(activeLabel)
    activate pebble
    pebble->>pebble: Set gauge[activeLabel]=1,<br/>others=0
    deactivate pebble
    deactivate registry

    metrics->>pebble: Scrape /metrics
    pebble-->>metrics: elastickv_fsm_apply_sync_mode{mode="sync"}=1
Loading
sequenceDiagram
    participant client as Client
    participant store as PebbleStore<br/>(ApplyMutations)
    participant batch as Pebble Batch
    participant wal as Pebble WAL

    client->>store: ApplyMutations(mutations)
    activate store
    store->>batch: NewBatch()
    loop Each mutation
        batch->>batch: Apply put/delete
    end
    store->>batch: Commit(fsmApplyWriteOpts)
    activate batch
    alt fsmApplyWriteOpts = Sync
        batch->>wal: Sync write (durable)
    else fsmApplyWriteOpts = NoSync
        batch->>wal: Async write (performance)
    end
    batch-->>store: ✓ Committed
    deactivate batch
    store-->>client: Success
    deactivate store
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit hops through metrics bright, 🐰
Sync or NoSync—choosing durability's right,
Environment variables guide the way,
Pebble WAL dances, come what may,
Benchmarks measure each eager hop's delight! 📊✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: introducing an environment variable ELASTICKV_FSM_SYNC_MODE that allows opting out of fsync on the FSM apply path for performance.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/fsm-apply-group-commit

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
store/lsm_store.go (1)

1177-1180: Add matching NoSync coverage for DeletePrefixAt.

DeletePrefixAt now uses the same tunable FSM commit option as ApplyMutations, but the new NoSync functional/reopen tests shown only exercise ApplyMutations. A small test that applies keys, calls DeletePrefixAt under pebble.NoSync, then verifies visibility after clean reopen would catch regressions where this path accidentally goes back to pebble.Sync.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@store/lsm_store.go` around lines 1177 - 1180, Add a functional test that
mirrors the existing NoSync coverage for ApplyMutations but targets
DeletePrefixAt: create keys, call DeletePrefixAt using the same tunable FSM
commit option (pebble.NoSync / the fsmApplyWriteOpts variant for NoSync),
perform a clean reopen of the DB, and assert that deletions are visible after
reopen; this ensures DeletePrefixAt (the method) uses the same commit path as
ApplyMutations rather than falling back to pebble.Sync. Include references to
DeletePrefixAt, ApplyMutations, fsmApplyWriteOpts and pebble.NoSync when
locating where to add the test and to reuse the same commit-option setup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@monitoring/pebble.go`:
- Around line 176-195: SetFSMApplySyncMode on PebbleMetrics currently accepts
any activeLabel and can create arbitrary metric labels; change it so unknown
values fallback to "sync" before touching the gauge. In the SetFSMApplySyncMode
function, validate activeLabel against "sync" and "nosync" (treat anything else
as "sync"), then zero the known labels via
m.fsmApplySyncMode.WithLabelValues("sync").Set(0) and ("nosync").Set(0) and
finally set the validated label to 1; keep the nil checks for m and
m.fsmApplySyncMode intact.

---

Nitpick comments:
In `@store/lsm_store.go`:
- Around line 1177-1180: Add a functional test that mirrors the existing NoSync
coverage for ApplyMutations but targets DeletePrefixAt: create keys, call
DeletePrefixAt using the same tunable FSM commit option (pebble.NoSync / the
fsmApplyWriteOpts variant for NoSync), perform a clean reopen of the DB, and
assert that deletions are visible after reopen; this ensures DeletePrefixAt (the
method) uses the same commit path as ApplyMutations rather than falling back to
pebble.Sync. Include references to DeletePrefixAt, ApplyMutations,
fsmApplyWriteOpts and pebble.NoSync when locating where to add the test and to
reuse the same commit-option setup.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 941ef56a-2487-4849-9307-87010ec04347

📥 Commits

Reviewing files that changed from the base of the PR and between 306eb0b and 88910f9.

📒 Files selected for processing (8)
  • main.go
  • monitoring/pebble.go
  • monitoring/pebble_test.go
  • monitoring/registry.go
  • store/lsm_store.go
  • store/lsm_store_env_test.go
  • store/lsm_store_sync_mode_benchmark_test.go
  • store/lsm_store_sync_mode_test.go

Comment thread monitoring/pebble.go
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an optional "nosync" mode for FSM commits, allowing operators to prioritize performance over local durability when Raft log replay is sufficient for recovery. The changes include environment variable resolution for ELASTICKV_FSM_SYNC_MODE, updates to the Pebble store's commit logic, and new Prometheus metrics for observability. Review feedback points out a discrepancy between the metrics update documentation and its implementation regarding label resetting, and highlights potential thread-safety concerns in tests due to the use of global variables for configuration.

Comment thread monitoring/pebble.go Outdated
Comment on lines +182 to +183
// env var. Invoking again is safe and idempotent: the new label goes to
// 1 and all previously-set labels go to 0.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring claims that invoking this method again is idempotent and that "all previously-set labels go to 0". However, the implementation only explicitly zeros the "sync" and "nosync" labels. If a custom label was previously set (which the method allows), it will remain at 1, violating the mutual exclusivity property described. Consider updating the docstring to clarify that only the documented modes are reset, or use m.fsmApplySyncMode.Reset() if the intention is to clear all series (though Reset() may cause gaps in Prometheus scrapes).

Comment thread store/lsm_store_env_test.go Outdated
Comment on lines +97 to +107
func setFSMApplyWriteOptsForTest(t *testing.T, opts *pebble.WriteOptions, label string) {
t.Helper()
prevOpts := fsmApplyWriteOpts
prevLabel := fsmApplySyncModeLabel
fsmApplyWriteOpts = opts
fsmApplySyncModeLabel = label
t.Cleanup(func() {
fsmApplyWriteOpts = prevOpts
fsmApplySyncModeLabel = prevLabel
})
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test helper setFSMApplyWriteOptsForTest modifies package-level global variables (fsmApplyWriteOpts and fsmApplySyncModeLabel). While it uses t.Cleanup to restore state, this approach is not thread-safe if tests within this package are ever run in parallel (e.g., via t.Parallel()). Although current tests do not appear to use parallelism, this global mutation is a potential source of flakiness for future tests. Consider refactoring the store to accept WriteOptions as a dependency in NewPebbleStore to avoid global state in tests. Note that per repository guidelines, wide-reaching API changes should be deferred to a dedicated follow-up pull request to manage complexity.

References
  1. During large refactorings, limit the scope of a single pull request. Defer related but wide-reaching API changes to dedicated follow-up pull requests to manage complexity and avoid scope creep.

bootjp added 2 commits April 23, 2026 23:08
gosec G115 flagged uint64(i) * 2 on the benchmark's MVCC timestamp
math. The loop counter is non-negative by construction, but the
linter cannot prove it. Add an explicit if i < 0 guard so the
conversion is provably safe and the reviewdog warning clears.

Addresses PR #592 review feedback from coderabbit / reviewdog.
SetFSMApplySyncMode previously accepted any label verbatim, which
broke the promised two-row gauge shape in two ways: (1) a prior
unknown label was left at 1 because only sync/nosync were zeroed
explicitly, and (2) the store's resolver maps unknown
ELASTICKV_FSM_SYNC_MODE values to sync, so the monitoring gauge
could disagree with the actual WriteOptions in use.

Coerce any label outside {sync, nosync} to sync before setting,
matching store.resolveFSMApplyWriteOpts's fallback. Update the
docstring and add a dedicated test that exercises the coercion
against a primed nosync posture.

Addresses PR #592 review feedback from coderabbit and gemini.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

Thanks for the reviews. Pushed 56af1db and 1236dae addressing the actionable items:

1. gosec G115 in lsm_store_sync_mode_benchmark_test.go (coderabbit / reviewdog)

Added an explicit if i < 0 guard before the uint64(i) * 2 conversion in BenchmarkApplyMutations_SyncMode. The counter is non-negative by construction, but the explicit guard is what the linter needs to clear G115.

2. SetFSMApplySyncMode unknown-label handling (coderabbit minor + gemini medium)

Previously, an unrecognised activeLabel was accepted verbatim: it stayed pinned at 1 across subsequent calls (only "sync" / "nosync" were explicitly zeroed), and the gauge could disagree with the actual pebble.WriteOptions the store uses, since store.resolveFSMApplyWriteOpts maps unknown ELASTICKV_FSM_SYNC_MODE values to sync.

Fixed by coercing any label outside {"sync","nosync"} to "sync" before setting — same fallback the store resolver uses, so the two paths agree. Docstring updated; new test TestSetFSMApplySyncMode_UnknownLabelCoercesToSync primes the gauge with nosync, calls SetFSMApplySyncMode("batch"), and asserts sync=1 / nosync=0.

3. Global-state test helper setFSMApplyWriteOptsForTest (gemini medium) — deferred

Leaving this as-is for this PR. The reviewer's own "References" section on that comment cites the repo guideline: "During large refactorings, limit the scope of a single pull request. Defer related but wide-reaching API changes to dedicated follow-up pull requests to manage complexity and avoid scope creep." Changing NewPebbleStore to accept WriteOptions as a dependency is exactly that kind of wide-reaching API change — it ripples through every store construction site (main, engine bootstrap, tests, benchmarks) and is orthogonal to the durability opt-out this PR introduces. Happy to do it in a follow-up; not in this PR.

Local verification:

  • go test -race ./store/... ./monitoring/... — pass
  • golangci-lint run ./store/... ./monitoring/... — 0 issues

/gemini review
@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable FSM apply sync mode via the ELASTICKV_FSM_SYNC_MODE environment variable, allowing operators to choose between sync and nosync for Pebble write operations. It includes updates to the metrics registry to surface the active durability posture, comprehensive unit tests for environment variable parsing and functional equivalence, and benchmarks comparing the performance of the two modes. Feedback was provided regarding the use of package-level variables for core configuration, suggesting they be moved into the pebbleStore struct to improve test safety and avoid global state mutation.

Comment thread store/lsm_store.go Outdated
Comment on lines +103 to +110
// fsmApplyWriteOpts is the Pebble WriteOptions value applied on the FSM
// commit path. Resolved once from ELASTICKV_FSM_SYNC_MODE at init() and
// then treated as read-only. Exposed as a package variable so tests can
// swap it via setFSMApplyWriteOptsForTest; production code must not
// mutate it after init().
//
// The zero (unset) state is `pebble.Sync`, preserving legacy behaviour.
var fsmApplyWriteOpts = pebble.Sync
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a package-level variable for core write path configuration is brittle and can lead to race conditions or flakiness in tests. While the init() logic ensures it is set once at startup, the test helpers (setFSMApplyWriteOptsForTest) mutate this global state, which is not thread-safe if tests are run in parallel.

Consider moving applyWriteOpts into the pebbleStore struct and initializing it in NewPebbleStore. This would allow for per-instance configuration and safer testing without relying on global state mutation.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

Re-raising the same item. This PR intentionally defers the NewPebbleStore(WriteOptions) dependency-injection refactor to a follow-up.

Justification (from the previous round): gemini's own reference note says "During large refactorings, limit the scope of a single pull request. Defer related but wide-reaching API changes to dedicated follow-up pull requests to manage complexity and avoid scope creep." This PR's scope is the ELASTICKV_FSM_SYNC_MODE knob + the mutual-exclusivity gauge shape; changing NewPebbleStore's signature would touch every call site across store/, kv/, adapter/, main.go, plus every test harness that constructs a store. That diff would dominate the diff, obscure the perf change, and invite merge conflicts with other in-flight work.

Concretely for test safety: setFSMApplyWriteOptsForTest already uses t.Cleanup to restore the prior value, and no test in this package uses t.Parallel(). The race window is zero under the current harness. If future parallelism lands, the helper can be swapped for struct-injected options in a dedicated refactor PR without changing the production semantics of this one.

Leaving item as-is. No new commits.

/gemini review
@codex review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable FSM apply sync mode via the ELASTICKV_FSM_SYNC_MODE environment variable, allowing operators to choose between "sync" (default) and "nosync" for Pebble write operations. The changes update ApplyMutations and DeletePrefixAt to respect the selected mode and add a Prometheus gauge to monitor the active durability posture. Feedback was provided to simplify the environment variable resolution logic by merging redundant switch cases.

Comment thread store/lsm_store.go
Comment on lines +136 to +143
switch strings.ToLower(strings.TrimSpace(envVal)) {
case fsmSyncModeNoSync:
return pebble.NoSync, fsmSyncModeNoSync
case "", fsmSyncModeSync:
return pebble.Sync, fsmSyncModeSync
default:
return pebble.Sync, fsmSyncModeSync
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The switch statement can be simplified by merging the redundant cases that fall back to the default sync behavior. This improves readability and maintainability by removing unnecessary code paths.

func resolveFSMApplyWriteOpts(envVal string) (*pebble.WriteOptions, string) {
	switch strings.ToLower(strings.TrimSpace(envVal)) {
	case fsmSyncModeNoSync:
		return pebble.NoSync, fsmSyncModeNoSync
	default:
		return pebble.Sync, fsmSyncModeSync
	}
}
References
  1. Remove dead or unreachable code paths to improve code clarity and maintainability.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…re field

Per the gemini review on PR #592, migrate the FSM apply write options
(and its sync-mode label) from package-level globals onto the
pebbleStore struct so each store instance owns its resolved
ELASTICKV_FSM_SYNC_MODE state. NewPebbleStore now calls
resolveFSMApplyWriteOpts directly and stores the result; the
ApplyMutations and DeletePrefixAt commit paths read s.fsmApplyWriteOpts
instead of the package var.

- Remove the package-level fsmApplyWriteOpts / fsmApplySyncModeLabel
  vars and the init() assignment that populated them.
- Add fsmApplyWriteOpts and fsmApplySyncModeLabel fields on
  pebbleStore plus a (*pebbleStore).FSMApplySyncModeLabel() accessor.
- Drop setFSMApplyWriteOptsForTest / setBenchFSMApplyWriteOpts helpers;
  replace with newPebbleStoreWithFSMApplyWriteOptsForTest which builds
  a real store and overrides the fields, so tests exercise the full
  construction path without mutating globals.
- Main wiring now reads the label off the first built shard store via
  a fsmApplySyncModeLabeler type assertion after buildShardGroups,
  skipping the gauge update if no store exposes the accessor.

Env-var parsing coverage (sync / nosync / mixed-case / whitespace /
unknown / empty / garbage) remains exercised directly against
resolveFSMApplyWriteOpts, and the NoSync functional-equivalence and
reopen-visibility tests continue to validate the commit-path behaviour
through the per-store override.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

@bootjp bootjp enabled auto-merge April 23, 2026 16:21
@bootjp bootjp merged commit bee7ead into main Apr 23, 2026
7 checks passed
@bootjp bootjp deleted the perf/fsm-apply-group-commit branch April 23, 2026 16:22
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: de873f5478

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread store/lsm_store.go
// via ELASTICKV_FSM_SYNC_MODE=nosync when the raft WAL's durability is
// considered sufficient (raft-log replay from the last FSM snapshot
// re-applies any entries lost from Pebble after a crash).
if err := b.Commit(s.fsmApplyWriteOpts); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep non-Raft ApplyMutations commits synced

Switching ApplyMutations to s.fsmApplyWriteOpts makes ELASTICKV_FSM_SYNC_MODE=nosync affect callers that do not have Raft-log replay as a durability backstop. A concrete production path is distribution.EnsureCatalogSnapshot -> CatalogStore.Save -> store.ApplyMutations, which writes catalog state directly; if the process crashes before Pebble flushes, those acknowledged writes can be lost with no Raft entry to re-apply. This change should be scoped to true FSM/raft-apply call paths only, or direct callers must continue to force pebble.Sync.

Useful? React with 👍 / 👎.

bootjp added a commit that referenced this pull request Apr 23, 2026
Codex flagged that the "Unsupported Commands by Name" (Redis summary) and
"FSM Apply Sync Mode" (Pebble internals) panels query metrics not yet on
main, so during an incident operators could mistake the empty panels for
a scrape failure. Append a sentence to each panel's description calling
out the dependency and that the panel will populate automatically once
PR #594 / PR #592 merge.
bootjp added a commit that referenced this pull request Apr 23, 2026
…ctor Redis/Cluster dashboards (#597)

## Summary

- Add **Elastickv Pebble Internals** dashboard covering block cache hit
rate/capacity, L0 pressure, compactions, memtables, FSM apply sync mode,
and store write conflicts.
- Add **Unsupported Commands by Name** panel to the Redis dashboard
(driven by PR #594 `elastickv_redis_unsupported_commands_total`) and
annotate the legacy **Errors by Command** panel so operators know the
`unknown` bucket is broken down by real name in the new panel.
- Fold the standalone Redis hot-path (PR #560) dashboard into a
collapsed row at the bottom of the Redis summary dashboard, delete the
standalone file, and retitle the summary to **Elastickv Redis** so there
is a single Redis-facing dashboard.
- Rename the misnamed `elastickv-cluster-summary.json` (which was
already a pure DynamoDB dashboard) to `elastickv-dynamodb.json` /
**Elastickv DynamoDB** with `uid=elastickv-dynamodb`, and retitle
`elastickv-cluster-overview.json` to **Elastickv Cluster** now that it
is the sole cluster landing page.

## Before / After file list

| State | File |
| --- | --- |
| Added |
`monitoring/grafana/dashboards/elastickv-pebble-internals.json` |
| Renamed | `elastickv-cluster-summary.json` to
`elastickv-dynamodb.json` |
| Retitled | `elastickv-cluster-overview.json` (title now `Elastickv
Cluster`) |
| Modified | `elastickv-redis-summary.json` (title `Elastickv Redis`;
added Unsupported Commands panel and a collapsed Hot Path row) |
| Deleted | `monitoring/grafana/dashboards/elastickv-redis-hotpath.json`
|
| Unchanged | `monitoring/grafana/dashboards/elastickv-raft-status.json`
|

## What each dashboard now covers

- **Elastickv Cluster** - leader identity, cluster-wide latency/error
posture, per-node Raft health, and an embedded Redis row.
- **Elastickv DynamoDB** - per-operation throughput/latency and
per-table error/volume breakdown for the DynamoDB-compatible API.
- **Elastickv Raft Status** - unchanged: membership, leader changes,
failed proposals, node state, index drift, backlog, leader contact.
- **Elastickv Redis** - per-command throughput/latency/errors,
unsupported-commands breakdown, secondary-write and Lua diagnostics, and
a collapsible `Hot Path (legacy PR #560)` row with the 9 GET-fast-path
panels preserved verbatim.
- **Elastickv Pebble Internals** - block cache hit rate and
size-vs-capacity, hits/misses rate, L0 sublevels/files with write-stall
thresholds, compaction debt/progress/rate, memtable count/size/zombies,
FSM apply sync mode, store write conflicts by prefix.

## Dead-metric warnings

Two metrics referenced by panels in this PR are **not yet present in
`monitoring/*.go` on `main`**:

- `elastickv_redis_unsupported_commands_total` (PR #594) - the spec
asked for a panel targeting this and this PR adds one; it will stay
empty until #594 lands.
- `elastickv_pebble_fsm_apply_sync_mode` (PR #592) - same situation;
panel is in place and will populate once #592 lands.

Both are noted here so reviewers can decide whether to sequence this PR
after the upstream ones.

## Test plan

- [x] `python3 -c "import json; json.load(open('<file>'))"` passes on
every modified dashboard file.
- [x] Every dashboard exposes `schemaVersion`, `panels`, `time`,
`title`, `uid`.
- [x] Panel IDs are unique within each dashboard (including nested
hot-path panels).
- [x] No overlapping gridPos cells in the modified Redis dashboard
(checked with a programmatic sweep).
- [x] `go build ./...` succeeds after updating the four source comments
/ doc page that referenced the deleted hotpath JSON by path.
- [x] `go test ./...` passes, including `./monitoring/...`.
- [ ] Import each of the five JSON files into Grafana (10.x,
schemaVersion 41) via UI; every panel renders without errors and the Hot
Path row expands/collapses cleanly.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

Follow-up on the codex P2 correctness concern raised during this PR's review (ELASTICKV_FSM_SYNC_MODE=nosync bleeding into non-raft callers).

Addressed in #600: splits ApplyMutations / DeletePrefixAt (now unconditionally pebble.Sync) from new ApplyMutationsRaft / DeletePrefixAtRaft entry points that observe s.fsmApplyWriteOpts. kv/fsm.go is now the only caller of the *Raft variants, so the nosync knob cannot weaken durability for the CatalogStore.Save / EnsureCatalogSnapshot path (or any other direct caller) that has no raft-log replay backstop.

Tests + lint green. See #600 for the before/after table and new coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant