perf(store): add ELASTICKV_FSM_SYNC_MODE for FSM apply fsync opt-out#592
perf(store): add ELASTICKV_FSM_SYNC_MODE for FSM apply fsync opt-out#592
Conversation
Every committed raft entry triggered a pebble.Sync in ApplyMutations / DeletePrefixAt, on top of the raft WAL fsync that etcd upstream already performs per Ready batch. Under write-heavy workloads this per-proposal fsync dominates p50 latency (Apple M1 Max: 16 ms/op on an APFS tempdir vs 16 us/op with NoSync -- three orders of magnitude). Because raft re-delivers all committed entries past the last FSM snapshot on restart (Config.Applied defaults to snapshot.Metadata.Index and kv/fsm.applyCommitWithIdempotencyFallback treats already-committed writes as idempotent retries), Pebble durability on the FSM commit path is redundant with raft-WAL durability: a crash that loses the tail of the Pebble WAL is recoverable via raft-log replay from the fsynced FSM snapshot. This CL exposes that lever behind ELASTICKV_FSM_SYNC_MODE, defaulting to "sync" so production behaviour is unchanged. Operators can opt in to "nosync" to trade FSM-state crash durability (re-covered from raft) for write throughput. Other pebble.Sync call sites (snapshot commit, metadata boundary writes, compaction) are untouched: their durability contract is orthogonal to the raft log. Also adds an elastickv_fsm_apply_sync_mode gauge so dashboards alert on unexpected posture changes, wired at NewRegistry time via store.FSMApplySyncModeLabel(). Tests: env-var parsing (sync / nosync / case / whitespace / unknown fallback), functional equivalence of both modes, and clean-shutdown reopen visibility. Crash-recovery via un-fsynced WAL tail loss is an OS-level scenario tracked in JEPSEN_TODO and not a Go unit test.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 41 minutes and 51 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR adds environment-controlled Pebble WAL durability configuration for FSM commits through a new Changes
Sequence Diagram(s)sequenceDiagram
participant main as Startup Process
participant store as LSM Store<br/>(init)
participant registry as Metrics Registry
participant pebble as Pebble Metrics
participant metrics as /metrics Endpoint
main->>store: Load process
activate store
store->>store: Parse ELASTICKV_FSM_SYNC_MODE env var<br/>(default: "sync")
store->>store: Resolve to fsmApplyWriteOpts<br/>& fsmApplySyncModeLabel
deactivate store
main->>registry: NewRegistry()
activate registry
registry->>pebble: NewPebbleMetrics()
activate pebble
pebble->>pebble: Register elastickv_fsm_apply_sync_mode<br/>GaugeVec
deactivate pebble
deactivate registry
main->>store: FSMApplySyncModeLabel()
store-->>main: Return active mode label<br/>("sync" or "nosync")
main->>registry: SetFSMApplySyncMode(activeLabel)
activate registry
registry->>pebble: SetFSMApplySyncMode(activeLabel)
activate pebble
pebble->>pebble: Set gauge[activeLabel]=1,<br/>others=0
deactivate pebble
deactivate registry
metrics->>pebble: Scrape /metrics
pebble-->>metrics: elastickv_fsm_apply_sync_mode{mode="sync"}=1
sequenceDiagram
participant client as Client
participant store as PebbleStore<br/>(ApplyMutations)
participant batch as Pebble Batch
participant wal as Pebble WAL
client->>store: ApplyMutations(mutations)
activate store
store->>batch: NewBatch()
loop Each mutation
batch->>batch: Apply put/delete
end
store->>batch: Commit(fsmApplyWriteOpts)
activate batch
alt fsmApplyWriteOpts = Sync
batch->>wal: Sync write (durable)
else fsmApplyWriteOpts = NoSync
batch->>wal: Async write (performance)
end
batch-->>store: ✓ Committed
deactivate batch
store-->>client: Success
deactivate store
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
store/lsm_store.go (1)
1177-1180: Add matching NoSync coverage forDeletePrefixAt.
DeletePrefixAtnow uses the same tunable FSM commit option asApplyMutations, but the new NoSync functional/reopen tests shown only exerciseApplyMutations. A small test that applies keys, callsDeletePrefixAtunderpebble.NoSync, then verifies visibility after clean reopen would catch regressions where this path accidentally goes back topebble.Sync.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@store/lsm_store.go` around lines 1177 - 1180, Add a functional test that mirrors the existing NoSync coverage for ApplyMutations but targets DeletePrefixAt: create keys, call DeletePrefixAt using the same tunable FSM commit option (pebble.NoSync / the fsmApplyWriteOpts variant for NoSync), perform a clean reopen of the DB, and assert that deletions are visible after reopen; this ensures DeletePrefixAt (the method) uses the same commit path as ApplyMutations rather than falling back to pebble.Sync. Include references to DeletePrefixAt, ApplyMutations, fsmApplyWriteOpts and pebble.NoSync when locating where to add the test and to reuse the same commit-option setup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@monitoring/pebble.go`:
- Around line 176-195: SetFSMApplySyncMode on PebbleMetrics currently accepts
any activeLabel and can create arbitrary metric labels; change it so unknown
values fallback to "sync" before touching the gauge. In the SetFSMApplySyncMode
function, validate activeLabel against "sync" and "nosync" (treat anything else
as "sync"), then zero the known labels via
m.fsmApplySyncMode.WithLabelValues("sync").Set(0) and ("nosync").Set(0) and
finally set the validated label to 1; keep the nil checks for m and
m.fsmApplySyncMode intact.
---
Nitpick comments:
In `@store/lsm_store.go`:
- Around line 1177-1180: Add a functional test that mirrors the existing NoSync
coverage for ApplyMutations but targets DeletePrefixAt: create keys, call
DeletePrefixAt using the same tunable FSM commit option (pebble.NoSync / the
fsmApplyWriteOpts variant for NoSync), perform a clean reopen of the DB, and
assert that deletions are visible after reopen; this ensures DeletePrefixAt (the
method) uses the same commit path as ApplyMutations rather than falling back to
pebble.Sync. Include references to DeletePrefixAt, ApplyMutations,
fsmApplyWriteOpts and pebble.NoSync when locating where to add the test and to
reuse the same commit-option setup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 941ef56a-2487-4849-9307-87010ec04347
📒 Files selected for processing (8)
main.gomonitoring/pebble.gomonitoring/pebble_test.gomonitoring/registry.gostore/lsm_store.gostore/lsm_store_env_test.gostore/lsm_store_sync_mode_benchmark_test.gostore/lsm_store_sync_mode_test.go
There was a problem hiding this comment.
Code Review
This pull request introduces an optional "nosync" mode for FSM commits, allowing operators to prioritize performance over local durability when Raft log replay is sufficient for recovery. The changes include environment variable resolution for ELASTICKV_FSM_SYNC_MODE, updates to the Pebble store's commit logic, and new Prometheus metrics for observability. Review feedback points out a discrepancy between the metrics update documentation and its implementation regarding label resetting, and highlights potential thread-safety concerns in tests due to the use of global variables for configuration.
| // env var. Invoking again is safe and idempotent: the new label goes to | ||
| // 1 and all previously-set labels go to 0. |
There was a problem hiding this comment.
The docstring claims that invoking this method again is idempotent and that "all previously-set labels go to 0". However, the implementation only explicitly zeros the "sync" and "nosync" labels. If a custom label was previously set (which the method allows), it will remain at 1, violating the mutual exclusivity property described. Consider updating the docstring to clarify that only the documented modes are reset, or use m.fsmApplySyncMode.Reset() if the intention is to clear all series (though Reset() may cause gaps in Prometheus scrapes).
| func setFSMApplyWriteOptsForTest(t *testing.T, opts *pebble.WriteOptions, label string) { | ||
| t.Helper() | ||
| prevOpts := fsmApplyWriteOpts | ||
| prevLabel := fsmApplySyncModeLabel | ||
| fsmApplyWriteOpts = opts | ||
| fsmApplySyncModeLabel = label | ||
| t.Cleanup(func() { | ||
| fsmApplyWriteOpts = prevOpts | ||
| fsmApplySyncModeLabel = prevLabel | ||
| }) | ||
| } |
There was a problem hiding this comment.
The test helper setFSMApplyWriteOptsForTest modifies package-level global variables (fsmApplyWriteOpts and fsmApplySyncModeLabel). While it uses t.Cleanup to restore state, this approach is not thread-safe if tests within this package are ever run in parallel (e.g., via t.Parallel()). Although current tests do not appear to use parallelism, this global mutation is a potential source of flakiness for future tests. Consider refactoring the store to accept WriteOptions as a dependency in NewPebbleStore to avoid global state in tests. Note that per repository guidelines, wide-reaching API changes should be deferred to a dedicated follow-up pull request to manage complexity.
References
- During large refactorings, limit the scope of a single pull request. Defer related but wide-reaching API changes to dedicated follow-up pull requests to manage complexity and avoid scope creep.
gosec G115 flagged uint64(i) * 2 on the benchmark's MVCC timestamp math. The loop counter is non-negative by construction, but the linter cannot prove it. Add an explicit if i < 0 guard so the conversion is provably safe and the reviewdog warning clears. Addresses PR #592 review feedback from coderabbit / reviewdog.
SetFSMApplySyncMode previously accepted any label verbatim, which
broke the promised two-row gauge shape in two ways: (1) a prior
unknown label was left at 1 because only sync/nosync were zeroed
explicitly, and (2) the store's resolver maps unknown
ELASTICKV_FSM_SYNC_MODE values to sync, so the monitoring gauge
could disagree with the actual WriteOptions in use.
Coerce any label outside {sync, nosync} to sync before setting,
matching store.resolveFSMApplyWriteOpts's fallback. Update the
docstring and add a dedicated test that exercises the coercion
against a primed nosync posture.
Addresses PR #592 review feedback from coderabbit and gemini.
|
Thanks for the reviews. Pushed 56af1db and 1236dae addressing the actionable items: 1. gosec G115 in
|
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable FSM apply sync mode via the ELASTICKV_FSM_SYNC_MODE environment variable, allowing operators to choose between sync and nosync for Pebble write operations. It includes updates to the metrics registry to surface the active durability posture, comprehensive unit tests for environment variable parsing and functional equivalence, and benchmarks comparing the performance of the two modes. Feedback was provided regarding the use of package-level variables for core configuration, suggesting they be moved into the pebbleStore struct to improve test safety and avoid global state mutation.
| // fsmApplyWriteOpts is the Pebble WriteOptions value applied on the FSM | ||
| // commit path. Resolved once from ELASTICKV_FSM_SYNC_MODE at init() and | ||
| // then treated as read-only. Exposed as a package variable so tests can | ||
| // swap it via setFSMApplyWriteOptsForTest; production code must not | ||
| // mutate it after init(). | ||
| // | ||
| // The zero (unset) state is `pebble.Sync`, preserving legacy behaviour. | ||
| var fsmApplyWriteOpts = pebble.Sync |
There was a problem hiding this comment.
Using a package-level variable for core write path configuration is brittle and can lead to race conditions or flakiness in tests. While the init() logic ensures it is set once at startup, the test helpers (setFSMApplyWriteOptsForTest) mutate this global state, which is not thread-safe if tests are run in parallel.
Consider moving applyWriteOpts into the pebbleStore struct and initializing it in NewPebbleStore. This would allow for per-instance configuration and safer testing without relying on global state mutation.
|
Re-raising the same item. This PR intentionally defers the Justification (from the previous round): gemini's own reference note says "During large refactorings, limit the scope of a single pull request. Defer related but wide-reaching API changes to dedicated follow-up pull requests to manage complexity and avoid scope creep." This PR's scope is the Concretely for test safety: Leaving item as-is. No new commits. /gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable FSM apply sync mode via the ELASTICKV_FSM_SYNC_MODE environment variable, allowing operators to choose between "sync" (default) and "nosync" for Pebble write operations. The changes update ApplyMutations and DeletePrefixAt to respect the selected mode and add a Prometheus gauge to monitor the active durability posture. Feedback was provided to simplify the environment variable resolution logic by merging redundant switch cases.
| switch strings.ToLower(strings.TrimSpace(envVal)) { | ||
| case fsmSyncModeNoSync: | ||
| return pebble.NoSync, fsmSyncModeNoSync | ||
| case "", fsmSyncModeSync: | ||
| return pebble.Sync, fsmSyncModeSync | ||
| default: | ||
| return pebble.Sync, fsmSyncModeSync | ||
| } |
There was a problem hiding this comment.
The switch statement can be simplified by merging the redundant cases that fall back to the default sync behavior. This improves readability and maintainability by removing unnecessary code paths.
func resolveFSMApplyWriteOpts(envVal string) (*pebble.WriteOptions, string) {
switch strings.ToLower(strings.TrimSpace(envVal)) {
case fsmSyncModeNoSync:
return pebble.NoSync, fsmSyncModeNoSync
default:
return pebble.Sync, fsmSyncModeSync
}
}References
- Remove dead or unreachable code paths to improve code clarity and maintainability.
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…re field Per the gemini review on PR #592, migrate the FSM apply write options (and its sync-mode label) from package-level globals onto the pebbleStore struct so each store instance owns its resolved ELASTICKV_FSM_SYNC_MODE state. NewPebbleStore now calls resolveFSMApplyWriteOpts directly and stores the result; the ApplyMutations and DeletePrefixAt commit paths read s.fsmApplyWriteOpts instead of the package var. - Remove the package-level fsmApplyWriteOpts / fsmApplySyncModeLabel vars and the init() assignment that populated them. - Add fsmApplyWriteOpts and fsmApplySyncModeLabel fields on pebbleStore plus a (*pebbleStore).FSMApplySyncModeLabel() accessor. - Drop setFSMApplyWriteOptsForTest / setBenchFSMApplyWriteOpts helpers; replace with newPebbleStoreWithFSMApplyWriteOptsForTest which builds a real store and overrides the fields, so tests exercise the full construction path without mutating globals. - Main wiring now reads the label off the first built shard store via a fsmApplySyncModeLabeler type assertion after buildShardGroups, skipping the gauge update if no store exposes the accessor. Env-var parsing coverage (sync / nosync / mixed-case / whitespace / unknown / empty / garbage) remains exercised directly against resolveFSMApplyWriteOpts, and the NoSync functional-equivalence and reopen-visibility tests continue to validate the commit-path behaviour through the per-store override.
|
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: de873f5478
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // via ELASTICKV_FSM_SYNC_MODE=nosync when the raft WAL's durability is | ||
| // considered sufficient (raft-log replay from the last FSM snapshot | ||
| // re-applies any entries lost from Pebble after a crash). | ||
| if err := b.Commit(s.fsmApplyWriteOpts); err != nil { |
There was a problem hiding this comment.
Keep non-Raft ApplyMutations commits synced
Switching ApplyMutations to s.fsmApplyWriteOpts makes ELASTICKV_FSM_SYNC_MODE=nosync affect callers that do not have Raft-log replay as a durability backstop. A concrete production path is distribution.EnsureCatalogSnapshot -> CatalogStore.Save -> store.ApplyMutations, which writes catalog state directly; if the process crashes before Pebble flushes, those acknowledged writes can be lost with no Raft entry to re-apply. This change should be scoped to true FSM/raft-apply call paths only, or direct callers must continue to force pebble.Sync.
Useful? React with 👍 / 👎.
Codex flagged that the "Unsupported Commands by Name" (Redis summary) and "FSM Apply Sync Mode" (Pebble internals) panels query metrics not yet on main, so during an incident operators could mistake the empty panels for a scrape failure. Append a sentence to each panel's description calling out the dependency and that the panel will populate automatically once PR #594 / PR #592 merge.
…ctor Redis/Cluster dashboards (#597) ## Summary - Add **Elastickv Pebble Internals** dashboard covering block cache hit rate/capacity, L0 pressure, compactions, memtables, FSM apply sync mode, and store write conflicts. - Add **Unsupported Commands by Name** panel to the Redis dashboard (driven by PR #594 `elastickv_redis_unsupported_commands_total`) and annotate the legacy **Errors by Command** panel so operators know the `unknown` bucket is broken down by real name in the new panel. - Fold the standalone Redis hot-path (PR #560) dashboard into a collapsed row at the bottom of the Redis summary dashboard, delete the standalone file, and retitle the summary to **Elastickv Redis** so there is a single Redis-facing dashboard. - Rename the misnamed `elastickv-cluster-summary.json` (which was already a pure DynamoDB dashboard) to `elastickv-dynamodb.json` / **Elastickv DynamoDB** with `uid=elastickv-dynamodb`, and retitle `elastickv-cluster-overview.json` to **Elastickv Cluster** now that it is the sole cluster landing page. ## Before / After file list | State | File | | --- | --- | | Added | `monitoring/grafana/dashboards/elastickv-pebble-internals.json` | | Renamed | `elastickv-cluster-summary.json` to `elastickv-dynamodb.json` | | Retitled | `elastickv-cluster-overview.json` (title now `Elastickv Cluster`) | | Modified | `elastickv-redis-summary.json` (title `Elastickv Redis`; added Unsupported Commands panel and a collapsed Hot Path row) | | Deleted | `monitoring/grafana/dashboards/elastickv-redis-hotpath.json` | | Unchanged | `monitoring/grafana/dashboards/elastickv-raft-status.json` | ## What each dashboard now covers - **Elastickv Cluster** - leader identity, cluster-wide latency/error posture, per-node Raft health, and an embedded Redis row. - **Elastickv DynamoDB** - per-operation throughput/latency and per-table error/volume breakdown for the DynamoDB-compatible API. - **Elastickv Raft Status** - unchanged: membership, leader changes, failed proposals, node state, index drift, backlog, leader contact. - **Elastickv Redis** - per-command throughput/latency/errors, unsupported-commands breakdown, secondary-write and Lua diagnostics, and a collapsible `Hot Path (legacy PR #560)` row with the 9 GET-fast-path panels preserved verbatim. - **Elastickv Pebble Internals** - block cache hit rate and size-vs-capacity, hits/misses rate, L0 sublevels/files with write-stall thresholds, compaction debt/progress/rate, memtable count/size/zombies, FSM apply sync mode, store write conflicts by prefix. ## Dead-metric warnings Two metrics referenced by panels in this PR are **not yet present in `monitoring/*.go` on `main`**: - `elastickv_redis_unsupported_commands_total` (PR #594) - the spec asked for a panel targeting this and this PR adds one; it will stay empty until #594 lands. - `elastickv_pebble_fsm_apply_sync_mode` (PR #592) - same situation; panel is in place and will populate once #592 lands. Both are noted here so reviewers can decide whether to sequence this PR after the upstream ones. ## Test plan - [x] `python3 -c "import json; json.load(open('<file>'))"` passes on every modified dashboard file. - [x] Every dashboard exposes `schemaVersion`, `panels`, `time`, `title`, `uid`. - [x] Panel IDs are unique within each dashboard (including nested hot-path panels). - [x] No overlapping gridPos cells in the modified Redis dashboard (checked with a programmatic sweep). - [x] `go build ./...` succeeds after updating the four source comments / doc page that referenced the deleted hotpath JSON by path. - [x] `go test ./...` passes, including `./monitoring/...`. - [ ] Import each of the five JSON files into Grafana (10.x, schemaVersion 41) via UI; every panel renders without errors and the Hot Path row expands/collapses cleanly.
|
Follow-up on the codex P2 correctness concern raised during this PR's review ( Addressed in #600: splits Tests + lint green. See #600 for the before/after table and new coverage. |
Root cause
Every committed raft entry triggered a
pebble.Syncinsidestore.ApplyMutations/store.DeletePrefixAt(seestore/lsm_store.go:1056, 1108), on top of the raft WAL fsync thatetcd/raftalready performs perReadybatch inpersistReadyToWAL(internal/raftengine/etcd/wal_store.go:376).The raft Ready loop in
drainReady(internal/raftengine/etcd/engine.go:1389) already batches multiple entries perReady, so the raft WAL fsync is fine. The hot fsync is the FSM-sideb.Commit(pebble.Sync)that the apply loop (applyCommitted) hits once per entry. A prior cleanup (docs/review_todo.mdsection 3.4) intentionally keptApplyMutationsonpebble.Sync; this CL makes that choice tunable.Microbenchmark (Apple M1 Max, APFS tempdir,
BenchmarkApplyMutations_SyncMode):~1000x on this platform. Real hardware fsync latency varies, but the sync/nosync ratio is consistently large on any WAL that enforces platform durability.
Durability argument
Pebble's FSM-commit fsync is redundant with the raft WAL under this codebase's crash-recovery model:
etcd/raft) fsyncs every committed entry viapersist.SavebeforeAdvance.newMemoryStorage(internal/raftengine/etcd/persistence.go:352) reloads the snapshot + all WAL entries.newRawNodedoes not setConfig.Applied, soetcdraftdefaults it tosnapshot.Metadata.Index.e.applied = maxAppliedIndex(LocalSnap)and every committed entry past the snapshot is re-delivered throughCommittedEntrieson the firstReady.kv/fsm.applyCommitWithIdempotencyFallbacktreats an already-committed key (LatestCommitTS >= commitTS) as an idempotent retry, so replaying an entry whose effect survived the crash is safe.writeFSMSnapshotFilethenf.Sync()infsm_snapshot_file.go).Therefore a crash that loses the unfsynced tail of Pebble's own WAL is recoverable: raft replays from the last fsynced FSM snapshot onwards, and the idempotent apply path re-materialises the lost state. Pebble on the FSM commit path effectively becomes a volatile cache of applied state whose durability boundary is the raft WAL.
Other
pebble.Synccall sites (snapshot-batch commit, metadata-restore writes, compactioncommitSnapshotBatch) are untouched: those are orthogonal durability boundaries (e.g. restore-directory swap) and are not per-proposal cost.Env var + default
ELASTICKV_FSM_SYNC_MODE=sync(default) - current behaviour.ELASTICKV_FSM_SYNC_MODE=nosync-b.Commit(pebble.NoSync)on the FSM hot path. Raft WAL remains the durability boundary.Unknown values fall back to
sync(fail-safe toward durability). Parsing is case-insensitive and whitespace-tolerant.A Prometheus gauge
elastickv_fsm_apply_sync_mode{mode="sync"|"nosync"}is set atNewRegistrytime viastore.FSMApplySyncModeLabel(), so dashboards can alert if a rolling deploy accidentally flips the durability posture.Test plan
go test ./store/... ./monitoring/... ./kv/... ./internal/raftengine/... -count=1JEPSEN_TODO.mdBenchmark
Related
docs/review_todo.mdsection 3.4 (ApplyMutations retainedpebble.Sync).Summary by CodeRabbit
New Features
/metricsendpointTests