test(jepsen): add ZSet safety workload with model-based checker by bootjp · Pull Request #550 · bootjp/elastickv

bootjp · 2026-04-19T21:15:09Z

Summary

Adds a Jepsen workload (elastickv-zset-safety-test) that verifies ZSet-specific safety properties under faults (network partitions, node kills), using a custom model-based checker. Goes beyond the simple add->read visibility check.

Properties verified:

Score correctness: a ZRANGE result's score for a member must equal the model's latest committed score, OR equal a score written by an operation concurrent with the read.
Order preservation: ZRANGE 0 -1 must be sorted by (score asc, member lex asc).
ZRANGEBYSCORE correctness: bounded range queries return exactly the members whose score falls in the bound, modulo concurrent mutations.
No phantom members: every read member must have been introduced by some successful or in-flight ZADD/ZINCRBY.
Atomicity: the checker treats every :ok operation as atomic; any visible inconsistency is reported.

Concurrent-ZADD handling uses an invoke/complete windowing approach. A mutation is "committed before" a read iff its :complete index is strictly less than the read's :invoke index. Mutations whose intervals overlap are "concurrent" and contribute to a per-member allowed-score set. Indeterminate (:info) mutations are treated as possibly-concurrent. ZINCRBY whose response is unknown sets :unknown-score? so the checker skips the strict score check for concurrent reads. ZREM carries the actual removed? boolean from the server reply so a no-op ZREM does not falsely mark the member as deleted.

Files

New: jepsen/src/elastickv/redis_zset_safety_workload.clj
New: jepsen/test/elastickv/redis_zset_safety_workload_test.clj (test-spec construction + checker edge cases: no-op ZREM, :info ZINCRBY, deterministic score mismatch).
Modified: jepsen/src/elastickv/jepsen_test.clj (entry point added).
Modified: .github/workflows/jepsen-test.yml (5s smoke run on every push).
Modified: .github/workflows/jepsen-test-scheduled.yml (150s default run every 6h).

Running locally

The workload is invoked directly via its own -main, not through jepsen-test/-main:

cd jepsen
lein run -m elastickv.redis-zset-safety-workload \
  --time-limit 60 --rate 10 --concurrency 5 \
  --ports 63791,63792,63793 --host 127.0.0.1

(elastickv.jepsen-test exposes elastickv-zset-safety-test only as a Clojure function for REPL use; CI and ad-hoc runs use the namespace's own -main.

Test plan

passes (Java 21).
Run with partition + kill faults: confirm checker emits a clear failure when a stale-leader read returns a divergent score.
EOF
)

Summary by CodeRabbit

Tests
- Added a new safety test suite for Redis ZSet operations, validating data consistency and correctness under concurrent operations and fault conditions.
- Extended CI/CD pipeline to automatically run ZSet safety tests in both scheduled and on-demand test workflows.

Adds a Jepsen workload that goes beyond add->read visibility and verifies ZSet-specific safety properties under faults (network partitions, node kills): - score correctness: a ZRANGE result's score for a member must equal the model's latest committed score for that member, OR equal a score written by an operation that is concurrent with the read (since the client cannot linearise concurrent writes to the same member). - order preservation: ZRANGE 0 -1 must be sorted by (score asc, member lex asc). - ZRANGEBYSCORE correctness: bounded range queries return exactly the members whose score falls in the bound, modulo concurrent mutations. - no phantom members: every read member must have been introduced by some successful or in-flight ZADD/ZINCRBY. Concurrent-ZADD handling uses an invoke/complete windowing approach: mutations whose complete index < read's invoke index are committed before the read; mutations whose intervals overlap are concurrent and contribute to the per-member allowed-score set. Indeterminate (:info) mutations are treated as possibly-concurrent. Workload entry point added to jepsen_test.clj as elastickv-zset-safety-test.

gemini-code-assist · 2026-04-19T21:15:12Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

coderabbitai · 2026-04-19T21:15:15Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 38 minutes and 41 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 38 minutes and 41 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a177dc39-c2e2-45cc-8b55-8052a88e4000

📥 Commits

Reviewing files that changed from the base of the PR and between 9bfcc13 and 86d23dd.

📒 Files selected for processing (2)

jepsen/src/elastickv/redis_zset_safety_workload.clj
jepsen/test/elastickv/redis_zset_safety_workload_test.clj

📝 Walkthrough

Walkthrough

This pull request introduces a new Redis ZSet safety Jepsen workload (elastickv.redis-zset-safety-workload) to test elastickv's Redis implementation under faults. It includes CI integration in two scheduled/on-demand workflows, the workload implementation with a model-based checker, and comprehensive unit tests validating checker behavior.

Changes

Cohort / File(s)	Summary
CI Workflow Integration `.github/workflows/jepsen-test-scheduled.yml`, `.github/workflows/jepsen-test.yml`	Added new CI steps to run the Redis ZSet safety workload with timeout controls, rate limiting, and port/host configuration alongside existing Redis workload tests.
Workload Registration `jepsen/src/elastickv/jepsen_test.clj`	Added namespace require for `elastickv.redis-zset-safety-workload` and exposed a new `elastickv-zset-safety-test` function to integrate the ZSet safety workload into the CLI.
Workload Implementation `jepsen/src/elastickv/redis_zset_safety_workload.clj`	Implemented new Jepsen workload namespace with client operations (`:zadd`, `:zincrby`, `:zrem`, `:zrange-all`, `:zrangebyscore`), a model-based checker validating ordering/completeness/score consistency, fault configuration, and CLI option parsing.
Test Coverage `jepsen/test/elastickv/redis_zset_safety_workload_test.clj`	Added unit tests for workload construction, option overrides, and checker validation logic across edge cases (no-op ZREM, uncertain scores from `:info` operations, score mismatches).

Sequence Diagram

sequenceDiagram
    participant Client
    participant Redis
    participant Nemesis as Nemesis (Fault Injection)
    participant Checker

    Client->>Redis: ZADD/ZINCRBY/ZREM/ZRANGE operation
    Redis-->>Client: Response (or timeout/error)
    Note over Client: Record invoke/complete<br/>with type (:ok/:info)
    
    Nemesis->>Redis: Inject faults (partition/stop/etc.)
    Redis-->>Client: Degraded/failed responses
    
    Nemesis->>Redis: Heal/restore
    Redis-->>Client: Recover operations
    
    Checker->>Checker: Extract mutation windows<br/>per operation
    Checker->>Checker: Build per-member<br/>committed state
    Checker->>Checker: Validate ordering<br/>(score ↑, then member ↑)
    Checker->>Checker: Check score correctness<br/>vs committed/concurrent
    Checker->>Checker: Enforce completeness<br/>(no phantom members)
    Checker-->>Checker: Produce :valid?

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 A hop through the ZSet maze!

New safety workload hops in place,
Checking scores with fuzzy grace,
When faults crash through, the checker knows,
Each member's rank and all it owes—
Redis ZSets now stand the test! 🎯

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'test(jepsen): add ZSet safety workload with model-based checker' accurately summarizes the main change: introducing a new Jepsen test workload for ZSet safety with a model-based checker, which is reflected across all modified and added files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/jepsen-zset-safety

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- jepsen-test.yml: 5s smoke run on every push, mirroring the other workloads. - jepsen-test-scheduled.yml: 150s default run (overridable via workflow_dispatch inputs) every 6 hours. Workload entry: elastickv.redis-zset-safety-workload (added in the previous commit via -main).

Copilot

Pull request overview

Adds a new Jepsen workload to validate elastickv’s Redis ZSet safety properties (scores, ordering, range correctness, and phantom detection) under faults using a custom model-based checker, and wires a helper entrypoint into the Jepsen test namespace.

Changes:

Introduces redis_zset_safety_workload.clj with a Carmine-based client, randomized op generator, and a custom checker for ZSet safety properties.
Adds a new wrapper function in jepsen_test.clj intended to expose the new workload.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
jepsen/src/elastickv/redis_zset_safety_workload.clj	New ZSet safety Jepsen workload with custom model/checker and CLI entrypoint.
jepsen/src/elastickv/jepsen_test.clj	Adds a wrapper function and require for the new ZSet safety test.

+(defn elastickv-zset-safety-test
+  "Builds a Jepsen test map that drives elastickv's Redis ZSet safety
+  workload."
+  ([] (elastickv-zset-safety-test {}))
+  ([opts]
+   (let [nodes       (or (:nodes opts) default-nodes)
+         redis-ports (or (:redis-ports opts)
+                         (repeat (count nodes) (or (:redis-port opts) 6379)))
+         node->port  (or (:node->port opts)
+                         (cli/ports->node-map redis-ports nodes))
+         local?      (:local opts)
+         db          (if local?
+                       jdb/noop
+                       (ekdb/db {:grpc-port  (or (:grpc-port opts) 50051)
+                                 :redis-port node->port
+                                 :raft-groups (:raft-groups opts)
+                                 :shard-ranges (:shard-ranges opts)}))
+         rate        (double (or (:rate opts) 10))
+         time-limit  (or (:time-limit opts) 60)
+         faults      (if local?
+                       []
+                       (cli/normalize-faults (or (:faults opts) [:partition :kill])))
+         nemesis-p   (when-not local?
+                       (combined/nemesis-package {:db       db
+                                                  :faults   faults
+                                                  :interval (or (:fault-interval opts) 40)}))
+         nemesis-gen (if nemesis-p
+                       (:generator nemesis-p)
+                       (gen/once {:type :info :f :noop}))
+         workload    (elastickv-zset-safety-workload
+                       (assoc opts :node->port node->port))]
+     (merge workload
+            {:name        (or (:name opts) "elastickv-redis-zset-safety")
+             :nodes       nodes
+             :db          db
+             :redis-host  (:redis-host opts)
+             :os          (if local? os/noop debian/os)
+             :net         (if local? net/noop net/iptables)
+             :ssh         (merge {:username             "vagrant"
+                                  :private-key-path     "/home/vagrant/.ssh/id_rsa"
+                                  :strict-host-key-checking false}
+                                 (when local? {:dummy true})
+                                 (:ssh opts))
+             :remote      control/ssh
+             :nemesis     (if nemesis-p (:nemesis nemesis-p) nemesis/noop)
+             :final-generator nil
+             :concurrency (or (:concurrency opts) 5)
+             :generator   (->> (:generator workload)
+                               (gen/nemesis nemesis-gen)
+                               (gen/stagger (/ rate))
+                               (gen/time-limit time-limit))}))))


+            :zrem
+            (let [m (:value invoke)]
+              {:f :zrem :member m :score nil :zrem? true
+               :type t :invoke-idx inv-idx :complete-idx cmp-idx})))))


+            :zincrby
+            (let [[m _delta] (:value invoke)
+                  s (when (and (= :ok t) (vector? (:value complete)))
+                      (second (:value complete)))]
+              {:f :zincrby :member m :score (some-> s double)
+               :type t :invoke-idx inv-idx :complete-idx cmp-idx})


+  strictly precede `read-inv-idx`. Model maps member -> {:score s} or
+  marks member as :deleted. Returns {:members map :ok-members set}.
+  Only considers :ok mutations for the authoritative model; :info
+  mutations are treated as uncertain (neither strictly applied nor not)."


+(defn elastickv-zset-safety-test []
+  (zset-safety-workload/elastickv-zset-safety-test {}))
+
 (defn -main
  [& args]
  (cli/run! (cli/single-test-cmd {:test-fn elastickv-test}) args))


Three Copilot findings on PR #550: 1. :zincrby indeterminate handling. Pending or :info ZINCRBY left the resulting score unknown, but the checker still required the observed read score to be in the finite allowed-scores set. A read that legitimately observed an in-flight increment was flagged as a score mismatch (false positive). completed-mutation-window now sets :unknown-score? on a ZINCRBY when the completion is :info or pending. allowed-scores-for-member returns :unknown-score? when any concurrent ZINCRBY carries the flag, and check-zrange-all / check-zrangebyscore skip the strict score-membership check in that case. 2. :zrem no-op handling. ZREM of a never-added member returns 0 server-side (no-op). The previous model treated every ZREM as a deletion, producing missing-member false positives and score-mismatch false negatives. invoke! already exposes the actual removed? boolean as the second element of the completion value. completed-mutation-window now threads :removed? through, and the new apply-mutation-to-state helper leaves state unchanged when :removed? is false. 3. model-before docstring claimed it returned {:members map :ok-members set}, but it returned the model map directly. Docstring rewritten to match the actual return value. Adds jepsen/test/elastickv/redis_zset_safety_workload_test.clj covering test-spec construction, the no-op ZREM edge case, the :info ZINCRBY skip, and a positive-control score-mismatch detection. The checker tests bypass the timeline.html sub-checker (which writes to the test store) by invoking zset-safety-checker directly.

bootjp · 2026-04-19T21:59:39Z

94be1bd で Copilot 指摘を fix:

:zincrby indeterminate false positive: pending/:info ZINCRBY の resulting score 不明分を :unknown-score? フラグで伝播。allowed-scores-for-member が concurrent に unknown ZINCRBY を含む場合、strict score check を skip。
:zrem no-op false positive: invoke! が既に返している removed? boolean を mutation record に thread。:removed? false の ZREM は state を変更しない (apply-mutation-to-state ヘルパーで分岐)。
model-before docstring: 実装に合わせて修正。
PR description: lein run -m elastickv.redis-zset-safety-workload 直接呼び出し方式を明記 (CI と整合)。
Workload unit test: jepsen/test/elastickv/redis_zset_safety_workload_test.clj 追加。test-spec 構築、no-op ZREM, :info ZINCRBY, positive-control score mismatch を deterministic に検証。timeline.html サブチェッカーは store を要求するので bypass し、zset-safety-checker を直接呼び出し。

lein test elastickv.redis-zset-safety-workload-test 全 5 件 PASS (Java 21 環境で確認)。

/gemini review

gemini-code-assist · 2026-04-19T21:59:42Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…re-mismatch CI failure on 94be1bd surfaced a remaining checker false positive: a read that observes a member whose only prior ops are no-op ZREMs was classified as :score-mismatch with :allowed #{} rather than treated as a never-existed member. allowed-scores-for-member returned any-known? = true because (seq concurrent) was truthy even when every concurrent op was a :zrem that didn't actually remove anything. Fix: any-known? (renamed internally to existence-evidence?) now only counts concurrent mutations that provide evidence the member ever existed -- :zadd, :zincrby, or a :zrem whose :removed? boolean is true. A concurrent no-op ZREM contributes nothing. Adds no-op-zrem-alone-does-not-false-positive as a regression test. All 6 workload unit tests pass under Java 21.

bootjp · 2026-04-20T06:16:01Z

9bfcc13 で CI 失敗原因の checker false positive を追加 fix:

no-op ZREM のみで score-mismatch 誤検出: CI 結果 :allowed #{} の score-mismatch が発生。any-known? が (seq concurrent) truthy を返していたため、concurrent が全部 no-op ZREM でも score 判定パスに入っていた。
修正: existence-evidence? に名前を改め、concurrent mutation のうち そのメンバが存在したことの証拠 だけカウント (:zadd / :zincrby / :removed? true の :zrem)。no-op ZREM は存在証拠にならない。
regression test 追加 (no-op-zrem-alone-does-not-false-positive)。unit test 計 6 件 PASS。

/gemini review

gemini-code-assist · 2026-04-20T06:16:04Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

jepsen/test/elastickv/redis_zset_safety_workload_test.clj (1)

42-89: Add a direct :zrangebyscore checker regression.

The checker has a separate bounded-range path, but these edge-case tests only exercise :zrange-all. A small missing-member or out-of-range regression would protect the advertised ZRANGEBYSCORE property.

Example test to cover bounded-range completeness

+(deftest zrangebyscore-missing-member-is-detected
+  (let [history [{:type :invoke :process 0 :f :zadd :value ["m1" 5] :index 0}
+                 {:type :ok     :process 0 :f :zadd :value ["m1" 5] :index 1}
+                 {:type :invoke :process 0 :f :zrangebyscore :value [0 10] :index 2}
+                 {:type :ok     :process 0 :f :zrangebyscore
+                  :value {:bounds [0 10] :members []}
+                  :index 3}]
+        result  (run-checker history)]
+    (is (not (:valid? result)) (str "expected range mismatch, got: " result))))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@jepsen/test/elastickv/redis_zset_safety_workload_test.clj` around lines 42 -
89, Add a new test that exercises the bounded-range code path by invoking
:zrangebyscore instead of :zrange-all so the checker’s ZRANGEBYSCORE logic is
covered; create a test (e.g. noop-zrem-does-not-flag-bounded-zrangebyscore or
similar) that mirrors one of the existing edge-case histories but uses
:zrangebyscore (with appropriate score bounds/values) and asserts run-checker
returns :valid? or not as expected, ensuring you reference the same run-checker
invocation and operation symbols (:zadd, :zrem, :zincrby, :zrangebyscore) so the
new test hits the bounded-range branch.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@jepsen/src/elastickv/redis_zset_safety_workload.clj`:
- Around line 386-416: The read result must be checked for duplicate members to
prevent a ZSet read returning the same member twice; update check-zrange-all
(and the similar function handling ranges around lines 430-464) to detect
duplicate member entries before per-entry validation by scanning entries and
maintaining a seen-members set, and if a member is already seen swap! the errors
atom with a {:kind :duplicate :index cmp-idx :member member :entries entries}
(or similar) error entry so duplicate-member reads are reported and rejected.
- Around line 313-318: The current logic builds applied as (->> muts (filter ...
(< (:complete-idx %) read-inv-idx)) (sort-by :complete-idx)) and then reduces
via apply-mutation-to-state, which incorrectly linearizes overlapping :ok
mutations by completion time; change this to detect overlapping committed
mutations (using their :invoke-idx and :complete-idx intervals from muts) and do
not sort/serialize ambiguous pairs by :complete-idx. Instead compute either the
set of all possible latest states for the read by merging non-overlapping
mutations deterministically and treating overlapping/conflicting writes
conservatively (e.g., allow values from any write whose interval is not ordered
before the read), and update the applied construction and reduction via
apply-mutation-to-state to use that conservative/possible-states approach; apply
the same fix at the other occurrence around lines 347-352.

---

Nitpick comments:
In `@jepsen/test/elastickv/redis_zset_safety_workload_test.clj`:
- Around line 42-89: Add a new test that exercises the bounded-range code path
by invoking :zrangebyscore instead of :zrange-all so the checker’s ZRANGEBYSCORE
logic is covered; create a test (e.g.
noop-zrem-does-not-flag-bounded-zrangebyscore or similar) that mirrors one of
the existing edge-case histories but uses :zrangebyscore (with appropriate score
bounds/values) and asserts run-checker returns :valid? or not as expected,
ensuring you reference the same run-checker invocation and operation symbols
(:zadd, :zrem, :zincrby, :zrangebyscore) so the new test hits the bounded-range
branch.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e12c0d39-43a2-4d57-a8a3-f49ade75baba

📥 Commits

Reviewing files that changed from the base of the PR and between 173fbbc and 9bfcc13.

📒 Files selected for processing (5)

.github/workflows/jepsen-test-scheduled.yml
.github/workflows/jepsen-test.yml
jepsen/src/elastickv/jepsen_test.clj
jepsen/src/elastickv/redis_zset_safety_workload.clj
jepsen/test/elastickv/redis_zset_safety_workload_test.clj

Three Major-severity CodeRabbit findings on PR #550: 1. Duplicate-member detection (line 416): a ZSet read must return each member at most once. Previously, if ZRANGE returned the same member twice with an allowed score, the checker accepted it because sort and score-membership checks passed independently per entry. duplicate-members helper now flags :duplicate-members (and :duplicate-members-range for ZRANGEBYSCORE) before the per-entry loop. 2. Overlapping committed writes (line 318): two :ok mutations whose invoke/complete windows overlap have ambiguous serialization order. Pinning allowed-scores to a single last-wins linearization by :complete-idx was unsound. allowed-scores-for-member now unions all :zadd/:ok-:zincrby scores from committed mutations (over-approx that stays sound), and must-be-present? is relaxed when any pair of committed writes for the same member overlaps in time. 3. Pre-read :info mutations (line 328): a mutation recorded as :info whose completion precedes a later read's invoke may have taken effect server-side. Previously it was ignored by both model-before (:ok only) and the concurrent window (complete-idx >= read-inv-idx required). Now collected as pre-read-info, contributing to allowed scores and flipping unknown-score? for :zincrby with unknown resulting score. 3 new regression tests (duplicate-members-are-flagged, overlapping- committed-zadds-allow-either-score, info-before-read-is-considered- uncertain). Workload unit test count now 9, all PASS under Java 21.

bootjp · 2026-04-20T06:24:55Z

6d0b4c3 で CodeRabbit Major 3 件を fix:

重複メンバー: duplicate-members ヘルパーで ZRANGE / ZRANGEBYSCORE 結果の重複を :duplicate-members / :duplicate-members-range として明示検知。sort / score-membership の個別チェックでは検知できなかったケースを閉じた。
重複する committed writes の線形化: :complete-idx 順の単一直列化は 2 つの :ok mutation の window が重なる場合に unsafe。allowed-scores-for-member を「committed の全 :zadd / :ok :zincrby score の union」に広げ (sound な over-approx)、must-be-present? は committed 間に重なりがあるときリラックス。
read 前完了の :info: pre-read-info として抽出し、allowed-scores と unknown-score? の判定に含めた。以前は model-before (:ok のみ) にも concurrent 窓 (complete-idx >= read-inv-idx 必須) にも入らず無視されていた。

regression test 3 件追加 (duplicate-members-are-flagged / overlapping-committed-zadds-allow-either-score / info-before-read-is-considered-uncertain)。unit test 計 9 件 PASS。

/gemini review

gemini-code-assist · 2026-04-20T06:24:58Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

bootjp · 2026-04-20T09:33:02Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new Jepsen workload and model-based checker to verify Redis ZSet safety properties in elastickv. It includes a client for Redis operations, a generator for high-contention workloads, and a checker that validates score correctness, ordering, and atomicity. Feedback focuses on handling infinite scores in Redis to prevent parsing exceptions, improving the soundness of the linearizability model in allowed-scores-for-member to prevent stale reads, and optimizing the efficiency of mutation checks.

gemini-code-assist · 2026-04-20T09:34:49Z

+(defn- parse-withscores
+  "Carmine returns a flat [member score member score ...] vector for
+  ZRANGE WITHSCORES. Convert to a sorted vector of [member (double score)]
+  preserving server-returned order (score ascending, then member)."
+  [flat]
+  (->> flat
+       (partition 2)
+       (mapv (fn [[m s]]
+               [(if (bytes? m) (String. ^bytes m) (str m))
+                (Double/parseDouble (str s))]))))


The use of Double/parseDouble directly on Redis score strings will fail if the score is infinite. Redis returns "inf", "+inf", or "-inf" for infinite values, but Java's Double/parseDouble expects "Infinity" or "-Infinity". This will cause a NumberFormatException during the test if infinite scores are encountered.

(defn- parse-double-safe "Parses a Redis score string into a Double, handling 'inf' representations." [s] (let [s-str (str/lower-case (str s))] (cond (str/includes? s-str "inf") (if (str/starts-with? s-str "-") Double/NEGATIVE_INFINITY Double/POSITIVE_INFINITY) :else (Double/parseDouble (str s))))) (defn- parse-withscores "Carmine returns a flat [member score member score ...] vector for ZRANGE WITHSCORES. Convert to a sorted vector of [member (double score)] preserving server-returned order (score ascending, then member)." [flat] (->> flat (partition 2) (mapv (fn [[m s]] [(if (bytes? m) (String. ^bytes m) (str m)) (parse-double-safe s)]))))

gemini-code-assist · 2026-04-20T09:34:49Z

+          (let [[member delta] (:value op)
+                new-score (car/wcar cs (car/zincrby zset-key (double delta) member))]
+            (assoc op :type :ok
+                   :value [member (Double/parseDouble (str new-score))]))


As noted in the parse-withscores feedback, Double/parseDouble should be replaced with a safe parser that handles Redis's "inf" strings.

:value [member (parse-double-safe new-score)]]))

gemini-code-assist · 2026-04-20T09:34:49Z

+(defn- allowed-scores-for-member
+  "Compute the set of scores considered valid for `member` by a read
+  whose window is [read-inv-idx, read-cmp-idx], based on committed state
+  and any concurrent mutations.
+
+  Returns:
+    :scores            - set of acceptable scores (committed + concurrent
+                         :zadd / :ok :zincrby).
+    :unknown-score?    - true iff any concurrent ZINCRBY's resulting score
+                         is unknown (in-flight or :info). When set, the
+                         caller MUST skip the strict score-membership
+                         check to stay sound.
+    :must-be-present?  - committed state says present and no concurrent
+                         mutation could have removed/changed it.
+    :any-known?        - some op claims to have touched this member."
+  [mutations-by-m member read-inv-idx read-cmp-idx]
+  (let [muts (get mutations-by-m member [])
+        ;; :ok mutations that completed strictly before the read. They
+        ;; may have overlapped with each other in wall-clock time, so
+        ;; the serialization order among them is ambiguous.
+        committed (->> muts
+                       (filter #(and (= :ok (:type %))
+                                     (some? (:complete-idx %))
+                                     (< (:complete-idx %) read-inv-idx))))
+        ;; :info mutations that completed before the read: they may or
+        ;; may not have taken effect server-side. We must account for
+        ;; their possible scores just like concurrent ones.
+        pre-read-info (->> muts
+                           (filter #(and (= :info (:type %))
+                                         (some? (:complete-idx %))
+                                         (< (:complete-idx %) read-inv-idx))))
+        ;; Concurrent mutations: windows overlap the read. Include both
+        ;; :ok and :info since either may have taken effect.
+        concurrent (concurrent-mutations-for-member muts read-inv-idx read-cmp-idx)
+        ;; A conservative last-wins linearization for the must-be-present?
+        ;; check only. Ambiguous when committed writes overlap each other.
+        committed-sorted (sort-by :complete-idx committed)
+        committed-state (reduce apply-mutation-to-state nil committed-sorted)
+        committed-overlap? (boolean
+                             (some (fn [[a b]]
+                                     (and (not (identical? a b))
+                                          (<= (:invoke-idx a) (:complete-idx b))
+                                          (<= (:invoke-idx b) (:complete-idx a))))
+                                   (for [a committed, b committed] [a b])))
+        ;; Union of every score that any committed / pre-read :info /
+        ;; concurrent op could have produced. This over-approximates the
+        ;; legitimate post-state set when writes overlap, keeping the
+        ;; checker sound at the cost of being slightly less strict on
+        ;; overlapping concurrent writers.
+        add-scores (fn [acc m]
+                     (case (:f m)
+                       :zadd    (conj acc (:score m))
+                       :zincrby (cond-> acc (some? (:score m)) (conj (:score m)))
+                       :zrem    acc))
+        scores (as-> #{} s
+                 (reduce add-scores s committed)
+                 (reduce add-scores s pre-read-info)
+                 (reduce add-scores s concurrent))
+        unknown-score? (or
+                         (some #(and (= :zincrby (:f %)) (:unknown-score? %))
+                               concurrent)
+                         (some #(and (= :zincrby (:f %)) (:unknown-score? %))
+                               pre-read-info))
+        ;; any-known? must only be true when something provides evidence
+        ;; the member actually existed at some point. A no-op ZREM
+        ;; (:removed? false) does NOT prove existence.
+        existence-evidence? (or (some #(case (:f %)
+                                         (:zadd :zincrby) true
+                                         :zrem            (:removed? %))
+                                      committed)
+                                (some #(case (:f %)
+                                         (:zadd :zincrby) true
+                                         :zrem            (:removed? %))
+                                      pre-read-info)
+                                (some #(case (:f %)
+                                         (:zadd :zincrby) true
+                                         :zrem            (:removed? %))
+                                      concurrent))]
+      {:scores scores
+       :unknown-score? (boolean unknown-score?)
+       ;; must-be-present? is relaxed when committed writes overlap
+       ;; among themselves or when any :info / concurrent mutation could
+       ;; have removed the member before the read.
+       :must-be-present? (boolean (and committed-state
+                                       (:present? committed-state)
+                                       (not committed-overlap?)
+                                       (empty? pre-read-info)
+                                       (empty? concurrent)))
+       :any-known? (boolean existence-evidence?)}))


The current implementation of allowed-scores-for-member is too permissive and potentially unsound for linearizability checks.

Stale Reads: It includes all successful committed scores in the scores set (line 384), even if they were strictly followed by a later committed mutation. In a linearizable system, a read should only observe scores from the "potentially latest" committed writes (those not strictly followed by another) or concurrent ones.

Stale Presence: The checker does not verify if a member is allowed to be present. If a member was added and then deleted (both committed before the read), any-known? remains true, and the read will accept the member even though it should be absent.

Efficiency: The committed-overlap? check (line 368) is $O(N^2)$ relative to the number of mutations for a member. While the member universe is small, this can be optimized to $O(N)$ by tracking the maximum invoke-idx of preceding mutations.

(defn- allowed-scores-for-member "Compute the set of scores considered valid for `member` by a read whose window is [read-inv-idx, read-cmp-idx], based on committed state and any concurrent mutations." [mutations-by-m member read-inv-idx read-cmp-idx] (let [muts (get mutations-by-m member []) ;; Mutations that completed strictly before the read started. preceding (filter #(and (= :ok (:type %)) (some? (:complete-idx %)) (< (:complete-idx %) read-inv-idx)) muts) ;; A mutation is a candidate for the 'latest' state if no other ;; committed mutation strictly follows it (m2.invoke > m.complete). max-inv (reduce max -1 (map :invoke-idx preceding)) candidates (filter #(>= (:complete-idx %) max-inv) preceding) concurrent (concurrent-mutations-for-member muts read-inv-idx read-cmp-idx) pre-read-info (filter #(and (= :info (:type %)) (some? (:complete-idx %)) (< (:complete-idx %) read-inv-idx)) muts) ;; Scores allowed by committed state or uncertain ops. scores (into (set (keep :score candidates)) (concat (keep :score concurrent) (keep :score pre-read-info))) unknown-score? (some #(and (= :zincrby (:f %)) (:unknown-score? %)) (concat concurrent pre-read-info)) ;; Allowed to be present if any candidate or concurrent/info op is a write. can-be-present? (or (some #(#{:zadd :zincrby} (:f %)) candidates) (some #(#{:zadd :zincrby} (:f %)) concurrent) (some #(#{:zadd :zincrby} (:f %)) pre-read-info)) ;; Must be present if all candidates are writes and no concurrent/info op can delete. must-be-present? (and (seq candidates) (every? #(#{:zadd :zincrby} (:f %)) candidates) (not (some #(= :zrem (:f %)) concurrent)) (not (some #(= :zrem (:f %)) pre-read-info)))] {:scores scores :unknown-score? (boolean unknown-score?) :must-be-present? must-be-present? :can-be-present? can-be-present?}))

gemini-code-assist · 2026-04-20T09:34:49Z

+      (let [{:keys [scores any-known? unknown-score?]}
+            (allowed-scores-for-member mutations-by-m member inv-idx cmp-idx)]
+        (cond
+          (not any-known?)
+          (swap! errors conj {:kind :phantom
+                              :index cmp-idx
+                              :member member
+                              :score score})


The phantom check should be replaced with a more robust can-be-present? check to catch both phantoms (never existed) and stale reads (present after a committed deletion).

(doseq [[member score] entries] (let [{:keys [scores can-be-present? unknown-score?]} (allowed-scores-for-member mutations-by-m member inv-idx cmp-idx)] (cond (not can-be-present?) (swap! errors conj {:kind :unexpected-presence :index cmp-idx :member member :score score})

bootjp requested a review from Copilot April 19, 2026 21:15

Copilot started reviewing on behalf of bootjp April 19, 2026 21:16 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread jepsen/src/elastickv/redis_zset_safety_workload.clj

Comment thread jepsen/src/elastickv/redis_zset_safety_workload.clj

Comment thread jepsen/src/elastickv/redis_zset_safety_workload.clj

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

Merge branch 'main' into feat/jepsen-zset-safety

86d23dd

Conversation

bootjp commented Apr 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files

Running locally

Test plan

Summary by CodeRabbit

Uh oh!

gemini-code-assist Bot commented Apr 19, 2026

Uh oh!

coderabbitai Bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

bootjp commented Apr 19, 2026

Uh oh!

gemini-code-assist Bot commented Apr 19, 2026

Uh oh!

bootjp commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot commented Apr 20, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bootjp commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot commented Apr 20, 2026

Uh oh!

bootjp commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bootjp commented Apr 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 19, 2026 •

edited

Loading