test(jepsen): add ZSet safety workload with model-based checker#550
test(jepsen): add ZSet safety workload with model-based checker#550
Conversation
Adds a Jepsen workload that goes beyond add->read visibility and verifies ZSet-specific safety properties under faults (network partitions, node kills): - score correctness: a ZRANGE result's score for a member must equal the model's latest committed score for that member, OR equal a score written by an operation that is concurrent with the read (since the client cannot linearise concurrent writes to the same member). - order preservation: ZRANGE 0 -1 must be sorted by (score asc, member lex asc). - ZRANGEBYSCORE correctness: bounded range queries return exactly the members whose score falls in the bound, modulo concurrent mutations. - no phantom members: every read member must have been introduced by some successful or in-flight ZADD/ZINCRBY. Concurrent-ZADD handling uses an invoke/complete windowing approach: mutations whose complete index < read's invoke index are committed before the read; mutations whose intervals overlap are concurrent and contribute to the per-member allowed-score set. Indeterminate (:info) mutations are treated as possibly-concurrent. Workload entry point added to jepsen_test.clj as elastickv-zset-safety-test.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 38 minutes and 41 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis pull request introduces a new Redis ZSet safety Jepsen workload ( Changes
Sequence DiagramsequenceDiagram
participant Client
participant Redis
participant Nemesis as Nemesis (Fault Injection)
participant Checker
Client->>Redis: ZADD/ZINCRBY/ZREM/ZRANGE operation
Redis-->>Client: Response (or timeout/error)
Note over Client: Record invoke/complete<br/>with type (:ok/:info)
Nemesis->>Redis: Inject faults (partition/stop/etc.)
Redis-->>Client: Degraded/failed responses
Nemesis->>Redis: Heal/restore
Redis-->>Client: Recover operations
Checker->>Checker: Extract mutation windows<br/>per operation
Checker->>Checker: Build per-member<br/>committed state
Checker->>Checker: Validate ordering<br/>(score ↑, then member ↑)
Checker->>Checker: Check score correctness<br/>vs committed/concurrent
Checker->>Checker: Enforce completeness<br/>(no phantom members)
Checker-->>Checker: Produce :valid?
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- jepsen-test.yml: 5s smoke run on every push, mirroring the other workloads. - jepsen-test-scheduled.yml: 150s default run (overridable via workflow_dispatch inputs) every 6 hours. Workload entry: elastickv.redis-zset-safety-workload (added in the previous commit via -main).
There was a problem hiding this comment.
Pull request overview
Adds a new Jepsen workload to validate elastickv’s Redis ZSet safety properties (scores, ordering, range correctness, and phantom detection) under faults using a custom model-based checker, and wires a helper entrypoint into the Jepsen test namespace.
Changes:
- Introduces
redis_zset_safety_workload.cljwith a Carmine-based client, randomized op generator, and a custom checker for ZSet safety properties. - Adds a new wrapper function in
jepsen_test.cljintended to expose the new workload.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| jepsen/src/elastickv/redis_zset_safety_workload.clj | New ZSet safety Jepsen workload with custom model/checker and CLI entrypoint. |
| jepsen/src/elastickv/jepsen_test.clj | Adds a wrapper function and require for the new ZSet safety test. |
| (defn elastickv-zset-safety-test | ||
| "Builds a Jepsen test map that drives elastickv's Redis ZSet safety | ||
| workload." | ||
| ([] (elastickv-zset-safety-test {})) | ||
| ([opts] | ||
| (let [nodes (or (:nodes opts) default-nodes) | ||
| redis-ports (or (:redis-ports opts) | ||
| (repeat (count nodes) (or (:redis-port opts) 6379))) | ||
| node->port (or (:node->port opts) | ||
| (cli/ports->node-map redis-ports nodes)) | ||
| local? (:local opts) | ||
| db (if local? | ||
| jdb/noop | ||
| (ekdb/db {:grpc-port (or (:grpc-port opts) 50051) | ||
| :redis-port node->port | ||
| :raft-groups (:raft-groups opts) | ||
| :shard-ranges (:shard-ranges opts)})) | ||
| rate (double (or (:rate opts) 10)) | ||
| time-limit (or (:time-limit opts) 60) | ||
| faults (if local? | ||
| [] | ||
| (cli/normalize-faults (or (:faults opts) [:partition :kill]))) | ||
| nemesis-p (when-not local? | ||
| (combined/nemesis-package {:db db | ||
| :faults faults | ||
| :interval (or (:fault-interval opts) 40)})) | ||
| nemesis-gen (if nemesis-p | ||
| (:generator nemesis-p) | ||
| (gen/once {:type :info :f :noop})) | ||
| workload (elastickv-zset-safety-workload | ||
| (assoc opts :node->port node->port))] | ||
| (merge workload | ||
| {:name (or (:name opts) "elastickv-redis-zset-safety") | ||
| :nodes nodes | ||
| :db db | ||
| :redis-host (:redis-host opts) | ||
| :os (if local? os/noop debian/os) | ||
| :net (if local? net/noop net/iptables) | ||
| :ssh (merge {:username "vagrant" | ||
| :private-key-path "/home/vagrant/.ssh/id_rsa" | ||
| :strict-host-key-checking false} | ||
| (when local? {:dummy true}) | ||
| (:ssh opts)) | ||
| :remote control/ssh | ||
| :nemesis (if nemesis-p (:nemesis nemesis-p) nemesis/noop) | ||
| :final-generator nil | ||
| :concurrency (or (:concurrency opts) 5) | ||
| :generator (->> (:generator workload) | ||
| (gen/nemesis nemesis-gen) | ||
| (gen/stagger (/ rate)) | ||
| (gen/time-limit time-limit))})))) |
| :zrem | ||
| (let [m (:value invoke)] | ||
| {:f :zrem :member m :score nil :zrem? true | ||
| :type t :invoke-idx inv-idx :complete-idx cmp-idx}))))) |
| :zincrby | ||
| (let [[m _delta] (:value invoke) | ||
| s (when (and (= :ok t) (vector? (:value complete))) | ||
| (second (:value complete)))] | ||
| {:f :zincrby :member m :score (some-> s double) | ||
| :type t :invoke-idx inv-idx :complete-idx cmp-idx}) |
| strictly precede `read-inv-idx`. Model maps member -> {:score s} or | ||
| marks member as :deleted. Returns {:members map :ok-members set}. | ||
| Only considers :ok mutations for the authoritative model; :info | ||
| mutations are treated as uncertain (neither strictly applied nor not)." |
| (defn elastickv-zset-safety-test [] | ||
| (zset-safety-workload/elastickv-zset-safety-test {})) | ||
|
|
||
| (defn -main | ||
| [& args] | ||
| (cli/run! (cli/single-test-cmd {:test-fn elastickv-test}) args)) |
Three Copilot findings on PR #550: 1. :zincrby indeterminate handling. Pending or :info ZINCRBY left the resulting score unknown, but the checker still required the observed read score to be in the finite allowed-scores set. A read that legitimately observed an in-flight increment was flagged as a score mismatch (false positive). completed-mutation-window now sets :unknown-score? on a ZINCRBY when the completion is :info or pending. allowed-scores-for-member returns :unknown-score? when any concurrent ZINCRBY carries the flag, and check-zrange-all / check-zrangebyscore skip the strict score-membership check in that case. 2. :zrem no-op handling. ZREM of a never-added member returns 0 server-side (no-op). The previous model treated every ZREM as a deletion, producing missing-member false positives and score-mismatch false negatives. invoke! already exposes the actual removed? boolean as the second element of the completion value. completed-mutation-window now threads :removed? through, and the new apply-mutation-to-state helper leaves state unchanged when :removed? is false. 3. model-before docstring claimed it returned {:members map :ok-members set}, but it returned the model map directly. Docstring rewritten to match the actual return value. Adds jepsen/test/elastickv/redis_zset_safety_workload_test.clj covering test-spec construction, the no-op ZREM edge case, the :info ZINCRBY skip, and a positive-control score-mismatch detection. The checker tests bypass the timeline.html sub-checker (which writes to the test store) by invoking zset-safety-checker directly.
|
94be1bd で Copilot 指摘を fix:
/gemini review |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
…re-mismatch CI failure on 94be1bd surfaced a remaining checker false positive: a read that observes a member whose only prior ops are no-op ZREMs was classified as :score-mismatch with :allowed #{} rather than treated as a never-existed member. allowed-scores-for-member returned any-known? = true because (seq concurrent) was truthy even when every concurrent op was a :zrem that didn't actually remove anything. Fix: any-known? (renamed internally to existence-evidence?) now only counts concurrent mutations that provide evidence the member ever existed -- :zadd, :zincrby, or a :zrem whose :removed? boolean is true. A concurrent no-op ZREM contributes nothing. Adds no-op-zrem-alone-does-not-false-positive as a regression test. All 6 workload unit tests pass under Java 21.
|
9bfcc13 で CI 失敗原因の checker false positive を追加 fix:
/gemini review |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
jepsen/test/elastickv/redis_zset_safety_workload_test.clj (1)
42-89: Add a direct:zrangebyscorechecker regression.The checker has a separate bounded-range path, but these edge-case tests only exercise
:zrange-all. A small missing-member or out-of-range regression would protect the advertised ZRANGEBYSCORE property.Example test to cover bounded-range completeness
+(deftest zrangebyscore-missing-member-is-detected + (let [history [{:type :invoke :process 0 :f :zadd :value ["m1" 5] :index 0} + {:type :ok :process 0 :f :zadd :value ["m1" 5] :index 1} + {:type :invoke :process 0 :f :zrangebyscore :value [0 10] :index 2} + {:type :ok :process 0 :f :zrangebyscore + :value {:bounds [0 10] :members []} + :index 3}] + result (run-checker history)] + (is (not (:valid? result)) (str "expected range mismatch, got: " result))))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jepsen/test/elastickv/redis_zset_safety_workload_test.clj` around lines 42 - 89, Add a new test that exercises the bounded-range code path by invoking :zrangebyscore instead of :zrange-all so the checker’s ZRANGEBYSCORE logic is covered; create a test (e.g. noop-zrem-does-not-flag-bounded-zrangebyscore or similar) that mirrors one of the existing edge-case histories but uses :zrangebyscore (with appropriate score bounds/values) and asserts run-checker returns :valid? or not as expected, ensuring you reference the same run-checker invocation and operation symbols (:zadd, :zrem, :zincrby, :zrangebyscore) so the new test hits the bounded-range branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@jepsen/src/elastickv/redis_zset_safety_workload.clj`:
- Around line 386-416: The read result must be checked for duplicate members to
prevent a ZSet read returning the same member twice; update check-zrange-all
(and the similar function handling ranges around lines 430-464) to detect
duplicate member entries before per-entry validation by scanning entries and
maintaining a seen-members set, and if a member is already seen swap! the errors
atom with a {:kind :duplicate :index cmp-idx :member member :entries entries}
(or similar) error entry so duplicate-member reads are reported and rejected.
- Around line 313-318: The current logic builds applied as (->> muts (filter ...
(< (:complete-idx %) read-inv-idx)) (sort-by :complete-idx)) and then reduces
via apply-mutation-to-state, which incorrectly linearizes overlapping :ok
mutations by completion time; change this to detect overlapping committed
mutations (using their :invoke-idx and :complete-idx intervals from muts) and do
not sort/serialize ambiguous pairs by :complete-idx. Instead compute either the
set of all possible latest states for the read by merging non-overlapping
mutations deterministically and treating overlapping/conflicting writes
conservatively (e.g., allow values from any write whose interval is not ordered
before the read), and update the applied construction and reduction via
apply-mutation-to-state to use that conservative/possible-states approach; apply
the same fix at the other occurrence around lines 347-352.
---
Nitpick comments:
In `@jepsen/test/elastickv/redis_zset_safety_workload_test.clj`:
- Around line 42-89: Add a new test that exercises the bounded-range code path
by invoking :zrangebyscore instead of :zrange-all so the checker’s ZRANGEBYSCORE
logic is covered; create a test (e.g.
noop-zrem-does-not-flag-bounded-zrangebyscore or similar) that mirrors one of
the existing edge-case histories but uses :zrangebyscore (with appropriate score
bounds/values) and asserts run-checker returns :valid? or not as expected,
ensuring you reference the same run-checker invocation and operation symbols
(:zadd, :zrem, :zincrby, :zrangebyscore) so the new test hits the bounded-range
branch.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e12c0d39-43a2-4d57-a8a3-f49ade75baba
📒 Files selected for processing (5)
.github/workflows/jepsen-test-scheduled.yml.github/workflows/jepsen-test.ymljepsen/src/elastickv/jepsen_test.cljjepsen/src/elastickv/redis_zset_safety_workload.cljjepsen/test/elastickv/redis_zset_safety_workload_test.clj
Three Major-severity CodeRabbit findings on PR #550: 1. Duplicate-member detection (line 416): a ZSet read must return each member at most once. Previously, if ZRANGE returned the same member twice with an allowed score, the checker accepted it because sort and score-membership checks passed independently per entry. duplicate-members helper now flags :duplicate-members (and :duplicate-members-range for ZRANGEBYSCORE) before the per-entry loop. 2. Overlapping committed writes (line 318): two :ok mutations whose invoke/complete windows overlap have ambiguous serialization order. Pinning allowed-scores to a single last-wins linearization by :complete-idx was unsound. allowed-scores-for-member now unions all :zadd/:ok-:zincrby scores from committed mutations (over-approx that stays sound), and must-be-present? is relaxed when any pair of committed writes for the same member overlaps in time. 3. Pre-read :info mutations (line 328): a mutation recorded as :info whose completion precedes a later read's invoke may have taken effect server-side. Previously it was ignored by both model-before (:ok only) and the concurrent window (complete-idx >= read-inv-idx required). Now collected as pre-read-info, contributing to allowed scores and flipping unknown-score? for :zincrby with unknown resulting score. 3 new regression tests (duplicate-members-are-flagged, overlapping- committed-zadds-allow-either-score, info-before-read-is-considered- uncertain). Workload unit test count now 9, all PASS under Java 21.
|
6d0b4c3 で CodeRabbit Major 3 件を fix:
regression test 3 件追加 ( /gemini review |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new Jepsen workload and model-based checker to verify Redis ZSet safety properties in elastickv. It includes a client for Redis operations, a generator for high-contention workloads, and a checker that validates score correctness, ordering, and atomicity. Feedback focuses on handling infinite scores in Redis to prevent parsing exceptions, improving the soundness of the linearizability model in allowed-scores-for-member to prevent stale reads, and optimizing the efficiency of mutation checks.
| (defn- parse-withscores | ||
| "Carmine returns a flat [member score member score ...] vector for | ||
| ZRANGE WITHSCORES. Convert to a sorted vector of [member (double score)] | ||
| preserving server-returned order (score ascending, then member)." | ||
| [flat] | ||
| (->> flat | ||
| (partition 2) | ||
| (mapv (fn [[m s]] | ||
| [(if (bytes? m) (String. ^bytes m) (str m)) | ||
| (Double/parseDouble (str s))])))) |
There was a problem hiding this comment.
The use of Double/parseDouble directly on Redis score strings will fail if the score is infinite. Redis returns "inf", "+inf", or "-inf" for infinite values, but Java's Double/parseDouble expects "Infinity" or "-Infinity". This will cause a NumberFormatException during the test if infinite scores are encountered.
(defn- parse-double-safe
"Parses a Redis score string into a Double, handling 'inf' representations."
[s]
(let [s-str (str/lower-case (str s))]
(cond
(str/includes? s-str "inf") (if (str/starts-with? s-str "-")
Double/NEGATIVE_INFINITY
Double/POSITIVE_INFINITY)
:else (Double/parseDouble (str s)))))
(defn- parse-withscores
"Carmine returns a flat [member score member score ...] vector for
ZRANGE WITHSCORES. Convert to a sorted vector of [member (double score)]
preserving server-returned order (score ascending, then member)."
[flat]
(->> flat
(partition 2)
(mapv (fn [[m s]]
[(if (bytes? m) (String. ^bytes m) (str m))
(parse-double-safe s)]))))
| (let [[member delta] (:value op) | ||
| new-score (car/wcar cs (car/zincrby zset-key (double delta) member))] | ||
| (assoc op :type :ok | ||
| :value [member (Double/parseDouble (str new-score))])) |
| (defn- allowed-scores-for-member | ||
| "Compute the set of scores considered valid for `member` by a read | ||
| whose window is [read-inv-idx, read-cmp-idx], based on committed state | ||
| and any concurrent mutations. | ||
|
|
||
| Returns: | ||
| :scores - set of acceptable scores (committed + concurrent | ||
| :zadd / :ok :zincrby). | ||
| :unknown-score? - true iff any concurrent ZINCRBY's resulting score | ||
| is unknown (in-flight or :info). When set, the | ||
| caller MUST skip the strict score-membership | ||
| check to stay sound. | ||
| :must-be-present? - committed state says present and no concurrent | ||
| mutation could have removed/changed it. | ||
| :any-known? - some op claims to have touched this member." | ||
| [mutations-by-m member read-inv-idx read-cmp-idx] | ||
| (let [muts (get mutations-by-m member []) | ||
| ;; :ok mutations that completed strictly before the read. They | ||
| ;; may have overlapped with each other in wall-clock time, so | ||
| ;; the serialization order among them is ambiguous. | ||
| committed (->> muts | ||
| (filter #(and (= :ok (:type %)) | ||
| (some? (:complete-idx %)) | ||
| (< (:complete-idx %) read-inv-idx)))) | ||
| ;; :info mutations that completed before the read: they may or | ||
| ;; may not have taken effect server-side. We must account for | ||
| ;; their possible scores just like concurrent ones. | ||
| pre-read-info (->> muts | ||
| (filter #(and (= :info (:type %)) | ||
| (some? (:complete-idx %)) | ||
| (< (:complete-idx %) read-inv-idx)))) | ||
| ;; Concurrent mutations: windows overlap the read. Include both | ||
| ;; :ok and :info since either may have taken effect. | ||
| concurrent (concurrent-mutations-for-member muts read-inv-idx read-cmp-idx) | ||
| ;; A conservative last-wins linearization for the must-be-present? | ||
| ;; check only. Ambiguous when committed writes overlap each other. | ||
| committed-sorted (sort-by :complete-idx committed) | ||
| committed-state (reduce apply-mutation-to-state nil committed-sorted) | ||
| committed-overlap? (boolean | ||
| (some (fn [[a b]] | ||
| (and (not (identical? a b)) | ||
| (<= (:invoke-idx a) (:complete-idx b)) | ||
| (<= (:invoke-idx b) (:complete-idx a)))) | ||
| (for [a committed, b committed] [a b]))) | ||
| ;; Union of every score that any committed / pre-read :info / | ||
| ;; concurrent op could have produced. This over-approximates the | ||
| ;; legitimate post-state set when writes overlap, keeping the | ||
| ;; checker sound at the cost of being slightly less strict on | ||
| ;; overlapping concurrent writers. | ||
| add-scores (fn [acc m] | ||
| (case (:f m) | ||
| :zadd (conj acc (:score m)) | ||
| :zincrby (cond-> acc (some? (:score m)) (conj (:score m))) | ||
| :zrem acc)) | ||
| scores (as-> #{} s | ||
| (reduce add-scores s committed) | ||
| (reduce add-scores s pre-read-info) | ||
| (reduce add-scores s concurrent)) | ||
| unknown-score? (or | ||
| (some #(and (= :zincrby (:f %)) (:unknown-score? %)) | ||
| concurrent) | ||
| (some #(and (= :zincrby (:f %)) (:unknown-score? %)) | ||
| pre-read-info)) | ||
| ;; any-known? must only be true when something provides evidence | ||
| ;; the member actually existed at some point. A no-op ZREM | ||
| ;; (:removed? false) does NOT prove existence. | ||
| existence-evidence? (or (some #(case (:f %) | ||
| (:zadd :zincrby) true | ||
| :zrem (:removed? %)) | ||
| committed) | ||
| (some #(case (:f %) | ||
| (:zadd :zincrby) true | ||
| :zrem (:removed? %)) | ||
| pre-read-info) | ||
| (some #(case (:f %) | ||
| (:zadd :zincrby) true | ||
| :zrem (:removed? %)) | ||
| concurrent))] | ||
| {:scores scores | ||
| :unknown-score? (boolean unknown-score?) | ||
| ;; must-be-present? is relaxed when committed writes overlap | ||
| ;; among themselves or when any :info / concurrent mutation could | ||
| ;; have removed the member before the read. | ||
| :must-be-present? (boolean (and committed-state | ||
| (:present? committed-state) | ||
| (not committed-overlap?) | ||
| (empty? pre-read-info) | ||
| (empty? concurrent))) | ||
| :any-known? (boolean existence-evidence?)})) |
There was a problem hiding this comment.
The current implementation of allowed-scores-for-member is too permissive and potentially unsound for linearizability checks.
-
Stale Reads: It includes all successful committed scores in the
scoresset (line 384), even if they were strictly followed by a later committed mutation. In a linearizable system, a read should only observe scores from the "potentially latest" committed writes (those not strictly followed by another) or concurrent ones. -
Stale Presence: The checker does not verify if a member is allowed to be present. If a member was added and then deleted (both committed before the read),
any-known?remains true, and the read will accept the member even though it should be absent. -
Efficiency: The
committed-overlap?check (line 368) is$O(N^2)$ relative to the number of mutations for a member. While the member universe is small, this can be optimized to$O(N)$ by tracking the maximuminvoke-idxof preceding mutations.
(defn- allowed-scores-for-member
"Compute the set of scores considered valid for `member` by a read
whose window is [read-inv-idx, read-cmp-idx], based on committed state
and any concurrent mutations."
[mutations-by-m member read-inv-idx read-cmp-idx]
(let [muts (get mutations-by-m member [])
;; Mutations that completed strictly before the read started.
preceding (filter #(and (= :ok (:type %))
(some? (:complete-idx %))
(< (:complete-idx %) read-inv-idx))
muts)
;; A mutation is a candidate for the 'latest' state if no other
;; committed mutation strictly follows it (m2.invoke > m.complete).
max-inv (reduce max -1 (map :invoke-idx preceding))
candidates (filter #(>= (:complete-idx %) max-inv) preceding)
concurrent (concurrent-mutations-for-member muts read-inv-idx read-cmp-idx)
pre-read-info (filter #(and (= :info (:type %))
(some? (:complete-idx %))
(< (:complete-idx %) read-inv-idx))
muts)
;; Scores allowed by committed state or uncertain ops.
scores (into (set (keep :score candidates))
(concat (keep :score concurrent)
(keep :score pre-read-info)))
unknown-score? (some #(and (= :zincrby (:f %)) (:unknown-score? %))
(concat concurrent pre-read-info))
;; Allowed to be present if any candidate or concurrent/info op is a write.
can-be-present? (or (some #(#{:zadd :zincrby} (:f %)) candidates)
(some #(#{:zadd :zincrby} (:f %)) concurrent)
(some #(#{:zadd :zincrby} (:f %)) pre-read-info))
;; Must be present if all candidates are writes and no concurrent/info op can delete.
must-be-present? (and (seq candidates)
(every? #(#{:zadd :zincrby} (:f %)) candidates)
(not (some #(= :zrem (:f %)) concurrent))
(not (some #(= :zrem (:f %)) pre-read-info)))]
{:scores scores
:unknown-score? (boolean unknown-score?)
:must-be-present? must-be-present?
:can-be-present? can-be-present?}))
| (let [{:keys [scores any-known? unknown-score?]} | ||
| (allowed-scores-for-member mutations-by-m member inv-idx cmp-idx)] | ||
| (cond | ||
| (not any-known?) | ||
| (swap! errors conj {:kind :phantom | ||
| :index cmp-idx | ||
| :member member | ||
| :score score}) |
There was a problem hiding this comment.
The phantom check should be replaced with a more robust can-be-present? check to catch both phantoms (never existed) and stale reads (present after a committed deletion).
(doseq [[member score] entries]
(let [{:keys [scores can-be-present? unknown-score?]}
(allowed-scores-for-member mutations-by-m member inv-idx cmp-idx)]
(cond
(not can-be-present?)
(swap! errors conj {:kind :unexpected-presence
:index cmp-idx
:member member
:score score})
Summary
Adds a Jepsen workload (
elastickv-zset-safety-test) that verifies ZSet-specific safety properties under faults (network partitions, node kills), using a custom model-based checker. Goes beyond the simple add->read visibility check.Properties verified:
Concurrent-ZADD handling uses an invoke/complete windowing approach. A mutation is "committed before" a read iff its
:completeindex is strictly less than the read's:invokeindex. Mutations whose intervals overlap are "concurrent" and contribute to a per-member allowed-score set. Indeterminate (:info) mutations are treated as possibly-concurrent. ZINCRBY whose response is unknown sets:unknown-score?so the checker skips the strict score check for concurrent reads. ZREM carries the actualremoved?boolean from the server reply so a no-op ZREM does not falsely mark the member as deleted.Files
jepsen/src/elastickv/redis_zset_safety_workload.cljjepsen/test/elastickv/redis_zset_safety_workload_test.clj(test-spec construction + checker edge cases: no-op ZREM, :info ZINCRBY, deterministic score mismatch).jepsen/src/elastickv/jepsen_test.clj(entry point added)..github/workflows/jepsen-test.yml(5s smoke run on every push)..github/workflows/jepsen-test-scheduled.yml(150s default run every 6h).Running locally
The workload is invoked directly via its own
-main, not throughjepsen-test/-main:cd jepsen lein run -m elastickv.redis-zset-safety-workload \ --time-limit 60 --rate 10 --concurrency 5 \ --ports 63791,63792,63793 --host 127.0.0.1(
elastickv.jepsen-testexposeselastickv-zset-safety-testonly as a Clojure function for REPL use; CI and ad-hoc runs use the namespace's own-main.Test plan
EOF
)
Summary by CodeRabbit