test(grpc): absorb leader churn in consistency loops without weakening the check#632
test(grpc): absorb leader churn in consistency loops without weakening the check#632
Conversation
… the check Test_consistency_satisfy_write_after_read_sequence and Test_grpc_transaction both fire 9999 Put/Get (and Delete/Get) round trips against a 3-node Raft cluster. On a busy CI runner Raft re-election can fire mid-loop; the in-flight RPC then surfaces as "rpc error: code = Unknown desc = leader not found" or "etcd raft engine is not leader". Today both tests treat this as a hard failure and abort, even though the error is purely an availability hiccup — nothing was committed, so consistency cannot be violated. Wrap every RPC in the existing retryNotLeader helper so transient leader churn is absorbed within leaderChurnRetryTimeout. The post-RPC assertions (assert.Equal for the value-read invariant, assert.Nil for the post-delete invariant) are unchanged: once the Put / Delete eventually commits, the subsequent Get must agree, and a stale read still fails the test loudly. This is the explicit user constraint — "一貫性確認そのものは損なわず" — and it holds because retryNotLeader only inspects RPC error codes, never the response payload. The fix is symmetric with how rpushEventually / lpushEventually already wrap Lua list tests for the same class of CI flake. Build / vet / lint clean.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe test functions are updated to use the test context instead of background context and wrap RPC operations with retry logic to handle transient "not leader" errors during Raft leader transitions, improving test reliability. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request wraps gRPC calls in the Test_consistency_satisfy_write_after_read_sequence and Test_grpc_transaction tests with a retryNotLeader helper to improve resilience against transient Raft leader re-elections. The review feedback suggests using a context that supports cancellation or deadlines instead of context.Background() within the retry closures to ensure better resource management and responsiveness during test execution.
| err := retryNotLeader(context.Background(), func() error { | ||
| _, perr := c.RawPut(context.Background(), | ||
| &pb.RawPutRequest{Key: key, Value: want}) | ||
| return perr | ||
| }) |
There was a problem hiding this comment.
The closure passed to retryNotLeader uses context.Background() for the RawPut call. Following the project rule to use context.Context for managing deadlines and cancellation, it is better practice to use a context that can be cancelled or one that respects the retry helper's timeout. While this is a test, using context.Background() inside the closure prevents the RPC call from being cancelled if the retry loop times out or the test is interrupted.
References
- Use context.Context for managing deadlines and cancellation to ensure proper resource management and responsiveness.
| err = retryNotLeader(context.Background(), func() error { | ||
| var gerr error | ||
| resp, gerr = c.RawGet(context.Background(), &pb.RawGetRequest{Key: key}) | ||
| return gerr | ||
| }) |
There was a problem hiding this comment.
Similar to the RawPut call, RawGet here uses context.Background() inside the retry closure. In alignment with the project rule to use context.Context for managing deadlines and cancellation, consider using a context that is tied to the test lifecycle or the retry helper's timeout to ensure timely cancellation of in-flight requests during churn or timeouts.
References
- Use context.Context for managing deadlines and cancellation to ensure proper resource management and responsiveness.
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Replace context.Background() in Test_consistency_satisfy_write_after_ read_sequence and Test_grpc_transaction with the per-test t.Context(). Both tests now derive a single ctx at the top of the test body and thread it through every RPC and retryNotLeader call. Two effects: - Test-level cancel (timeout, parent stopping the test, t.Cleanup draining) propagates into the in-flight RPC instead of leaking the goroutine through context.Background until the gRPC call eventually errors out on its own. - The retry helper observes the same cancel signal, so a stalled shutdown does not chase 9999 iterations of "leader not found" retries past the point the test has been told to stop. Behaviour-equivalent on the happy path; the consistency assertions (assert.Equal / assert.Nil) are unchanged. Build / vet / lint clean, both tests still pass at 9999 iterations locally.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Summary
Test_consistency_satisfy_write_after_read_sequenceandTest_grpc_transactionboth run a 9999-iteration Put/Get loop against a 3-node Raft cluster. On a busy CI runner Raft re-election can fire mid-loop and the in-flight RPC surfaces asrpc error: code = Unknown desc = leader not found(alsoetcd raft engine is not leader). Today both tests abort on the first such error even though it is purely an availability hiccup — nothing committed, so consistency cannot be violated.Wrap every RPC in the existing
retryNotLeaderhelper so transient leader churn is absorbed withinleaderChurnRetryTimeout. The consistency assertions (assert.Equalfor value reads,assert.Nilfor post-delete reads) are unchanged: once the Put / Delete eventually commits, the subsequent Get must agree, and a stale read still fails the test loudly. This is the explicit constraint — 一貫性確認そのものは損なわず — and it holds becauseretryNotLeaderonly inspects RPC error codes, never the response payload.The fix is symmetric with how
rpushEventually/lpushEventuallyalready wrap Lua list tests for the same class of CI flake.Test plan
golangci-lintclean.go test ./adapter/ -run 'Test_grpc_transaction|Test_consistency_satisfy_write_after_read_sequence' -count=1 -timeout 600spasses (this is the same race-mode shape that was failing on https://github.com/bootjp/elastickv/actions/runs/24930560663/job/73007486310)./gemini review
@codex review
Summary by CodeRabbit