test(grpc): absorb leader churn in consistency loops without weakening the check by bootjp · Pull Request #632 · bootjp/elastickv

bootjp · 2026-04-25T12:35:22Z

Summary

Test_consistency_satisfy_write_after_read_sequence and Test_grpc_transaction both run a 9999-iteration Put/Get loop against a 3-node Raft cluster. On a busy CI runner Raft re-election can fire mid-loop and the in-flight RPC surfaces as rpc error: code = Unknown desc = leader not found (also etcd raft engine is not leader). Today both tests abort on the first such error even though it is purely an availability hiccup — nothing committed, so consistency cannot be violated.

Wrap every RPC in the existing retryNotLeader helper so transient leader churn is absorbed within leaderChurnRetryTimeout. The consistency assertions (assert.Equal for value reads, assert.Nil for post-delete reads) are unchanged: once the Put / Delete eventually commits, the subsequent Get must agree, and a stale read still fails the test loudly. This is the explicit constraint — 一貫性確認そのものは損なわず — and it holds because retryNotLeader only inspects RPC error codes, never the response payload.

The fix is symmetric with how rpushEventually / lpushEventually already wrap Lua list tests for the same class of CI flake.

Test plan

Build / vet / golangci-lint clean.
Locally: go test ./adapter/ -run 'Test_grpc_transaction|Test_consistency_satisfy_write_after_read_sequence' -count=1 -timeout 600s passes (this is the same race-mode shape that was failing on https://github.com/bootjp/elastickv/actions/runs/24930560663/job/73007486310).
No change to the consistency invariants — only the RPC failure path is retried.

/gemini review
@codex review

Summary by CodeRabbit

Tests
- Enhanced robustness of gRPC transaction tests by implementing retry logic for transient failures during leader elections, reducing flaky test failures and improving overall test reliability in distributed cluster scenarios.

… the check Test_consistency_satisfy_write_after_read_sequence and Test_grpc_transaction both fire 9999 Put/Get (and Delete/Get) round trips against a 3-node Raft cluster. On a busy CI runner Raft re-election can fire mid-loop; the in-flight RPC then surfaces as "rpc error: code = Unknown desc = leader not found" or "etcd raft engine is not leader". Today both tests treat this as a hard failure and abort, even though the error is purely an availability hiccup — nothing was committed, so consistency cannot be violated. Wrap every RPC in the existing retryNotLeader helper so transient leader churn is absorbed within leaderChurnRetryTimeout. The post-RPC assertions (assert.Equal for the value-read invariant, assert.Nil for the post-delete invariant) are unchanged: once the Put / Delete eventually commits, the subsequent Get must agree, and a stale read still fails the test loudly. This is the explicit user constraint — "一貫性確認そのものは損なわず" — and it holds because retryNotLeader only inspects RPC error codes, never the response payload. The fix is symmetric with how rpushEventually / lpushEventually already wrap Lua list tests for the same class of CI flake. Build / vet / lint clean.

coderabbitai · 2026-04-25T12:35:29Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 36a5cf10-6a2d-403d-9f69-cd8ffead0e20

📥 Commits

Reviewing files that changed from the base of the PR and between 8a9bab4 and 1aed386.

📒 Files selected for processing (1)

adapter/grpc_test.go

📝 Walkthrough

Walkthrough

The test functions are updated to use the test context instead of background context and wrap RPC operations with retry logic to handle transient "not leader" errors during Raft leader transitions, improving test reliability.

Changes

Cohort / File(s)	Summary
Test Resilience `adapter/grpc_test.go`	Added `retryNotLeader()` wrapper around Put/Get/Delete RPC calls in `Test_consistency_satisfy_write_after_read_sequence` and `Test_grpc_transaction` to gracefully handle transient leader unavailability; replaced `context.Background()` with `t.Context()` for RPC execution.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Hide transient leader errors behind bounded retries in coordinator #618: Modifies adapter/grpc_test.go's handling of transient "not leader" errors, complementing this PR's test-side retry logic by implementing server-side bounded retry mechanisms.

Poem

🐰 A test that once would flicker and fail,
Now bounces back through the Raft's leader trail—
When "not leader" whispers its transient cry,
Our retries catch it and give it a try! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and specifically describes the main change: wrapping consistency test loops with retry logic to handle transient leader changes without compromising test assertions.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch test/grpc-flake-retry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request wraps gRPC calls in the Test_consistency_satisfy_write_after_read_sequence and Test_grpc_transaction tests with a retryNotLeader helper to improve resilience against transient Raft leader re-elections. The review feedback suggests using a context that supports cancellation or deadlines instead of context.Background() within the retry closures to ensure better resource management and responsiveness during test execution.

gemini-code-assist · 2026-04-25T12:40:42Z

+		err := retryNotLeader(context.Background(), func() error {
+			_, perr := c.RawPut(context.Background(),
+				&pb.RawPutRequest{Key: key, Value: want})
+			return perr
+		})


The closure passed to retryNotLeader uses context.Background() for the RawPut call. Following the project rule to use context.Context for managing deadlines and cancellation, it is better practice to use a context that can be cancelled or one that respects the retry helper's timeout. While this is a test, using context.Background() inside the closure prevents the RPC call from being cancelled if the retry loop times out or the test is interrupted.

References

Use context.Context for managing deadlines and cancellation to ensure proper resource management and responsiveness.

gemini-code-assist · 2026-04-25T12:40:42Z

+		err = retryNotLeader(context.Background(), func() error {
+			var gerr error
+			resp, gerr = c.RawGet(context.Background(), &pb.RawGetRequest{Key: key})
+			return gerr
+		})


Similar to the RawPut call, RawGet here uses context.Background() inside the retry closure. In alignment with the project rule to use context.Context for managing deadlines and cancellation, consider using a context that is tied to the test lifecycle or the retry helper's timeout to ensure timely cancellation of in-flight requests during churn or timeouts.

References

Use context.Context for managing deadlines and cancellation to ensure proper resource management and responsiveness.

chatgpt-codex-connector · 2026-04-25T12:43:23Z

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Replace context.Background() in Test_consistency_satisfy_write_after_ read_sequence and Test_grpc_transaction with the per-test t.Context(). Both tests now derive a single ctx at the top of the test body and thread it through every RPC and retryNotLeader call. Two effects: - Test-level cancel (timeout, parent stopping the test, t.Cleanup draining) propagates into the in-flight RPC instead of leaking the goroutine through context.Background until the gRPC call eventually errors out on its own. - The retry helper observes the same cancel signal, so a stalled shutdown does not chase 9999 iterations of "leader not found" retries past the point the test has been told to stop. Behaviour-equivalent on the happy path; the consistency assertions (assert.Equal / assert.Nil) are unchanged. Build / vet / lint clean, both tests still pass at 9999 iterations locally.

chatgpt-codex-connector · 2026-04-25T13:04:05Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

gemini-code-assist Bot reviewed Apr 25, 2026

View reviewed changes

bootjp merged commit dbe4725 into main Apr 25, 2026
8 checks passed

bootjp deleted the test/grpc-flake-retry branch April 25, 2026 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(grpc): absorb leader churn in consistency loops without weakening the check#632

test(grpc): absorb leader churn in consistency loops without weakening the check#632
bootjp merged 2 commits intomainfrom
test/grpc-flake-retry

bootjp commented Apr 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 25, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bootjp commented Apr 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented Apr 25, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bootjp commented Apr 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading