fix(testcontainers): wait for vshard storages to complete handshake before declaring cluster ready#97
Draft
dkasimovskiy wants to merge 1 commit into
Draft
fix(testcontainers): wait for vshard storages to complete handshake before declaring cluster ready#97dkasimovskiy wants to merge 1 commit into
dkasimovskiy wants to merge 1 commit into
Conversation
a2189c3 to
44a8a09
Compare
…laring cluster ready VshardClusterConfigurator#configure() previously stopped at router-up + vshard.router.bootstrap() + crud._VERSION. None of those verify that individual storages finished the vshard handshake, so a CRUD request right after configure() could fail with VHANDSHAKE_NOT_COMPLETE (vshard code 40). Add waitUntilVshardStoragesAreReady: polls vshard.router.info() until every replica is status='available' and info.bucket.unreachable == 0, 120s budget.
44a8a09 to
1590ff9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
VshardClusterConfigurator#configure()previously declared the cluster ready after three checks: router is up,vshard.router.bootstrap()returns cleanly, andcrud._VERSIONis reachable on the router. None of these verifies that individual storages have completed the vshard handshake during the initial rebalance; the router can answer "bootstrap ok" while some storages are still in theVHANDSHAKE_NOT_COMPLETE(code 40) state, and a CRUD request that targets such a storage fails immediately.Observed in
tests-crud-integration(3.5.0):Add a fourth readiness step:
waitUntilVshardStoragesAreReadypollsvshard.router.info()until every replica in every replicaset is instatus='available'and there are no unreachable buckets, with a 120s budget. This guarantees that any subsequent CRUD request hits a fully handshaked storage.Changes:
VshardClusterContainer: addVSHARD_STORAGES_READY_COMMANDLua probe,TIMEOUT_VSHARD_STORAGES_READY_IN_SECONDS = 120, andwaitUntilVshardStoragesAreReady/vshardStoragesAreReadymethods.VshardClusterConfigurator#configure(): invoke the new readiness check as the final step before marking the cluster configured.I haven't forgotten about:
Related issues: