[fix][proxy] Avoid blocking the proxy IO thread on a cold broker cache#26052
Merged
lhotari merged 1 commit intoJun 18, 2026
Merged
Conversation
MetadataStoreCacheLoader.getAvailableBrokers() returned the cached active-broker list, but on a cold/empty cache it fell back to a synchronous getChildren() metadata read (getChildrenAsync(...).get(timeout)). Its clean List signature hid that it could block. It is reached on the proxy Netty IO thread via ProxyConnection.completeConnect() (when checkActiveBrokers is enabled) -> isBrokerActive() -> getAvailableBrokers(), and from BrokerDiscoveryProvider .nextBroker(). A cold cache could therefore stall a proxy IO thread on a metadata round-trip during connection setup. getAvailableBrokers() no longer performs a synchronous metadata read. It returns the snapshot maintained by init() and the metadata-store children/session listeners, and only when the snapshot is empty triggers a single background refresh (guarded by an AtomicBoolean so a burst of calls does not queue many reads). The former tryUpdate lambda is extracted into reloadBrokers(), shared by the listeners, the initial fetch, and the background refresh. The fail-fast isBrokerActive check (which already expects the client to retry on a back-off) now sees the current snapshot instead of blocking; an empty snapshot means there are genuinely no active brokers, since the cache is populated by init() and kept current by the listener.
lhotari
approved these changes
Jun 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
MetadataStoreCacheLoader.getAvailableBrokers()returns the cached active-broker list, but on a cold/empty cache it fell back to a synchronousgetChildren()metadata read (getChildrenAsync(...).get(operationTimeoutMs)). Its cleanListsignature hid that it could block.It is reached on the proxy's Netty IO thread via
ProxyConnection.completeConnect()(whencheckActiveBrokersis enabled) →isBrokerActive()→getAvailableBrokers(), and also fromBrokerDiscoveryProvider.nextBroker(). A cold cache could therefore stall a proxy IO thread on a metadata round-trip during connection setup.Modifications
getAvailableBrokers()no longer performs a synchronous metadata read. It returns the snapshot maintained byinit()and the metadata-store children/session listeners, and — only when the snapshot is empty — triggers a single background refresh (guarded by anAtomicBooleanso a burst of calls during an empty-cache window does not queue many reads). The formertryUpdatelambda is extracted into areloadBrokers()method shared by the listeners, the initial fetch, and the background refresh.The fail-fast
isBrokerActivecheck (which already expects the client to retry after a back-off) now sees the current snapshot instead of blocking. An empty snapshot means there are genuinely no active brokers, since the cache is populated byinit()and kept current by the listener.Verifying this change
Added
MetadataStoreCacheLoaderTest, which verifies thatgetAvailableBrokers()serves the cache without ever invoking the synchronousgetChildren(...), and does not block on an empty cache.Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes