feat(distributor): add goroutine worker pool for query fan-out to ing…#7623
Merged
alanprot merged 1 commit intoJun 16, 2026
Merged
Conversation
…esters Add experimental -distributor.num-query-workers flag to reuse pooled goroutines (with pre-grown stacks) for the read-path fan-out from distributors/rulers to ingesters. This is the query-path analogue of PR cortexproject#6406 which did the same for the push path. Profiling rulers with wide ingester fan-out shows runtime.copystack at ~8% of CPU, triggered by each freshly-spawned goroutine growing its stack during gRPC stream setup + snappy encoding. A worker pool amortizes this cost across requests. When no worker is available the pool falls back to spawning a goroutine, so behavior is identical to today under saturation (no regression risk). Changes: - Add ReplicationSet.DoWithExecutor() that accepts a util.AsyncExecutor - Keep ReplicationSet.Do() as backward-compatible wrapper (uses noOpExecutor) - Add NumQueryWorkers config, queryWorkers field, init/stop in Distributor - Route all 3 query fan-out call sites through DoWithExecutor Signed-off-by: Alan Protasio <alanprot@gmail.com>
68d3982 to
ca96bab
Compare
danielblando
approved these changes
Jun 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
Adds an experimental
-distributor.num-query-workersflag to use a goroutine worker pool for the read-path fan-out from distributors (queriers and rulers) to ingesters. This is the query-path analogue of #6406 which did the same for the push path.Why
Looking at the ruler flame graph, we can see that
runtime.copystackaccounts for around ~8-10% of CPU:Each query evaluation fans out to ingesters via
ReplicationSet.Do, which spawns a fresh goroutine per instance. These goroutines start with small stacks and immediately grow them (runtime.newstack→runtime.copystack) when entering the gRPC stream setup + snappy encoding frames. With wide ingester fan-out and many concurrent evaluations, this stack-growth cost becomes significant.A worker pool of long-lived goroutines (with already-grown stacks) amortizes this cost across requests. When no worker is available, the pool falls back to spawning a new goroutine — so behavior is identical to today under saturation (no regression risk).
Changes
ReplicationSet.DoWithExecutor()that accepts autil.AsyncExecutorReplicationSet.Do()as a backward-compatible wrapper (delegates via the existingnoOpExecutor)NumQueryWorkersconfig field,queryWorkersexecutor, init/stop in DistributorqueryIngestersExemplars,queryIngesterStream,ForReplicationSet) throughDoWithExecutorHow to use
Monitor
cortex_worker_pool_fallback_total{name="distributor-query"}— if it's high relative to query volume, increase the worker count.Expected impact
~6–8% CPU reduction on rulers/queriers with wide ingester fan-out. Not a latency fix (the workload is wait-bound at peak), but reclaims meaningful fleet capacity.