Skip to content

feat(distributor): add goroutine worker pool for query fan-out to ing…#7623

Merged
alanprot merged 1 commit into
cortexproject:masterfrom
alanprot:read-path-query-worker-pool
Jun 16, 2026
Merged

feat(distributor): add goroutine worker pool for query fan-out to ing…#7623
alanprot merged 1 commit into
cortexproject:masterfrom
alanprot:read-path-query-worker-pool

Conversation

@alanprot

Copy link
Copy Markdown
Member

What this PR does

Adds an experimental -distributor.num-query-workers flag to use a goroutine worker pool for the read-path fan-out from distributors (queriers and rulers) to ingesters. This is the query-path analogue of #6406 which did the same for the push path.

Why

Looking at the ruler flame graph, we can see that runtime.copystack accounts for around ~8-10% of CPU:

Each query evaluation fans out to ingesters via ReplicationSet.Do, which spawns a fresh goroutine per instance. These goroutines start with small stacks and immediately grow them (runtime.newstackruntime.copystack) when entering the gRPC stream setup + snappy encoding frames. With wide ingester fan-out and many concurrent evaluations, this stack-growth cost becomes significant.

A worker pool of long-lived goroutines (with already-grown stacks) amortizes this cost across requests. When no worker is available, the pool falls back to spawning a new goroutine — so behavior is identical to today under saturation (no regression risk).

Changes

  • Add ReplicationSet.DoWithExecutor() that accepts a util.AsyncExecutor
  • Keep ReplicationSet.Do() as a backward-compatible wrapper (delegates via the existing noOpExecutor)
  • Add NumQueryWorkers config field, queryWorkers executor, init/stop in Distributor
  • Route all 3 query fan-out call sites (queryIngestersExemplars, queryIngesterStream, ForReplicationSet) through DoWithExecutor

How to use

distributor:
  num_query_workers: 500  # tune based on fleet; fallback metric guides sizing

Monitor cortex_worker_pool_fallback_total{name="distributor-query"} — if it's high relative to query volume, increase the worker count.

Expected impact

~6–8% CPU reduction on rulers/queriers with wide ingester fan-out. Not a latency fix (the workload is wait-bound at peak), but reclaims meaningful fleet capacity.

…esters

Add experimental -distributor.num-query-workers flag to reuse pooled
goroutines (with pre-grown stacks) for the read-path fan-out from
distributors/rulers to ingesters. This is the query-path analogue of
PR cortexproject#6406 which did the same for the push path.

Profiling rulers with wide ingester fan-out shows runtime.copystack
at ~8% of CPU, triggered by each freshly-spawned goroutine growing its
stack during gRPC stream setup + snappy encoding. A worker pool
amortizes this cost across requests.

When no worker is available the pool falls back to spawning a goroutine,
so behavior is identical to today under saturation (no regression risk).

Changes:
- Add ReplicationSet.DoWithExecutor() that accepts a util.AsyncExecutor
- Keep ReplicationSet.Do() as backward-compatible wrapper (uses noOpExecutor)
- Add NumQueryWorkers config, queryWorkers field, init/stop in Distributor
- Route all 3 query fan-out call sites through DoWithExecutor

Signed-off-by: Alan Protasio <alanprot@gmail.com>
@alanprot alanprot force-pushed the read-path-query-worker-pool branch from 68d3982 to ca96bab Compare June 15, 2026 22:45
@alanprot alanprot marked this pull request as ready for review June 15, 2026 22:49
@alanprot alanprot merged commit ab8aa36 into cortexproject:master Jun 16, 2026
68 of 69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants