Add a metastore read replica role for read-only routing#6548
Draft
shuheiktgw wants to merge 12 commits into
Draft
Add a metastore read replica role for read-only routing#6548shuheiktgw wants to merge 12 commits into
shuheiktgw wants to merge 12 commits into
Conversation
Introduce the `metastore_read_replica` service role and the `metastore_read_replica_uri` node config option: the foundation for routing read-only metastore traffic to a PostgreSQL read replica. - Add `QuickwitService::MetastoreReadReplica`, parsed from `metastore_read_replica` / `metastore-read-replica`. - Split `QuickwitService::default_services()` from `supported_services()` so the new role is opt-in and never enabled implicitly on all-in-one nodes. - Add the optional `metastore_read_replica_uri` field (env `QW_METASTORE_READ_REPLICA_URI`), redacted alongside `metastore_uri`. - Validate that the role requires a PostgreSQL `metastore_read_replica_uri` and cannot be co-located with the `metastore` role.
Add the resolution plumbing for connecting to a PostgreSQL read replica
over a read-only connection.
- Add `MetastoreFactoryOptions { read_only }` and thread it through
`MetastoreFactory::resolve`.
- Add `MetastoreResolver::resolve_read_only`, which rejects any non-
PostgreSQL backend.
- Key the PostgreSQL factory cache on `(uri, options)` so the read-write
and read-only clients get distinct connection pools.
- Add `PostgresqlMetastore::new_read_only`: a read-only connection pool
with migrations skipped (the replica is migrated by the primary).
Resolve and expose a metastore gRPC server when the `metastore_read_replica` role is enabled, backed by a read-only connection to `metastore_read_replica_uri`. - The read replica server reuses the metrics + load-shed layers but omits the control-plane event layers, which only wrap write RPCs. - A read-replica node is exempted from the control-plane connectivity wait, like a primary metastore node, so dedicated replica pods start independently. - Extract `metastore_max_in_flight_requests` shared by both roles.
Add `ReadReplicaRoutingMetastore`, a `MetastoreService` wrapper that routes the stale-tolerant reads issued by the search and analytics paths (`index_metadata`, `indexes_metadata`, `list_indexes_metadata`, `list_splits`, `list_metrics_splits`, `list_sketch_splits`) to read replica nodes when any are connected, and everything else (writes and non-hot-path reads) to the primary. - Routing is decided per request from the read replica balance channel's live connection set, so it degrades to the primary when no replica is deployed. The check is a synchronous `watch` read with no borrow held across an await. - Wire it into the searcher service and the DataFusion session builder, so all search (REST, Elasticsearch, gRPC) and metrics analytics benefit, while REST admin handlers keep read-your-writes against the primary.
- Document `metastore_read_replica_uri` in the node config reference and the example `quickwit.yaml`. - Add a PostgreSQL-gated test that resolves a read-only metastore against a real database and verifies it serves read RPCs.
Replace the full `MetastoreService` implementation on `ReadReplicaRoutingMetastore` (which forced ~45 delegating methods, most of them writes) with a narrow, read-only `MetastoreReadService` trait — the read-only subset of the metastore RPCs (`index_metadata`, `list_indexes_metadata`, `list_splits`, `list_metrics_splits`, `list_sketch_splits`). - `MetastoreServiceClient` implements `MetastoreReadService`; `MetastoreReadServiceClient = Arc<dyn MetastoreReadService>`. - `ReadReplicaRoutingMetastore` now implements only the 5-method trait, so writes are excluded at the type level rather than delegated. - The search and DataFusion read paths take `MetastoreReadServiceClient` / `&dyn MetastoreReadService`; `list_parquet_splits_*` take `&dyn MetastoreReadService`. `single_node_search` keeps its concrete `MetastoreServiceClient` parameter and adapts internally. Addresses review feedback that the wrapper should expose a Go-style read-only interface instead of reimplementing the whole service.
c0ae4dd to
d14a422
Compare
shuheiktgw
commented
Jun 26, 2026
| # # DataFusion when enabled, to nodes running the `metastore_read_replica` | ||
| # # service. Searchers require at least one `metastore_read_replica` node at | ||
| # # startup and do not fall back to the primary metastore. | ||
| # use_metastore_read_replica: false |
Collaborator
Author
There was a problem hiding this comment.
The reason we need this flag in addition to metastore_read_replica_uri is that users can set metastore_read_replica_uri via the QW_METASTORE_READ_REPLICA_URI environment variable. Without this flag, searchers would need access to that same environment variable even though they don’t need the secret itself.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds support for deploying metastore read replicas as a dedicated
metastore_read_replicaservice and lets searchers opt into using them throughsearcher.use_metastore_read_replica.It is an alternative to #6538's per-request gRPC-header routing. The current implementation makes the routing decision once at startup: searchers use the primary metastore by default, or, when the flag is enabled, build their search/DataFusion read-only metastore client against nodes advertising
metastore_read_replica.How it works
enabled_services: [metastore_read_replica]andmetastore_read_replica_uripointing at a PostgreSQL read replica. The config validator rejects read-replica nodes without that URI and rejects runningmetastoreandmetastore_read_replicaon the same node.metastore_read_replica_uriwithresolve_read_only, skip migrations, and expose the normalquickwit.metastore.MetastoreServiceover gRPC. Write-side control-plane event layers are not installed on this read-only service.searcher.use_metastore_read_replicais true. When enabled on a searcher node,build_metastore_clientwaits for nodes advertisingmetastore_read_replicaand builds the search/DataFusionMetastoreReadServiceClientfrom that service.metastore_read_replicanode is connected within the 300s startup wait, startup fails withcould not find any metastore_read_replica node in the cluster.MetastoreReadService, which includes only stale-tolerant search/analytics reads:index_metadata,list_indexes_metadata,list_splits,list_metrics_splits, andlist_sketch_splits. Writes and non-searcher roles continue using the primary metastore path.Operational notes
searcher.use_metastore_read_replicais an availability commitment: the searcher expectsmetastore_read_replicanodes to be present at startup.Testing
quickwit-config: role parsing/defaults and validation formetastore_read_replica_uri, env overrides, and mutually exclusivemetastore/metastore_read_replicaservices.quickwit-metastore: PostgreSQL-gated coverage for resolving a read-only metastore connection without running migrations on the replica.quickwit-serve: readiness-client coverage for searchers with and withoutuse_metastore_read_replica, including multi-role nodes that still depend on the primary for their other roles.Known limitation / follow-up
There is no automated end-to-end multi-node test for searcher ->
metastore_read_replicaover gRPC. TheClusterSandboxharness uses an in-memory metastore, while this feature is PostgreSQL-only. Covering it needs a PostgreSQL-backed sandbox variant; manual verification with separate primary/read-replica metastore roles against a real database is the interim check.