Skip to content

Add a metastore read replica role for read-only routing#6548

Draft
shuheiktgw wants to merge 12 commits into
quickwit-oss:mainfrom
shuheiktgw:metastore-read-replica-role
Draft

Add a metastore read replica role for read-only routing#6548
shuheiktgw wants to merge 12 commits into
quickwit-oss:mainfrom
shuheiktgw:metastore-read-replica-role

Conversation

@shuheiktgw

@shuheiktgw shuheiktgw commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Description

This PR adds support for deploying metastore read replicas as a dedicated metastore_read_replica service and lets searchers opt into using them through searcher.use_metastore_read_replica.

It is an alternative to #6538's per-request gRPC-header routing. The current implementation makes the routing decision once at startup: searchers use the primary metastore by default, or, when the flag is enabled, build their search/DataFusion read-only metastore client against nodes advertising metastore_read_replica.

How it works

  1. Operators deploy standalone nodes with enabled_services: [metastore_read_replica] and metastore_read_replica_uri pointing at a PostgreSQL read replica. The config validator rejects read-replica nodes without that URI and rejects running metastore and metastore_read_replica on the same node.
  2. Read-replica nodes resolve metastore_read_replica_uri with resolve_read_only, skip migrations, and expose the normal quickwit.metastore.MetastoreService over gRPC. Write-side control-plane event layers are not installed on this read-only service.
  3. Searchers continue to use the primary metastore unless searcher.use_metastore_read_replica is true. When enabled on a searcher node, build_metastore_client waits for nodes advertising metastore_read_replica and builds the search/DataFusion MetastoreReadServiceClient from that service.
  4. Routing is not done per request, and there is no automatic fallback to the primary once the flag is enabled. If no metastore_read_replica node is connected within the 300s startup wait, startup fails with could not find any metastore_read_replica node in the cluster.
  5. The read path is constrained through MetastoreReadService, which includes only stale-tolerant search/analytics reads: index_metadata, list_indexes_metadata, list_splits, list_metrics_splits, and list_sketch_splits. Writes and non-searcher roles continue using the primary metastore path.

Operational notes

  • The feature is disabled by default. Enabling searcher.use_metastore_read_replica is an availability commitment: the searcher expects metastore_read_replica nodes to be present at startup.
  • Search-path reads may lag by the PostgreSQL replication window. This is scoped to searchers/DataFusion; the control plane, indexers, janitor, and write paths continue using the primary metastore.

Testing

  • quickwit-config: role parsing/defaults and validation for metastore_read_replica_uri, env overrides, and mutually exclusive metastore / metastore_read_replica services.
  • quickwit-metastore: PostgreSQL-gated coverage for resolving a read-only metastore connection without running migrations on the replica.
  • quickwit-serve: readiness-client coverage for searchers with and without use_metastore_read_replica, including multi-role nodes that still depend on the primary for their other roles.

Known limitation / follow-up

There is no automated end-to-end multi-node test for searcher -> metastore_read_replica over gRPC. The ClusterSandbox harness uses an in-memory metastore, while this feature is PostgreSQL-only. Covering it needs a PostgreSQL-backed sandbox variant; manual verification with separate primary/read-replica metastore roles against a real database is the interim check.

Introduce the `metastore_read_replica` service role and the
`metastore_read_replica_uri` node config option: the foundation for
routing read-only metastore traffic to a PostgreSQL read replica.

- Add `QuickwitService::MetastoreReadReplica`, parsed from
  `metastore_read_replica` / `metastore-read-replica`.
- Split `QuickwitService::default_services()` from `supported_services()`
  so the new role is opt-in and never enabled implicitly on all-in-one
  nodes.
- Add the optional `metastore_read_replica_uri` field (env
  `QW_METASTORE_READ_REPLICA_URI`), redacted alongside `metastore_uri`.
- Validate that the role requires a PostgreSQL `metastore_read_replica_uri`
  and cannot be co-located with the `metastore` role.
Add the resolution plumbing for connecting to a PostgreSQL read replica
over a read-only connection.

- Add `MetastoreFactoryOptions { read_only }` and thread it through
  `MetastoreFactory::resolve`.
- Add `MetastoreResolver::resolve_read_only`, which rejects any non-
  PostgreSQL backend.
- Key the PostgreSQL factory cache on `(uri, options)` so the read-write
  and read-only clients get distinct connection pools.
- Add `PostgresqlMetastore::new_read_only`: a read-only connection pool
  with migrations skipped (the replica is migrated by the primary).
Resolve and expose a metastore gRPC server when the
`metastore_read_replica` role is enabled, backed by a read-only
connection to `metastore_read_replica_uri`.

- The read replica server reuses the metrics + load-shed layers but omits
  the control-plane event layers, which only wrap write RPCs.
- A read-replica node is exempted from the control-plane connectivity
  wait, like a primary metastore node, so dedicated replica pods start
  independently.
- Extract `metastore_max_in_flight_requests` shared by both roles.
Add `ReadReplicaRoutingMetastore`, a `MetastoreService` wrapper that
routes the stale-tolerant reads issued by the search and analytics paths
(`index_metadata`, `indexes_metadata`, `list_indexes_metadata`,
`list_splits`, `list_metrics_splits`, `list_sketch_splits`) to read
replica nodes when any are connected, and everything else (writes and
non-hot-path reads) to the primary.

- Routing is decided per request from the read replica balance channel's
  live connection set, so it degrades to the primary when no replica is
  deployed. The check is a synchronous `watch` read with no borrow held
  across an await.
- Wire it into the searcher service and the DataFusion session builder, so
  all search (REST, Elasticsearch, gRPC) and metrics analytics benefit,
  while REST admin handlers keep read-your-writes against the primary.
- Document `metastore_read_replica_uri` in the node config reference and
  the example `quickwit.yaml`.
- Add a PostgreSQL-gated test that resolves a read-only metastore against
  a real database and verifies it serves read RPCs.
Replace the full `MetastoreService` implementation on
`ReadReplicaRoutingMetastore` (which forced ~45 delegating methods, most
of them writes) with a narrow, read-only `MetastoreReadService` trait —
the read-only subset of the metastore RPCs (`index_metadata`,
`list_indexes_metadata`, `list_splits`, `list_metrics_splits`,
`list_sketch_splits`).

- `MetastoreServiceClient` implements `MetastoreReadService`;
  `MetastoreReadServiceClient = Arc<dyn MetastoreReadService>`.
- `ReadReplicaRoutingMetastore` now implements only the 5-method trait, so
  writes are excluded at the type level rather than delegated.
- The search and DataFusion read paths take `MetastoreReadServiceClient` /
  `&dyn MetastoreReadService`; `list_parquet_splits_*` take
  `&dyn MetastoreReadService`. `single_node_search` keeps its concrete
  `MetastoreServiceClient` parameter and adapts internally.

Addresses review feedback that the wrapper should expose a Go-style
read-only interface instead of reimplementing the whole service.
@shuheiktgw shuheiktgw force-pushed the metastore-read-replica-role branch 2 times, most recently from c0ae4dd to d14a422 Compare June 25, 2026 16:13
Comment thread config/quickwit.yaml
# # DataFusion when enabled, to nodes running the `metastore_read_replica`
# # service. Searchers require at least one `metastore_read_replica` node at
# # startup and do not fall back to the primary metastore.
# use_metastore_read_replica: false

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we need this flag in addition to metastore_read_replica_uri is that users can set metastore_read_replica_uri via the QW_METASTORE_READ_REPLICA_URI environment variable. Without this flag, searchers would need access to that same environment variable even though they don’t need the secret itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant