Skip to content

Fix/batch inprogress mutations#1431

Merged
Slach merged 3 commits into
Altinity:masterfrom
ruslanen:fix/batch-inprogress-mutations
Jun 30, 2026
Merged

Fix/batch inprogress mutations#1431
Slach merged 3 commits into
Altinity:masterfrom
ruslanen:fix/batch-inprogress-mutations

Conversation

@ruslanen

Copy link
Copy Markdown
Contributor

Batch system.mutations lookup to avoid O(N²) on installations with many tables

Problem

During create, in-progress mutations are fetched once per table via
GetInProgressMutations(ctx, database, table):

SELECT mutation_id, command FROM system.mutations WHERE is_done=0 AND database=? AND table=?

system.mutations is a virtual table: every query against it enumerates all tables on the
server, so the per-table WHERE database/table filter does not bound the work that ClickHouse
actually does. With N tables in the backup we issue N such queries, and each one costs
~O(total tables) → overall O(N²).

On installations with many tables this single query family dominates create wall-clock. Observed
on a real cluster: ~240 ms per call across tens of thousands of tables, collapsing create
throughput from ~250 tables/s to ~26 tables/s.

Fix

Fetch the whole in-progress mutation set once per backup with a single scan:

SELECT database, table, mutation_id, command FROM system.mutations WHERE is_done=0

GetInProgressMutationsBatch returns a map["database.table"][]MutationMetadata; the per-table
code path now does an in-memory map lookup instead of a query. Query count drops from N → 1.

Behavior unchanged

  • Same WHERE is_done=0 filter as before.
  • Same per-table Mutations written into TableMetadata.
  • The batch query runs only when the per-table query would have run before (BackupMutations
    enabled and not schema/rbac/configs/named-collections-only).

Tests

Added TestGroupMutationsByTable (+ empty-input case) covering the pure groupMutationsByTable
helper: rows from one server-wide scan are bucketed to the correct database.table with no
cross-table leakage and in stable order.

Impact (measured)

55k-table / 93.6 GiB local backup: create ~35 min → ~230 s.

Files

  • pkg/clickhouse/clickhouse.goGetInProgressMutationsBatch, inProgressMutationRow, groupMutationsByTable
  • pkg/backup/create.go — single batch call, per-table map lookup
  • pkg/clickhouse/clickhouse_test.go — unit tests

ruslanen added 2 commits June 19, 2026 16:28
…bles

GetInProgressMutations was called once per table during `create`. Each query
against system.mutations enumerates every table on the server, so the per-table
WHERE database/table filter does not bound the work: cost is
O(total tables) per call * N calls = O(N^2). On installations with many tables
this query family dominates `create` wall-clock (observed ~240ms/call across
tens of thousands of tables).

Fetch the whole in-progress mutation set once per backup via a new
GetInProgressMutationsBatch (single system.mutations scan, same WHERE is_done=0
filter) and look it up per table from an in-memory map keyed by
"database.table". Behavior is unchanged (same per-table Mutations in
TableMetadata); only the query count changes (N -> 1).

Measured on a 55k-table / 93.6GiB local backup: create ~35min -> ~230s.
Extract groupMutationsByTable (pure, no I/O) from GetInProgressMutationsBatch and
add unit tests: two tables (one with two mutations) bucket to the correct
database.table with no cross-table leakage, plus an empty-input case.
@Slach Slach added this to the 2.7.3 milestone Jun 30, 2026
Comment thread pkg/clickhouse/clickhouse.go Outdated
func groupMutationsByTable(rows []inProgressMutationRow) map[string][]metadata.MutationMetadata {
result := make(map[string][]metadata.MutationMetadata, len(rows))
for _, r := range rows {
key := r.Database + "." + r.Table

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look to metadata.TableTitle

r.Database + "." + r.Table

this is bad key

you not cover lot of corner cases

@Slach Slach left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for contribution

but, actually if you have more than 1000 mutations, this is means something wrong with your achitecture

scan in-memory system.mutations for each table, is not so dangerous

Comment thread pkg/clickhouse/clickhouse.go
@Slach

Slach commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

ok. i figure out about O(N^2) for system.mutation, you are right let me fix TableTitle approach

@Slach Slach merged commit 42dcee3 into Altinity:master Jun 30, 2026
53 of 56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants