Commit 7c1f7a3 enables preloading of the primary key on ClickHouse startup.
Quoting clickhouse/install:
# Force synchronous startup loading so the cold timer doesn't catch
# work that should have been amortized into ./start.
#
# Two independent layers of laziness contribute to the cold-query floor:
#
# 1. async_load_databases (server-level): with the default 1, the server
# binds its listen port and answers SELECT 1 before user databases
# have finished loading. ./check passes, then the first query stalls
# waiting for the part loader.
#
# 2. primary_key_lazy_load and columns_and_secondary_indices_sizes_lazy_calculation
# (MergeTree-level): even after parts are loaded, the in-memory
# primary key and the per-column .size streams are populated lazily
# on first query. With ~25 parts × ~80 columns × multiple metadata
# files per column that's >1.8k file opens on the first query path,
# contributing several hundred ms even on local NVMe.
#
# Both are eager-load toggles, not caching shortcuts: the same I/O
# happens either way, just before query timing instead of during it.
# Together they brought Q40 cold from ~3 s to ~1.5 s on c6a.4xlarge.
README.md defines true cold runs differently:
2.a) True cold runs. Before each first run of each query, all [...] database caches (e.g. buffer pools) are cleared.
In the current configuration, ClickHouse pre-warms the primary key data at startup, which defeats the clearing of all database caches. The measured cold runs times aren't truly cold anymore. While size-calculations might be considered metadata, primary key data is definitively data that's also queried by the benchmark queries.
I'm not sure there's a good definition which data is fine to be cached for a true cold run and which not.
Two of my suggestions below:
- Include startup times in "true cold" runs. This effectively prevents cheating by preloading data at startup.
- Drop the cold run altogether. "Cold" is clearly different for different systems and does not compare performance apples-to-apples
Commit 7c1f7a3 enables preloading of the primary key on ClickHouse startup.
Quoting
clickhouse/install:README.md defines true cold runs differently:
In the current configuration, ClickHouse pre-warms the primary key data at startup, which defeats the clearing of all database caches. The measured cold runs times aren't truly cold anymore. While size-calculations might be considered metadata, primary key data is definitively data that's also queried by the benchmark queries.
I'm not sure there's a good definition which data is fine to be cached for a true cold run and which not.
Two of my suggestions below: