Skip to content

ClickHouse: Cold runs preload data into memory before the timer #941

@pfent

Description

@pfent

Commit 7c1f7a3 enables preloading of the primary key on ClickHouse startup.

Quoting clickhouse/install:

# Force synchronous startup loading so the cold timer doesn't catch
# work that should have been amortized into ./start.
#
# Two independent layers of laziness contribute to the cold-query floor:
#
# 1. async_load_databases (server-level): with the default 1, the server
#    binds its listen port and answers SELECT 1 before user databases
#    have finished loading. ./check passes, then the first query stalls
#    waiting for the part loader.
#
# 2. primary_key_lazy_load and columns_and_secondary_indices_sizes_lazy_calculation
#    (MergeTree-level): even after parts are loaded, the in-memory
#    primary key and the per-column .size streams are populated lazily
#    on first query. With ~25 parts × ~80 columns × multiple metadata
#    files per column that's >1.8k file opens on the first query path,
#    contributing several hundred ms even on local NVMe.
#
# Both are eager-load toggles, not caching shortcuts: the same I/O
# happens either way, just before query timing instead of during it.
# Together they brought Q40 cold from ~3 s to ~1.5 s on c6a.4xlarge.

README.md defines true cold runs differently:

2.a) True cold runs. Before each first run of each query, all [...] database caches (e.g. buffer pools) are cleared.

In the current configuration, ClickHouse pre-warms the primary key data at startup, which defeats the clearing of all database caches. The measured cold runs times aren't truly cold anymore. While size-calculations might be considered metadata, primary key data is definitively data that's also queried by the benchmark queries.

I'm not sure there's a good definition which data is fine to be cached for a true cold run and which not.
Two of my suggestions below:

  1. Include startup times in "true cold" runs. This effectively prevents cheating by preloading data at startup.
  2. Drop the cold run altogether. "Cold" is clearly different for different systems and does not compare performance apples-to-apples

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions