console: add k6 load test harness for cluster detail performance#36796
Open
leedqin wants to merge 1 commit into
Open
console: add k6 load test harness for cluster detail performance#36796leedqin wants to merge 1 commit into
leedqin wants to merge 1 commit into
Conversation
Replays the console's per-page /api/sql traffic against Materialize at concurrent virtual-user counts. k6 isolates server-side concurrency on mz_catalog_server. Ships: * dump-queries.spec.ts — Playwright spec that captures live /api/sql POSTs against the cluster detail page and writes them to e2e-tests/k6/queries.json (gitignored, environment-specific). * cluster-detail.js — k6 script that replays the captured POSTs at ramping VU counts (default 5→100 over 6m), with per-label SLA thresholds and a per-tag latency breakdown. *baselines/workload_replay_2026_05_29.md — first baseline run on default mz_catalog_server replica size. Setup and run instructions: console/doc/guide-testing.md under "Running Load Tests (k6)".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replays the console's per-page /api/sql traffic against Materialize at concurrent virtual-user counts. k6 isolates server-side concurrency on mz_catalog_server.
Ships:
dump-queries.spec.ts— Playwright spec that captures live /api/sql POSTs against the cluster detail page and writes them to e2e-tests/k6/queries.json (gitignored, environment-specific).cluster-detail.js— k6 script that replays the captured POSTs at ramping VU counts (default 5→100 over 6m), with per-label SLA thresholds and a per-tag latency breakdown. *baselines/workload_replay_2026_05_29.md — first baseline run on 50 cc mz_catalog_server replica sizeSetup and run instructions: console/doc/guide-testing.md under "Running Load Tests (k6)".
Motivation
Motivated by the console scalability initiative. Fixes CNS-86
Tips For The Reviewer
There are two critical pieces of this PR. First part is recording the queries being dumped in a json file that k6 can play back captured HTTP requests for concurrent virtual users.
Piece 1 — Recording (dump-queries.spec.ts)
A Playwright test. Opens the console's cluster detail page in a real browser, watches every /api/sql POST for 75 seconds, and writes the latest one per query label to queries.json. That file ends up looking like:
Why we need this step at all: the console builds its SQL dynamically with Kysely (no SQL files anywhere) and bakes in environment-specific values like cluster IDs and replica names. The only way to faithfully replay what the browser actually sends is to record it once. The file is gitignored
Piece 2 — Playback (cluster-detail.js)
A k6 script. Reads queries.json and POSTs those exact requests, but at increasing concurrent virtual-user counts:
Each "VU" = one simulated browser tab
Each iteration = one polling tick (~5 seconds)
Iteration 0 = fires every query (cold-mount burst, simulates page load)
Later iterations = fires each query only when its real polling interval has elapsed (5s for most, 60s for the heavy ones)
Every request is tagged with its label so the summary shows per-query latency
The script ramps from 5 → 100 VUs over 6 minutes, then prints per-label percentiles.