From f4cb375a758edcfe15d21d40bdd17c1e88653119 Mon Sep 17 00:00:00 2001 From: Eric Allam Date: Fri, 15 May 2026 18:11:01 +0100 Subject: [PATCH] docs(clickhouse): require max+1 numbering and idempotent DDL for migrations MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Codify two rules that came out of the 029/030 ordering incident (TRI-9367 test cloud deploy): 1. ClickHouse migration files must be numbered max(existing)+1. Goose runs in strict mode in the deploy pipeline and refuses to apply a missing version below the current version, which blocks the deploy. Branches that fell behind main need to renumber before merging. 2. DDL must be idempotent (ADD COLUMN IF NOT EXISTS, DROP COLUMN IF EXISTS, CREATE TABLE IF NOT EXISTS, etc.) so a retry or out-of-order apply (e.g. goose up --allow-missing for local recovery) is a no-op rather than an error. Rules go in internal-packages/clickhouse/CLAUDE.md; reviewer-facing summary added to .claude/REVIEW.md as a new 🔴 finding. --- .claude/REVIEW.md | 1 + internal-packages/clickhouse/CLAUDE.md | 34 +++++++++++++++++++++++--- 2 files changed, 32 insertions(+), 3 deletions(-) diff --git a/.claude/REVIEW.md b/.claude/REVIEW.md index 67f7a9f15cb..19edf00a52e 100644 --- a/.claude/REVIEW.md +++ b/.claude/REVIEW.md @@ -11,6 +11,7 @@ Reserve 🔴 for things that would page someone or block a rollback. In this cod - A Redis data shape used by both versions changes in place. New shapes need a new key namespace. - A migration is not backward-compatible with the prior image. - **Schema / migration safety.** Prisma migrations must be backward-compatible with the prior deploy. Adding NOT NULL without a default, dropping a column an old image still reads, renaming a column — all 🔴. +- **ClickHouse migration ordering + idempotency.** Goose runs in strict mode in the deploy pipeline and refuses to apply a missing version below the current version — slotting a new file in below the latest already-applied version blocks the deploy. New ClickHouse migration files MUST use the next available number (`max(files in internal-packages/clickhouse/schema/) + 1`); if main has added migrations while you've been on a branch, renumber yours. DDL must also be idempotent (`ADD COLUMN IF NOT EXISTS`, `DROP COLUMN IF EXISTS`, `CREATE TABLE IF NOT EXISTS`, `ADD INDEX IF NOT EXISTS`) so a partial / `--allow-missing` apply elsewhere doesn't fail on retry. Either fault is 🔴 — both break test/prod deploys. Rules live in `internal-packages/clickhouse/CLAUDE.md`. - **Queue / concurrency correctness.** RunQueue, MarQS (V1, legacy), redis-worker — any change to enqueue / dequeue / locking semantics. Re-derive the invariant on paper before flagging or accepting. - **Missing index on a hot table.** New Prisma queries against `TaskRun`, `TaskRunExecutionSnapshot`, `JobRun`, `Project`, etc. must use an existing index. Check `internal-packages/database/prisma/schema.prisma` for the relevant `@@index` lines — don't guess and don't propose `EXPLAIN`. - **Recovery-path queries.** Any `TaskRun.findFirst` / `findMany` added to a schedule, run-recovery, or restart loop. Recovery fan-outs (Redis crash, restart storms) turn "rare indexed query" into a DB incident. 🔴 even if indexed. diff --git a/internal-packages/clickhouse/CLAUDE.md b/internal-packages/clickhouse/CLAUDE.md index c3e4f9125a0..9b482b8a2d7 100644 --- a/internal-packages/clickhouse/CLAUDE.md +++ b/internal-packages/clickhouse/CLAUDE.md @@ -4,18 +4,46 @@ ## Migrations -Goose-format SQL migrations live in `schema/`. Create new numbered files: +Goose-format SQL migrations live in `schema/`. Two rules below are load-bearing — both can block a deploy. + +### Rule 1: number to `max + 1`, never slot in + +Goose runs in strict mode in the deploy pipeline. If a migration file numbered *below* the version currently recorded in `goose_db_version` ever shows up, goose refuses to apply it and the deploy fails: + +``` +goose run: error: found 1 missing migrations before current version 30: + version 29: 029_add_task_kind_to_task_runs_v2.sql +``` + +When adding a migration: + +1. Look at `schema/` and take the largest existing number, call it `N`. +2. Name your file `0(N+1)_descriptive_name.sql`. +3. If you've been on a branch while main added migrations, **rebase and renumber** before opening the PR — a file numbered below the new max will block the next deploy after your PR merges. + +### Rule 2: DDL must be idempotent + +Migrations can be applied out of order in some environments (`goose up --allow-missing` for local recovery, manual fixups, etc.) and may be retried. Always use idempotent forms so a re-apply is a no-op: ```sql -- +goose Up ALTER TABLE trigger_dev.your_table -ADD COLUMN new_column String DEFAULT ''; + ADD COLUMN IF NOT EXISTS new_column String DEFAULT ''; -- +goose Down ALTER TABLE trigger_dev.your_table -DROP COLUMN new_column; + DROP COLUMN IF EXISTS new_column; ``` +Equivalent forms for other DDL: + +- `CREATE TABLE IF NOT EXISTS …` +- `DROP TABLE IF EXISTS …` +- `ADD INDEX IF NOT EXISTS …` / `DROP INDEX IF EXISTS …` +- `CREATE MATERIALIZED VIEW IF NOT EXISTS …` / `DROP VIEW IF EXISTS …` + +ClickHouse supports `IF [NOT] EXISTS` on all of the above. Older migrations in this directory predate the rule and are not idempotent — leave them as-is unless you're explicitly hardening one. + ## Naming Conventions - `raw_` prefix for input tables (where data lands first)