From f27dc3352b25c17b419af8dac8f7398e4c5f923f Mon Sep 17 00:00:00 2001 From: zlc Date: Fri, 12 Jun 2026 09:54:49 +0000 Subject: [PATCH 1/7] docs: add how-to for replicated ClickHouse cluster with embedded ClickHouse Keeper Covers coordination topology choice (external ZooKeeper / standalone Keeper / embedded Keeper), a full ClickHouseInstallation manifest with init-container-generated raft configuration, quorum and replication verification, and recommended settings for log storage workloads. Notes that the platform ClickHouse Operator (0.20.x) does not include the upstream ClickHouseKeeperInstallation controller, so embedded or standalone Keeper is required. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...Cluster_with_Embedded_ClickHouse_Keeper.md | 375 ++++++++++++++++++ 1 file changed, 375 insertions(+) create mode 100644 docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md diff --git a/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md b/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md new file mode 100644 index 00000000..a90f1ba1 --- /dev/null +++ b/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md @@ -0,0 +1,375 @@ +--- +kind: + - How To +products: + - Alauda Container Platform + - Alauda Application Services +ProductsVersion: + - 4.2.x +--- +# How to Deploy a Replicated ClickHouse Cluster with Embedded ClickHouse Keeper + +## Purpose + +This document explains how to deploy a replicated ClickHouse cluster that uses ClickHouse Keeper instead of an external ZooKeeper ensemble, using the ClickHouse Operator on Alauda Container Platform. Keeper runs embedded inside each ClickHouse Pod, so no separate coordination workload needs to be deployed or operated. + +This topology is well suited for log storage scenarios where a small replicated cluster and minimal operational footprint are the priority. + +The procedure covers: + +- Choosing a coordination topology (external ZooKeeper, standalone Keeper, or embedded Keeper). +- Deploying a `ClickHouseInstallation` with embedded Keeper. +- Verifying the Keeper quorum and table replication. +- Recommended ClickHouse settings for log storage workloads. + +## Environment + +- Alauda Container Platform 4.2 or later +- ClickHouse Operator based on Altinity clickhouse-operator 0.20.x +- ClickHouse Server 23.x or later (ClickHouse Keeper is bundled in the server image; validated with 25.x) +- Cluster access via `kubectl` + +## Resolution + +### 1. Overview + +Replicated tables (`ReplicatedMergeTree` family) and `ON CLUSTER` DDL in ClickHouse require a coordination service. ClickHouse Keeper is the ZooKeeper-compatible replacement implemented inside ClickHouse itself: it speaks the ZooKeeper client protocol, uses Raft instead of ZAB, and can run in the same process or Pod as the ClickHouse server. + +Three topologies are possible on Kubernetes: + +| Topology | Description | Trade-offs | +|----------|-------------|------------| +| External ZooKeeper | A separate ZooKeeper StatefulSet (3 nodes); the `ClickHouseInstallation` points at `zookeeper:2181` | Mature and battle-tested, but one more stateful workload to operate | +| Standalone Keeper | A separate ClickHouse Keeper StatefulSet (3 nodes); the `ClickHouseInstallation` points at `keeper:9181` | Resource isolation between Keeper and ClickHouse, but still two workloads, and the Keeper manifests are managed manually | +| Embedded Keeper (this document) | Every ClickHouse Pod also runs a Keeper Raft member; the `ClickHouseInstallation` points at its own Keeper Service | One workload only; Keeper shares Pod resources with ClickHouse and scaling of the two is coupled | + +> **Note on operator-managed Keeper:** newer upstream versions of the Altinity operator introduce a `ClickHouseKeeperInstallation` (CHK) custom resource with its own controller. The ClickHouse Operator 0.20.x delivered with the platform does not include this controller, and the CHK CRD is not installed — `kubectl api-resources --api-group=clickhouse-keeper.altinity.com` returns no resources. Keeper therefore cannot be deployed as an operator-managed CR on this version; use the embedded topology described here (or a standalone StatefulSet) instead. The only Keeper-related field the operator consumes is `spec.configuration.zookeeper`, which works unchanged against Keeper because the client protocol is ZooKeeper-compatible. + +How the embedded topology works: + +- The cluster layout is 1 shard × 3 replicas, so the 3 ClickHouse Pods form a 3-member Keeper quorum (an odd member count that tolerates the loss of one node). +- A static `keeper_config.xml` fragment is injected into every shard through the `ClickHouseInstallation` `files` mechanism. +- The dynamic part of the Keeper configuration — `server_id` and the `raft_configuration` member list — depends on the Pod identity, so it is generated at Pod start by an init container and pulled in through `include_from`. +- A dedicated headless Service exposes the Keeper client port (9181), and `spec.configuration.zookeeper.nodes` points at that Service. + +### 2. Prerequisites + +| Item | Description | Example | +|------|-------------|---------| +| Namespace | Namespace for the ClickHouse cluster | `` | +| ClickHouseInstallation name | Name of the CHI resource | `` | +| ClickHouse cluster name | Cluster name inside the CHI spec | `` | +| ClickHouse image | Server image (Keeper is included) | `` | +| StorageClass | StorageClass for data volumes | `` | +| Storage size | PVC size per replica | `` | + +```bash +export NAMESPACE="" +export CHI_NAME="" +export CLUSTER_NAME="" +``` + +The ClickHouse Operator must already be installed and watching the target namespace. + +### 3. Create the Keeper client Service + +Replicas reach the Keeper quorum through a headless Service that selects all ready ClickHouse Pods of this installation. The `clickhouse.altinity.com/role: keeper` label is added to the Pods by the pod template in the next step. + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: ${CHI_NAME}-keeper + namespace: ${NAMESPACE} +spec: + clusterIP: None + type: ClusterIP + ports: + - name: keeper + port: 9181 + protocol: TCP + targetPort: 9181 + selector: + clickhouse.altinity.com/chi: ${CHI_NAME} + clickhouse.altinity.com/namespace: ${NAMESPACE} + clickhouse.altinity.com/ready: "yes" + clickhouse.altinity.com/role: keeper +``` + +### 4. Deploy the ClickHouseInstallation + +The manifest below deploys 1 shard × 3 replicas with embedded Keeper. Key points are explained after the manifest. + +```yaml +apiVersion: "clickhouse.altinity.com/v1" +kind: "ClickHouseInstallation" +metadata: + name: ${CHI_NAME} + namespace: ${NAMESPACE} +spec: + configuration: + zookeeper: + nodes: + - host: ${CHI_NAME}-keeper + port: 9181 + clusters: + - name: ${CLUSTER_NAME} + templates: + podTemplate: pod-template + dataVolumeClaimTemplate: data-volumeclaim-template + layout: + shardsCount: 1 + replicasCount: 3 + shards: + - files: + keeper_config.xml: | + + /tmp/clickhouse/keeper_dynamic_configuration.xml + + /var/lib/clickhouse-keeper + 9181 + * + + information + + + + settings: + # Keep system tables bounded; without TTLs they grow indefinitely. + asynchronous_metric_log/database: system + asynchronous_metric_log/table: asynchronous_metric_log + asynchronous_metric_log/ttl: "event_date + INTERVAL 7 DAY DELETE" + metric_log/database: system + metric_log/table: metric_log + metric_log/ttl: "event_date + INTERVAL 7 DAY DELETE" + trace_log/database: system + trace_log/table: trace_log + trace_log/ttl: "event_date + INTERVAL 7 DAY DELETE" + profiles: + default/max_execution_time: 120 + default/allow_unrestricted_reads_from_keeper: "1" + defaults: + templates: + podTemplate: pod-template + dataVolumeClaimTemplate: data-volumeclaim-template + templates: + podTemplates: + - name: pod-template + podDistribution: + - scope: Shard + topologyKey: kubernetes.io/hostname + type: ShardAntiAffinity + metadata: + labels: + clickhouse.altinity.com/role: keeper + spec: + containers: + - name: clickhouse + image: + env: + - name: RAFT_PORT + value: "9444" + ports: + - name: http + containerPort: 8123 + - name: client + containerPort: 9000 + - name: interserver + containerPort: 9009 + - name: ch-keeper + containerPort: 9181 + - name: raft + containerPort: 9444 + volumeMounts: + - name: data-volumeclaim-template + mountPath: /var/lib/clickhouse + - name: keeper-dynamic-config + mountPath: /tmp/clickhouse + readinessProbe: + tcpSocket: + port: 9444 + initialDelaySeconds: 10 + timeoutSeconds: 5 + periodSeconds: 10 + failureThreshold: 3 + initContainers: + - name: keeper-config-initializer + image: + env: + - name: RAFT_PORT + value: "9444" + - name: SHARDS_COUNT + value: "1" + - name: REPLICAS_COUNT + value: "3" + command: + - /bin/bash + - -c + - | + set -euo pipefail + OUT="/tmp/config/keeper_dynamic_configuration.xml" + + HOST=$(hostname -s) + DOMAIN=$(hostname -d) + # StatefulSet Pod hostname: ---- + if [[ $HOST =~ (.*)-([0-9]+)-([0-9]+)-([0-9]+)$ ]]; then + SHARD=${BASH_REMATCH[2]} + REPLICA=${BASH_REMATCH[3]} + else + echo "Failed to parse shard/replica from hostname $HOST"; exit 1 + fi + # Pod FQDN domain: ---..svc. + if [[ $DOMAIN =~ ^(.*)-([0-9]+)-([0-9]+)\.(.*)$ ]]; then + DOMAIN_NAME=${BASH_REMATCH[1]} + DOMAIN_SUFFIX=.${BASH_REMATCH[4]} + else + echo "Failed to parse domain $DOMAIN"; exit 1 + fi + + MY_ID=$((SHARD * REPLICAS_COUNT + REPLICA + 1)) + KEEPER_ID=1 + { + echo "" + echo " " + echo " ${MY_ID}" + echo " " + for (( i=0; i" + echo " ${KEEPER_ID}" + echo " ${DOMAIN_NAME}-${i}-${j}${DOMAIN_SUFFIX}" + echo " ${RAFT_PORT}" + echo " " + KEEPER_ID=$((KEEPER_ID + 1)) + done + done + echo " " + echo " " + echo "" + } > "$OUT" + echo "Keeper dynamic configuration generated for server_id=${MY_ID}" + volumeMounts: + - name: keeper-dynamic-config + mountPath: /tmp/config + volumes: + - name: keeper-dynamic-config + emptyDir: + medium: Memory + serviceTemplates: + - name: replica-service-template + spec: + type: ClusterIP + ports: + - name: http + port: 8123 + - name: tcp + port: 9000 + - name: interserver + port: 9009 + - name: clickhouse-keeper + port: 9181 + - name: raft + port: 9444 + volumeClaimTemplates: + - name: data-volumeclaim-template + spec: + accessModes: + - ReadWriteOnce + storageClassName: + resources: + requests: + storage: +``` + +Key points: + +- **Quorum and layout.** `shardsCount: 1` and `replicasCount: 3` produce 3 Pods, each a Keeper Raft member. Keep an odd member count; 3 members tolerate one failure. If you add shards, every replica of every shard joins the quorum, and the `server_id` formula `SHARD * REPLICAS_COUNT + REPLICA + 1` stays unique as long as `REPLICAS_COUNT` in the init container matches the real layout. +- **Static vs dynamic Keeper config.** The static part (`tcp_port`, data `path`, coordination settings) is injected per shard through `layout.shards[].files`. The identity-dependent part (`server_id`, `raft_configuration`) is generated by the init container into an in-memory `emptyDir` and merged via `include_from`. Replica changes therefore require only updating `REPLICAS_COUNT`, not editing the static fragment. +- **Coordination endpoint.** `spec.configuration.zookeeper.nodes` points at the Keeper Service on port 9181. The operator renders this into the server `zookeeper` configuration; ClickHouse does not need to know whether the backend is ZooKeeper or Keeper. +- **Anti-affinity.** `ShardAntiAffinity` on `kubernetes.io/hostname` spreads the replicas (and therefore the Keeper members) across nodes, so a single node failure cannot break the quorum. +- **Readiness.** The readiness probe checks the Raft port (9444), so a Pod only becomes ready after its Keeper member is up. +- **Per-replica Service ports.** The replica Service template exposes 9181 and 9444 in addition to the ClickHouse ports, so quorum members can reach each other through their per-replica DNS names. + +Apply the manifests: + +```bash +kubectl apply -n "$NAMESPACE" -f keeper-service.yaml +kubectl apply -n "$NAMESPACE" -f chi.yaml +``` + +### 5. Verify the Keeper quorum + +Wait until the CHI reaches `Completed`: + +```bash +kubectl -n "$NAMESPACE" get clickhouseinstallation "$CHI_NAME" -w +``` + +Check the quorum with the `mntr` four-letter command on any Pod (this is why `four_letter_word_white_list` is enabled): + +```bash +kubectl -n "$NAMESPACE" exec chi-${CHI_NAME}-${CLUSTER_NAME}-0-0-0 -c clickhouse -- \ + bash -c 'exec 3<>/dev/tcp/localhost/9181; printf mntr >&3; cat <&3' | egrep 'zk_server_state|zk_synced_followers' +``` + +Expected output: one Pod reports `zk_server_state leader` with `zk_synced_followers 2`; the other two report `follower`. + +Confirm ClickHouse can reach the coordination service: + +```bash +kubectl -n "$NAMESPACE" exec chi-${CHI_NAME}-${CLUSTER_NAME}-0-0-0 -c clickhouse -- \ + clickhouse-client -q "SELECT * FROM system.zookeeper WHERE path = '/'" +``` + +### 6. Verify replication + +Create a replicated table across the cluster, write on one replica, and read on another: + +```bash +kubectl -n "$NAMESPACE" exec chi-${CHI_NAME}-${CLUSTER_NAME}-0-0-0 -c clickhouse -- clickhouse-client -q " +CREATE TABLE default.keeper_smoke ON CLUSTER '${CLUSTER_NAME}' +(ts DateTime, msg String) +ENGINE = ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/default/keeper_smoke', '{replica}') +ORDER BY ts" + +kubectl -n "$NAMESPACE" exec chi-${CHI_NAME}-${CLUSTER_NAME}-0-0-0 -c clickhouse -- \ + clickhouse-client -q "INSERT INTO default.keeper_smoke VALUES (now(), 'hello')" + +kubectl -n "$NAMESPACE" exec chi-${CHI_NAME}-${CLUSTER_NAME}-0-1-0 -c clickhouse -- \ + clickhouse-client -q "SELECT count() FROM default.keeper_smoke" +``` + +The count on the second replica must be `1`. Also confirm replica health: + +```bash +kubectl -n "$NAMESPACE" exec chi-${CHI_NAME}-${CLUSTER_NAME}-0-0-0 -c clickhouse -- \ + clickhouse-client -q "SELECT database, table, is_readonly, absolute_delay FROM system.replicas" +``` + +`is_readonly` must be `0` and `absolute_delay` close to `0`. Clean up the smoke-test table afterwards: + +```bash +kubectl -n "$NAMESPACE" exec chi-${CHI_NAME}-${CLUSTER_NAME}-0-0-0 -c clickhouse -- \ + clickhouse-client -q "DROP TABLE default.keeper_smoke ON CLUSTER '${CLUSTER_NAME}' SYNC" +``` + +### 7. Recommended settings for log storage workloads + +The manifest above already includes the settings that matter most for log storage. Rationale and additional options: + +| Setting | Recommendation | Why | +|---------|----------------|-----| +| `asynchronous_metric_log/ttl`, `metric_log/ttl`, `trace_log/ttl` | 7-day `DELETE` TTL | Self-observability system tables grow indefinitely by default and will eventually fill the data volume | +| `default/max_execution_time` | 120 seconds | Prevents a single slow query from starving ingestion on a small cluster | +| `default/allow_unrestricted_reads_from_keeper` | `1` (optional) | Allows broad `system.zookeeper` reads for troubleshooting; low risk | +| `default/max_parallel_replicas` | Match replica count (optional) | Can accelerate reads on a single-shard, multi-replica layout | +| Table TTLs on log tables | Per-category retention (e.g. `event_date + INTERVAL 7 DAY DELETE`) | Retention is the primary capacity control for log data | +| Quotas (`interval/duration`, `interval/queries`) | Optional guardrail | Caps runaway query volume per user | + +Constraints and trade-offs of the embedded topology: + +- Keeper shares CPU, memory, and disk I/O with the ClickHouse server. Size the Pod resources for both, and keep the Keeper data path (`/var/lib/clickhouse-keeper`) on the same persistent volume. +- Replica count and quorum size are coupled. Do not scale replicas to an even number or below 3, and keep `REPLICAS_COUNT` in the init container in sync with the layout. +- Rolling restarts restart Keeper members together with ClickHouse. The default `maxUnavailable` behavior of the operator (one host at a time) keeps the quorum alive; do not force parallel restarts. + +If you need Keeper and ClickHouse to scale independently, or you run many ClickHouse installations against a shared coordination service, deploy a standalone 3-node Keeper (or ZooKeeper) StatefulSet instead and point `spec.configuration.zookeeper.nodes` at it; everything else in this document stays the same. From 6550ff417252a215ab32c365b5b061dda324b7a8 Mon Sep 17 00:00:00 2001 From: zlc Date: Mon, 15 Jun 2026 02:09:22 +0000 Subject: [PATCH 2/7] docs: reference the ACP-shipped operator version instead of upstream chop number Use the platform clickhouse-operator component version (v4.2.3 on ACP 4.2) in the Environment and operator-managed-Keeper note, since the upstream Altinity 0.20.x base version is not meaningful to ACP users. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...ated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md b/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md index a90f1ba1..89dc3281 100644 --- a/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md +++ b/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md @@ -25,7 +25,7 @@ The procedure covers: ## Environment - Alauda Container Platform 4.2 or later -- ClickHouse Operator based on Altinity clickhouse-operator 0.20.x +- ClickHouse Operator shipped with the platform — the `clickhouse-operator` component (validated with image `clickhouse-operator:v4.2.3` on ACP 4.2) - ClickHouse Server 23.x or later (ClickHouse Keeper is bundled in the server image; validated with 25.x) - Cluster access via `kubectl` @@ -43,7 +43,7 @@ Three topologies are possible on Kubernetes: | Standalone Keeper | A separate ClickHouse Keeper StatefulSet (3 nodes); the `ClickHouseInstallation` points at `keeper:9181` | Resource isolation between Keeper and ClickHouse, but still two workloads, and the Keeper manifests are managed manually | | Embedded Keeper (this document) | Every ClickHouse Pod also runs a Keeper Raft member; the `ClickHouseInstallation` points at its own Keeper Service | One workload only; Keeper shares Pod resources with ClickHouse and scaling of the two is coupled | -> **Note on operator-managed Keeper:** newer upstream versions of the Altinity operator introduce a `ClickHouseKeeperInstallation` (CHK) custom resource with its own controller. The ClickHouse Operator 0.20.x delivered with the platform does not include this controller, and the CHK CRD is not installed — `kubectl api-resources --api-group=clickhouse-keeper.altinity.com` returns no resources. Keeper therefore cannot be deployed as an operator-managed CR on this version; use the embedded topology described here (or a standalone StatefulSet) instead. The only Keeper-related field the operator consumes is `spec.configuration.zookeeper`, which works unchanged against Keeper because the client protocol is ZooKeeper-compatible. +> **Note on operator-managed Keeper:** newer versions of the upstream Altinity operator introduce a `ClickHouseKeeperInstallation` (CHK) custom resource with its own controller. The ClickHouse Operator shipped with this platform version does not include that controller, and the CHK CRD is not installed — `kubectl api-resources --api-group=clickhouse-keeper.altinity.com` returns no resources. Keeper therefore cannot be deployed as an operator-managed CR; use the embedded topology described here (or a standalone StatefulSet) instead. The only Keeper-related field the operator consumes is `spec.configuration.zookeeper`, which works unchanged against Keeper because the client protocol is ZooKeeper-compatible. How the embedded topology works: From 691682188441cbbbbcc41327aba2ee51fd19499d Mon Sep 17 00:00:00 2001 From: zlc Date: Mon, 15 Jun 2026 03:15:28 +0000 Subject: [PATCH 3/7] docs: use Keeper-native verification (clickhouse-keeper-client, system.zookeeper_connection) Lead the quorum verification with the dedicated clickhouse-keeper-client CLI, clarify that mntr is ClickHouse Keeper's own 4lw command (not an external ZooKeeper), and switch the connection check to system.zookeeper_connection. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...Cluster_with_Embedded_ClickHouse_Keeper.md | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md b/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md index 89dc3281..7cc86c92 100644 --- a/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md +++ b/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md @@ -305,22 +305,33 @@ Wait until the CHI reaches `Completed`: kubectl -n "$NAMESPACE" get clickhouseinstallation "$CHI_NAME" -w ``` -Check the quorum with the `mntr` four-letter command on any Pod (this is why `four_letter_word_white_list` is enabled): +Use the dedicated `clickhouse-keeper-client` CLI (bundled in the ClickHouse server image) to confirm the embedded Keeper answers on a Pod: + +```bash +kubectl -n "$NAMESPACE" exec chi-${CHI_NAME}-${CLUSTER_NAME}-0-0-0 -c clickhouse -- \ + clickhouse-keeper-client -h localhost -p 9181 -q "ls /" +``` + +A healthy Keeper returns its root znodes (for example `keeper clickhouse`). + +Check the Raft role and quorum size. The `mntr` command here is **ClickHouse Keeper's own** four-letter-word (4lw) implementation — it is served by the Keeper Raft engine, not by any ZooKeeper process, and is what the `four_letter_word_white_list` in `keeper_config.xml` enables. The `zk_`-prefixed keys are retained only so existing monitoring tooling can parse the output: ```bash kubectl -n "$NAMESPACE" exec chi-${CHI_NAME}-${CLUSTER_NAME}-0-0-0 -c clickhouse -- \ bash -c 'exec 3<>/dev/tcp/localhost/9181; printf mntr >&3; cat <&3' | egrep 'zk_server_state|zk_synced_followers' ``` -Expected output: one Pod reports `zk_server_state leader` with `zk_synced_followers 2`; the other two report `follower`. +Expected output: exactly one Pod reports `zk_server_state leader` with `zk_synced_followers 2`; the other two report `follower`. -Confirm ClickHouse can reach the coordination service: +Confirm the ClickHouse server is connected to the Keeper quorum. The client-side view is exposed through `system.zookeeper_connection` — the table keeps the historical `zookeeper` name but reports the Keeper endpoint and works against Keeper unchanged: ```bash kubectl -n "$NAMESPACE" exec chi-${CHI_NAME}-${CLUSTER_NAME}-0-0-0 -c clickhouse -- \ - clickhouse-client -q "SELECT * FROM system.zookeeper WHERE path = '/'" + clickhouse-client -q "SELECT * FROM system.zookeeper_connection FORMAT Vertical" ``` +The reported `host` must be the Keeper Service (`${CHI_NAME}-keeper`) on port `9181`. + ### 6. Verify replication Create a replicated table across the cluster, write on one replica, and read on another: From d8b545d719b2e4010c800ff7d56622e61269999e Mon Sep 17 00:00:00 2001 From: zlc Date: Mon, 15 Jun 2026 03:23:12 +0000 Subject: [PATCH 4/7] docs: unify manifest placeholders to environment variables - Parameterize the image, StorageClass, and PVC size with ${CLICKHOUSE_IMAGE}/${STORAGE_CLASS}/${STORAGE_SIZE} instead of bare <...> placeholders, matching the existing ${NAMESPACE}/${CHI_NAME} style. - List all required environment variables up front in Prerequisites. - Render manifests with envsubst using an explicit variable allow-list so the init container's runtime shell variables are preserved. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...Cluster_with_Embedded_ClickHouse_Keeper.md | 42 +++++++++++-------- 1 file changed, 24 insertions(+), 18 deletions(-) diff --git a/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md b/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md index 7cc86c92..efce08c1 100644 --- a/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md +++ b/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md @@ -52,25 +52,30 @@ How the embedded topology works: - The dynamic part of the Keeper configuration — `server_id` and the `raft_configuration` member list — depends on the Pod identity, so it is generated at Pod start by an init container and pulled in through `include_from`. - A dedicated headless Service exposes the Keeper client port (9181), and `spec.configuration.zookeeper.nodes` points at that Service. -### 2. Prerequisites +### 2. Prerequisites and required environment variables -| Item | Description | Example | -|------|-------------|---------| -| Namespace | Namespace for the ClickHouse cluster | `` | -| ClickHouseInstallation name | Name of the CHI resource | `` | -| ClickHouse cluster name | Cluster name inside the CHI spec | `` | -| ClickHouse image | Server image (Keeper is included) | `` | -| StorageClass | StorageClass for data volumes | `` | -| Storage size | PVC size per replica | `` | +The ClickHouse Operator must already be installed and watching the target namespace. + +Every manifest and command in this procedure is parameterized with the environment variables below. Set all of them first; the manifests are rendered with `envsubst` before being applied (step 4), so each variable must be exported. + +| Environment variable | Description | Example | +|----------------------|-------------|---------| +| `NAMESPACE` | Namespace for the ClickHouse cluster | `` | +| `CHI_NAME` | Name of the `ClickHouseInstallation` resource | `` | +| `CLUSTER_NAME` | Cluster name inside the CHI spec | `` | +| `CLICKHOUSE_IMAGE` | ClickHouse server image (Keeper is bundled in it) | `` | +| `STORAGE_CLASS` | StorageClass for the data volumes | `` | +| `STORAGE_SIZE` | PVC size per replica | `` | ```bash export NAMESPACE="" export CHI_NAME="" export CLUSTER_NAME="" +export CLICKHOUSE_IMAGE="" +export STORAGE_CLASS="" +export STORAGE_SIZE="" ``` -The ClickHouse Operator must already be installed and watching the target namespace. - ### 3. Create the Keeper client Service Replicas reach the Keeper quorum through a headless Service that selects all ready ClickHouse Pods of this installation. The `clickhouse.altinity.com/role: keeper` label is added to the Pods by the pod template in the next step. @@ -165,7 +170,7 @@ spec: spec: containers: - name: clickhouse - image: + image: ${CLICKHOUSE_IMAGE} env: - name: RAFT_PORT value: "9444" @@ -194,7 +199,7 @@ spec: failureThreshold: 3 initContainers: - name: keeper-config-initializer - image: + image: ${CLICKHOUSE_IMAGE} env: - name: RAFT_PORT value: "9444" @@ -275,10 +280,10 @@ spec: spec: accessModes: - ReadWriteOnce - storageClassName: + storageClassName: ${STORAGE_CLASS} resources: requests: - storage: + storage: ${STORAGE_SIZE} ``` Key points: @@ -290,11 +295,12 @@ Key points: - **Readiness.** The readiness probe checks the Raft port (9444), so a Pod only becomes ready after its Keeper member is up. - **Per-replica Service ports.** The replica Service template exposes 9181 and 9444 in addition to the ClickHouse ports, so quorum members can reach each other through their per-replica DNS names. -Apply the manifests: +Save the Service from step 3 as `keeper-service.yaml` and the `ClickHouseInstallation` above as `chi.yaml`, then render and apply them. Pass an explicit variable list to `envsubst` so it substitutes only the configuration variables and leaves the init container's runtime shell variables (such as `${MY_ID}`, `${SHARD}`, and `${RAFT_PORT}`) intact: ```bash -kubectl apply -n "$NAMESPACE" -f keeper-service.yaml -kubectl apply -n "$NAMESPACE" -f chi.yaml +RENDER_VARS='${NAMESPACE} ${CHI_NAME} ${CLUSTER_NAME} ${CLICKHOUSE_IMAGE} ${STORAGE_CLASS} ${STORAGE_SIZE}' +envsubst "$RENDER_VARS" < keeper-service.yaml | kubectl apply -n "$NAMESPACE" -f - +envsubst "$RENDER_VARS" < chi.yaml | kubectl apply -n "$NAMESPACE" -f - ``` ### 5. Verify the Keeper quorum From 23d8be11898850109f64d6381335c977cc7f9e1d Mon Sep 17 00:00:00 2001 From: zlc Date: Mon, 15 Jun 2026 06:35:39 +0000 Subject: [PATCH 5/7] docs: drop the operator-managed Keeper (CHK) capability note Remove the note about the upstream ClickHouseKeeperInstallation controller not being present; keep only a neutral description of how the operator consumes spec.configuration.zookeeper. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...icated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md b/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md index efce08c1..6d357ce2 100644 --- a/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md +++ b/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md @@ -43,7 +43,7 @@ Three topologies are possible on Kubernetes: | Standalone Keeper | A separate ClickHouse Keeper StatefulSet (3 nodes); the `ClickHouseInstallation` points at `keeper:9181` | Resource isolation between Keeper and ClickHouse, but still two workloads, and the Keeper manifests are managed manually | | Embedded Keeper (this document) | Every ClickHouse Pod also runs a Keeper Raft member; the `ClickHouseInstallation` points at its own Keeper Service | One workload only; Keeper shares Pod resources with ClickHouse and scaling of the two is coupled | -> **Note on operator-managed Keeper:** newer versions of the upstream Altinity operator introduce a `ClickHouseKeeperInstallation` (CHK) custom resource with its own controller. The ClickHouse Operator shipped with this platform version does not include that controller, and the CHK CRD is not installed — `kubectl api-resources --api-group=clickhouse-keeper.altinity.com` returns no resources. Keeper therefore cannot be deployed as an operator-managed CR; use the embedded topology described here (or a standalone StatefulSet) instead. The only Keeper-related field the operator consumes is `spec.configuration.zookeeper`, which works unchanged against Keeper because the client protocol is ZooKeeper-compatible. +The operator consumes a single Keeper-related field, `spec.configuration.zookeeper`, and points ClickHouse at the coordination endpoint through it. This works unchanged against ClickHouse Keeper because the client protocol is ZooKeeper-compatible. How the embedded topology works: From ce1c5818daecde6503bce318aef8751f9deca1ff Mon Sep 17 00:00:00 2001 From: zlc Date: Mon, 15 Jun 2026 07:14:01 +0000 Subject: [PATCH 6/7] docs: clarify shard scaling requires SHARDS_COUNT and note quorum sizing Address review feedback: the init container's raft member list loops over both SHARDS_COUNT and REPLICAS_COUNT, so both must match the layout. Add a note that embedded Keeper suits small clusters and many-shard clusters should use a standalone Keeper quorum. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...ted_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md b/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md index 6d357ce2..aa5ca63d 100644 --- a/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md +++ b/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md @@ -288,8 +288,9 @@ spec: Key points: -- **Quorum and layout.** `shardsCount: 1` and `replicasCount: 3` produce 3 Pods, each a Keeper Raft member. Keep an odd member count; 3 members tolerate one failure. If you add shards, every replica of every shard joins the quorum, and the `server_id` formula `SHARD * REPLICAS_COUNT + REPLICA + 1` stays unique as long as `REPLICAS_COUNT` in the init container matches the real layout. -- **Static vs dynamic Keeper config.** The static part (`tcp_port`, data `path`, coordination settings) is injected per shard through `layout.shards[].files`. The identity-dependent part (`server_id`, `raft_configuration`) is generated by the init container into an in-memory `emptyDir` and merged via `include_from`. Replica changes therefore require only updating `REPLICAS_COUNT`, not editing the static fragment. +- **Quorum and layout.** `shardsCount: 1` and `replicasCount: 3` produce 3 Pods, each a Keeper Raft member. Keep an odd member count; 3 members tolerate one failure. The init container builds the member list by looping over **both** `SHARDS_COUNT` and `REPLICAS_COUNT`, so if you change the layout you must update both env values to match `shardsCount`/`replicasCount` — otherwise the generated `raft_configuration` is truncated. The `server_id` formula `SHARD * REPLICAS_COUNT + REPLICA + 1` stays unique as long as both values match the real layout. +- **Embedded Keeper suits small clusters.** Every shard replica becomes a Keeper voter, so a 2-shard × 3-replica layout means a 6-member quorum. Keeper performs best with a small odd quorum (3 or 5). For a cluster with many shards, run a dedicated 3- or 5-node standalone Keeper and point `spec.configuration.zookeeper.nodes` at it instead of growing the quorum with every data Pod. +- **Static vs dynamic Keeper config.** The static part (`tcp_port`, data `path`, coordination settings) is injected per shard through `layout.shards[].files`. The identity-dependent part (`server_id`, `raft_configuration`) is generated by the init container into an in-memory `emptyDir` and merged via `include_from`. Layout changes therefore require only updating the `SHARDS_COUNT`/`REPLICAS_COUNT` env values, not editing the static fragment. - **Coordination endpoint.** `spec.configuration.zookeeper.nodes` points at the Keeper Service on port 9181. The operator renders this into the server `zookeeper` configuration; ClickHouse does not need to know whether the backend is ZooKeeper or Keeper. - **Anti-affinity.** `ShardAntiAffinity` on `kubernetes.io/hostname` spreads the replicas (and therefore the Keeper members) across nodes, so a single node failure cannot break the quorum. - **Readiness.** The readiness probe checks the Raft port (9444), so a Pod only becomes ready after its Keeper member is up. From a22e48319d774296a9e77df27e243811a30fe623 Mon Sep 17 00:00:00 2001 From: zlc Date: Mon, 15 Jun 2026 07:55:47 +0000 Subject: [PATCH 7/7] docs: move embedded ClickHouse Keeper how-to under ecosystem/clickhouse Match the ecosystem layout (kafka/redis/zookeeper/... each have their own directory) by placing the article under docs/en/solutions/ecosystem/clickhouse/. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...plicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/en/solutions/{ => ecosystem/clickhouse}/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md (100%) diff --git a/docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md b/docs/en/solutions/ecosystem/clickhouse/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md similarity index 100% rename from docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md rename to docs/en/solutions/ecosystem/clickhouse/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md