Skip to content

docs: How to deploy a replicated ClickHouse cluster with embedded ClickHouse Keeper#790

Open
zlcnju wants to merge 7 commits into
mainfrom
ck-embedded-keeper-cluster
Open

docs: How to deploy a replicated ClickHouse cluster with embedded ClickHouse Keeper#790
zlcnju wants to merge 7 commits into
mainfrom
ck-embedded-keeper-cluster

Conversation

@zlcnju

@zlcnju zlcnju commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a How-To article: How to Deploy a Replicated ClickHouse Cluster with Embedded ClickHouse Keeper (docs/en/solutions/).

  • Coordination topology comparison: external ZooKeeper vs standalone Keeper vs embedded Keeper, with the embedded topology recommended for log storage scenarios.
  • Documents that the ClickHouse Operator shipped with the platform (Altinity 0.20.x base) does not include the upstream ClickHouseKeeperInstallation (CHK) controller — the CHK CRD is not installed — so Keeper must run embedded in the ClickHouse Pods or as a standalone StatefulSet.
  • Full ClickHouseInstallation manifest: 1 shard × 3 replicas forming a 3-member Keeper Raft quorum, static keeper_config.xml injected per shard, dynamic server_id/raft_configuration generated by an init container via include_from, anti-affinity, raft-port readiness probe, per-replica Service exposing 9181/9444.
  • Verification steps: mntr four-letter command, system.zookeeper, ReplicatedMergeTree smoke test across replicas, system.replicas health.
  • Recommended settings for log storage workloads (system-table TTLs, max_execution_time, retention TTLs) and the trade-offs/constraints of the embedded topology.

Validation

Validated on an ACP 4.2 environment (ClickHouse Operator v4.2.3 / chop 0.20.0, ClickHouse Server 25.x):

  • Confirmed the CHK API group returns 404 and only CHI/CHIT/CHOP CRDs are installed.
  • Manifest structure derived from the log-storage ClickHouse chart in production use, genericized with placeholders.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Added comprehensive guide for deploying replicated ClickHouse clusters with embedded ClickHouse Keeper, covering coordination topologies, environment configuration, service setup, health verification procedures, replication testing, and operational best practices for production deployments.

…kHouse Keeper

Covers coordination topology choice (external ZooKeeper / standalone
Keeper / embedded Keeper), a full ClickHouseInstallation manifest with
init-container-generated raft configuration, quorum and replication
verification, and recommended settings for log storage workloads.

Notes that the platform ClickHouse Operator (0.20.x) does not include
the upstream ClickHouseKeeperInstallation controller, so embedded or
standalone Keeper is required.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cf8fc8f1-c7cb-47aa-a36c-a2c6aa75b8a7

📥 Commits

Reviewing files that changed from the base of the PR and between 23d8be1 and a22e483.

📒 Files selected for processing (1)
  • docs/en/solutions/ecosystem/clickhouse/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md

Walkthrough

A new Markdown guide is added that documents deploying a replicated ClickHouse cluster using ClickHouse Keeper embedded in each ClickHouse Pod via the ClickHouse Operator. The guide covers topology selection, prerequisite environment variables, a headless Service manifest, a full ClickHouseInstallation YAML with init-container-generated Keeper configuration, and verification and operational guidance.

Embedded Keeper Deployment Documentation

Layer / File(s) Summary
Introduction, prerequisites, and Keeper client Service
docs/en/solutions/ecosystem/clickhouse/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md
Front matter, guide purpose, coordination topology comparison (ZooKeeper vs standalone vs embedded Keeper), required environment variables, and the headless Service manifest exposing Keeper port 9181.
ClickHouseInstallation manifest and deployment
docs/en/solutions/ecosystem/clickhouse/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md
Full ClickHouseInstallation YAML for 1-shard × 3-replica embedded Keeper: shard-level static Keeper config files, init container generating server_id and raft_configuration from Pod hostname, readiness probes on the raft port, service templates, volumeClaimTemplates, key deployment points, and envsubst rendering instructions.
Keeper quorum verification and replication smoke test
docs/en/solutions/ecosystem/clickhouse/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md
Keeper raft leader/follower status via mntr, system.zookeeper_connection connectivity check, ReplicatedMergeTree table creation with ON CLUSTER, insert/read validation across replicas, system.replicas health checks, and cleanup commands.
Recommended settings and embedded topology constraints
docs/en/solutions/ecosystem/clickhouse/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md
Log-storage-oriented settings (TTL, execution time, Keeper read permissions, parallel replicas), resource coupling and restart behavior constraints, and guidance for switching to standalone Keeper or ZooKeeper for independent scaling.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 Hop hop, the Keeper joins the Pod so near,
No separate ZooKeeper needed, have no fear!
Three replicas raft together in a row,
Init containers whisper: "Here's your server_id, go!"
The tables replicate, the quorum stands tall —
One embedded guide to coordinate them all! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main change: adding documentation for deploying a replicated ClickHouse cluster with embedded ClickHouse Keeper, matching the PR objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ck-embedded-keeper-cluster

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md`:
- Around line 286-287: Update the paragraph to explicitly state that scaling
shards requires updating both SHARDS_COUNT and REPLICAS_COUNT because the Raft
config generation loop uses SHARDS_COUNT as well as REPLICAS_COUNT; mention the
server_id formula `SHARD * REPLICAS_COUNT + REPLICA + 1`, the init container
that writes into the in-memory emptyDir and merges via `include_from`, and that
static fragments under `layout.shards[].files` remain unchanged—so when adding
shards you must increase SHARDS_COUNT (not just REPLICAS_COUNT) to avoid
truncating the generated member list.
- Around line 212-241: The init script uses DOMAIN=$(hostname -d) which returns
only the DNS search suffix, breaking the regex that expects the full pod FQDN;
change DOMAIN assignment to use the full hostname (DOMAIN=$(hostname -f) or
`hostname --fqdn`) so the existing regex that extracts DOMAIN_NAME and
DOMAIN_SUFFIX from the full FQDN works correctly; keep the current regex and
downstream variables (DOMAIN_NAME, DOMAIN_SUFFIX, MY_ID, KEEPER_ID) unchanged so
the generated Raft peer hostnames are correct.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6cef2b9e-dc4b-42d4-bc22-4ceea6233807

📥 Commits

Reviewing files that changed from the base of the PR and between 4d9cdd9 and f27dc33.

📒 Files selected for processing (1)
  • docs/en/solutions/How_to_Deploy_a_Replicated_ClickHouse_Cluster_with_Embedded_ClickHouse_Keeper.md

…chop number

Use the platform clickhouse-operator component version (v4.2.3 on ACP 4.2)
in the Environment and operator-managed-Keeper note, since the upstream
Altinity 0.20.x base version is not meaningful to ACP users.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…m.zookeeper_connection)

Lead the quorum verification with the dedicated clickhouse-keeper-client
CLI, clarify that mntr is ClickHouse Keeper's own 4lw command (not an
external ZooKeeper), and switch the connection check to
system.zookeeper_connection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Parameterize the image, StorageClass, and PVC size with
  ${CLICKHOUSE_IMAGE}/${STORAGE_CLASS}/${STORAGE_SIZE} instead of bare
  <...> placeholders, matching the existing ${NAMESPACE}/${CHI_NAME} style.
- List all required environment variables up front in Prerequisites.
- Render manifests with envsubst using an explicit variable allow-list so
  the init container's runtime shell variables are preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove the note about the upstream ClickHouseKeeperInstallation controller
not being present; keep only a neutral description of how the operator
consumes spec.configuration.zookeeper.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address review feedback: the init container's raft member list loops over
both SHARDS_COUNT and REPLICAS_COUNT, so both must match the layout. Add a
note that embedded Keeper suits small clusters and many-shard clusters
should use a standalone Keeper quorum.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Match the ecosystem layout (kafka/redis/zookeeper/... each have their own
directory) by placing the article under docs/en/solutions/ecosystem/clickhouse/.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@zlcnju zlcnju deployed to translate June 15, 2026 07:55 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant