From df21c126d6034ef1034f153f7934af570d5964cc Mon Sep 17 00:00:00 2001 From: Dianjin Wang Date: Wed, 20 May 2026 12:33:39 +0800 Subject: [PATCH 1/7] DevOps: add Ansible playbook for cluster deployment Introduces an Ansible-based automation for deploying Apache Cloudberry on bare-metal or virtual machines, covering the full deployment workflow from OS configuration to cluster initialization. Key features: - Automates all pre-deployment OS tuning: SELinux, firewall, sysctl, PAM limits, THP, IPC, SSH thresholds, and chronyd - Dynamically calculates memory-dependent sysctl parameters per host: kernel.shmall/shmmax (via PAGE_SIZE and _PHYS_PAGES), vm.overcommit_ratio (via gp_vmem formula), vm.dirty_* (bytes mode for >64GB, ratio mode for <=64GB), and vm.min_free_kbytes (3% of MemTotal) - Installs RPM/DEB packages and configures N-N passwordless SSH via gpssh-exkeys - Initializes the cluster with gpinitsystem including standby coordinator (gpinitstandby with sync verification) - Supports variable cluster sizes via inventory/hosts Files added: - ansible/site.yml main playbook - ansible/group_vars/all.yml deployment variables with inline documentation - ansible/inventory/hosts sample inventory for a 5-node cluster - ansible/ansible.cfg disables host key checking for first-run - ansible/README.md usage guide and prerequisites Assisted-by: Claude Code --- devops/deploy/ansible/README.md | 103 ++++ devops/deploy/ansible/ansible.cfg | 19 + devops/deploy/ansible/group_vars/all.yml | 76 +++ devops/deploy/ansible/inventory/hosts | 37 ++ devops/deploy/ansible/site.yml | 570 +++++++++++++++++++++++ 5 files changed, 805 insertions(+) create mode 100644 devops/deploy/ansible/README.md create mode 100644 devops/deploy/ansible/ansible.cfg create mode 100644 devops/deploy/ansible/group_vars/all.yml create mode 100644 devops/deploy/ansible/inventory/hosts create mode 100644 devops/deploy/ansible/site.yml diff --git a/devops/deploy/ansible/README.md b/devops/deploy/ansible/README.md new file mode 100644 index 00000000000..44423c5d75f --- /dev/null +++ b/devops/deploy/ansible/README.md @@ -0,0 +1,103 @@ + + +# Apache Cloudberry Deployment Via Ansible + +This directory contains an Ansible playbook for deploying Apache Cloudberry on physical or virtual machines via Ansible. + +## Quick Start + +```bash +# 1. Edit inventory and variables +vi ansible/inventory/hosts # set hostnames and IPs +vi ansible/group_vars/all.yml # set password, disk, segments, etc. + +# 2. Run the playbook +ansible-playbook ansible/site.yml -i ansible/inventory/hosts \ + -e package_path=./apache-cloudberry-db-incubating-2.1.0-1.el9.x86_64.rpm +``` + +## Cluster Layout (default) + +| Host | Role | +|------|------| +| cdw | Coordinator | +| scdw | Standby Coordinator | +| sdw1 | Segment Host 1 | +| sdw2 | Segment Host 2 | +| sdw3 | Segment Host 3 | + +Each segment host runs 2 primary segments and 2 mirror segments (spread mirroring). + +## Prerequisites + +- Ansible installed on the control machine (tested with ansible-core 2.14+) +- Root SSH access from the control machine to all hosts +- All hosts have Rocky Linux 8/9 or compatible OS installed +- Apache Cloudberry RPM/DEB package downloaded to the control machine + +Ansible 2.10+ requires the following collections to be installed separately: + +```bash +ansible-galaxy collection install ansible.posix community.general community.crypto +``` + +To suppress the Python `crypt` module deprecation warning, install `passlib`: + +```bash +pip3 install passlib +``` + +## Directory Structure + +``` +ansible/ +├── ansible.cfg # disable host key checking +├── site.yml # main playbook +├── inventory/ +│ └── hosts # hostnames and IPs +└── group_vars/ + └── all.yml # deployment variables +``` + +## What the Playbook Does + +1. Disable SELinux and firewall +2. Configure hostnames and `/etc/hosts` +3. Set kernel parameters (`sysctl`) +4. Set resource limits (`limits.conf`) +5. Configure XFS mount and disk I/O settings +6. Disable Transparent Huge Pages +7. Disable IPC object removal +8. Configure SSH thresholds +9. Synchronize system clocks (chronyd) +10. Create `gpadmin` user with sudo +11. Install Apache Cloudberry package on all hosts +12. Configure passwordless SSH for gpadmin (N-N) +13. Create data storage directories +14. Initialize the cluster with `gpinitsystem` +15. Set environment variables in `.bashrc` + +## After Deployment + +```bash +su - gpadmin +psql -d warehouse # connect to the database +gpstate -s # check cluster status +``` diff --git a/devops/deploy/ansible/ansible.cfg b/devops/deploy/ansible/ansible.cfg new file mode 100644 index 00000000000..bbc14a5f7f7 --- /dev/null +++ b/devops/deploy/ansible/ansible.cfg @@ -0,0 +1,19 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +[defaults] +host_key_checking = False diff --git a/devops/deploy/ansible/group_vars/all.yml b/devops/deploy/ansible/group_vars/all.yml new file mode 100644 index 00000000000..188f2a9cc3a --- /dev/null +++ b/devops/deploy/ansible/group_vars/all.yml @@ -0,0 +1,76 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# Apache Cloudberry Deployment Variables +# +# Note: kernel.shmall, kernel.shmmax, vm.overcommit_ratio, vm.min_free_kbytes, +# and vm.dirty_* parameters are NOT set here. They are dynamically calculated +# at deploy time based on each host's actual RAM and swap size. +# See the "Calculate dynamic sysctl values" tasks in site.yml. + +# Cloudberry version +cloudberry_version: "2.1.0" + +# Admin user +cloudberry_admin_user: "gpadmin" +cloudberry_admin_password: "changeme" + +# Package path (override via -e package_path=... on the command line) +# Example: -e package_path=/root/apache-cloudberry-db-incubating-2.1.0-1.el9.x86_64.rpm +# package_path: /home/gpadmin/apache-cloudberry-db-incubating-2.1.0-1.el9.x86_64.rpm + +# Data disk device and mount point. +# Must be set manually before running the playbook. +# Run 'lsblk' on each host to identify the correct device name. +# +# Common device names by environment: +# /dev/sdb — physical servers, VMware +# /dev/vdb — KVM / OpenStack / Cloud ECS +# /dev/nvme0n1 — NVMe SSD (physical or cloud) +# /dev/xvdb — AWS EC2 (older instance types) +# +# If you are using a cloud VM with only a single system disk (no dedicated +# data disk), leave data_disk empty and create the data directory manually: +# mkdir -p /data && chown -R gpadmin:gpadmin /data +# The XFS formatting and mount steps in site.yml will be skipped automatically. +data_disk: "/dev/sdb" +data_mount: "/data" + +# Data directories +# coordinator_data_dir is used on both cdw and scdw. +# primary_data_dir and mirror_data_dir are used on segment hosts. +coordinator_data_dir: "/data/coordinator" +primary_data_dir: "/data/primary" +mirror_data_dir: "/data/mirror" + +# Number of primary segment instances per segment host. +# Mirror instances are created 1:1 with primaries. +# Recommended: set to the number of CPU cores / 4, or 2 for test environments. +segments_per_host: 2 + +# Ports +# Ensure these ranges do not overlap with net.ipv4.ip_local_port_range (10000-65535). +coordinator_port: 5432 +port_base: 6000 +mirror_port_base: 7000 + +# Default database created during gpinitsystem +database_name: "warehouse" + +# Coordinator and standby hostnames (must match inventory/hosts and /etc/hosts) +coordinator_hostname: "cdw" +standby_hostname: "scdw" diff --git a/devops/deploy/ansible/inventory/hosts b/devops/deploy/ansible/inventory/hosts new file mode 100644 index 00000000000..b1d4848208e --- /dev/null +++ b/devops/deploy/ansible/inventory/hosts @@ -0,0 +1,37 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +[coordinator] +cdw ansible_host=192.168.1.x + +[standby] +scdw ansible_host=192.168.1.x + +[segments] +sdw1 ansible_host=192.168.1.x +sdw2 ansible_host=192.168.1.x +sdw3 ansible_host=192.168.1.x + +[cloudberry:children] +coordinator +standby +segments + +[cloudberry:vars] +ansible_user=root +ansible_password=your_root_password_here +ansible_ssh_common_args='-o StrictHostKeyChecking=no' diff --git a/devops/deploy/ansible/site.yml b/devops/deploy/ansible/site.yml new file mode 100644 index 00000000000..cbf08100bf3 --- /dev/null +++ b/devops/deploy/ansible/site.yml @@ -0,0 +1,570 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +# Apache Cloudberry - Bare Metal Deployment Playbook +# +# Usage: +# ansible-playbook site.yml -i inventory/hosts \ +# -e package_path=./apache-cloudberry-db-incubating-2.1.0.el8.x86_64.rpm + +--- +- name: Configure all hosts + hosts: cloudberry + become: yes + tasks: + + # --- SELinux --- + - name: Disable SELinux + selinux: + state: disabled + when: ansible_os_family == "RedHat" + + # --- Firewall --- + - name: Stop and disable firewalld + systemd: + name: firewalld + state: stopped + enabled: no + ignore_errors: yes + + # --- Hosts file --- + - name: Set hostname + hostname: + name: "{{ inventory_hostname }}" + + - name: Add cluster hosts to /etc/hosts + lineinfile: + path: /etc/hosts + line: "{{ hostvars[item].ansible_host }} {{ item }}" + state: present + loop: "{{ groups['cloudberry'] }}" + + # --- Kernel parameters --- + # Dynamically calculate memory-dependent sysctl values based on each host's + # actual physical memory and swap, following the official documentation formulas: + # kernel.shmall = _PHYS_PAGES / 2 + # kernel.shmmax = (_PHYS_PAGES / 2) * PAGE_SIZE + # vm.overcommit_ratio = (RAM - 0.026 * gp_vmem) / RAM * 100 + # vm.dirty_* uses bytes mode for >64GB RAM, ratio mode for <=64GB RAM + # vm.min_free_kbytes = MemTotal * 3% + + - name: Get system PAGE_SIZE + command: getconf PAGE_SIZE + register: _page_size_result + changed_when: false + + - name: Calculate dynamic sysctl values + set_fact: + _ram_mb: "{{ ansible_memtotal_mb }}" + _swap_mb: "{{ ansible_swaptotal_mb }}" + _page_size: "{{ _page_size_result.stdout | int }}" + + - name: Calculate kernel.shmall and kernel.shmmax + set_fact: + _shmall: "{{ ((_ram_mb | int * 1024 / (_page_size | int)) / 2) | int }}" + _shmmax: "{{ ((_ram_mb | int * 1024 / (_page_size | int)) / 2 * (_page_size | int)) | int }}" + + - name: Calculate vm.overcommit_ratio + set_fact: + _ram_gb: "{{ (_ram_mb | int / 1024) | float }}" + _swap_gb: "{{ (_swap_mb | int / 1024) | float }}" + + - name: Calculate gp_vmem and overcommit_ratio + set_fact: + _gp_vmem: >- + {{ ((_swap_gb | float + _ram_gb | float) - (7.5 + 0.05 * (_ram_gb | float))) + / ((_ram_gb | float >= 256) | ternary(1.17, 1.7)) }} + + - name: Calculate final overcommit_ratio + set_fact: + _overcommit_ratio: "{{ ((_ram_gb | float - 0.026 * (_gp_vmem | float)) / (_ram_gb | float) * 100) | int }}" + + - name: Calculate vm.min_free_kbytes (3% of MemTotal in kB) + set_fact: + _min_free_kbytes: "{{ (_ram_mb | int * 1024 * 0.03) | round | int }}" + + - name: Set sysctl parameters (fixed values) + sysctl: + name: "{{ item.key }}" + value: "{{ item.value }}" + sysctl_set: yes + reload: yes + loop: "{{ sysctl_fixed | dict2items }}" + vars: + sysctl_fixed: + kernel.shmmni: "4096" + vm.overcommit_memory: "2" + net.ipv4.ip_local_port_range: "10000 65535" + kernel.sem: "250 2048000 200 8192" + kernel.sysrq: "1" + kernel.core_uses_pid: "1" + kernel.msgmnb: "65536" + kernel.msgmax: "65536" + kernel.msgmni: "2048" + net.ipv4.tcp_syncookies: "1" + net.ipv4.conf.default.accept_source_route: "0" + net.ipv4.tcp_max_syn_backlog: "4096" + net.ipv4.conf.all.arp_filter: "1" + net.ipv4.ipfrag_high_thresh: "41943040" + net.ipv4.ipfrag_low_thresh: "31457280" + net.ipv4.ipfrag_time: "60" + net.core.netdev_max_backlog: "10000" + net.core.rmem_max: "2097152" + net.core.wmem_max: "2097152" + vm.swappiness: "10" + vm.zone_reclaim_mode: "0" + vm.dirty_expire_centisecs: "500" + vm.dirty_writeback_centisecs: "100" + kernel.core_pattern: "/var/core/core.%h.%t" + + - name: Set sysctl parameters (memory-dependent) + sysctl: + name: "{{ item.key }}" + value: "{{ item.value }}" + sysctl_set: yes + reload: yes + loop: + - { key: "kernel.shmall", value: "{{ _shmall }}" } + - { key: "kernel.shmmax", value: "{{ _shmmax }}" } + - { key: "vm.overcommit_ratio", value: "{{ _overcommit_ratio }}" } + - { key: "vm.min_free_kbytes", value: "{{ _min_free_kbytes }}" } + + - name: Set sysctl dirty parameters for hosts with more than 64GB RAM (bytes mode) + sysctl: + name: "{{ item.key }}" + value: "{{ item.value }}" + sysctl_set: yes + reload: yes + loop: + - { key: "vm.dirty_background_ratio", value: "0" } + - { key: "vm.dirty_ratio", value: "0" } + - { key: "vm.dirty_background_bytes", value: "1610612736" } + - { key: "vm.dirty_bytes", value: "4294967296" } + when: _ram_mb | int > 65536 + + - name: Set sysctl dirty parameters for hosts with 64GB RAM or less (ratio mode) + sysctl: + name: "{{ item.key }}" + value: "{{ item.value }}" + sysctl_set: yes + reload: yes + loop: + - { key: "vm.dirty_background_ratio", value: "3" } + - { key: "vm.dirty_ratio", value: "10" } + - { key: "vm.dirty_background_bytes", value: "0" } + - { key: "vm.dirty_bytes", value: "0" } + when: _ram_mb | int <= 65536 + + - name: Create core dump directory + file: + path: /var/core + state: directory + mode: "1777" + + # --- Resource limits --- + - name: Set PAM limits + pam_limits: + domain: "*" + limit_type: "{{ item.type }}" + limit_item: "{{ item.item }}" + value: "{{ item.value }}" + loop: + - { type: soft, item: nofile, value: "524288" } + - { type: hard, item: nofile, value: "524288" } + - { type: soft, item: nproc, value: "131072" } + - { type: hard, item: nproc, value: "131072" } + - { type: soft, item: core, value: unlimited } + + # --- XFS mount --- + - name: Create XFS filesystem on data disk + filesystem: + fstype: xfs + dev: "{{ data_disk }}" + force: no + ignore_errors: yes + + - name: Mount data disk + mount: + path: "{{ data_mount }}" + src: "{{ data_disk }}" + fstype: xfs + opts: rw,nodev,noatime,inode64 + state: mounted + + # --- Disk I/O --- + - name: Set blockdev read-ahead + command: "/sbin/blockdev --setra 16384 {{ data_disk }}" + changed_when: false + + - name: Set I/O scheduler via grubby + command: grubby --update-kernel=ALL --args="elevator=mq-deadline" + ignore_errors: yes + changed_when: false + + # --- THP --- + - name: Disable Transparent Huge Pages via grubby + command: grubby --update-kernel=ALL --args="transparent_hugepage=never" + ignore_errors: yes + changed_when: false + + # --- IPC --- + - name: Disable IPC object removal + lineinfile: + path: /etc/systemd/logind.conf + regexp: "^#?RemoveIPC=" + line: "RemoveIPC=no" + notify: restart systemd-logind + + # --- SSH threshold --- + - name: Set SSH MaxStartups + lineinfile: + path: /etc/ssh/sshd_config + regexp: "^#?MaxStartups" + line: "MaxStartups 10:30:200" + notify: restart sshd + + - name: Set SSH MaxSessions + lineinfile: + path: /etc/ssh/sshd_config + regexp: "^#?MaxSessions" + line: "MaxSessions 200" + notify: restart sshd + + # --- Clock sync --- + - name: Enable and start chronyd + systemd: + name: chronyd + state: started + enabled: yes + + # --- gpadmin user --- + - name: Create gpadmin group + group: + name: "{{ cloudberry_admin_user }}" + state: present + + - name: Create gpadmin user + user: + name: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + system: yes + create_home: yes + password: "{{ cloudberry_admin_password | password_hash('sha512') }}" + + - name: Add gpadmin to wheel group + user: + name: "{{ cloudberry_admin_user }}" + groups: wheel + append: yes + + - name: Ensure wheel group has NOPASSWD sudo + lineinfile: + path: /etc/sudoers + regexp: "^%wheel" + line: "%wheel ALL=(ALL) NOPASSWD: ALL" + validate: "visudo -cf %s" + + - name: Set data directory ownership + file: + path: "{{ data_mount }}" + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + recurse: yes + + handlers: + - name: restart systemd-logind + service: + name: systemd-logind + state: restarted + + - name: restart sshd + service: + name: sshd + state: restarted + +# --- Install Apache Cloudberry --- +- name: Install Apache Cloudberry package + hosts: cloudberry + become: yes + tasks: + - name: Copy package to host + copy: + src: "{{ package_path }}" + dest: "/tmp/{{ package_path | basename }}" + + - name: Install package (RPM) + yum: + name: "/tmp/{{ package_path | basename }}" + state: present + disable_gpg_check: yes + when: ansible_os_family == "RedHat" + + - name: Install package (DEB) + apt: + deb: "/tmp/{{ package_path | basename }}" + state: present + when: ansible_os_family == "Debian" + + - name: Cleanup package file + file: + path: "/tmp/{{ package_path | basename }}" + state: absent + + - name: Set installation directory ownership + shell: chown -R {{ cloudberry_admin_user }}:{{ cloudberry_admin_user }} /usr/local/cloudberry* + changed_when: false + +# --- Configure SSH and initialize (coordinator only) --- +- name: Configure passwordless SSH and initialize cluster + hosts: coordinator + become: yes + become_user: "{{ cloudberry_admin_user }}" + vars: + cloudberry_admin_user: "gpadmin" + tasks: + - name: Create .ssh directory for gpadmin + file: + path: "/home/{{ cloudberry_admin_user }}/.ssh" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + mode: "0700" + + - name: Generate SSH key for gpadmin + openssh_keypair: + path: "/home/{{ cloudberry_admin_user }}/.ssh/id_rsa" + type: rsa + size: 4096 + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + + - name: Fetch gpadmin public key from coordinator + slurp: + src: "/home/{{ cloudberry_admin_user }}/.ssh/id_rsa.pub" + register: gpadmin_pubkey + + - name: Distribute SSH public key to all hosts + authorized_key: + user: "{{ cloudberry_admin_user }}" + key: "{{ gpadmin_pubkey.content | b64decode }}" + delegate_to: "{{ item }}" + loop: "{{ groups['cloudberry'] }}" + + - name: Create hostfile_exkeys + copy: + content: "{{ groups['cloudberry'] | join('\n') }}\n" + dest: "/home/{{ cloudberry_admin_user }}/hostfile_exkeys" + + - name: Create hostfile_gpinitsystem + copy: + content: "{{ groups['segments'] | join('\n') }}\n" + dest: "/home/{{ cloudberry_admin_user }}/hostfile_gpinitsystem" + + - name: Scan and trust host keys for all cluster nodes + shell: | + ssh-keyscan -H {{ groups['cloudberry'] | join(' ') }} >> /home/{{ cloudberry_admin_user }}/.ssh/known_hosts + chown {{ cloudberry_admin_user }}:{{ cloudberry_admin_user }} /home/{{ cloudberry_admin_user }}/.ssh/known_hosts + chmod 600 /home/{{ cloudberry_admin_user }}/.ssh/known_hosts + args: + executable: /bin/bash + + - name: Run gpssh-exkeys + shell: | + source /usr/local/cloudberry-db/cloudberry-env.sh + gpssh-exkeys -f /home/{{ cloudberry_admin_user }}/hostfile_exkeys + args: + executable: /bin/bash + + - name: Create data root directory as root + file: + path: "{{ coordinator_data_dir | dirname }}" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + mode: "0755" + become: yes + become_user: root + + - name: Create coordinator data directory + file: + path: "{{ coordinator_data_dir }}" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + +- name: Create data directories on standby + hosts: standby + become: yes + tasks: + - name: Create data root directory on standby + file: + path: "{{ coordinator_data_dir | dirname }}" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + mode: "0755" + + - name: Create coordinator data directory on standby + file: + path: "{{ coordinator_data_dir }}" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + + - name: Create .ssh directory for gpadmin on standby + file: + path: "/home/{{ cloudberry_admin_user }}/.ssh" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + mode: "0700" + +- name: Create data directories on segments + hosts: segments + become: yes + tasks: + - name: Create data root directory on segments + file: + path: "{{ primary_data_dir | dirname }}" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + mode: "0755" + + - name: Create primary data directory + file: + path: "{{ primary_data_dir }}" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + + - name: Create mirror data directory + file: + path: "{{ mirror_data_dir }}" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + +- name: Initialize Apache Cloudberry cluster + hosts: coordinator + become: yes + become_user: "{{ cloudberry_admin_user }}" + vars: + cloudberry_admin_user: "gpadmin" + tasks: + - name: Create gpconfigs directory + file: + path: "/home/{{ cloudberry_admin_user }}/gpconfigs" + state: directory + + - name: Create gpinitsystem config + copy: + content: | + SEG_PREFIX=gpseg + PORT_BASE={{ port_base }} + declare -a DATA_DIRECTORY=({% for i in range(segments_per_host) %}{{ primary_data_dir }} {% endfor %}) + COORDINATOR_HOSTNAME={{ coordinator_hostname }} + COORDINATOR_DIRECTORY={{ coordinator_data_dir }} + COORDINATOR_PORT={{ coordinator_port }} + TRUSTED_SHELL=ssh + CHECK_POINT_SEGMENTS=8 + ENCODING=UNICODE + MIRROR_PORT_BASE={{ mirror_port_base }} + declare -a MIRROR_DATA_DIRECTORY=({% for i in range(segments_per_host) %}{{ mirror_data_dir }} {% endfor %}) + DATABASE_NAME={{ database_name }} + dest: "/home/{{ cloudberry_admin_user }}/gpconfigs/gpinitsystem_config" + + - name: Run gpinitsystem + shell: | + source /usr/local/cloudberry-db/cloudberry-env.sh + yes | gpinitsystem -c /home/{{ cloudberry_admin_user }}/gpconfigs/gpinitsystem_config \ + -h /home/{{ cloudberry_admin_user }}/hostfile_gpinitsystem \ + -s {{ standby_hostname }} --mirror-mode=spread + args: + executable: /bin/bash + register: gpinitsystem_result + failed_when: "'successfully created' not in gpinitsystem_result.stdout" + + - name: Set environment variables in .bashrc + blockinfile: + path: "/home/{{ cloudberry_admin_user }}/.bashrc" + block: | + source /usr/local/cloudberry-db/cloudberry-env.sh + export COORDINATOR_DATA_DIRECTORY={{ coordinator_data_dir }}/gpseg-1 + export PGPORT={{ coordinator_port }} + export PGUSER={{ cloudberry_admin_user }} + export PGDATABASE={{ database_name }} + marker: "# {mark} CLOUDBERRY ENVIRONMENT" + + - name: Sync .bashrc to standby coordinator + shell: | + scp /home/{{ cloudberry_admin_user }}/.bashrc \ + {{ standby_hostname }}:/home/{{ cloudberry_admin_user }}/.bashrc + args: + executable: /bin/bash + + - name: Verify standby coordinator is synchronized + shell: | + source /usr/local/cloudberry-db/cloudberry-env.sh + source /home/{{ cloudberry_admin_user }}/.bashrc + gpstate -f + args: + executable: /bin/bash + register: gpstate_result + changed_when: false + failed_when: false + + - name: Initialize standby coordinator if not already active + shell: | + source /usr/local/cloudberry-db/cloudberry-env.sh + source /home/{{ cloudberry_admin_user }}/.bashrc + if ! gpstate -f 2>&1 | grep -q "Sync state: sync"; then + yes | gpinitstandby -s {{ standby_hostname }} + else + echo "Standby already synchronized, skipping gpinitstandby" + fi + args: + executable: /bin/bash + register: gpinitstandby_result + changed_when: "'skipping' not in gpinitstandby_result.stdout" + + - name: Display success message + debug: + msg: + - "==========================================" + - " Apache Cloudberry cluster initialized successfully!" + - "==========================================" + - " Connect to the database:" + - " su - gpadmin" + - " psql -d {{ database_name }}" + - "==========================================" + + - name: Check postgres --gp-version + command: /usr/local/cloudberry-db/bin/postgres --gp-version + register: gp_version + changed_when: false + + - name: Check postgres --version + command: /usr/local/cloudberry-db/bin/postgres --version + register: pg_version + changed_when: false + + - name: Show version info + debug: + msg: + - "{{ gp_version.stdout }}" + - "{{ pg_version.stdout }}" From daa232d9938ef7e11a7e6ed358c89f75575b24af Mon Sep 17 00:00:00 2001 From: Dianjin Wang Date: Thu, 21 May 2026 18:42:29 +0800 Subject: [PATCH 2/7] update configure <64GB memory --- devops/deploy/ansible/site.yml | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/devops/deploy/ansible/site.yml b/devops/deploy/ansible/site.yml index cbf08100bf3..094e23bac80 100644 --- a/devops/deploy/ansible/site.yml +++ b/devops/deploy/ansible/site.yml @@ -165,9 +165,10 @@ loop: - { key: "vm.dirty_background_ratio", value: "3" } - { key: "vm.dirty_ratio", value: "10" } - - { key: "vm.dirty_background_bytes", value: "0" } - - { key: "vm.dirty_bytes", value: "0" } when: _ram_mb | int <= 65536 + # Per official docs: for systems with 64GB RAM or less, only set the ratio + # parameters. vm.dirty_background_bytes and vm.dirty_bytes should be omitted + # entirely, not set to 0. - name: Create core dump directory file: From 0742f28092f72234ccd34c6117a760db44c9f10d Mon Sep 17 00:00:00 2001 From: Dianjin Wang Date: Fri, 22 May 2026 12:14:03 +0800 Subject: [PATCH 3/7] Add Ubuntu support --- devops/deploy/ansible/site.yml | 127 +++++++++++++++++++++++++++++---- 1 file changed, 114 insertions(+), 13 deletions(-) diff --git a/devops/deploy/ansible/site.yml b/devops/deploy/ansible/site.yml index 094e23bac80..1937733bbab 100644 --- a/devops/deploy/ansible/site.yml +++ b/devops/deploy/ansible/site.yml @@ -28,19 +28,54 @@ tasks: # --- SELinux --- - - name: Disable SELinux + # On RHEL/Rocky Linux, SELinux must be disabled. + # On Ubuntu, SELinux is not installed by default and this task is skipped. + # If SELinux is installed on Ubuntu, it will also be disabled. + - name: Disable SELinux (RHEL/Rocky Linux) selinux: state: disabled when: ansible_os_family == "RedHat" + - name: Disable SELinux if installed (Ubuntu) + shell: | + if dpkg -l selinux-basics &>/dev/null 2>&1; then + setenforce 0 2>/dev/null || true + sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config 2>/dev/null || true + echo "SELinux disabled" + else + echo "SELinux not installed, skipping" + fi + args: + executable: /bin/bash + when: ansible_os_family == "Debian" + register: selinux_ubuntu_result + changed_when: "'SELinux disabled' in selinux_ubuntu_result.stdout" + # --- Firewall --- - - name: Stop and disable firewalld + # RHEL/Rocky Linux: stop and disable firewalld + # Ubuntu: ufw is disabled by default; disable it if active + - name: Stop and disable firewalld (RHEL/Rocky Linux) systemd: name: firewalld state: stopped enabled: no + when: ansible_os_family == "RedHat" ignore_errors: yes + - name: Disable ufw if active (Ubuntu) + shell: | + if ufw status | grep -q "Status: active"; then + ufw disable + echo "ufw disabled" + else + echo "ufw already inactive, skipping" + fi + args: + executable: /bin/bash + when: ansible_os_family == "Debian" + register: ufw_result + changed_when: "'ufw disabled' in ufw_result.stdout" + # --- Hosts file --- - name: Set hostname hostname: @@ -211,16 +246,48 @@ command: "/sbin/blockdev --setra 16384 {{ data_disk }}" changed_when: false - - name: Set I/O scheduler via grubby + - name: Set I/O scheduler via grubby (RHEL/Rocky Linux) command: grubby --update-kernel=ALL --args="elevator=mq-deadline" ignore_errors: yes changed_when: false + when: ansible_os_family == "RedHat" + + - name: Set I/O scheduler via grub (Ubuntu) + shell: | + if grep -q "elevator=mq-deadline" /etc/default/grub; then + echo "elevator already set, skipping" + else + sed -i 's/GRUB_CMDLINE_LINUX="\(.*\)"/GRUB_CMDLINE_LINUX="\1 elevator=mq-deadline"/' /etc/default/grub + update-grub + echo "elevator set" + fi + args: + executable: /bin/bash + when: ansible_os_family == "Debian" + register: grub_elevator_result + changed_when: "'elevator set' in grub_elevator_result.stdout" # --- THP --- - - name: Disable Transparent Huge Pages via grubby + - name: Disable Transparent Huge Pages via grubby (RHEL/Rocky Linux) command: grubby --update-kernel=ALL --args="transparent_hugepage=never" ignore_errors: yes changed_when: false + when: ansible_os_family == "RedHat" + + - name: Disable Transparent Huge Pages via grub (Ubuntu) + shell: | + if grep -q "transparent_hugepage=never" /etc/default/grub; then + echo "THP already disabled, skipping" + else + sed -i 's/GRUB_CMDLINE_LINUX="\(.*\)"/GRUB_CMDLINE_LINUX="\1 transparent_hugepage=never"/' /etc/default/grub + update-grub + echo "THP disabled" + fi + args: + executable: /bin/bash + when: ansible_os_family == "Debian" + register: grub_thp_result + changed_when: "'THP disabled' in grub_thp_result.stdout" # --- IPC --- - name: Disable IPC object removal @@ -246,11 +313,20 @@ notify: restart sshd # --- Clock sync --- - - name: Enable and start chronyd + # RHEL/Rocky Linux uses chronyd; Ubuntu uses chrony (package name differs) + - name: Enable and start chronyd (RHEL/Rocky Linux) systemd: name: chronyd state: started enabled: yes + when: ansible_os_family == "RedHat" + + - name: Enable and start chrony (Ubuntu) + systemd: + name: chrony + state: started + enabled: yes + when: ansible_os_family == "Debian" # --- gpadmin user --- - name: Create gpadmin group @@ -266,18 +342,35 @@ create_home: yes password: "{{ cloudberry_admin_password | password_hash('sha512') }}" - - name: Add gpadmin to wheel group + - name: Add gpadmin to wheel group and enable NOPASSWD sudo (RHEL/Rocky Linux) user: name: "{{ cloudberry_admin_user }}" groups: wheel append: yes + when: ansible_os_family == "RedHat" - - name: Ensure wheel group has NOPASSWD sudo + - name: Ensure wheel group has NOPASSWD sudo (RHEL/Rocky Linux) lineinfile: path: /etc/sudoers regexp: "^%wheel" line: "%wheel ALL=(ALL) NOPASSWD: ALL" validate: "visudo -cf %s" + when: ansible_os_family == "RedHat" + + - name: Add gpadmin to sudo group (Ubuntu) + user: + name: "{{ cloudberry_admin_user }}" + groups: sudo + append: yes + when: ansible_os_family == "Debian" + + - name: Create sudoers drop-in file for gpadmin (Ubuntu) + copy: + content: "gpadmin ALL=(ALL) NOPASSWD: ALL\n" + dest: /etc/sudoers.d/gpadmin + mode: "0440" + validate: "visudo -cf %s" + when: ansible_os_family == "Debian" - name: Set data directory ownership file: @@ -376,12 +469,12 @@ dest: "/home/{{ cloudberry_admin_user }}/hostfile_gpinitsystem" - name: Scan and trust host keys for all cluster nodes - shell: | - ssh-keyscan -H {{ groups['cloudberry'] | join(' ') }} >> /home/{{ cloudberry_admin_user }}/.ssh/known_hosts - chown {{ cloudberry_admin_user }}:{{ cloudberry_admin_user }} /home/{{ cloudberry_admin_user }}/.ssh/known_hosts - chmod 600 /home/{{ cloudberry_admin_user }}/.ssh/known_hosts - args: - executable: /bin/bash + known_hosts: + name: "{{ item }}" + key: "{{ lookup('pipe', 'ssh-keyscan -H ' + item) }}" + state: present + path: "/home/{{ cloudberry_admin_user }}/.ssh/known_hosts" + loop: "{{ groups['cloudberry'] }}" - name: Run gpssh-exkeys shell: | @@ -489,6 +582,13 @@ DATABASE_NAME={{ database_name }} dest: "/home/{{ cloudberry_admin_user }}/gpconfigs/gpinitsystem_config" + - name: Check if cluster is already initialized + stat: + path: "{{ coordinator_data_dir }}/gpseg-1/PG_VERSION" + register: pgversion_stat + become: yes + become_user: root + - name: Run gpinitsystem shell: | source /usr/local/cloudberry-db/cloudberry-env.sh @@ -499,6 +599,7 @@ executable: /bin/bash register: gpinitsystem_result failed_when: "'successfully created' not in gpinitsystem_result.stdout" + when: not pgversion_stat.stat.exists - name: Set environment variables in .bashrc blockinfile: From 9c6c9f84572c722d0b3debcfd10c370534bb9508 Mon Sep 17 00:00:00 2001 From: Dianjin Wang Date: Fri, 22 May 2026 14:33:12 +0800 Subject: [PATCH 4/7] Refactor the Ansible --- devops/deploy/ansible/README.md | 117 +++- devops/deploy/ansible/group_vars/all.yml | 5 + .../ansible/roles/common/handlers/main.yml | 27 + .../ansible/roles/common/tasks/main.yml | 369 ++++++++++ .../ansible/roles/initialize/tasks/main.yml | 128 ++++ .../roles/install_from_source/tasks/main.yml | 308 +++++++++ .../roles/install_package/tasks/main.yml | 44 ++ .../deploy/ansible/roles/ssh/tasks/main.yml | 172 +++++ devops/deploy/ansible/site-from-source.yml | 50 ++ devops/deploy/ansible/site.yml | 645 +----------------- 10 files changed, 1208 insertions(+), 657 deletions(-) create mode 100644 devops/deploy/ansible/roles/common/handlers/main.yml create mode 100644 devops/deploy/ansible/roles/common/tasks/main.yml create mode 100644 devops/deploy/ansible/roles/initialize/tasks/main.yml create mode 100644 devops/deploy/ansible/roles/install_from_source/tasks/main.yml create mode 100644 devops/deploy/ansible/roles/install_package/tasks/main.yml create mode 100644 devops/deploy/ansible/roles/ssh/tasks/main.yml create mode 100644 devops/deploy/ansible/site-from-source.yml diff --git a/devops/deploy/ansible/README.md b/devops/deploy/ansible/README.md index 44423c5d75f..54afae5dbf9 100644 --- a/devops/deploy/ansible/README.md +++ b/devops/deploy/ansible/README.md @@ -17,22 +17,46 @@ under the License. --> -# Apache Cloudberry Deployment Via Ansible +# Apache Cloudberry Bare-Metal Deployment via Ansible -This directory contains an Ansible playbook for deploying Apache Cloudberry on physical or virtual machines via Ansible. +This directory contains Ansible playbooks for deploying Apache Cloudberry on physical or +virtual machines. Two installation methods are supported: + +- **RPM/DEB package** (`site.yml`) — installs from a pre-built binary package +- **Source build** (`site-from-source.yml`) — downloads the official source tarball, + verifies signatures, and compiles from source ## Quick Start +### Install from RPM/DEB package + ```bash # 1. Edit inventory and variables -vi ansible/inventory/hosts # set hostnames and IPs -vi ansible/group_vars/all.yml # set password, disk, segments, etc. +vi inventory/hosts # set hostnames and IPs +vi group_vars/all.yml # set password, disk, segments, etc. # 2. Run the playbook -ansible-playbook ansible/site.yml -i ansible/inventory/hosts \ - -e package_path=./apache-cloudberry-db-incubating-2.1.0-1.el9.x86_64.rpm +ansible-playbook site.yml -i inventory/hosts \ + -e package_path=/path/to/apache-cloudberry-db-incubating-2.1.0-1.el9.x86_64.rpm ``` +### Install from source + +```bash +# 1. Edit inventory and variables (same as above) +vi inventory/hosts +vi group_vars/all.yml + +# 2. Run the playbook (no package_path needed) +ansible-playbook site-from-source.yml -i inventory/hosts \ + -e source_path=/path/to/apache-cloudberry-2.1.0-incubating-src.tar.gz +``` + +Download and verify the source tarball from +[Apache Cloudberry Releases](https://cloudberry.apache.org/releases) before running. +The playbook handles build dependency installation, Xerces-C compilation (RHEL/Rocky only), +and source compilation on each host. + ## Cluster Layout (default) | Host | Role | @@ -49,16 +73,15 @@ Each segment host runs 2 primary segments and 2 mirror segments (spread mirrorin - Ansible installed on the control machine (tested with ansible-core 2.14+) - Root SSH access from the control machine to all hosts -- All hosts have Rocky Linux 8/9 or compatible OS installed -- Apache Cloudberry RPM/DEB package downloaded to the control machine +- All hosts running Rocky Linux 8/9, Ubuntu 22.04, or compatible OS -Ansible 2.10+ requires the following collections to be installed separately: +Ansible 2.10+ requires the following collections: ```bash ansible-galaxy collection install ansible.posix community.general community.crypto ``` -To suppress the Python `crypt` module deprecation warning, install `passlib`: +To suppress the Python `crypt` module deprecation warning: ```bash pip3 install passlib @@ -68,31 +91,76 @@ pip3 install passlib ``` ansible/ -├── ansible.cfg # disable host key checking -├── site.yml # main playbook +├── site.yml # RPM/DEB installation entry point +├── site-from-source.yml # Source build entry point +├── ansible.cfg # Disables host key checking ├── inventory/ -│ └── hosts # hostnames and IPs -└── group_vars/ - └── all.yml # deployment variables +│ └── hosts # Hostnames and IPs +├── group_vars/ +│ └── all.yml # Deployment variables +└── roles/ + ├── common/ # OS configuration (shared by both methods) + │ ├── tasks/main.yml + │ └── handlers/main.yml + ├── install_package/ # RPM/DEB package installation + │ └── tasks/main.yml + ├── install_from_source/ # Source build and installation + │ └── tasks/main.yml + ├── ssh/ # Passwordless SSH + data directories + │ └── tasks/main.yml + └── initialize/ # Cluster initialization (gpinitsystem) + └── tasks/main.yml ``` -## What the Playbook Does +## What the Playbooks Do + +Both `site.yml` and `site-from-source.yml` share the same OS configuration, SSH setup, +and cluster initialization steps. Only the installation role differs. -1. Disable SELinux and firewall +### Shared steps (all hosts) + +1. Disable SELinux / ufw firewall 2. Configure hostnames and `/etc/hosts` -3. Set kernel parameters (`sysctl`) +3. Set kernel parameters (`sysctl`) — dynamically calculated per host based on RAM/swap 4. Set resource limits (`limits.conf`) 5. Configure XFS mount and disk I/O settings 6. Disable Transparent Huge Pages 7. Disable IPC object removal 8. Configure SSH thresholds -9. Synchronize system clocks (chronyd) +9. Synchronize system clocks (chronyd / chrony) 10. Create `gpadmin` user with sudo -11. Install Apache Cloudberry package on all hosts -12. Configure passwordless SSH for gpadmin (N-N) -13. Create data storage directories -14. Initialize the cluster with `gpinitsystem` -15. Set environment variables in `.bashrc` + +### RPM/DEB installation (`install_package` role) + +11. Copy package to each host and install via `dnf` / `apt` + +### Source build (`install_from_source` role) + +11. Install build dependencies (OS-specific) +12. Build and install Apache Xerces-C (RHEL/Rocky only; Ubuntu uses system package) +13. Copy source tarball to each host and extract +14. Compile and install using the project's build scripts + +### Shared steps (coordinator only) + +16. Configure N-N passwordless SSH for gpadmin via `gpssh-exkeys` +17. Create data storage directories on all nodes +18. Initialize the cluster with `gpinitsystem` (includes standby coordinator) +19. Set environment variables in `.bashrc` + +## Key Variables (`group_vars/all.yml`) + +| Variable | Default | Description | +|----------|---------|-------------| +| `cloudberry_version` | `2.1.0` | Version used for source download URL | +| `cloudberry_admin_user` | `gpadmin` | Admin OS user | +| `cloudberry_admin_password` | `changeme` | Password for gpadmin | +| `data_disk` | `/dev/sdb` | Data disk device (run `lsblk` to verify) | +| `data_mount` | `/data` | Mount point for data disk | +| `segments_per_host` | `2` | Primary segment instances per host | +| `coordinator_port` | `5432` | Coordinator port | +| `database_name` | `warehouse` | Default database created at init | +| `xerces_version` | `3.3.0` | Xerces-C version (source build only) | ## After Deployment @@ -100,4 +168,5 @@ ansible/ su - gpadmin psql -d warehouse # connect to the database gpstate -s # check cluster status +gpstate -f # check standby coordinator status ``` diff --git a/devops/deploy/ansible/group_vars/all.yml b/devops/deploy/ansible/group_vars/all.yml index 188f2a9cc3a..0f4c51ba360 100644 --- a/devops/deploy/ansible/group_vars/all.yml +++ b/devops/deploy/ansible/group_vars/all.yml @@ -23,8 +23,13 @@ # See the "Calculate dynamic sysctl values" tasks in site.yml. # Cloudberry version +# Used for both RPM/DEB package naming and source build download URL. cloudberry_version: "2.1.0" +# Apache Xerces-C version (required for source build on RHEL/Rocky Linux) +# Ubuntu uses the system libxerces-c-dev package instead. +xerces_version: "3.3.0" + # Admin user cloudberry_admin_user: "gpadmin" cloudberry_admin_password: "changeme" diff --git a/devops/deploy/ansible/roles/common/handlers/main.yml b/devops/deploy/ansible/roles/common/handlers/main.yml new file mode 100644 index 00000000000..c9a72cb645a --- /dev/null +++ b/devops/deploy/ansible/roles/common/handlers/main.yml @@ -0,0 +1,27 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +--- +- name: restart systemd-logind + service: + name: systemd-logind + state: restarted + +- name: restart sshd + service: + name: sshd + state: restarted diff --git a/devops/deploy/ansible/roles/common/tasks/main.yml b/devops/deploy/ansible/roles/common/tasks/main.yml new file mode 100644 index 00000000000..09f91e2247e --- /dev/null +++ b/devops/deploy/ansible/roles/common/tasks/main.yml @@ -0,0 +1,369 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +--- +# --- SELinux --- +# On RHEL/Rocky Linux, SELinux must be disabled. +# On Ubuntu, SELinux is not installed by default and this task is skipped. +# If SELinux is installed on Ubuntu, it will also be disabled. +- name: Disable SELinux (RHEL/Rocky Linux) + selinux: + state: disabled + when: ansible_os_family == "RedHat" + +- name: Disable SELinux if installed (Ubuntu) + shell: | + if dpkg -l selinux-basics &>/dev/null 2>&1; then + setenforce 0 2>/dev/null || true + sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config 2>/dev/null || true + echo "SELinux disabled" + else + echo "SELinux not installed, skipping" + fi + args: + executable: /bin/bash + when: ansible_os_family == "Debian" + register: selinux_ubuntu_result + changed_when: "'SELinux disabled' in selinux_ubuntu_result.stdout" + +# --- Firewall --- +# RHEL/Rocky Linux: stop and disable firewalld +# Ubuntu: ufw is disabled by default; disable it if active +- name: Stop and disable firewalld (RHEL/Rocky Linux) + systemd: + name: firewalld + state: stopped + enabled: no + when: ansible_os_family == "RedHat" + ignore_errors: yes + +- name: Disable ufw if active (Ubuntu) + shell: | + if ufw status | grep -q "Status: active"; then + ufw disable + echo "ufw disabled" + else + echo "ufw already inactive, skipping" + fi + args: + executable: /bin/bash + when: ansible_os_family == "Debian" + register: ufw_result + changed_when: "'ufw disabled' in ufw_result.stdout" + +# --- Hosts file --- +- name: Set hostname + hostname: + name: "{{ inventory_hostname }}" + +- name: Add cluster hosts to /etc/hosts + lineinfile: + path: /etc/hosts + line: "{{ hostvars[item].ansible_host }} {{ item }}" + state: present + loop: "{{ groups['cloudberry'] }}" + +# --- Kernel parameters --- +# Dynamically calculate memory-dependent sysctl values based on each host's +# actual physical memory and swap, following the official documentation formulas: +# kernel.shmall = _PHYS_PAGES / 2 +# kernel.shmmax = (_PHYS_PAGES / 2) * PAGE_SIZE +# vm.overcommit_ratio = (RAM - 0.026 * gp_vmem) / RAM * 100 +# vm.dirty_* uses bytes mode for >64GB RAM, ratio mode for <=64GB RAM +# vm.min_free_kbytes = MemTotal * 3% + +- name: Get system PAGE_SIZE + command: getconf PAGE_SIZE + register: _page_size_result + changed_when: false + +- name: Calculate dynamic sysctl values + set_fact: + _ram_mb: "{{ ansible_memtotal_mb }}" + _swap_mb: "{{ ansible_swaptotal_mb }}" + _page_size: "{{ _page_size_result.stdout | int }}" + +- name: Calculate kernel.shmall and kernel.shmmax + set_fact: + _shmall: "{{ ((_ram_mb | int * 1024 / (_page_size | int)) / 2) | int }}" + _shmmax: "{{ ((_ram_mb | int * 1024 / (_page_size | int)) / 2 * (_page_size | int)) | int }}" + +- name: Calculate vm.overcommit_ratio + set_fact: + _ram_gb: "{{ (_ram_mb | int / 1024) | float }}" + _swap_gb: "{{ (_swap_mb | int / 1024) | float }}" + +- name: Calculate gp_vmem and overcommit_ratio + set_fact: + _gp_vmem: >- + {{ ((_swap_gb | float + _ram_gb | float) - (7.5 + 0.05 * (_ram_gb | float))) + / ((_ram_gb | float >= 256) | ternary(1.17, 1.7)) }} + +- name: Calculate final overcommit_ratio + set_fact: + _overcommit_ratio: "{{ ((_ram_gb | float - 0.026 * (_gp_vmem | float)) / (_ram_gb | float) * 100) | int }}" + +- name: Calculate vm.min_free_kbytes (3% of MemTotal in kB) + set_fact: + _min_free_kbytes: "{{ (_ram_mb | int * 1024 * 0.03) | round | int }}" + +- name: Set sysctl parameters (fixed values) + sysctl: + name: "{{ item.key }}" + value: "{{ item.value }}" + sysctl_set: yes + reload: yes + loop: "{{ sysctl_fixed | dict2items }}" + vars: + sysctl_fixed: + kernel.shmmni: "4096" + vm.overcommit_memory: "2" + net.ipv4.ip_local_port_range: "10000 65535" + kernel.sem: "250 2048000 200 8192" + kernel.sysrq: "1" + kernel.core_uses_pid: "1" + kernel.msgmnb: "65536" + kernel.msgmax: "65536" + kernel.msgmni: "2048" + net.ipv4.tcp_syncookies: "1" + net.ipv4.conf.default.accept_source_route: "0" + net.ipv4.tcp_max_syn_backlog: "4096" + net.ipv4.conf.all.arp_filter: "1" + net.ipv4.ipfrag_high_thresh: "41943040" + net.ipv4.ipfrag_low_thresh: "31457280" + net.ipv4.ipfrag_time: "60" + net.core.netdev_max_backlog: "10000" + net.core.rmem_max: "2097152" + net.core.wmem_max: "2097152" + vm.swappiness: "10" + vm.zone_reclaim_mode: "0" + vm.dirty_expire_centisecs: "500" + vm.dirty_writeback_centisecs: "100" + kernel.core_pattern: "/var/core/core.%h.%t" + +- name: Set sysctl parameters (memory-dependent) + sysctl: + name: "{{ item.key }}" + value: "{{ item.value }}" + sysctl_set: yes + reload: yes + loop: + - { key: "kernel.shmall", value: "{{ _shmall }}" } + - { key: "kernel.shmmax", value: "{{ _shmmax }}" } + - { key: "vm.overcommit_ratio", value: "{{ _overcommit_ratio }}" } + - { key: "vm.min_free_kbytes", value: "{{ _min_free_kbytes }}" } + +- name: Set sysctl dirty parameters for hosts with more than 64GB RAM (bytes mode) + sysctl: + name: "{{ item.key }}" + value: "{{ item.value }}" + sysctl_set: yes + reload: yes + loop: + - { key: "vm.dirty_background_ratio", value: "0" } + - { key: "vm.dirty_ratio", value: "0" } + - { key: "vm.dirty_background_bytes", value: "1610612736" } + - { key: "vm.dirty_bytes", value: "4294967296" } + when: _ram_mb | int > 65536 + +- name: Set sysctl dirty parameters for hosts with 64GB RAM or less (ratio mode) + sysctl: + name: "{{ item.key }}" + value: "{{ item.value }}" + sysctl_set: yes + reload: yes + loop: + - { key: "vm.dirty_background_ratio", value: "3" } + - { key: "vm.dirty_ratio", value: "10" } + when: _ram_mb | int <= 65536 + # Per official docs: for systems with 64GB RAM or less, only set the ratio + # parameters. vm.dirty_background_bytes and vm.dirty_bytes should be omitted + # entirely, not set to 0. + +- name: Create core dump directory + file: + path: /var/core + state: directory + mode: "1777" + +# --- Resource limits --- +- name: Set PAM limits + pam_limits: + domain: "*" + limit_type: "{{ item.type }}" + limit_item: "{{ item.item }}" + value: "{{ item.value }}" + loop: + - { type: soft, item: nofile, value: "524288" } + - { type: hard, item: nofile, value: "524288" } + - { type: soft, item: nproc, value: "131072" } + - { type: hard, item: nproc, value: "131072" } + - { type: soft, item: core, value: unlimited } + +# --- XFS mount --- +- name: Create XFS filesystem on data disk + filesystem: + fstype: xfs + dev: "{{ data_disk }}" + force: no + ignore_errors: yes + +- name: Mount data disk + mount: + path: "{{ data_mount }}" + src: "{{ data_disk }}" + fstype: xfs + opts: rw,nodev,noatime,inode64 + state: mounted + +# --- Disk I/O --- +- name: Set blockdev read-ahead + command: "/sbin/blockdev --setra 16384 {{ data_disk }}" + changed_when: false + +- name: Set I/O scheduler via grubby (RHEL/Rocky Linux) + command: grubby --update-kernel=ALL --args="elevator=mq-deadline" + ignore_errors: yes + changed_when: false + when: ansible_os_family == "RedHat" + +- name: Set I/O scheduler via grub (Ubuntu) + shell: | + if grep -q "elevator=mq-deadline" /etc/default/grub; then + echo "elevator already set, skipping" + else + sed -i 's/GRUB_CMDLINE_LINUX="\(.*\)"/GRUB_CMDLINE_LINUX="\1 elevator=mq-deadline"/' /etc/default/grub + update-grub + echo "elevator set" + fi + args: + executable: /bin/bash + when: ansible_os_family == "Debian" + register: grub_elevator_result + changed_when: "'elevator set' in grub_elevator_result.stdout" + +# --- THP --- +- name: Disable Transparent Huge Pages via grubby (RHEL/Rocky Linux) + command: grubby --update-kernel=ALL --args="transparent_hugepage=never" + ignore_errors: yes + changed_when: false + when: ansible_os_family == "RedHat" + +- name: Disable Transparent Huge Pages via grub (Ubuntu) + shell: | + if grep -q "transparent_hugepage=never" /etc/default/grub; then + echo "THP already disabled, skipping" + else + sed -i 's/GRUB_CMDLINE_LINUX="\(.*\)"/GRUB_CMDLINE_LINUX="\1 transparent_hugepage=never"/' /etc/default/grub + update-grub + echo "THP disabled" + fi + args: + executable: /bin/bash + when: ansible_os_family == "Debian" + register: grub_thp_result + changed_when: "'THP disabled' in grub_thp_result.stdout" + +# --- IPC --- +- name: Disable IPC object removal + lineinfile: + path: /etc/systemd/logind.conf + regexp: "^#?RemoveIPC=" + line: "RemoveIPC=no" + notify: restart systemd-logind + +# --- SSH threshold --- +- name: Set SSH MaxStartups + lineinfile: + path: /etc/ssh/sshd_config + regexp: "^#?MaxStartups" + line: "MaxStartups 10:30:200" + notify: restart sshd + +- name: Set SSH MaxSessions + lineinfile: + path: /etc/ssh/sshd_config + regexp: "^#?MaxSessions" + line: "MaxSessions 200" + notify: restart sshd + +# --- Clock sync --- +# RHEL/Rocky Linux uses chronyd; Ubuntu uses chrony (package name differs) +- name: Enable and start chronyd (RHEL/Rocky Linux) + systemd: + name: chronyd + state: started + enabled: yes + when: ansible_os_family == "RedHat" + +- name: Enable and start chrony (Ubuntu) + systemd: + name: chrony + state: started + enabled: yes + when: ansible_os_family == "Debian" + +# --- gpadmin user --- +- name: Create gpadmin group + group: + name: "{{ cloudberry_admin_user }}" + state: present + +- name: Create gpadmin user + user: + name: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + system: yes + create_home: yes + password: "{{ cloudberry_admin_password | password_hash('sha512') }}" + +- name: Add gpadmin to wheel group and enable NOPASSWD sudo (RHEL/Rocky Linux) + user: + name: "{{ cloudberry_admin_user }}" + groups: wheel + append: yes + when: ansible_os_family == "RedHat" + +- name: Ensure wheel group has NOPASSWD sudo (RHEL/Rocky Linux) + lineinfile: + path: /etc/sudoers + regexp: "^%wheel" + line: "%wheel ALL=(ALL) NOPASSWD: ALL" + validate: "visudo -cf %s" + when: ansible_os_family == "RedHat" + +- name: Add gpadmin to sudo group (Ubuntu) + user: + name: "{{ cloudberry_admin_user }}" + groups: sudo + append: yes + when: ansible_os_family == "Debian" + +- name: Create sudoers drop-in file for gpadmin (Ubuntu) + copy: + content: "gpadmin ALL=(ALL) NOPASSWD: ALL\n" + dest: /etc/sudoers.d/gpadmin + mode: "0440" + validate: "visudo -cf %s" + when: ansible_os_family == "Debian" + +- name: Set data directory ownership + file: + path: "{{ data_mount }}" + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + recurse: yes diff --git a/devops/deploy/ansible/roles/initialize/tasks/main.yml b/devops/deploy/ansible/roles/initialize/tasks/main.yml new file mode 100644 index 00000000000..05698a12f24 --- /dev/null +++ b/devops/deploy/ansible/roles/initialize/tasks/main.yml @@ -0,0 +1,128 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +--- +- name: Create gpconfigs directory + file: + path: "/home/{{ cloudberry_admin_user }}/gpconfigs" + state: directory + +- name: Create gpinitsystem config + copy: + content: | + SEG_PREFIX=gpseg + PORT_BASE={{ port_base }} + declare -a DATA_DIRECTORY=({% for i in range(segments_per_host) %}{{ primary_data_dir }} {% endfor %}) + COORDINATOR_HOSTNAME={{ coordinator_hostname }} + COORDINATOR_DIRECTORY={{ coordinator_data_dir }} + COORDINATOR_PORT={{ coordinator_port }} + TRUSTED_SHELL=ssh + CHECK_POINT_SEGMENTS=8 + ENCODING=UNICODE + MIRROR_PORT_BASE={{ mirror_port_base }} + declare -a MIRROR_DATA_DIRECTORY=({% for i in range(segments_per_host) %}{{ mirror_data_dir }} {% endfor %}) + DATABASE_NAME={{ database_name }} + dest: "/home/{{ cloudberry_admin_user }}/gpconfigs/gpinitsystem_config" + +- name: Check if cluster is already initialized + stat: + path: "{{ coordinator_data_dir }}/gpseg-1/PG_VERSION" + register: pgversion_stat + become: yes + become_user: root + +- name: Run gpinitsystem + shell: | + source /usr/local/cloudberry-db/cloudberry-env.sh + yes | gpinitsystem -c /home/{{ cloudberry_admin_user }}/gpconfigs/gpinitsystem_config \ + -h /home/{{ cloudberry_admin_user }}/hostfile_gpinitsystem \ + -s {{ standby_hostname }} --mirror-mode=spread + args: + executable: /bin/bash + register: gpinitsystem_result + failed_when: "'successfully created' not in gpinitsystem_result.stdout" + when: not pgversion_stat.stat.exists + +- name: Set environment variables in .bashrc + blockinfile: + path: "/home/{{ cloudberry_admin_user }}/.bashrc" + block: | + source /usr/local/cloudberry-db/cloudberry-env.sh + export COORDINATOR_DATA_DIRECTORY={{ coordinator_data_dir }}/gpseg-1 + export PGPORT={{ coordinator_port }} + export PGUSER={{ cloudberry_admin_user }} + export PGDATABASE={{ database_name }} + marker: "# {mark} CLOUDBERRY ENVIRONMENT" + +- name: Sync .bashrc to standby coordinator + shell: | + scp /home/{{ cloudberry_admin_user }}/.bashrc \ + {{ standby_hostname }}:/home/{{ cloudberry_admin_user }}/.bashrc + args: + executable: /bin/bash + +- name: Verify standby coordinator is synchronized + shell: | + source /usr/local/cloudberry-db/cloudberry-env.sh + source /home/{{ cloudberry_admin_user }}/.bashrc + gpstate -f + args: + executable: /bin/bash + register: gpstate_result + changed_when: false + failed_when: false + +- name: Initialize standby coordinator if not already active + shell: | + source /usr/local/cloudberry-db/cloudberry-env.sh + source /home/{{ cloudberry_admin_user }}/.bashrc + if ! gpstate -f 2>&1 | grep -q "Sync state: sync"; then + yes | gpinitstandby -s {{ standby_hostname }} + else + echo "Standby already synchronized, skipping gpinitstandby" + fi + args: + executable: /bin/bash + register: gpinitstandby_result + changed_when: "'skipping' not in gpinitstandby_result.stdout" + +- name: Display success message + debug: + msg: + - "==========================================" + - " Apache Cloudberry cluster initialized successfully!" + - "==========================================" + - " Connect to the database:" + - " su - gpadmin" + - " psql -d {{ database_name }}" + - "==========================================" + +- name: Check postgres --gp-version + command: /usr/local/cloudberry-db/bin/postgres --gp-version + register: gp_version + changed_when: false + +- name: Check postgres --version + command: /usr/local/cloudberry-db/bin/postgres --version + register: pg_version + changed_when: false + +- name: Show version info + debug: + msg: + - "{{ gp_version.stdout }}" + - "{{ pg_version.stdout }}" diff --git a/devops/deploy/ansible/roles/install_from_source/tasks/main.yml b/devops/deploy/ansible/roles/install_from_source/tasks/main.yml new file mode 100644 index 00000000000..04b74396553 --- /dev/null +++ b/devops/deploy/ansible/roles/install_from_source/tasks/main.yml @@ -0,0 +1,308 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +# Usage: +# Download and verify the source tarball from https://cloudberry.apache.org/releases +# before running this playbook, then pass the local path via: +# -e source_path=/path/to/apache-cloudberry-2.1.0-incubating-src.tar.gz + +--- +# --- Install build dependencies --- + +- name: Install build dependencies (RHEL/Rocky Linux 8) + dnf: + name: + - apr-devel + - bison + - bzip2-devel + - cmake3 + - curl + - diffutils + - flex + - gcc + - gcc-c++ + - glibc-langpack-en + - glibc-locale-source + - iproute + - krb5-devel + - libcurl-devel + - libevent-devel + - libxml2-devel + - libuuid-devel + - libzstd-devel + - lz4-devel + - net-tools + - openldap-devel + - openssl-devel + - openssh-server + - pam-devel + - perl + - perl-ExtUtils-Embed + - perl-Test-Simple + - perl-Env + - python3-devel + - python3-pip + - readline-devel + - rsync + - wget + - which + - zlib-devel + state: present + when: + - ansible_os_family == "RedHat" + - ansible_distribution_major_version == "8" + +- name: Install extra build dependencies from devel repo (Rocky Linux 8) + dnf: + name: + - liburing-devel + - libuv-devel + - libyaml-devel + - perl-IPC-Run + - protobuf-devel + state: present + enablerepo: devel + when: + - ansible_os_family == "RedHat" + - ansible_distribution_major_version == "8" + +- name: Install build dependencies (RHEL/Rocky Linux 9+) + dnf: + name: + - apr-devel + - bison + - bzip2-devel + - cmake3 + - curl + - diffutils + - flex + - gcc + - gcc-c++ + - glibc-langpack-en + - glibc-locale-source + - iproute + - krb5-devel + - libcurl-devel + - libevent-devel + - libxml2-devel + - libuuid-devel + - libzstd-devel + - lz4-devel + - net-tools + - openldap-devel + - openssl-devel + - openssh-server + - pam-devel + - perl + - perl-ExtUtils-Embed + - perl-Test-Simple + - perl-Env + - python3-devel + - python3-pip + - readline-devel + - rsync + - wget + - which + - zlib-devel + state: present + when: + - ansible_os_family == "RedHat" + - ansible_distribution_major_version | int >= 9 + +- name: Install extra build dependencies from crb repo (Rocky Linux 9+) + dnf: + name: + - liburing-devel + - libuv-devel + - libyaml-devel + - perl-IPC-Run + - protobuf-devel + state: present + enablerepo: crb + when: + - ansible_os_family == "RedHat" + - ansible_distribution_major_version | int >= 9 + +- name: Install build dependencies (Ubuntu 22.04) + apt: + name: + - bison + - bzip2 + - cmake + - curl + - flex + - gcc + - g++ + - iproute2 + - iputils-ping + - language-pack-en + - locales + - libapr1-dev + - libbz2-dev + - libcurl4-gnutls-dev + - libevent-dev + - libkrb5-dev + - libipc-run-perl + - libldap2-dev + - libpam0g-dev + - libprotobuf-dev + - libreadline-dev + - libssl-dev + - liburing-dev + - libuv1-dev + - liblz4-dev + - libxerces-c-dev + - libxml2-dev + - libyaml-dev + - libzstd-dev + - libperl-dev + - make + - pkg-config + - protobuf-compiler + - python3-dev + - python3-pip + - python3-setuptools + - rsync + - wget + state: present + update_cache: yes + when: ansible_os_family == "Debian" + +# --- Install Apache Xerces-C (RHEL/Rocky Linux only) --- +# Ubuntu uses the system libxerces-c-dev package installed above. + +- name: Check if Xerces-C is already installed (RHEL/Rocky Linux) + stat: + path: "/usr/local/xerces-c/lib/libxerces-c.so" + register: xerces_installed + when: ansible_os_family == "RedHat" + +- name: Download Xerces-C source (RHEL/Rocky Linux) + get_url: + url: "https://dlcdn.apache.org//xerces/c/3/sources/xerces-c-{{ xerces_version }}.tar.gz" + dest: "/tmp/xerces-c-{{ xerces_version }}.tar.gz" + mode: "0644" + when: + - ansible_os_family == "RedHat" + - not xerces_installed.stat.exists + +- name: Download Xerces-C SHA256 checksum (RHEL/Rocky Linux) + get_url: + url: "https://dlcdn.apache.org//xerces/c/3/sources/xerces-c-{{ xerces_version }}.tar.gz.sha256" + dest: "/tmp/xerces-c-{{ xerces_version }}.tar.gz.sha256" + mode: "0644" + when: + - ansible_os_family == "RedHat" + - not xerces_installed.stat.exists + +- name: Verify Xerces-C checksum (RHEL/Rocky Linux) + shell: | + cd /tmp + sha256sum -c xerces-c-{{ xerces_version }}.tar.gz.sha256 + args: + executable: /bin/bash + when: + - ansible_os_family == "RedHat" + - not xerces_installed.stat.exists + changed_when: false + +- name: Build and install Xerces-C (RHEL/Rocky Linux) + shell: | + cd /tmp + tar xf xerces-c-{{ xerces_version }}.tar.gz + cd xerces-c-{{ xerces_version }} + ./configure --prefix=/usr/local/xerces-c-{{ xerces_version }} + make -j$(nproc) + sudo make install + sudo ln -sf /usr/local/xerces-c-{{ xerces_version }} /usr/local/xerces-c + cd /tmp && rm -rf xerces-c-{{ xerces_version }} xerces-c-{{ xerces_version }}.tar.gz + args: + executable: /bin/bash + when: + - ansible_os_family == "RedHat" + - not xerces_installed.stat.exists + +# --- Copy and extract source tarball --- + +- name: Copy source tarball to host + copy: + src: "{{ source_path }}" + dest: "/tmp/{{ source_path | basename }}" + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + mode: "0644" + +- name: Check if source directory already exists + stat: + path: "/home/{{ cloudberry_admin_user }}/cloudberry" + register: src_dir + +- name: Extract source tarball + unarchive: + src: "/tmp/{{ source_path | basename }}" + dest: "/home/{{ cloudberry_admin_user }}" + remote_src: yes + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + when: not src_dir.stat.exists + +- name: Rename extracted directory to cloudberry + shell: | + extracted=$(tar -tzf /tmp/{{ source_path | basename }} | head -1 | cut -d/ -f1) + mv /home/{{ cloudberry_admin_user }}/${extracted} \ + /home/{{ cloudberry_admin_user }}/cloudberry + args: + executable: /bin/bash + when: not src_dir.stat.exists + +- name: Cleanup source tarball from /tmp + file: + path: "/tmp/{{ source_path | basename }}" + state: absent + +# --- Build and install --- + +- name: Create build-logs directory + file: + path: "/home/{{ cloudberry_admin_user }}/cloudberry/build-logs" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + +- name: Configure Cloudberry + shell: | + export SRC_DIR=/home/{{ cloudberry_admin_user }}/cloudberry + export BUILD_DESTINATION=/usr/local/cloudberry-db + cd ${SRC_DIR} + ./devops/build/automation/cloudberry/scripts/configure-cloudberry.sh + args: + executable: /bin/bash + become_user: "{{ cloudberry_admin_user }}" + +- name: Build and install Cloudberry + shell: | + export SRC_DIR=/home/{{ cloudberry_admin_user }}/cloudberry + export BUILD_DESTINATION=/usr/local/cloudberry-db + cd ${SRC_DIR} + ./devops/build/automation/cloudberry/scripts/build-cloudberry.sh + args: + executable: /bin/bash + become_user: "{{ cloudberry_admin_user }}" + +- name: Set installation directory ownership + shell: chown -R {{ cloudberry_admin_user }}:{{ cloudberry_admin_user }} /usr/local/cloudberry* + changed_when: false diff --git a/devops/deploy/ansible/roles/install_package/tasks/main.yml b/devops/deploy/ansible/roles/install_package/tasks/main.yml new file mode 100644 index 00000000000..a4d1861d103 --- /dev/null +++ b/devops/deploy/ansible/roles/install_package/tasks/main.yml @@ -0,0 +1,44 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +--- +- name: Copy package to host + copy: + src: "{{ package_path }}" + dest: "/tmp/{{ package_path | basename }}" + +- name: Install package (RPM) + yum: + name: "/tmp/{{ package_path | basename }}" + state: present + disable_gpg_check: yes + when: ansible_os_family == "RedHat" + +- name: Install package (DEB) + apt: + deb: "/tmp/{{ package_path | basename }}" + state: present + when: ansible_os_family == "Debian" + +- name: Cleanup package file + file: + path: "/tmp/{{ package_path | basename }}" + state: absent + +- name: Set installation directory ownership + shell: chown -R {{ cloudberry_admin_user }}:{{ cloudberry_admin_user }} /usr/local/cloudberry* + changed_when: false diff --git a/devops/deploy/ansible/roles/ssh/tasks/main.yml b/devops/deploy/ansible/roles/ssh/tasks/main.yml new file mode 100644 index 00000000000..7d97cc67e8e --- /dev/null +++ b/devops/deploy/ansible/roles/ssh/tasks/main.yml @@ -0,0 +1,172 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +--- +# --- SSH passwordless setup (runs on coordinator) --- + +- name: Create .ssh directory for gpadmin + file: + path: "/home/{{ cloudberry_admin_user }}/.ssh" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + mode: "0700" + +- name: Generate SSH key for gpadmin + openssh_keypair: + path: "/home/{{ cloudberry_admin_user }}/.ssh/id_rsa" + type: rsa + size: 4096 + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + +- name: Fetch gpadmin public key from coordinator + slurp: + src: "/home/{{ cloudberry_admin_user }}/.ssh/id_rsa.pub" + register: gpadmin_pubkey + +- name: Distribute SSH public key to all hosts + authorized_key: + user: "{{ cloudberry_admin_user }}" + key: "{{ gpadmin_pubkey.content | b64decode }}" + delegate_to: "{{ item }}" + loop: "{{ groups['cloudberry'] }}" + +- name: Create hostfile_exkeys + copy: + content: "{{ groups['cloudberry'] | join('\n') }}\n" + dest: "/home/{{ cloudberry_admin_user }}/hostfile_exkeys" + +- name: Create hostfile_gpinitsystem + copy: + content: "{{ groups['segments'] | join('\n') }}\n" + dest: "/home/{{ cloudberry_admin_user }}/hostfile_gpinitsystem" + +- name: Scan host keys from all cluster nodes + shell: ssh-keyscan {{ groups['cloudberry'] | join(' ') }} + register: scanned_host_keys + changed_when: false + args: + executable: /bin/bash + +- name: Trust scanned host keys for gpadmin + known_hosts: + name: "{{ item.split()[0] }}" + key: "{{ item }}" + state: present + path: "/home/{{ cloudberry_admin_user }}/.ssh/known_hosts" + loop: "{{ scanned_host_keys.stdout_lines }}" + when: item | length > 0 + +- name: Run gpssh-exkeys + shell: | + source /usr/local/cloudberry-db/cloudberry-env.sh + gpssh-exkeys -f /home/{{ cloudberry_admin_user }}/hostfile_exkeys + args: + executable: /bin/bash + +# --- Data directories on coordinator --- + +- name: Create data root directory as root + file: + path: "{{ coordinator_data_dir | dirname }}" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + mode: "0755" + become: yes + become_user: root + +- name: Create coordinator data directory + file: + path: "{{ coordinator_data_dir }}" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + +# --- Data directories on standby --- + +- name: Create data root directory on standby + file: + path: "{{ coordinator_data_dir | dirname }}" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + mode: "0755" + delegate_to: "{{ item }}" + loop: "{{ groups['standby'] }}" + become: yes + become_user: root + +- name: Create coordinator data directory on standby + file: + path: "{{ coordinator_data_dir }}" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + delegate_to: "{{ item }}" + loop: "{{ groups['standby'] }}" + become: yes + become_user: root + +- name: Create .ssh directory for gpadmin on standby + file: + path: "/home/{{ cloudberry_admin_user }}/.ssh" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + mode: "0700" + delegate_to: "{{ item }}" + loop: "{{ groups['standby'] }}" + become: yes + become_user: root + +# --- Data directories on segments --- + +- name: Create data root directory on segments + file: + path: "{{ primary_data_dir | dirname }}" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + mode: "0755" + delegate_to: "{{ item }}" + loop: "{{ groups['segments'] }}" + become: yes + become_user: root + +- name: Create primary data directory on segments + file: + path: "{{ primary_data_dir }}" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + delegate_to: "{{ item }}" + loop: "{{ groups['segments'] }}" + become: yes + become_user: root + +- name: Create mirror data directory on segments + file: + path: "{{ mirror_data_dir }}" + state: directory + owner: "{{ cloudberry_admin_user }}" + group: "{{ cloudberry_admin_user }}" + delegate_to: "{{ item }}" + loop: "{{ groups['segments'] }}" + become: yes + become_user: root diff --git a/devops/deploy/ansible/site-from-source.yml b/devops/deploy/ansible/site-from-source.yml new file mode 100644 index 00000000000..38bac4fe544 --- /dev/null +++ b/devops/deploy/ansible/site-from-source.yml @@ -0,0 +1,50 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +# Apache Cloudberry - Bare Metal Deployment Playbook (Source Build) +# +# Usage: +# ansible-playbook site-from-source.yml -i inventory/hosts + +--- +- name: Configure all hosts + hosts: cloudberry + become: yes + roles: + - common + +- name: Build and install Apache Cloudberry from source + hosts: cloudberry + become: yes + roles: + - install_from_source + +- name: Configure SSH and create data directories + hosts: coordinator + become: yes + become_user: gpadmin + roles: + - ssh + +- name: Initialize Apache Cloudberry cluster + hosts: coordinator + become: yes + become_user: gpadmin + vars: + cloudberry_admin_user: "gpadmin" + roles: + - initialize diff --git a/devops/deploy/ansible/site.yml b/devops/deploy/ansible/site.yml index 1937733bbab..570e2844159 100644 --- a/devops/deploy/ansible/site.yml +++ b/devops/deploy/ansible/site.yml @@ -15,7 +15,7 @@ # specific language governing permissions and limitations # under the License. # -# Apache Cloudberry - Bare Metal Deployment Playbook +# Apache Cloudberry - Bare Metal Deployment Playbook (RPM/DEB) # # Usage: # ansible-playbook site.yml -i inventory/hosts \ @@ -25,648 +25,27 @@ - name: Configure all hosts hosts: cloudberry become: yes - tasks: + roles: + - common - # --- SELinux --- - # On RHEL/Rocky Linux, SELinux must be disabled. - # On Ubuntu, SELinux is not installed by default and this task is skipped. - # If SELinux is installed on Ubuntu, it will also be disabled. - - name: Disable SELinux (RHEL/Rocky Linux) - selinux: - state: disabled - when: ansible_os_family == "RedHat" - - - name: Disable SELinux if installed (Ubuntu) - shell: | - if dpkg -l selinux-basics &>/dev/null 2>&1; then - setenforce 0 2>/dev/null || true - sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config 2>/dev/null || true - echo "SELinux disabled" - else - echo "SELinux not installed, skipping" - fi - args: - executable: /bin/bash - when: ansible_os_family == "Debian" - register: selinux_ubuntu_result - changed_when: "'SELinux disabled' in selinux_ubuntu_result.stdout" - - # --- Firewall --- - # RHEL/Rocky Linux: stop and disable firewalld - # Ubuntu: ufw is disabled by default; disable it if active - - name: Stop and disable firewalld (RHEL/Rocky Linux) - systemd: - name: firewalld - state: stopped - enabled: no - when: ansible_os_family == "RedHat" - ignore_errors: yes - - - name: Disable ufw if active (Ubuntu) - shell: | - if ufw status | grep -q "Status: active"; then - ufw disable - echo "ufw disabled" - else - echo "ufw already inactive, skipping" - fi - args: - executable: /bin/bash - when: ansible_os_family == "Debian" - register: ufw_result - changed_when: "'ufw disabled' in ufw_result.stdout" - - # --- Hosts file --- - - name: Set hostname - hostname: - name: "{{ inventory_hostname }}" - - - name: Add cluster hosts to /etc/hosts - lineinfile: - path: /etc/hosts - line: "{{ hostvars[item].ansible_host }} {{ item }}" - state: present - loop: "{{ groups['cloudberry'] }}" - - # --- Kernel parameters --- - # Dynamically calculate memory-dependent sysctl values based on each host's - # actual physical memory and swap, following the official documentation formulas: - # kernel.shmall = _PHYS_PAGES / 2 - # kernel.shmmax = (_PHYS_PAGES / 2) * PAGE_SIZE - # vm.overcommit_ratio = (RAM - 0.026 * gp_vmem) / RAM * 100 - # vm.dirty_* uses bytes mode for >64GB RAM, ratio mode for <=64GB RAM - # vm.min_free_kbytes = MemTotal * 3% - - - name: Get system PAGE_SIZE - command: getconf PAGE_SIZE - register: _page_size_result - changed_when: false - - - name: Calculate dynamic sysctl values - set_fact: - _ram_mb: "{{ ansible_memtotal_mb }}" - _swap_mb: "{{ ansible_swaptotal_mb }}" - _page_size: "{{ _page_size_result.stdout | int }}" - - - name: Calculate kernel.shmall and kernel.shmmax - set_fact: - _shmall: "{{ ((_ram_mb | int * 1024 / (_page_size | int)) / 2) | int }}" - _shmmax: "{{ ((_ram_mb | int * 1024 / (_page_size | int)) / 2 * (_page_size | int)) | int }}" - - - name: Calculate vm.overcommit_ratio - set_fact: - _ram_gb: "{{ (_ram_mb | int / 1024) | float }}" - _swap_gb: "{{ (_swap_mb | int / 1024) | float }}" - - - name: Calculate gp_vmem and overcommit_ratio - set_fact: - _gp_vmem: >- - {{ ((_swap_gb | float + _ram_gb | float) - (7.5 + 0.05 * (_ram_gb | float))) - / ((_ram_gb | float >= 256) | ternary(1.17, 1.7)) }} - - - name: Calculate final overcommit_ratio - set_fact: - _overcommit_ratio: "{{ ((_ram_gb | float - 0.026 * (_gp_vmem | float)) / (_ram_gb | float) * 100) | int }}" - - - name: Calculate vm.min_free_kbytes (3% of MemTotal in kB) - set_fact: - _min_free_kbytes: "{{ (_ram_mb | int * 1024 * 0.03) | round | int }}" - - - name: Set sysctl parameters (fixed values) - sysctl: - name: "{{ item.key }}" - value: "{{ item.value }}" - sysctl_set: yes - reload: yes - loop: "{{ sysctl_fixed | dict2items }}" - vars: - sysctl_fixed: - kernel.shmmni: "4096" - vm.overcommit_memory: "2" - net.ipv4.ip_local_port_range: "10000 65535" - kernel.sem: "250 2048000 200 8192" - kernel.sysrq: "1" - kernel.core_uses_pid: "1" - kernel.msgmnb: "65536" - kernel.msgmax: "65536" - kernel.msgmni: "2048" - net.ipv4.tcp_syncookies: "1" - net.ipv4.conf.default.accept_source_route: "0" - net.ipv4.tcp_max_syn_backlog: "4096" - net.ipv4.conf.all.arp_filter: "1" - net.ipv4.ipfrag_high_thresh: "41943040" - net.ipv4.ipfrag_low_thresh: "31457280" - net.ipv4.ipfrag_time: "60" - net.core.netdev_max_backlog: "10000" - net.core.rmem_max: "2097152" - net.core.wmem_max: "2097152" - vm.swappiness: "10" - vm.zone_reclaim_mode: "0" - vm.dirty_expire_centisecs: "500" - vm.dirty_writeback_centisecs: "100" - kernel.core_pattern: "/var/core/core.%h.%t" - - - name: Set sysctl parameters (memory-dependent) - sysctl: - name: "{{ item.key }}" - value: "{{ item.value }}" - sysctl_set: yes - reload: yes - loop: - - { key: "kernel.shmall", value: "{{ _shmall }}" } - - { key: "kernel.shmmax", value: "{{ _shmmax }}" } - - { key: "vm.overcommit_ratio", value: "{{ _overcommit_ratio }}" } - - { key: "vm.min_free_kbytes", value: "{{ _min_free_kbytes }}" } - - - name: Set sysctl dirty parameters for hosts with more than 64GB RAM (bytes mode) - sysctl: - name: "{{ item.key }}" - value: "{{ item.value }}" - sysctl_set: yes - reload: yes - loop: - - { key: "vm.dirty_background_ratio", value: "0" } - - { key: "vm.dirty_ratio", value: "0" } - - { key: "vm.dirty_background_bytes", value: "1610612736" } - - { key: "vm.dirty_bytes", value: "4294967296" } - when: _ram_mb | int > 65536 - - - name: Set sysctl dirty parameters for hosts with 64GB RAM or less (ratio mode) - sysctl: - name: "{{ item.key }}" - value: "{{ item.value }}" - sysctl_set: yes - reload: yes - loop: - - { key: "vm.dirty_background_ratio", value: "3" } - - { key: "vm.dirty_ratio", value: "10" } - when: _ram_mb | int <= 65536 - # Per official docs: for systems with 64GB RAM or less, only set the ratio - # parameters. vm.dirty_background_bytes and vm.dirty_bytes should be omitted - # entirely, not set to 0. - - - name: Create core dump directory - file: - path: /var/core - state: directory - mode: "1777" - - # --- Resource limits --- - - name: Set PAM limits - pam_limits: - domain: "*" - limit_type: "{{ item.type }}" - limit_item: "{{ item.item }}" - value: "{{ item.value }}" - loop: - - { type: soft, item: nofile, value: "524288" } - - { type: hard, item: nofile, value: "524288" } - - { type: soft, item: nproc, value: "131072" } - - { type: hard, item: nproc, value: "131072" } - - { type: soft, item: core, value: unlimited } - - # --- XFS mount --- - - name: Create XFS filesystem on data disk - filesystem: - fstype: xfs - dev: "{{ data_disk }}" - force: no - ignore_errors: yes - - - name: Mount data disk - mount: - path: "{{ data_mount }}" - src: "{{ data_disk }}" - fstype: xfs - opts: rw,nodev,noatime,inode64 - state: mounted - - # --- Disk I/O --- - - name: Set blockdev read-ahead - command: "/sbin/blockdev --setra 16384 {{ data_disk }}" - changed_when: false - - - name: Set I/O scheduler via grubby (RHEL/Rocky Linux) - command: grubby --update-kernel=ALL --args="elevator=mq-deadline" - ignore_errors: yes - changed_when: false - when: ansible_os_family == "RedHat" - - - name: Set I/O scheduler via grub (Ubuntu) - shell: | - if grep -q "elevator=mq-deadline" /etc/default/grub; then - echo "elevator already set, skipping" - else - sed -i 's/GRUB_CMDLINE_LINUX="\(.*\)"/GRUB_CMDLINE_LINUX="\1 elevator=mq-deadline"/' /etc/default/grub - update-grub - echo "elevator set" - fi - args: - executable: /bin/bash - when: ansible_os_family == "Debian" - register: grub_elevator_result - changed_when: "'elevator set' in grub_elevator_result.stdout" - - # --- THP --- - - name: Disable Transparent Huge Pages via grubby (RHEL/Rocky Linux) - command: grubby --update-kernel=ALL --args="transparent_hugepage=never" - ignore_errors: yes - changed_when: false - when: ansible_os_family == "RedHat" - - - name: Disable Transparent Huge Pages via grub (Ubuntu) - shell: | - if grep -q "transparent_hugepage=never" /etc/default/grub; then - echo "THP already disabled, skipping" - else - sed -i 's/GRUB_CMDLINE_LINUX="\(.*\)"/GRUB_CMDLINE_LINUX="\1 transparent_hugepage=never"/' /etc/default/grub - update-grub - echo "THP disabled" - fi - args: - executable: /bin/bash - when: ansible_os_family == "Debian" - register: grub_thp_result - changed_when: "'THP disabled' in grub_thp_result.stdout" - - # --- IPC --- - - name: Disable IPC object removal - lineinfile: - path: /etc/systemd/logind.conf - regexp: "^#?RemoveIPC=" - line: "RemoveIPC=no" - notify: restart systemd-logind - - # --- SSH threshold --- - - name: Set SSH MaxStartups - lineinfile: - path: /etc/ssh/sshd_config - regexp: "^#?MaxStartups" - line: "MaxStartups 10:30:200" - notify: restart sshd - - - name: Set SSH MaxSessions - lineinfile: - path: /etc/ssh/sshd_config - regexp: "^#?MaxSessions" - line: "MaxSessions 200" - notify: restart sshd - - # --- Clock sync --- - # RHEL/Rocky Linux uses chronyd; Ubuntu uses chrony (package name differs) - - name: Enable and start chronyd (RHEL/Rocky Linux) - systemd: - name: chronyd - state: started - enabled: yes - when: ansible_os_family == "RedHat" - - - name: Enable and start chrony (Ubuntu) - systemd: - name: chrony - state: started - enabled: yes - when: ansible_os_family == "Debian" - - # --- gpadmin user --- - - name: Create gpadmin group - group: - name: "{{ cloudberry_admin_user }}" - state: present - - - name: Create gpadmin user - user: - name: "{{ cloudberry_admin_user }}" - group: "{{ cloudberry_admin_user }}" - system: yes - create_home: yes - password: "{{ cloudberry_admin_password | password_hash('sha512') }}" - - - name: Add gpadmin to wheel group and enable NOPASSWD sudo (RHEL/Rocky Linux) - user: - name: "{{ cloudberry_admin_user }}" - groups: wheel - append: yes - when: ansible_os_family == "RedHat" - - - name: Ensure wheel group has NOPASSWD sudo (RHEL/Rocky Linux) - lineinfile: - path: /etc/sudoers - regexp: "^%wheel" - line: "%wheel ALL=(ALL) NOPASSWD: ALL" - validate: "visudo -cf %s" - when: ansible_os_family == "RedHat" - - - name: Add gpadmin to sudo group (Ubuntu) - user: - name: "{{ cloudberry_admin_user }}" - groups: sudo - append: yes - when: ansible_os_family == "Debian" - - - name: Create sudoers drop-in file for gpadmin (Ubuntu) - copy: - content: "gpadmin ALL=(ALL) NOPASSWD: ALL\n" - dest: /etc/sudoers.d/gpadmin - mode: "0440" - validate: "visudo -cf %s" - when: ansible_os_family == "Debian" - - - name: Set data directory ownership - file: - path: "{{ data_mount }}" - owner: "{{ cloudberry_admin_user }}" - group: "{{ cloudberry_admin_user }}" - recurse: yes - - handlers: - - name: restart systemd-logind - service: - name: systemd-logind - state: restarted - - - name: restart sshd - service: - name: sshd - state: restarted - -# --- Install Apache Cloudberry --- - name: Install Apache Cloudberry package hosts: cloudberry become: yes - tasks: - - name: Copy package to host - copy: - src: "{{ package_path }}" - dest: "/tmp/{{ package_path | basename }}" - - - name: Install package (RPM) - yum: - name: "/tmp/{{ package_path | basename }}" - state: present - disable_gpg_check: yes - when: ansible_os_family == "RedHat" - - - name: Install package (DEB) - apt: - deb: "/tmp/{{ package_path | basename }}" - state: present - when: ansible_os_family == "Debian" + roles: + - install_package - - name: Cleanup package file - file: - path: "/tmp/{{ package_path | basename }}" - state: absent - - - name: Set installation directory ownership - shell: chown -R {{ cloudberry_admin_user }}:{{ cloudberry_admin_user }} /usr/local/cloudberry* - changed_when: false - -# --- Configure SSH and initialize (coordinator only) --- -- name: Configure passwordless SSH and initialize cluster +- name: Configure SSH and create data directories hosts: coordinator become: yes - become_user: "{{ cloudberry_admin_user }}" - vars: - cloudberry_admin_user: "gpadmin" - tasks: - - name: Create .ssh directory for gpadmin - file: - path: "/home/{{ cloudberry_admin_user }}/.ssh" - state: directory - owner: "{{ cloudberry_admin_user }}" - group: "{{ cloudberry_admin_user }}" - mode: "0700" - - - name: Generate SSH key for gpadmin - openssh_keypair: - path: "/home/{{ cloudberry_admin_user }}/.ssh/id_rsa" - type: rsa - size: 4096 - owner: "{{ cloudberry_admin_user }}" - group: "{{ cloudberry_admin_user }}" - - - name: Fetch gpadmin public key from coordinator - slurp: - src: "/home/{{ cloudberry_admin_user }}/.ssh/id_rsa.pub" - register: gpadmin_pubkey - - - name: Distribute SSH public key to all hosts - authorized_key: - user: "{{ cloudberry_admin_user }}" - key: "{{ gpadmin_pubkey.content | b64decode }}" - delegate_to: "{{ item }}" - loop: "{{ groups['cloudberry'] }}" - - - name: Create hostfile_exkeys - copy: - content: "{{ groups['cloudberry'] | join('\n') }}\n" - dest: "/home/{{ cloudberry_admin_user }}/hostfile_exkeys" - - - name: Create hostfile_gpinitsystem - copy: - content: "{{ groups['segments'] | join('\n') }}\n" - dest: "/home/{{ cloudberry_admin_user }}/hostfile_gpinitsystem" - - - name: Scan and trust host keys for all cluster nodes - known_hosts: - name: "{{ item }}" - key: "{{ lookup('pipe', 'ssh-keyscan -H ' + item) }}" - state: present - path: "/home/{{ cloudberry_admin_user }}/.ssh/known_hosts" - loop: "{{ groups['cloudberry'] }}" - - - name: Run gpssh-exkeys - shell: | - source /usr/local/cloudberry-db/cloudberry-env.sh - gpssh-exkeys -f /home/{{ cloudberry_admin_user }}/hostfile_exkeys - args: - executable: /bin/bash - - - name: Create data root directory as root - file: - path: "{{ coordinator_data_dir | dirname }}" - state: directory - owner: "{{ cloudberry_admin_user }}" - group: "{{ cloudberry_admin_user }}" - mode: "0755" - become: yes - become_user: root - - - name: Create coordinator data directory - file: - path: "{{ coordinator_data_dir }}" - state: directory - owner: "{{ cloudberry_admin_user }}" - group: "{{ cloudberry_admin_user }}" - -- name: Create data directories on standby - hosts: standby - become: yes - tasks: - - name: Create data root directory on standby - file: - path: "{{ coordinator_data_dir | dirname }}" - state: directory - owner: "{{ cloudberry_admin_user }}" - group: "{{ cloudberry_admin_user }}" - mode: "0755" - - - name: Create coordinator data directory on standby - file: - path: "{{ coordinator_data_dir }}" - state: directory - owner: "{{ cloudberry_admin_user }}" - group: "{{ cloudberry_admin_user }}" - - - name: Create .ssh directory for gpadmin on standby - file: - path: "/home/{{ cloudberry_admin_user }}/.ssh" - state: directory - owner: "{{ cloudberry_admin_user }}" - group: "{{ cloudberry_admin_user }}" - mode: "0700" - -- name: Create data directories on segments - hosts: segments - become: yes - tasks: - - name: Create data root directory on segments - file: - path: "{{ primary_data_dir | dirname }}" - state: directory - owner: "{{ cloudberry_admin_user }}" - group: "{{ cloudberry_admin_user }}" - mode: "0755" - - - name: Create primary data directory - file: - path: "{{ primary_data_dir }}" - state: directory - owner: "{{ cloudberry_admin_user }}" - group: "{{ cloudberry_admin_user }}" - - - name: Create mirror data directory - file: - path: "{{ mirror_data_dir }}" - state: directory - owner: "{{ cloudberry_admin_user }}" - group: "{{ cloudberry_admin_user }}" + become_user: gpadmin + roles: + - ssh - name: Initialize Apache Cloudberry cluster hosts: coordinator become: yes - become_user: "{{ cloudberry_admin_user }}" + become_user: gpadmin vars: cloudberry_admin_user: "gpadmin" - tasks: - - name: Create gpconfigs directory - file: - path: "/home/{{ cloudberry_admin_user }}/gpconfigs" - state: directory - - - name: Create gpinitsystem config - copy: - content: | - SEG_PREFIX=gpseg - PORT_BASE={{ port_base }} - declare -a DATA_DIRECTORY=({% for i in range(segments_per_host) %}{{ primary_data_dir }} {% endfor %}) - COORDINATOR_HOSTNAME={{ coordinator_hostname }} - COORDINATOR_DIRECTORY={{ coordinator_data_dir }} - COORDINATOR_PORT={{ coordinator_port }} - TRUSTED_SHELL=ssh - CHECK_POINT_SEGMENTS=8 - ENCODING=UNICODE - MIRROR_PORT_BASE={{ mirror_port_base }} - declare -a MIRROR_DATA_DIRECTORY=({% for i in range(segments_per_host) %}{{ mirror_data_dir }} {% endfor %}) - DATABASE_NAME={{ database_name }} - dest: "/home/{{ cloudberry_admin_user }}/gpconfigs/gpinitsystem_config" - - - name: Check if cluster is already initialized - stat: - path: "{{ coordinator_data_dir }}/gpseg-1/PG_VERSION" - register: pgversion_stat - become: yes - become_user: root - - - name: Run gpinitsystem - shell: | - source /usr/local/cloudberry-db/cloudberry-env.sh - yes | gpinitsystem -c /home/{{ cloudberry_admin_user }}/gpconfigs/gpinitsystem_config \ - -h /home/{{ cloudberry_admin_user }}/hostfile_gpinitsystem \ - -s {{ standby_hostname }} --mirror-mode=spread - args: - executable: /bin/bash - register: gpinitsystem_result - failed_when: "'successfully created' not in gpinitsystem_result.stdout" - when: not pgversion_stat.stat.exists - - - name: Set environment variables in .bashrc - blockinfile: - path: "/home/{{ cloudberry_admin_user }}/.bashrc" - block: | - source /usr/local/cloudberry-db/cloudberry-env.sh - export COORDINATOR_DATA_DIRECTORY={{ coordinator_data_dir }}/gpseg-1 - export PGPORT={{ coordinator_port }} - export PGUSER={{ cloudberry_admin_user }} - export PGDATABASE={{ database_name }} - marker: "# {mark} CLOUDBERRY ENVIRONMENT" - - - name: Sync .bashrc to standby coordinator - shell: | - scp /home/{{ cloudberry_admin_user }}/.bashrc \ - {{ standby_hostname }}:/home/{{ cloudberry_admin_user }}/.bashrc - args: - executable: /bin/bash - - - name: Verify standby coordinator is synchronized - shell: | - source /usr/local/cloudberry-db/cloudberry-env.sh - source /home/{{ cloudberry_admin_user }}/.bashrc - gpstate -f - args: - executable: /bin/bash - register: gpstate_result - changed_when: false - failed_when: false - - - name: Initialize standby coordinator if not already active - shell: | - source /usr/local/cloudberry-db/cloudberry-env.sh - source /home/{{ cloudberry_admin_user }}/.bashrc - if ! gpstate -f 2>&1 | grep -q "Sync state: sync"; then - yes | gpinitstandby -s {{ standby_hostname }} - else - echo "Standby already synchronized, skipping gpinitstandby" - fi - args: - executable: /bin/bash - register: gpinitstandby_result - changed_when: "'skipping' not in gpinitstandby_result.stdout" - - - name: Display success message - debug: - msg: - - "==========================================" - - " Apache Cloudberry cluster initialized successfully!" - - "==========================================" - - " Connect to the database:" - - " su - gpadmin" - - " psql -d {{ database_name }}" - - "==========================================" - - - name: Check postgres --gp-version - command: /usr/local/cloudberry-db/bin/postgres --gp-version - register: gp_version - changed_when: false - - - name: Check postgres --version - command: /usr/local/cloudberry-db/bin/postgres --version - register: pg_version - changed_when: false - - - name: Show version info - debug: - msg: - - "{{ gp_version.stdout }}" - - "{{ pg_version.stdout }}" + roles: + - initialize From 9aa1bd055b8893b093b72a10ae71c3be0ac3d652 Mon Sep 17 00:00:00 2001 From: Dianjin Wang Date: Fri, 22 May 2026 15:32:45 +0800 Subject: [PATCH 5/7] Add build from source support --- devops/deploy/ansible/README.md | 57 +++++++++++++++---- devops/deploy/ansible/group_vars/all.yml | 2 + .../roles/install_from_source/tasks/main.yml | 37 ++++-------- 3 files changed, 59 insertions(+), 37 deletions(-) diff --git a/devops/deploy/ansible/README.md b/devops/deploy/ansible/README.md index 54afae5dbf9..87b97be48a6 100644 --- a/devops/deploy/ansible/README.md +++ b/devops/deploy/ansible/README.md @@ -23,8 +23,8 @@ This directory contains Ansible playbooks for deploying Apache Cloudberry on phy virtual machines. Two installation methods are supported: - **RPM/DEB package** (`site.yml`) — installs from a pre-built binary package -- **Source build** (`site-from-source.yml`) — downloads the official source tarball, - verifies signatures, and compiles from source +- **Source build** (`site-from-source.yml`) — compiles and installs from a source tarball + you provide ## Quick Start @@ -47,15 +47,52 @@ ansible-playbook site.yml -i inventory/hosts \ vi inventory/hosts vi group_vars/all.yml -# 2. Run the playbook (no package_path needed) +# 2. Run the playbook ansible-playbook site-from-source.yml -i inventory/hosts \ - -e source_path=/path/to/apache-cloudberry-2.1.0-incubating-src.tar.gz + -e source_path=/path/to/apache-cloudberry-2.1.0-incubating-src.tar.gz \ + -e xerces_path=/path/to/xerces-c-3.3.0.tar.gz # RHEL/Rocky only +``` + +Download and verify the following before running: +- Cloudberry source: [Apache Cloudberry Releases](https://cloudberry.apache.org/releases) +- Xerces-C source (RHEL/Rocky only): [Apache Xerces-C Downloads](https://xerces.apache.org/xerces-c/download.cgi) + +Ubuntu uses the system `libxerces-c-dev` package, so `xerces_path` is not needed on Ubuntu. + +#### Downloading Cloudberry source + +Download the source tarball and verify its integrity: + +```bash +# Download source tarball +curl -L -o apache-cloudberry-2.1.0-incubating-src.tar.gz \ + "https://www.apache.org/dyn/closer.lua/incubator/cloudberry/2.1.0-incubating/apache-cloudberry-2.1.0-incubating-src.tar.gz?action=download" + +# Download checksum and signature files +curl -O https://downloads.apache.org/incubator/cloudberry/2.1.0-incubating/apache-cloudberry-2.1.0-incubating-src.tar.gz.sha512 +curl -O https://downloads.apache.org/incubator/cloudberry/2.1.0-incubating/apache-cloudberry-2.1.0-incubating-src.tar.gz.asc + +# Verify SHA512 checksum +sha512sum -c apache-cloudberry-2.1.0-incubating-src.tar.gz.sha512 + +# Verify GPG signature (optional but recommended) +curl https://downloads.apache.org/incubator/cloudberry/KEYS | gpg --import +gpg --verify apache-cloudberry-2.1.0-incubating-src.tar.gz.asc \ + apache-cloudberry-2.1.0-incubating-src.tar.gz ``` -Download and verify the source tarball from -[Apache Cloudberry Releases](https://cloudberry.apache.org/releases) before running. -The playbook handles build dependency installation, Xerces-C compilation (RHEL/Rocky only), -and source compilation on each host. +#### Downloading Xerces-C (RHEL/Rocky Linux only) + +The playbook requires Xerces-C 3.3.0. Download and verify: + +```bash +# Download source tarball +wget https://dlcdn.apache.org/xerces/c/3/sources/xerces-c-3.3.0.tar.gz + +# Verify SHA256 checksum +echo "$(curl -sL https://dlcdn.apache.org/xerces/c/3/sources/xerces-c-3.3.0.tar.gz.sha256)" \ + | sha256sum -c - +``` ## Cluster Layout (default) @@ -152,7 +189,7 @@ and cluster initialization steps. Only the installation role differs. | Variable | Default | Description | |----------|---------|-------------| -| `cloudberry_version` | `2.1.0` | Version used for source download URL | +| `cloudberry_version` | `2.1.0` | Cloudberry version (informational) | | `cloudberry_admin_user` | `gpadmin` | Admin OS user | | `cloudberry_admin_password` | `changeme` | Password for gpadmin | | `data_disk` | `/dev/sdb` | Data disk device (run `lsblk` to verify) | @@ -160,7 +197,7 @@ and cluster initialization steps. Only the installation role differs. | `segments_per_host` | `2` | Primary segment instances per host | | `coordinator_port` | `5432` | Coordinator port | | `database_name` | `warehouse` | Default database created at init | -| `xerces_version` | `3.3.0` | Xerces-C version (source build only) | +| `xerces_version` | `3.3.0` | Xerces-C version (source build, RHEL/Rocky only) | ## After Deployment diff --git a/devops/deploy/ansible/group_vars/all.yml b/devops/deploy/ansible/group_vars/all.yml index 0f4c51ba360..8a806217622 100644 --- a/devops/deploy/ansible/group_vars/all.yml +++ b/devops/deploy/ansible/group_vars/all.yml @@ -28,6 +28,8 @@ cloudberry_version: "2.1.0" # Apache Xerces-C version (required for source build on RHEL/Rocky Linux) # Ubuntu uses the system libxerces-c-dev package instead. +# Download from https://xerces.apache.org/xerces-c/download.cgi and pass via: +# -e xerces_path=/path/to/xerces-c-3.3.0.tar.gz xerces_version: "3.3.0" # Admin user diff --git a/devops/deploy/ansible/roles/install_from_source/tasks/main.yml b/devops/deploy/ansible/roles/install_from_source/tasks/main.yml index 04b74396553..dba89f9730e 100644 --- a/devops/deploy/ansible/roles/install_from_source/tasks/main.yml +++ b/devops/deploy/ansible/roles/install_from_source/tasks/main.yml @@ -184,6 +184,8 @@ # --- Install Apache Xerces-C (RHEL/Rocky Linux only) --- # Ubuntu uses the system libxerces-c-dev package installed above. +# Download the Xerces-C source tarball from https://xerces.apache.org/xerces-c/download.cgi +# and pass the local path via: -e xerces_path=/path/to/xerces-c-3.3.0.tar.gz - name: Check if Xerces-C is already installed (RHEL/Rocky Linux) stat: @@ -191,45 +193,26 @@ register: xerces_installed when: ansible_os_family == "RedHat" -- name: Download Xerces-C source (RHEL/Rocky Linux) - get_url: - url: "https://dlcdn.apache.org//xerces/c/3/sources/xerces-c-{{ xerces_version }}.tar.gz" - dest: "/tmp/xerces-c-{{ xerces_version }}.tar.gz" - mode: "0644" - when: - - ansible_os_family == "RedHat" - - not xerces_installed.stat.exists - -- name: Download Xerces-C SHA256 checksum (RHEL/Rocky Linux) - get_url: - url: "https://dlcdn.apache.org//xerces/c/3/sources/xerces-c-{{ xerces_version }}.tar.gz.sha256" - dest: "/tmp/xerces-c-{{ xerces_version }}.tar.gz.sha256" +- name: Copy Xerces-C source tarball to host (RHEL/Rocky Linux) + copy: + src: "{{ xerces_path }}" + dest: "/tmp/{{ xerces_path | basename }}" mode: "0644" when: - ansible_os_family == "RedHat" - not xerces_installed.stat.exists -- name: Verify Xerces-C checksum (RHEL/Rocky Linux) - shell: | - cd /tmp - sha256sum -c xerces-c-{{ xerces_version }}.tar.gz.sha256 - args: - executable: /bin/bash - when: - - ansible_os_family == "RedHat" - - not xerces_installed.stat.exists - changed_when: false - - name: Build and install Xerces-C (RHEL/Rocky Linux) shell: | cd /tmp - tar xf xerces-c-{{ xerces_version }}.tar.gz - cd xerces-c-{{ xerces_version }} + tar xf {{ xerces_path | basename }} + extracted=$(tar -tzf {{ xerces_path | basename }} | head -1 | cut -d/ -f1) + cd ${extracted} ./configure --prefix=/usr/local/xerces-c-{{ xerces_version }} make -j$(nproc) sudo make install sudo ln -sf /usr/local/xerces-c-{{ xerces_version }} /usr/local/xerces-c - cd /tmp && rm -rf xerces-c-{{ xerces_version }} xerces-c-{{ xerces_version }}.tar.gz + cd /tmp && rm -rf ${extracted} {{ xerces_path | basename }} args: executable: /bin/bash when: From 0bb222cb3e1631d3c8945f2fc2dc956fe2852814 Mon Sep 17 00:00:00 2001 From: Dianjin Wang Date: Fri, 22 May 2026 18:12:49 +0800 Subject: [PATCH 6/7] update --- .../roles/install_from_source/tasks/main.yml | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/devops/deploy/ansible/roles/install_from_source/tasks/main.yml b/devops/deploy/ansible/roles/install_from_source/tasks/main.yml index dba89f9730e..756b5fa3352 100644 --- a/devops/deploy/ansible/roles/install_from_source/tasks/main.yml +++ b/devops/deploy/ansible/roles/install_from_source/tasks/main.yml @@ -23,6 +23,14 @@ --- # --- Install build dependencies --- +- name: Install EPEL repository (RHEL/Rocky Linux 8) + dnf: + name: epel-release + state: present + when: + - ansible_os_family == "RedHat" + - ansible_distribution_major_version == "8" + - name: Install build dependencies (RHEL/Rocky Linux 8) dnf: name: @@ -58,6 +66,7 @@ - python3-pip - readline-devel - rsync + - the_silver_searcher - wget - which - zlib-devel @@ -80,6 +89,14 @@ - ansible_os_family == "RedHat" - ansible_distribution_major_version == "8" +- name: Install EPEL repository (RHEL/Rocky Linux 9+) + dnf: + name: epel-release + state: present + when: + - ansible_os_family == "RedHat" + - ansible_distribution_major_version | int >= 9 + - name: Install build dependencies (RHEL/Rocky Linux 9+) dnf: name: @@ -115,6 +132,7 @@ - python3-pip - readline-devel - rsync + - the_silver_searcher - wget - which - zlib-devel From 7e0ac63634aefae79c2e280f8445e602416cd815 Mon Sep 17 00:00:00 2001 From: Dianjin Wang Date: Fri, 22 May 2026 18:51:37 +0800 Subject: [PATCH 7/7] fix bugs --- .../ansible/roles/initialize/tasks/main.yml | 20 ++++++++++ .../roles/install_from_source/tasks/main.yml | 40 ++++++++++++++++--- 2 files changed, 54 insertions(+), 6 deletions(-) diff --git a/devops/deploy/ansible/roles/initialize/tasks/main.yml b/devops/deploy/ansible/roles/initialize/tasks/main.yml index 05698a12f24..570cd3e6cb3 100644 --- a/devops/deploy/ansible/roles/initialize/tasks/main.yml +++ b/devops/deploy/ansible/roles/initialize/tasks/main.yml @@ -75,6 +75,25 @@ args: executable: /bin/bash +- name: Wait for cluster to be fully ready + shell: | + source /usr/local/cloudberry-db/cloudberry-env.sh + source /home/{{ cloudberry_admin_user }}/.bashrc + for i in $(seq 1 12); do + if gpstate -s 2>&1 | grep -q "All segments are running normally"; then + echo "Cluster is ready" + exit 0 + fi + echo "Waiting for cluster... attempt $i/12" + sleep 10 + done + echo "Cluster ready check timed out, proceeding anyway" + exit 0 + args: + executable: /bin/bash + changed_when: false + when: not pgversion_stat.stat.exists + - name: Verify standby coordinator is synchronized shell: | source /usr/local/cloudberry-db/cloudberry-env.sh @@ -99,6 +118,7 @@ executable: /bin/bash register: gpinitstandby_result changed_when: "'skipping' not in gpinitstandby_result.stdout" + when: not pgversion_stat.stat.exists - name: Display success message debug: diff --git a/devops/deploy/ansible/roles/install_from_source/tasks/main.yml b/devops/deploy/ansible/roles/install_from_source/tasks/main.yml index 756b5fa3352..700d9cde75e 100644 --- a/devops/deploy/ansible/roles/install_from_source/tasks/main.yml +++ b/devops/deploy/ansible/roles/install_from_source/tasks/main.yml @@ -294,15 +294,43 @@ executable: /bin/bash become_user: "{{ cloudberry_admin_user }}" -- name: Build and install Cloudberry - shell: | - export SRC_DIR=/home/{{ cloudberry_admin_user }}/cloudberry - export BUILD_DESTINATION=/usr/local/cloudberry-db - cd ${SRC_DIR} - ./devops/build/automation/cloudberry/scripts/build-cloudberry.sh +- name: Build Cloudberry main + shell: make -j$(nproc) --directory ${SRC_DIR} + args: + executable: /bin/bash + become_user: "{{ cloudberry_admin_user }}" + environment: + SRC_DIR: "/home/{{ cloudberry_admin_user }}/cloudberry" + BUILD_DESTINATION: "/usr/local/cloudberry-db" + LD_LIBRARY_PATH: "/usr/local/cloudberry-db/lib" + +- name: Build Cloudberry contrib + shell: make -j$(nproc) --directory ${SRC_DIR}/contrib + args: + executable: /bin/bash + become_user: "{{ cloudberry_admin_user }}" + environment: + SRC_DIR: "/home/{{ cloudberry_admin_user }}/cloudberry" + BUILD_DESTINATION: "/usr/local/cloudberry-db" + LD_LIBRARY_PATH: "/usr/local/cloudberry-db/lib" + +- name: Install Cloudberry main + shell: make install --directory ${SRC_DIR} + args: + executable: /bin/bash + become_user: "{{ cloudberry_admin_user }}" + environment: + SRC_DIR: "/home/{{ cloudberry_admin_user }}/cloudberry" + BUILD_DESTINATION: "/usr/local/cloudberry-db" + +- name: Install Cloudberry contrib + shell: make install --directory ${SRC_DIR}/contrib args: executable: /bin/bash become_user: "{{ cloudberry_admin_user }}" + environment: + SRC_DIR: "/home/{{ cloudberry_admin_user }}/cloudberry" + BUILD_DESTINATION: "/usr/local/cloudberry-db" - name: Set installation directory ownership shell: chown -R {{ cloudberry_admin_user }}:{{ cloudberry_admin_user }} /usr/local/cloudberry*