Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions deploy/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,18 @@ patch-nodes:
get-tnf-logs:
@./openshift-clusters/scripts/get-tnf-logs.sh

bm-init:
@./openshift-clusters/scripts/bm-init.sh

bm-fencing-agent:
@./openshift-clusters/scripts/deploy-cluster.sh --topology fencing --method agent

bm-arbiter-agent:
@./openshift-clusters/scripts/deploy-cluster.sh --topology arbiter --method agent

bm-ssh:
@./openshift-clusters/scripts/bm-ssh.sh

help:
@echo "Available commands:"
@echo ""
Expand Down Expand Up @@ -118,4 +130,10 @@ help:
@echo ""
@echo "Cluster Utilities:"
@echo " get-tnf-logs - Collect pacemaker and etcd logs from cluster nodes"
@echo ""
@echo "Bare Metal Deployment:"
@echo " bm-init - Initialize a bare metal host for cluster deployment (interactive)"
@echo " bm-fencing-agent - Deploy fencing cluster on bare metal (agent-based install)"
@echo " bm-arbiter-agent - Deploy arbiter cluster on bare metal (agent-based install)"
@echo " bm-ssh - SSH into the bare metal host"

66 changes: 66 additions & 0 deletions deploy/openshift-clusters/scripts/bm-init.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#!/bin/bash
#
# Initialize a bare metal host for two-node cluster deployment.
# Usage: bm-init.sh [user@host]
#

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
DEPLOY_DIR="$(cd "${SCRIPT_DIR}/../.." && pwd)"

set -o nounset
set -o errexit
set -o pipefail

INSTANCE_DATA_DIR="${DEPLOY_DIR}/aws-hypervisor/instance-data"
INVENTORY_FILE="${DEPLOY_DIR}/openshift-clusters/inventory.ini"

HOST_TARGET="${1:-}"

if [[ -z "${HOST_TARGET}" ]]; then
read -rp "Enter bare metal host (user@host or IP): " HOST_TARGET
fi

if [[ -z "${HOST_TARGET}" ]]; then
echo "Error: No host specified."
exit 1
fi

# If only an IP/hostname was given, default to root@
if [[ "${HOST_TARGET}" != *@* ]]; then
HOST_TARGET="root@${HOST_TARGET}"
fi

echo "Initializing bare metal host: ${HOST_TARGET}"

# Generate inventory file
echo "Generating inventory.ini..."
cat > "${INVENTORY_FILE}" <<EOF
[metal_machine]
${HOST_TARGET} ansible_ssh_extra_args='-o ServerAliveInterval=30 -o ServerAliveCountMax=120'

[metal_machine:vars]
ansible_become_password=""
EOF

# Run the init-host playbook
echo "Running init-host.yml playbook..."
cd "${DEPLOY_DIR}/openshift-clusters"

if ansible-playbook init-host.yml -i inventory.ini; then
echo ""
echo "Host initialization completed successfully!"

# Write the bare metal marker file
mkdir -p "${INSTANCE_DATA_DIR}"
echo "${HOST_TARGET}" > "${INSTANCE_DATA_DIR}/bare-metal-host"

Comment on lines +53 to +56

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear stale AWS marker when persisting bare-metal state.

At Line 55, bare-metal-host is written, but an existing aws-instance-id is left intact. Since check_instance() prioritizes AWS markers, later scripts can branch to the wrong mode when both files exist.

Suggested patch
     # Write the bare metal marker file
     mkdir -p "${INSTANCE_DATA_DIR}"
+    rm -f "${INSTANCE_DATA_DIR}/aws-instance-id"
     echo "${HOST_TARGET}" > "${INSTANCE_DATA_DIR}/bare-metal-host"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Write the bare metal marker file
mkdir -p "${INSTANCE_DATA_DIR}"
echo "${HOST_TARGET}" > "${INSTANCE_DATA_DIR}/bare-metal-host"
# Write the bare metal marker file
mkdir -p "${INSTANCE_DATA_DIR}"
rm -f "${INSTANCE_DATA_DIR}/aws-instance-id"
echo "${HOST_TARGET}" > "${INSTANCE_DATA_DIR}/bare-metal-host"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/scripts/bm-init.sh` around lines 53 - 56, When
persisting bare-metal state in the block that writes
"${INSTANCE_DATA_DIR}/bare-metal-host" (using INSTANCE_DATA_DIR and
HOST_TARGET), ensure any stale AWS marker is removed so check_instance() does
not mis-detect AWS mode; explicitly remove the aws-instance-id file (e.g., rm -f
"${INSTANCE_DATA_DIR}/aws-instance-id" or equivalent) before or after writing
bare-metal-host so only the bare-metal marker remains.

echo ""
echo "Next steps:"
echo " Deploy a cluster:"
echo " make bm-fencing-agent"
echo " make bm-arbiter-agent"
else
echo "Error: Host initialization failed!"
echo "Check the Ansible logs for more details."
exit 1
fi
22 changes: 22 additions & 0 deletions deploy/openshift-clusters/scripts/bm-ssh.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash
#
# SSH to a bare metal host initialized with bm-init.sh
#

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
DEPLOY_DIR="$(cd "${SCRIPT_DIR}/../.." && pwd)"

set -o nounset
set -o errexit
set -o pipefail

MARKER_FILE="${DEPLOY_DIR}/aws-hypervisor/instance-data/bare-metal-host"

if [[ ! -f "${MARKER_FILE}" ]]; then
echo "Error: No bare metal host found. Run 'make bm-init' first."
exit 1
fi

HOST_TARGET=$(cat "${MARKER_FILE}")
echo "Connecting to bare metal host: ${HOST_TARGET}"
ssh "${HOST_TARGET}"
11 changes: 7 additions & 4 deletions deploy/openshift-clusters/scripts/clean.sh
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
#!/bin/bash

# Get the directory where this script is located
SCRIPT_DIR=$(dirname "$0")
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# Get the deploy directory (two levels up from scripts)
DEPLOY_DIR="$(cd "${SCRIPT_DIR}/../.." && pwd)"

set -o nounset
set -o errexit
set -o pipefail

# Check if instance data exists
if [[ ! -f "${DEPLOY_DIR}/aws-hypervisor/instance-data/aws-instance-id" ]]; then
echo "Error: No instance found. Please run 'make deploy' first."
# Source shared helpers
# shellcheck source=common.sh
source "${SCRIPT_DIR}/common.sh"

# Check if instance data exists (EC2 or bare metal)
if ! check_instance "${DEPLOY_DIR}" >/dev/null; then
exit 1
fi

Expand Down
57 changes: 57 additions & 0 deletions deploy/openshift-clusters/scripts/common.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#!/bin/bash
#
# Shared helper functions for cluster management scripts.
# Supports both AWS EC2 and bare metal deployments.
#

# Detect the instance type (aws or baremetal) by checking for marker files.
# Prints the type to stdout and returns 0, or prints an error and returns 1.
check_instance() {
local deploy_dir="$1"

if [[ -f "${deploy_dir}/aws-hypervisor/instance-data/aws-instance-id" ]]; then
echo "aws"
return 0
elif [[ -f "${deploy_dir}/aws-hypervisor/instance-data/bare-metal-host" ]]; then
echo "baremetal"
return 0
fi

echo "Error: No instance found. Run 'make deploy' (EC2) or 'make bm-init' (bare metal) first." >&2
return 1
}

# Get a display identifier for the current instance (instance ID or hostname).
get_instance_display() {
local deploy_dir="$1"
local instance_type="$2"

case "${instance_type}" in
aws)
cat "${deploy_dir}/aws-hypervisor/instance-data/aws-instance-id"
;;
baremetal)
cat "${deploy_dir}/aws-hypervisor/instance-data/bare-metal-host"
;;
esac
}

# Get the SSH user and host for connecting to the hypervisor.
get_ssh_target() {
local deploy_dir="$1"
local instance_type="$2"

case "${instance_type}" in
aws)
local ssh_user host_ip
ssh_user=$(cat "${deploy_dir}/aws-hypervisor/instance-data/ssh_user")
host_ip=$(cat "${deploy_dir}/aws-hypervisor/instance-data/public_address")
echo "${ssh_user}@${host_ip}"
;;
baremetal)
# Read the first metal_machine host from the inventory
grep -A1 '^\[metal_machine\]' "${deploy_dir}/openshift-clusters/inventory.ini" \
| tail -1 | awk '{print $1}'
Comment on lines +51 to +54

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't derive the bare-metal SSH target from the inventory alias alone.

This branch returns the first field under [metal_machine], which is often just the Ansible host alias. If the inventory uses ansible_host= and/or ansible_user=, downstream commands will SSH to the wrong host or user.

💡 Suggested fix
         baremetal)
-            # Read the first metal_machine host from the inventory
-            grep -A1 '^\[metal_machine\]' "${deploy_dir}/openshift-clusters/inventory.ini" \
-                | tail -1 | awk '{print $1}'
+            awk '
+                /^\[metal_machine\]$/ { in_section=1; next }
+                /^\[/ { in_section=0 }
+                in_section && NF {
+                    host=$1; user=""; addr=""
+                    for (i=2; i<=NF; i++) {
+                        if ($i ~ /^ansible_user=/) { sub(/^ansible_user=/, "", $i); user=$i }
+                        if ($i ~ /^ansible_host=/) { sub(/^ansible_host=/, "", $i); addr=$i }
+                    }
+                    if (addr == "") addr=host
+                    print (user ? user "@" : "") addr
+                    exit
+                }
+            ' "${deploy_dir}/openshift-clusters/inventory.ini"
             ;;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
baremetal)
# Read the first metal_machine host from the inventory
grep -A1 '^\[metal_machine\]' "${deploy_dir}/openshift-clusters/inventory.ini" \
| tail -1 | awk '{print $1}'
baremetal)
awk '
/^\[metal_machine\]$/ { in_section=1; next }
/^\[/ { in_section=0 }
in_section && NF {
host=$1; user=""; addr=""
for (i=2; i<=NF; i++) {
if ($i ~ /^ansible_user=/) { sub(/^ansible_user=/, "", $i); user=$i }
if ($i ~ /^ansible_host=/) { sub(/^ansible_host=/, "", $i); addr=$i }
}
if (addr == "") addr=host
print (user ? user "@" : "") addr
exit
}
' "${deploy_dir}/openshift-clusters/inventory.ini"
;;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/scripts/common.sh` around lines 51 - 54, The
baremetal) case currently returns the first whitespace field under
[metal_machine], which may be just an Ansible alias; update the logic to read
the full inventory line for the first host under [metal_machine] from
"${deploy_dir}/openshift-clusters/inventory.ini", parse any ansible_host and
ansible_user key=value pairs (falling back to the alias for host and to the
current user if ansible_user is absent), and output a proper SSH target in the
form user@host (or host if no user). Modify the baremetal) branch to detect and
prefer ansible_host and ansible_user when present, and only use the alias as a
fallback so downstream SSH commands connect to the correct host/user.

;;
esac
}
9 changes: 6 additions & 3 deletions deploy/openshift-clusters/scripts/deploy-cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,12 @@ if [[ "${METHOD}" != "ipi" && "${METHOD}" != "agent" && "${METHOD}" != "kcli" ]]
exit 1
fi

# Check if instance data exists
if [[ ! -f "${DEPLOY_DIR}/aws-hypervisor/instance-data/aws-instance-id" ]]; then
echo "Error: No instance found. Please run 'make deploy' first."
# Source shared helpers
# shellcheck source=common.sh
source "${SCRIPT_DIR}/common.sh"

# Check if instance data exists (EC2 or bare metal)
if ! check_instance "${DEPLOY_DIR}" >/dev/null; then
exit 1
Comment on lines +74 to 76

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject non-agent methods on bare metal.

This new gate lets bare-metal runs through for every method, but this PR only adds bare-metal support for agent-based installs. --method ipi and --method kcli should fail here instead of reaching incompatible playbooks.

💡 Suggested fix
-# Check if instance data exists (EC2 or bare metal)
-if ! check_instance "${DEPLOY_DIR}" >/dev/null; then
-    exit 1
-fi
+# Check if instance data exists (EC2 or bare metal)
+INSTANCE_TYPE=$(check_instance "${DEPLOY_DIR}") || exit 1
+
+if [[ "${INSTANCE_TYPE}" == "baremetal" && "${METHOD}" != "agent" ]]; then
+    echo "Error: bare metal deployments only support --method agent"
+    exit 1
+fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/scripts/deploy-cluster.sh` around lines 74 - 76,
Detect when the deployment is bare-metal after calling check_instance
"${DEPLOY_DIR}" and, if so, verify the install method (e.g., the variable used
for CLI flag --method, such as METHOD or INSTALL_METHOD); if the method is not
the agent-based option, immediately exit with an error and message rejecting
non-agent methods (specifically block "--method ipi" and "--method kcli" for
bare-metal). Implement this gate in deploy-cluster.sh right after the
check_instance call so incompatible playbooks never run by consulting the
existing method variable and returning non-zero when a bare-metal + non-agent
combination is found.

fi

Expand Down
11 changes: 7 additions & 4 deletions deploy/openshift-clusters/scripts/full-clean.sh
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
#!/bin/bash

# Get the directory where this script is located
SCRIPT_DIR=$(dirname "$0")
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# Get the deploy directory (two levels up from scripts)
DEPLOY_DIR="$(cd "${SCRIPT_DIR}/../.." && pwd)"

set -o nounset
set -o errexit
set -o pipefail

# Check if instance data exists
if [[ ! -f "${DEPLOY_DIR}/aws-hypervisor/instance-data/aws-instance-id" ]]; then
echo "Error: No instance found. Please run 'make deploy' first."
# Source shared helpers
# shellcheck source=common.sh
source "${SCRIPT_DIR}/common.sh"

# Check if instance data exists (EC2 or bare metal)
if ! check_instance "${DEPLOY_DIR}" >/dev/null; then
exit 1
fi

Expand Down
11 changes: 7 additions & 4 deletions deploy/openshift-clusters/scripts/patch-nodes.sh
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
#!/bin/bash

# Get the directory where this script is located
SCRIPT_DIR=$(dirname "$0")
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# Get the deploy directory (two levels up from scripts)
DEPLOY_DIR="$(cd "${SCRIPT_DIR}/../.." && pwd)"

set -o nounset
set -o errexit
set -o pipefail

# Check if instance data exists
if [[ ! -f "${DEPLOY_DIR}/aws-hypervisor/instance-data/aws-instance-id" ]]; then
echo "Error: No instance found. Please run 'make deploy' first."
# Source shared helpers
# shellcheck source=common.sh
source "${SCRIPT_DIR}/common.sh"

# Check if instance data exists (EC2 or bare metal)
if ! check_instance "${DEPLOY_DIR}" >/dev/null; then
exit 1
fi

Expand Down
65 changes: 32 additions & 33 deletions deploy/openshift-clusters/scripts/redeploy-cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,12 @@
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
DEPLOY_DIR="$(cd "${SCRIPT_DIR}/../../" && pwd)"

# Source the instance.env file with absolute path
# shellcheck source=/dev/null
source "${DEPLOY_DIR}/aws-hypervisor/instance.env"
# Source shared helpers
# shellcheck source=common.sh
source "${SCRIPT_DIR}/common.sh"

# Resolve SHARED_DIR to absolute path if it's relative
if [[ "${SHARED_DIR}" != /* ]]; then
export SHARED_DIR="${DEPLOY_DIR}/aws-hypervisor/${SHARED_DIR}"
fi
# Set SHARED_DIR (used by check_vm_infrastructure_change)
export SHARED_DIR="${DEPLOY_DIR}/aws-hypervisor/instance-data"

set -o nounset
set -o errexit
Expand All @@ -26,7 +24,7 @@ check_vm_infrastructure_change() {
local previous_installation_method=""
local previous_status=""

echo "Checking VM infrastructure requirements for instance ${INSTANCE_ID}..."
echo "Checking VM infrastructure requirements for ${INSTANCE_DISPLAY}..."

# Read previous state if exists
if [[ -f "$state_file" ]]; then
Expand All @@ -49,7 +47,7 @@ check_vm_infrastructure_change() {
export current_installation_method="IPI"
fi

echo "Instance: ${INSTANCE_ID}"
echo "Instance: ${INSTANCE_DISPLAY}"
echo "Previous cluster config: ${previous_topology:-none}/${previous_installation_method:-none} (status: ${previous_status:-unknown})"
echo "Current cluster config: ${current_topology}/${current_installation_method}"

Expand Down Expand Up @@ -104,37 +102,38 @@ check_vm_infrastructure_change() {

# Note: Cluster state is now managed by the Ansible playbook

# Check if the instance exists and get its ID
if [[ ! -f "${SHARED_DIR}/aws-instance-id" ]]; then
echo "Error: No instance found. Please run 'make deploy' first."
exit 1
fi
# Detect instance type
INSTANCE_TYPE=$(check_instance "${DEPLOY_DIR}") || exit 1
INSTANCE_DISPLAY=$(get_instance_display "${DEPLOY_DIR}" "${INSTANCE_TYPE}")

INSTANCE_ID=$(cat "${SHARED_DIR}/aws-instance-id")
echo "Redeploying OpenShift cluster on instance ${INSTANCE_ID}..."
echo "Redeploying OpenShift cluster on ${INSTANCE_DISPLAY}..."

# Check current instance state
INSTANCE_STATE=$(aws --region "${REGION}" ec2 describe-instances --instance-ids "${INSTANCE_ID}" --query 'Reservations[0].Instances[0].State.Name' --output text --no-cli-pager)
if [[ "${INSTANCE_TYPE}" == "aws" ]]; then
# shellcheck source=/dev/null
source "${DEPLOY_DIR}/aws-hypervisor/instance.env"

if [[ "${INSTANCE_STATE}" != "running" ]]; then
echo "Error: Instance is not running (state: ${INSTANCE_STATE})"
echo "Cannot redeploy cluster on a stopped instance."
exit 1
fi
INSTANCE_ID="${INSTANCE_DISPLAY}"
INSTANCE_STATE=$(aws --region "${REGION}" ec2 describe-instances --instance-ids "${INSTANCE_ID}" --query 'Reservations[0].Instances[0].State.Name' --output text --no-cli-pager)

# Get the instance IP
HOST_PUBLIC_IP=$(aws --region "${REGION}" ec2 describe-instances --instance-ids "${INSTANCE_ID}" --query 'Reservations[0].Instances[0].PublicIpAddress' --output text --no-cli-pager)
if [[ "${INSTANCE_STATE}" != "running" ]]; then
echo "Error: Instance is not running (state: ${INSTANCE_STATE})"
echo "Cannot redeploy cluster on a stopped instance."
exit 1
fi

if [[ "${HOST_PUBLIC_IP}" == "null" || "${HOST_PUBLIC_IP}" == "" ]]; then
echo "Error: Could not determine instance public IP"
exit 1
fi
HOST_PUBLIC_IP=$(aws --region "${REGION}" ec2 describe-instances --instance-ids "${INSTANCE_ID}" --query 'Reservations[0].Instances[0].PublicIpAddress' --output text --no-cli-pager)

echo "Connecting to instance at ${HOST_PUBLIC_IP}..."
if [[ "${HOST_PUBLIC_IP}" == "null" || "${HOST_PUBLIC_IP}" == "" ]]; then
echo "Error: Could not determine instance public IP"
exit 1
fi

# Update SSH config
echo "Updating SSH config for aws-hypervisor..."
(cd "${DEPLOY_DIR}/aws-hypervisor" && go run main.go -k aws-hypervisor -h "$HOST_PUBLIC_IP")
echo "Connecting to instance at ${HOST_PUBLIC_IP}..."

# Update SSH config
echo "Updating SSH config for aws-hypervisor..."
(cd "${DEPLOY_DIR}/aws-hypervisor" && go run main.go -k aws-hypervisor -h "$HOST_PUBLIC_IP")
fi

# Interactive mode selection
echo ""
Expand Down
Loading