AGENT-580: Add agent gather bootstrap command#10666
Conversation
|
@zaneb: This pull request references AGENT-580 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a gather subcommand for agent bootstrap data collection, propagates a shared gather ID into remote archive naming, removes SSH-key lookup from rendezvous and cluster setup, and switches ignition to the bootstrap SSH key pair. ChangesAgent Gather and SSH Key Flow
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (13 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/openshift-install/agent/gather.go`:
- Around line 36-51: The Run function in newAgentGatherCmd uses logrus.Fatal
after runAgentGatherCmd fails, which exits immediately and bypasses the deferred
cleanup() from SetupFileHook. Change this command to use RunE (or otherwise
return the error) so the defer always executes, and let the caller handle
logging or exit behavior instead of calling Fatal inside the command.
In `@data/data/agent/files/usr/local/bin/agent-gather`:
- Line 146: The default GATHER_ID format in agent-gather now includes %Z, which
can introduce timezone symbols into generated filenames. Update the GATHER_ID
assignment to match the previous timestamp-only format used by
OUTPUT_FILE/bs_gather naming, and keep the change localized to the GATHER_ID
generation logic so manual/default invocations remain consistent.
In `@pkg/agent/gather.go`:
- Around line 18-47: The SSH gather flow in PullAgentGatherArchive can block
indefinitely because the client connection, command execution, and file transfer
have no deadline. Add a timeout to the rendezvous host path by threading a
context or explicit timeout into gatherssh.NewClient, gatherssh.Run, and
gatherssh.PullFileTo, and ensure the timeout is enforced for the
ssh.Dial/session/SFTP operations. Keep the change localized to
PullAgentGatherArchive and the gatherssh helpers it calls so all three stages
fail fast on stalled hosts or networks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 121f2fea-1fbc-484c-bd86-ee97a73c4d44
📒 Files selected for processing (4)
cmd/openshift-install/agent.gocmd/openshift-install/agent/gather.godata/data/agent/files/usr/local/bin/agent-gatherpkg/agent/gather.go
| func PullAgentGatherArchive(rendezvousIP string, sshKeys []string, directory, gatherID string) (string, error) { | ||
| logrus.Info("Pulling agent-gather data from the rendezvous host") | ||
|
|
||
| address := net.JoinHostPort(rendezvousIP, strconv.Itoa(22)) | ||
| client, err := gatherssh.NewClient("core", address, sshKeys) | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to SSH to rendezvous host %s: %w", rendezvousIP, err) | ||
| } | ||
|
|
||
| // Run agent-gather with -i so it writes to a predictable path | ||
| cmd := fmt.Sprintf("sudo /usr/local/bin/agent-gather -i %s", gatherID) | ||
| if err := gatherssh.Run(client, cmd); err != nil { | ||
| return "", fmt.Errorf("failed to run agent-gather on rendezvous host: %w", err) | ||
| } | ||
|
|
||
| archiveName := fmt.Sprintf("agent-gather-%s.tar.xz", gatherID) | ||
| remoteFile := path.Join("/home/core", archiveName) | ||
| localFile := filepath.Join(directory, archiveName) | ||
| if err := gatherssh.PullFileTo(client, remoteFile, localFile); err != nil { | ||
| return "", fmt.Errorf("failed to pull agent-gather archive: %w", err) | ||
| } | ||
|
|
||
| absPath, err := filepath.Abs(localFile) | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to get absolute path: %w", err) | ||
| } | ||
|
|
||
| logrus.Info("Successfully pulled agent-gather data") | ||
| return absPath, nil | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
ast-grep outline pkg/gather/ssh
rg -n -A5 'func NewClient|func Run|func PullFileTo' pkg/gather/sshRepository: openshift/installer
Length of output: 1335
🏁 Script executed:
#!/bin/bash
sed -n '1,220p' pkg/gather/ssh/ssh.goRepository: openshift/installer
Length of output: 4360
Add a timeout to the SSH gather path. ssh.Dial, sess.Run, and the SFTP download here have no deadline, so a stalled rendezvous host or network issue can block the CLI indefinitely.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/agent/gather.go` around lines 18 - 47, The SSH gather flow in
PullAgentGatherArchive can block indefinitely because the client connection,
command execution, and file transfer have no deadline. Add a timeout to the
rendezvous host path by threading a context or explicit timeout into
gatherssh.NewClient, gatherssh.Run, and gatherssh.PullFileTo, and ensure the
timeout is enforced for the ssh.Dial/session/SFTP operations. Keep the change
localized to PullAgentGatherArchive and the gatherssh helpers it calls so all
three stages fail fast on stalled hosts or networks.
Source: Path instructions
Add a -i flag to agent-gather that sets the gather ID, which determines the output filename. A timestamp-based ID is generated by default. This allows callers to know in advance where to find the output file, following the same pattern used by installer-gather.sh. Assisted-by: Claude Code
6afc374 to
557eeeb
Compare
| } | ||
|
|
||
| func runAgentGatherCmd(directory string) (string, error) { | ||
| rendezvousIP, _, err := agentpkg.FindRendezvouIPAndSSHKeyFromAssetStore(directory) |
There was a problem hiding this comment.
Note a blocking point, but for this particular command I think we should give also the possibility to the user to specify both the rendezvousIP and sshkey location, so that it could be executed from a location where the asset store was not present
There was a problem hiding this comment.
I'm not sure that would really buy anything over just using ssh. It's the same amount of config. The advantage of this command is that it Just Works (provided you run it from the installer directory).
| // Run agent-gather with -i so it writes to a predictable path | ||
| cmd := fmt.Sprintf("sudo /usr/local/bin/agent-gather -i %s", gatherID) | ||
| if err := gatherssh.Run(client, cmd); err != nil { | ||
| return "", fmt.Errorf("failed to run agent-gather on rendezvous host: %w", err) |
There was a problem hiding this comment.
note: potentially this could be a genuine error, ie the agent-gather was not present
There was a problem hiding this comment.
Yeah, this is true and it would be nice to be able to distinguish between the script not existing and another kind of failure. But the current API doesn't give us this kind of insight 🙁
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andfasano The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
6a4049a to
6fcf8ec
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
cmd/openshift-install/agent/gather.go (1)
45-49: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winThread Cobra’s context instead of using
context.TODO().This command performs asset-store work and network gather setup; passing
cmd.Context()keeps cancellation/timeouts available instead of severing them at Line 64. As per path instructions, Go code should usecontext.Contextfor cancellation and timeouts.Proposed fix
- Run: func(_ *cobra.Command, _ []string) { + Run: func(cmd *cobra.Command, _ []string) { cleanup := command.SetupFileHook(command.RootOpts.Dir) defer cleanup() - bundlePath, err := runAgentGatherCmd(command.RootOpts.Dir) + bundlePath, err := runAgentGatherCmd(cmd.Context(), command.RootOpts.Dir)-func runAgentGatherCmd(directory string) (string, error) { - ctx := context.TODO() - +func runAgentGatherCmd(ctx context.Context, directory string) (string, error) {Also applies to: 63-64
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/openshift-install/agent/gather.go` around lines 45 - 49, The agent-gather command is severing cancellation/timeouts by using a TODO context instead of the Cobra command context. Update the Run path in the cobra command to thread cmd.Context() through into runAgentGatherCmd and onward to the asset-store/network gather setup helpers, so the existing context cancellation is preserved. Check the runAgentGatherCmd flow and any helpers it calls that currently accept or create a context, and replace context.TODO() with the passed-in context where those operations are started.Source: Path instructions
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/openshift-install/agent/gather.go`:
- Around line 77-94: The temp bootstrap SSH key path is being stored in the
package-level agentGatherOpts.sshKeys even though the file is removed when
gatherSSHKeys returns, so later runs can inherit a stale path. Update
gatherSSHKeys to build and use a local sshKeys slice for this invocation, append
the temp file name there, and pass that local list onward instead of mutating
the shared agentGatherOpts state.
- Around line 40-103: Add focused unit tests around newAgentGatherCmd and
runAgentGatherCmd to cover the agent gather flow without relying on integration
tests. Mock assetstore.NewStore, agentpkg.FindRendezvousIPFromAssetStore,
store.Fetch, os.CreateTemp, and agentpkg.PullAgentGatherArchive to verify
rendezvous lookup, bootstrap SSH key fallback, temporary key cleanup, and that
agentGatherOpts.sshKeys is wired into the archive pull call. Keep the tests
centered on runAgentGatherCmd and the command Run path so the behavior is
covered even if line numbers shift.
In `@pkg/asset/agent/image/ignition.go`:
- Line 156: The ignition test coverage is missing an assertion for the bootstrap
SSH key alongside the existing authorized key checks. Update the assertions in
ignition_test.go around the ignition generation path to verify that
core.SSHAuthorizedKeys contains BootstrapSSHKeyPair.Public() and still includes
infraEnv.Spec.SSHAuthorizedKey, using the relevant ignition setup helpers and
the SSHAuthorizedKeys field to locate the check.
---
Nitpick comments:
In `@cmd/openshift-install/agent/gather.go`:
- Around line 45-49: The agent-gather command is severing cancellation/timeouts
by using a TODO context instead of the Cobra command context. Update the Run
path in the cobra command to thread cmd.Context() through into runAgentGatherCmd
and onward to the asset-store/network gather setup helpers, so the existing
context cancellation is preserved. Check the runAgentGatherCmd flow and any
helpers it calls that currently accept or create a context, and replace
context.TODO() with the passed-in context where those operations are started.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 2331c786-c1b4-4b7f-b3a7-06700f14bcc0
📒 Files selected for processing (7)
cmd/openshift-install/agent.gocmd/openshift-install/agent/gather.gocmd/openshift-install/agent/waitfor.gopkg/agent/cluster.gopkg/agent/gather.gopkg/agent/rest.gopkg/asset/agent/image/ignition.go
✅ Files skipped from review due to trivial changes (1)
- cmd/openshift-install/agent.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/agent/gather.go
| func newAgentGatherCmd() *cobra.Command { | ||
| cmd := &cobra.Command{ | ||
| Use: "bootstrap", | ||
| Short: "Gather debugging data from the rendezvous host", | ||
| Args: cobra.ExactArgs(0), | ||
| Run: func(_ *cobra.Command, _ []string) { | ||
| cleanup := command.SetupFileHook(command.RootOpts.Dir) | ||
| defer cleanup() | ||
|
|
||
| bundlePath, err := runAgentGatherCmd(command.RootOpts.Dir) | ||
| if err != nil { | ||
| logrus.Fatal(err) | ||
| } | ||
| logrus.Infof("Agent gather logs captured here %q", bundlePath) | ||
| }, | ||
| } | ||
|
|
||
| cmd.PersistentFlags().StringArrayVar(&agentGatherOpts.sshKeys, "key", []string{}, | ||
| "Path to SSH private keys that should be used for authentication. "+ | ||
| "If no key was provided, SSH private keys from user's environment will be used") | ||
| return cmd | ||
| } | ||
|
|
||
| func runAgentGatherCmd(directory string) (string, error) { | ||
| ctx := context.TODO() | ||
|
|
||
| store, err := assetstore.NewStore(directory) | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to create asset store: %w", err) | ||
| } | ||
|
|
||
| rendezvousIP, err := agentpkg.FindRendezvousIPFromAssetStore(store) | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to determine rendezvous host: %w", err) | ||
| } | ||
| logrus.Infof("Rendezvous host IP: %s", rendezvousIP) | ||
|
|
||
| // add the bootstrap SSH key pair to the sshKeys list automatically | ||
| bootstrapSSHKeyPair := &tls.BootstrapSSHKeyPair{} | ||
| if err := store.Fetch(ctx, bootstrapSSHKeyPair); err != nil { | ||
| logrus.Debugf("Failed to fetch bootstrap SSH key pair: %v", err) | ||
| } else { | ||
| tmpfile, err := os.CreateTemp("", "bootstrap-ssh") | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
| defer os.Remove(tmpfile.Name()) | ||
| if _, err := tmpfile.Write(bootstrapSSHKeyPair.Private()); err != nil { | ||
| return "", err | ||
| } | ||
| if err := tmpfile.Close(); err != nil { | ||
| return "", err | ||
| } | ||
| agentGatherOpts.sshKeys = append(agentGatherOpts.sshKeys, tmpfile.Name()) | ||
| } | ||
|
|
||
| gatherID := time.Now().Format("20060102150405") | ||
|
|
||
| bundlePath, err := agentpkg.PullAgentGatherArchive(rendezvousIP, agentGatherOpts.sshKeys, directory, gatherID) | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to gather data from rendezvous host: %w", err) | ||
| } | ||
|
|
||
| return bundlePath, nil |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify tests cover the new gather command/helper wiring without assuming file layout.
rg -n --glob '*_test.go' 'NewGatherCmd|newAgentGatherCmd|runAgentGatherCmd|PullAgentGatherArchive'Repository: openshift/installer
Length of output: 157
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Map the relevant command package and locate related tests.
git ls-files 'cmd/openshift-install/agent/*' 'cmd/openshift-install/*_test.go' | sort
echo '--- outline gather.go ---'
ast-grep outline cmd/openshift-install/agent/gather.go --view expanded || true
echo '--- search for agent gather tests and related helpers ---'
rg -n --glob 'cmd/openshift-install/**/*_test.go' \
'gather|Rendezvous|BootstrapSSHKeyPair|PullAgentGatherArchive|newAgentGatherCmd|runAgentGatherCmd|agentGatherOpts'
echo '--- search for command setup tests elsewhere ---'
rg -n --glob '**/*_test.go' \
'SetupFileHook|PullAgentGatherArchive|FindRendezvousIPFromAssetStore|BootstrapSSHKeyPair'Repository: openshift/installer
Length of output: 757
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Enumerate unit-test files in cmd/openshift-install and look for command-path coverage.
git ls-files 'cmd/openshift-install/**/*_test.go' | sort
echo '--- command-related test coverage ---'
rg -n --glob 'cmd/openshift-install/**/*_test.go' \
'New.*Cmd|new.*Cmd|run.*Cmd|gather|waitfor|agent' \
cmd/openshift-installRepository: openshift/installer
Length of output: 472
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the existing cmd/openshift-install tests to see whether they are unit tests
# and whether they exercise the new gather command path.
ast-grep outline cmd/openshift-install/integration_test.go --view expanded || true
echo '---'
ast-grep outline cmd/openshift-install/internal_integration_test.go --view expanded || true
echo '--- file headers ---'
sed -n '1,140p' cmd/openshift-install/integration_test.go
echo '---'
sed -n '1,180p' cmd/openshift-install/internal_integration_test.goRepository: openshift/installer
Length of output: 6415
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check the test file kinds and whether they import testing-only integration deps.
for f in cmd/openshift-install/integration_test.go cmd/openshift-install/internal_integration_test.go; do
echo "--- $f ---"
sed -n '1,120p' "$f"
doneRepository: openshift/installer
Length of output: 4164
Add unit tests for the agent gather command path cmd/openshift-install/agent/gather.go is only covered by integration tests today; add focused unit coverage for rendezvous lookup, bootstrap-key fallback, temp-key cleanup, and PullAgentGatherArchive wiring.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/openshift-install/agent/gather.go` around lines 40 - 103, Add focused
unit tests around newAgentGatherCmd and runAgentGatherCmd to cover the agent
gather flow without relying on integration tests. Mock assetstore.NewStore,
agentpkg.FindRendezvousIPFromAssetStore, store.Fetch, os.CreateTemp, and
agentpkg.PullAgentGatherArchive to verify rendezvous lookup, bootstrap SSH key
fallback, temporary key cleanup, and that agentGatherOpts.sshKeys is wired into
the archive pull call. Keep the tests centered on runAgentGatherCmd and the
command Run path so the behavior is covered even if line numbers shift.
Source: Coding guidelines
| // add the bootstrap SSH key pair to the sshKeys list automatically | ||
| bootstrapSSHKeyPair := &tls.BootstrapSSHKeyPair{} | ||
| if err := store.Fetch(ctx, bootstrapSSHKeyPair); err != nil { | ||
| logrus.Debugf("Failed to fetch bootstrap SSH key pair: %v", err) | ||
| } else { | ||
| tmpfile, err := os.CreateTemp("", "bootstrap-ssh") | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
| defer os.Remove(tmpfile.Name()) | ||
| if _, err := tmpfile.Write(bootstrapSSHKeyPair.Private()); err != nil { | ||
| return "", err | ||
| } | ||
| if err := tmpfile.Close(); err != nil { | ||
| return "", err | ||
| } | ||
| agentGatherOpts.sshKeys = append(agentGatherOpts.sshKeys, tmpfile.Name()) | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Avoid persisting a deleted temp-key path in global options.
Line 93 appends the temp key path to package-level agentGatherOpts.sshKeys, but Line 86 removes that file when the function returns. A later in-process invocation or test can inherit a stale key path. Use a local copy for this run.
Proposed fix
+ sshKeys := append([]string{}, agentGatherOpts.sshKeys...)
+
// add the bootstrap SSH key pair to the sshKeys list automatically
bootstrapSSHKeyPair := &tls.BootstrapSSHKeyPair{}
if err := store.Fetch(ctx, bootstrapSSHKeyPair); err != nil {
logrus.Debugf("Failed to fetch bootstrap SSH key pair: %v", err)
@@
- agentGatherOpts.sshKeys = append(agentGatherOpts.sshKeys, tmpfile.Name())
+ sshKeys = append(sshKeys, tmpfile.Name())
}
gatherID := time.Now().Format("20060102150405")
- bundlePath, err := agentpkg.PullAgentGatherArchive(rendezvousIP, agentGatherOpts.sshKeys, directory, gatherID)
+ bundlePath, err := agentpkg.PullAgentGatherArchive(rendezvousIP, sshKeys, directory, gatherID)Also applies to: 98-98
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/openshift-install/agent/gather.go` around lines 77 - 94, The temp
bootstrap SSH key path is being stored in the package-level
agentGatherOpts.sshKeys even though the file is removed when gatherSSHKeys
returns, so later runs can inherit a stale path. Update gatherSSHKeys to build
and use a local sshKeys slice for this invocation, append the temp file name
there, and pass that local list onward instead of mutating the shared
agentGatherOpts state.
The SSH key loaded from InstallConfig and returned from FindRendezvousIPAndSSHKeyFromAssetStore was stored in NodeZeroRestClient.NodeSSHKey but never read by any code since 82c7aa3. Remove the dead code to reduce confusion. Assisted-by: Claude Code
6fcf8ec to
e23e547
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/agent/rest.go (1)
66-76: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winReturn the underlying load error instead of a generic failure.
Line 75 collapses the actionable load failures into a generic error after only logging details at debug. For gather/wait failures, callers lose the asset name and root cause.
Proposed fix
if agentConfigError != nil { - logrus.Debug(errors.Wrapf(agentConfigError, "failed to load %s", agentConfigAsset.Name())) + return "", errors.Wrapf(agentConfigError, "failed to load %s", agentConfigAsset.Name()) } if manifestError != nil { - logrus.Debug(errors.Wrapf(manifestError, "failed to load %s", agentManifestsAsset.Name())) + return "", errors.Wrapf(manifestError, "failed to load %s", agentManifestsAsset.Name()) } if agentHostsError != nil { - logrus.Debug(errors.Wrapf(agentHostsError, "failed to load %s", agentHostsAsset.Name())) - } - if agentConfigError != nil || manifestError != nil || agentHostsError != nil { - return "", errors.New("failed to load AgentConfig, NMStateConfig, or AgentHosts") + return "", errors.Wrapf(agentHostsError, "failed to load %s", agentHostsAsset.Name()) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/agent/rest.go` around lines 66 - 76, The load path in rest.go is swallowing the actionable failure by returning a generic error after only debug logging the specific causes. Update the error handling in the Agent config loading flow to return the underlying wrapped load error instead of the broad "failed to load AgentConfig, NMStateConfig, or AgentHosts" message, using the existing agentConfigError, manifestError, and agentHostsError values from the load block so callers can see the asset name and root cause.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@pkg/agent/rest.go`:
- Around line 66-76: The load path in rest.go is swallowing the actionable
failure by returning a generic error after only debug logging the specific
causes. Update the error handling in the Agent config loading flow to return the
underlying wrapped load error instead of the broad "failed to load AgentConfig,
NMStateConfig, or AgentHosts" message, using the existing agentConfigError,
manifestError, and agentHostsError values from the load block so callers can see
the asset name and root cause.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: d91ca3ae-df66-40af-b82f-22ef82fa5171
📒 Files selected for processing (8)
cmd/openshift-install/agent.gocmd/openshift-install/agent/gather.gocmd/openshift-install/agent/waitfor.gopkg/agent/cluster.gopkg/agent/gather.gopkg/agent/rest.gopkg/asset/agent/image/ignition.gopkg/nodejoiner/monitoraddnodes.go
🚧 Files skipped from review as they are similar to previous changes (6)
- cmd/openshift-install/agent.go
- cmd/openshift-install/agent/waitfor.go
- pkg/asset/agent/image/ignition.go
- pkg/agent/gather.go
- pkg/agent/cluster.go
- cmd/openshift-install/agent/gather.go
Add the generated bootstrap SSH public key to the core user's SSHAuthorizedKeys in the agent ignition, alongside the user-provided key from the InfraEnv spec. This enables the installer to SSH to the rendezvous host using the bootstrap key without requiring the user to provide a key. This key does not persist into the cluster, it is only active while the agent ISO is booted. Assisted-by: Claude Code
Add "openshift-install agent gather bootstrap" command to collect debugging data from the rendezvous host during agent-based installations. The command determines the rendezvous IP from the asset store, SSHs to the host, runs agent-gather, and pulls the resulting archive locally. As with "openshift-install gather bootstrap", the bootstrap SSH key pair is loaded automatically from the asset store, additional keys can be specified with the --key flag, and keys from the user's SSH agent or ~/.ssh/ are always included as well. Assisted-by: Claude Code
e23e547 to
f175ad3
Compare
|
@zaneb: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add "openshift-install agent gather bootstrap" command to collect
debugging data from the rendezvous host during agent-based
installations. The command determines the rendezvous IP from the
asset store, SSHs to the host, runs agent-gather, and pulls the
resulting archive locally.
Accepts an optional --key flag to specify SSH private keys for
authentication. If no key is provided, keys from the user's
environment are used.
Summary by CodeRabbit
openshift-install agent gathercommand with abootstrapoption to collect agent diagnostics from the rendezvous host.-i IDoption to use a consistent gather identifier for output naming.