OCPBUGS-86643: Skip control-plane label migration on External topolog…#6205
OCPBUGS-86643: Skip control-plane label migration on External topolog…#6205harsh-thakare wants to merge 1 commit into
Conversation
…y clusters HostedCluster (HyperShift) worker nodes must not receive the node-role.kubernetes.io/control-plane label during upgrade. External topology clusters have no in-cluster control plane; MCO master reconciliation should not run label migration there. https://issues.redhat.com/browse/OCPBUGS-86643 Signed-off-by: Harshal Thakare <hthakare@redhat.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@harsh-thakare: This pull request references Jira Issue OCPBUGS-86643, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Walkthrough
ChangesExternalTopology guard in
OCB off-cluster + on-cluster layering coexistence e2e test
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 13 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (13 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Warning |
|
Hi @harsh-thakare. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: harsh-thakare The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/controller/node/node_controller_test.go (1)
1639-1644: ⚡ Quick winHandle both node patch and update actions in the assertion
Line 1639 only checks
core.PatchAction; if node mutation is emitted asupdate, this test can fail even when behavior is correct. Consider accepting both verbs and validating label presence from either payload/object.Suggested test adjustment
patched := false for _, action := range filterInformerActions(f.kubeclient.Actions()) { - patchAction, ok := action.(core.PatchAction) - if !ok || action.GetResource().Resource != "nodes" { - continue - } - if strings.Contains(string(patchAction.GetPatch()), ControlPlaneLabel) { - patched = true - break - } + if action.GetResource().Resource != "nodes" { + continue + } + switch a := action.(type) { + case core.PatchAction: + if strings.Contains(string(a.GetPatch()), ControlPlaneLabel) { + patched = true + } + case core.UpdateAction: + if n, ok := a.GetObject().(*corev1.Node); ok { + _, patched = n.Labels[ControlPlaneLabel] + } + } + if patched { + break + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/node/node_controller_test.go` around lines 1639 - 1644, The test assertion at line 1639 only checks for core.PatchAction when verifying node mutations, but if the controller emits an update action instead of a patch action, the test fails incorrectly. Modify the type assertion to accept both core.PatchAction and core.UpdateAction (or whichever update action type is used). For PatchAction, continue checking the patch content via GetPatch(), and for UpdateAction, check the ControlPlaneLabel in the object field being updated. Set patched to true when the label is found in either action type's relevant payload.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended-priv/mco_ocb.go`:
- Line 281: The deferred call to DisableOCL(mosc) on line 281 is ignoring the
error return value that the function provides. Replace the simple defer
statement with a deferred anonymous function that explicitly captures and
handles the error returned by DisableOCL(mosc) by either logging it, storing it
for assertion, or using an error handling pattern consistent with the test's
cleanup strategy, ensuring that any cleanup failures are properly tracked rather
than silently dropped.
---
Nitpick comments:
In `@pkg/controller/node/node_controller_test.go`:
- Around line 1639-1644: The test assertion at line 1639 only checks for
core.PatchAction when verifying node mutations, but if the controller emits an
update action instead of a patch action, the test fails incorrectly. Modify the
type assertion to accept both core.PatchAction and core.UpdateAction (or
whichever update action type is used). For PatchAction, continue checking the
patch content via GetPatch(), and for UpdateAction, check the ControlPlaneLabel
in the object field being updated. Set patched to true when the label is found
in either action type's relevant payload.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 27ce11ea-e4d5-472b-8348-57644378bbb6
📒 Files selected for processing (3)
pkg/controller/node/node_controller.gopkg/controller/node/node_controller_test.gotest/extended-priv/mco_ocb.go
| exutil.By("Enable on-cluster layering (OCL) with a containerFile") | ||
| mosc, err := CreateMachineOSConfigUsingExternalOrInternalRegistry(oc.AsAdmin(), MachineConfigNamespace, mcp.GetName(), mcp.GetName(), containerFiles) | ||
| o.Expect(err).NotTo(o.HaveOccurred(), "Error creating MachineOSConfig for %s", mcp.GetName()) | ||
| defer DisableOCL(mosc) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the ignored error return and the function signature
rg -n 'defer\s+DisableOCL\(' test/extended-priv/mco_ocb.go
rg -n '^func\s+DisableOCL\(.+\)\s+error' test/extended-priv/mco_ocb.goRepository: openshift/machine-config-operator
Length of output: 243
🏁 Script executed:
# Get context on which lines are new in the PR by checking the ranges
# The review mentions lines 238-337 as new test code
# Check if lines 176, 198 are before the new block and 382 is after
sed -n '170,390p' test/extended-priv/mco_ocb.go | cat -n | grep -E '(176|198|281|382):|defer.*DisableOCL'Repository: openshift/machine-config-operator
Length of output: 204
🏁 Script executed:
# Also check how errors are handled with other deferred calls in the file to understand the pattern
rg -B2 -A2 'defer.*\(' test/extended-priv/mco_ocb.go | head -60Repository: openshift/machine-config-operator
Length of output: 3051
Handle deferred DisableOCL errors instead of dropping them.
Line 281 defers DisableOCL(mosc) but ignores its error return. If cleanup fails, the spec can leak OCL state and destabilize later tests. Assert this deferred cleanup result explicitly.
Proposed fix
- defer DisableOCL(mosc)
+ defer func() {
+ o.Expect(DisableOCL(mosc)).To(o.Succeed(), "failed to disable OCL for pool %s", mcp.GetName())
+ }()Go guideline: **/*.go — "Never ignore error returns".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| defer DisableOCL(mosc) | |
| defer func() { | |
| o.Expect(DisableOCL(mosc)).To(o.Succeed(), "failed to disable OCL for pool %s", mcp.GetName()) | |
| }() |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/extended-priv/mco_ocb.go` at line 281, The deferred call to
DisableOCL(mosc) on line 281 is ignoring the error return value that the
function provides. Replace the simple defer statement with a deferred anonymous
function that explicitly captures and handles the error returned by
DisableOCL(mosc) by either logging it, storing it for assertion, or using an
error handling pattern consistent with the test's cleanup strategy, ensuring
that any cleanup failures are properly tracked rather than silently dropped.
Source: Coding guidelines
There was a problem hiding this comment.
The change in test/extended-priv/mco_ocb.go appears
unrelated to this PR. Could you please let me know the reason behind this file?
Jira
https://issues.redhat.com/browse/OCPBUGS-86643
Problem
After upgrading HyperShift HostedClusters from 4.20.x to 4.21.x, dedicated
worker nodes incorrectly receive
node-role.kubernetes.io/control-plane.Hub compact nodes that are real control-plane nodes are expected to carry
both
masterandcontrol-planelabels; guest HostedCluster worker nodesmust not.
Customer environment:
NodePoolobjects exist on the HUB (clustersnamespace), not inside guest clusterscontrol-planelabel observed on existing HostedCluster worker nodesRoot cause
MCO
reconcileMaster()callsupdateMasterNodeControlPlaneLabel()(added forOCPBUGS-58180) to migrate legacy
master→control-planelabels. This runswithout excluding External topology clusters (HyperShift guest clusters),
where there is no in-cluster control plane.
Fix
Skip
reconcileMaster()whenControllerConfig.spec.infra.status.controlPlaneTopology == External.Files changed
pkg/controller/node/node_controller.go— guard inreconcileMaster()pkg/controller/node/node_controller_test.go— unit testsLocal testing
Environment: macOS, branch ocpbugs-86643-skip-cp-label-external-topology, commit 73db7395
Summary by CodeRabbit
Bug Fixes
Tests