OCPBUGS-86719: Use zero-downtime rollout strategy for console pods by asadawar · Pull Request #1168 · openshift/console-operator

asadawar · 2026-05-29T10:54:36Z

Summary

Change the console deployment rollout strategy from maxSurge=3, maxUnavailable=1 to maxSurge=1, maxUnavailable=0 on 3+ node topologies (HighlyAvailable, External+HA), ensuring no old pod is terminated until its replacement passes readiness checks
On 2-node topologies (DualReplica, HighlyAvailableArbiter), keep maxUnavailable=1 with maxSurge reduced from 3 to 1 to avoid rollout deadlock caused by required pod anti-affinity
Add test coverage for DualReplica and HighlyAvailableArbiter topology strategies

Why this approach

Three approaches were considered:

1. maxUnavailable=0 for all HA topologies (rejected)
On DualReplica (2 masters, 2 replicas) and HighlyAvailableArbiter (2 full masters + 1 arbiter) clusters, the console deployment uses RequiredDuringSchedulingIgnoredDuringExecution pod anti-affinity on kubernetes.io/hostname. When every eligible node already runs a console pod, the scheduler cannot place a surge pod. With maxUnavailable=0, no old pod can be terminated either, causing a rollout deadlock that stalls until ProgressDeadlineExceeded (10 minutes). This approach was rejected because it would break recently added DualReplica support (PR #1151, merged 2026-05-07).

2. Keep maxUnavailable=1 for all topologies, only reduce maxSurge (rejected)
Reducing maxSurge from 3 to 1 aligns with other operators (CMO monitoring-plugin uses maxUnavailable=1 with default maxSurge) but does not fix the reported bug. With maxUnavailable=1, Kubernetes is still allowed to terminate one old pod before its replacement is ready, causing the console flap. This approach was rejected because it does not address the root cause.

3. Topology-aware strategy (chosen)
Use maxUnavailable=0 on topologies where a free node is available for the surge pod (HighlyAvailable with 3+ masters, External+HA with multiple workers), and maxUnavailable=1 on constrained topologies (DualReplica, HighlyAvailableArbiter) where rollout deadlock is possible. This fixes the bug for the most common topology while preserving correct behavior on constrained clusters.

For the HighlyAvailableArbiter case, the conservative choice (maxUnavailable=1) was made because arbiter nodes may have taints or resource constraints that prevent scheduling console pods, effectively making it a 2-node topology for console scheduling. Maintainers familiar with arbiter node scheduling can adjust this if arbiter nodes are known to be eligible.

Root cause

The withStrategy function in pkg/console/subresource/deployment/deployment.go:184 set maxSurge=3, maxUnavailable=1 for all HA topologies. These values were introduced in PR #1107 (OCPBUGS-74872) as part of a refactor that moved deployment construction from bindata to Go code, without specific rationale for the strategy values.

With maxUnavailable=1 and 2 replicas, the Kubernetes deployment controller is allowed to terminate one old pod immediately when a rollout starts, even before any new pod is ready. This creates a window (approximately 10-15 seconds based on observed pod startup times) where only one pod serves traffic. During this window:

The terminating pod is removed from Service endpoints (even though the preStop hook keeps the process alive for 25 seconds)
New connections are routed only to the single remaining pod
WebSocket connections to the terminated pod are dropped, causing visible console "flapping"

Cluster verification

Verified on a live OCP 4.22.0-rc.4 vSphere IPI cluster:

Cluster topology:

$ oc get infrastructure cluster -o jsonpath='{.status.controlPlaneTopology}'
HighlyAvailable

$ oc get nodes -l node-role.kubernetes.io/master= -o name
node/master-0
node/master-1
node/master-2

Current strategy (before fix):

$ oc get deployment console -n openshift-console -o jsonpath='{.spec.strategy}'
{"rollingUpdate":{"maxSurge":3,"maxUnavailable":1},"type":"RollingUpdate"}

Pod distribution (2 pods on 2 of 3 masters, 3rd master free for surge):

$ oc get pods -n openshift-console -o wide
NAME                        READY   STATUS    NODE
console-7dfb9f987d-4rbcd    1/1     Running   master-1
console-7dfb9f987d-zc7n4    1/1     Running   master-0

With the fix applied (maxSurge=1, maxUnavailable=0), the rollout behavior would be:

New pod created on master-2 (the free node). Total: 3 pods, 2 available.
New pod passes readiness check. Total: 3 pods, 3 available.
One old pod terminated. Total: 2 pods, 2 available.
Second new pod created on the freed node. Total: 3 pods, 2 available.
Second new pod passes readiness check. Old pod terminated. Total: 2 pods, 2 available.

At no point does availability drop below 2 (full capacity).

Test plan

Unit tests pass (make test-unit): all deployment strategy tests updated and passing
Added new test cases for DualReplica and HighlyAvailableArbiter topologies
gofmt and govet clean (make check)
Verified current cluster topology and strategy on live OCP 4.22 cluster

OWNERS

/cc @spadgett @jhadvig @TheRealJon

Bug: https://issues.redhat.com/browse/OCPBUGS-86719

Summary by CodeRabbit

Bug Fixes
- Console deployment uses a topology-aware rolling update strategy to reduce disruption during rollouts.
Tests
- Updated unit tests to cover rolling update behavior across additional topology modes.
Chores / CI
- Extended end-to-end test timeout in the CI test runner to reduce spurious timeouts.

openshift-ci-robot · 2026-05-29T10:54:43Z

@asadawar: This pull request references Jira Issue OCPBUGS-86719, which is invalid:

expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

Change the console deployment rollout strategy from maxSurge=3, maxUnavailable=1 to maxSurge=1, maxUnavailable=0 on 3+ node topologies (HighlyAvailable, External+HA), ensuring no old pod is terminated until its replacement passes readiness checks

On 2-node topologies (DualReplica, HighlyAvailableArbiter), keep maxUnavailable=1 with maxSurge reduced from 3 to 1 to avoid rollout deadlock caused by required pod anti-affinity

Add test coverage for DualReplica and HighlyAvailableArbiter topology strategies

Why this approach

Three approaches were considered:

1. maxUnavailable=0 for all HA topologies (rejected)
On DualReplica (2 masters, 2 replicas) and HighlyAvailableArbiter (2 full masters + 1 arbiter) clusters, the console deployment uses RequiredDuringSchedulingIgnoredDuringExecution pod anti-affinity on kubernetes.io/hostname. When every eligible node already runs a console pod, the scheduler cannot place a surge pod. With maxUnavailable=0, no old pod can be terminated either, causing a rollout deadlock that stalls until ProgressDeadlineExceeded (10 minutes). This approach was rejected because it would break recently added DualReplica support (PR #1151, merged 2026-05-07).

2. Keep maxUnavailable=1 for all topologies, only reduce maxSurge (rejected)
Reducing maxSurge from 3 to 1 aligns with other operators (CMO monitoring-plugin uses maxUnavailable=1 with default maxSurge) but does not fix the reported bug. With maxUnavailable=1, Kubernetes is still allowed to terminate one old pod before its replacement is ready, causing the console flap. This approach was rejected because it does not address the root cause.

3. Topology-aware strategy (chosen)
Use maxUnavailable=0 on topologies where a free node is available for the surge pod (HighlyAvailable with 3+ masters, External+HA with multiple workers), and maxUnavailable=1 on constrained topologies (DualReplica, HighlyAvailableArbiter) where rollout deadlock is possible. This fixes the bug for the most common topology while preserving correct behavior on constrained clusters.

For the HighlyAvailableArbiter case, the conservative choice (maxUnavailable=1) was made because arbiter nodes may have taints or resource constraints that prevent scheduling console pods, effectively making it a 2-node topology for console scheduling. Maintainers familiar with arbiter node scheduling can adjust this if arbiter nodes are known to be eligible.

Root cause

The withStrategy function in pkg/console/subresource/deployment/deployment.go:184 set maxSurge=3, maxUnavailable=1 for all HA topologies. These values were introduced in PR #1107 (OCPBUGS-74872) as part of a refactor that moved deployment construction from bindata to Go code, without specific rationale for the strategy values.

With maxUnavailable=1 and 2 replicas, the Kubernetes deployment controller is allowed to terminate one old pod immediately when a rollout starts, even before any new pod is ready. This creates a window (approximately 10-15 seconds based on observed pod startup times) where only one pod serves traffic. During this window:

The terminating pod is removed from Service endpoints (even though the preStop hook keeps the process alive for 25 seconds)

New connections are routed only to the single remaining pod

WebSocket connections to the terminated pod are dropped, causing visible console "flapping"

Cluster verification

Verified on a live OCP 4.22.0-rc.4 vSphere IPI cluster:

Cluster topology:
$ oc get infrastructure cluster -o jsonpath='{.status.controlPlaneTopology}'
HighlyAvailable

$ oc get nodes -l node-role.kubernetes.io/master= -o name
node/master-0
node/master-1
node/master-2
Current strategy (before fix):
$ oc get deployment console -n openshift-console -o jsonpath='{.spec.strategy}'
{"rollingUpdate":{"maxSurge":3,"maxUnavailable":1},"type":"RollingUpdate"}
Pod distribution (2 pods on 2 of 3 masters, 3rd master free for surge):
$ oc get pods -n openshift-console -o wide
NAME                        READY   STATUS    NODE
console-7dfb9f987d-4rbcd    1/1     Running   master-1
console-7dfb9f987d-zc7n4    1/1     Running   master-0
With the fix applied (maxSurge=1, maxUnavailable=0), the rollout behavior would be:

New pod created on master-2 (the free node). Total: 3 pods, 2 available.

New pod passes readiness check. Total: 3 pods, 3 available.

One old pod terminated. Total: 2 pods, 2 available.

Second new pod created on the freed node. Total: 3 pods, 2 available.

Second new pod passes readiness check. Old pod terminated. Total: 2 pods, 2 available.

At no point does availability drop below 2 (full capacity).

Test plan

Unit tests pass (make test-unit): all deployment strategy tests updated and passing

Added new test cases for DualReplica and HighlyAvailableArbiter topologies

gofmt and govet clean (make check)

Verified current cluster topology and strategy on live OCP 4.22 cluster

OWNERS

/cc @spadgett @jhadvig @TheRealJon

Bug: https://issues.redhat.com/browse/OCPBUGS-86719

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-05-29T10:54:56Z

Walkthrough

The PR makes console Deployment rolling update parameters topology-aware: DualReplica and HighlyAvailableArbiter use MaxSurge=1/MaxUnavailable=1; other topologies use MaxSurge=1/MaxUnavailable=0. Tests are updated and expanded for topology cases, and e2e test timeout increased to 40m.

Changes

Topology-aware rolling update strategy

Layer / File(s)	Summary
Topology-aware rolling update strategy implementation `pkg/console/subresource/deployment/deployment.go`	`withStrategy` now sets `Strategy.RollingUpdate` using control-plane topology: `MaxSurge=1`/`MaxUnavailable=1` for `DualReplica` and `HighlyAvailableArbiter`, and `MaxSurge=1`/`MaxUnavailable=0` for other topologies.
Strategy test fixtures and topology-specific test cases `pkg/console/subresource/deployment/deployment_test.go`	Tests add fixtures for `DualReplicaTopologyMode` and `HighlyAvailableArbiterMode`, introduce `zeroDowntimeStrategy` (`1/0`) and a constrained HA strategy (`1/1`), update highly available expectations to use zero-downtime where appropriate, and add cases asserting DualReplica/Arbiter behavior.
Default deployment and downloads test expectations `pkg/console/subresource/deployment/deployment_test.go`	Default, trusted-CA, external topology, and downloads deployment test expectations updated to `maxSurge=1` / `maxUnavailable=0`.
E2E runner timeout `test-e2e.sh`	Increase `go test` timeout from 30m to 40m in the OpenShift CI path and align non-OpenShift branch invocation/echo formatting.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 13 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description provides a comprehensive summary, detailed rationale with three approaches evaluated, root cause analysis, verification on a live cluster, and test plan; however, it does not follow the required template structure with sections like 'Analysis/Root cause', 'Solution description', 'Test setup', 'Test cases', and 'Browser conformance'.	Restructure the description to follow the required template: add 'Analysis/Root cause', 'Solution description', 'Test setup', 'Test cases', 'Browser conformance' sections, and include the 'Reviewers and assignees' section with appropriate team tags.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (13 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically references the Jira issue (OCPBUGS-86719) and accurately summarizes the main change: implementing a zero-downtime rollout strategy for console pods.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	The project uses standard Go testing (not Ginkgo), and all test names in the PR changes are stable and deterministic with no dynamic values like pod names, timestamps, UUIDs, or node names.
Test Structure And Quality	✅ Passed	PR adds standard Go unit tests with table-driven patterns consistent with codebase, maintaining single responsibility and proper assertions via deep.Equal structures.
Microshift Test Compatibility	✅ Passed	No new Ginkgo e2e tests were added in this PR. Changes are limited to production code (deployment strategy) and unit tests (using Go testing package), plus a test runner script timeout update.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	PR adds TestDeploymentsReplicas, a standard Go test (not Ginkgo) that handles SNO via topology-aware replica logic and does not make multi-node assumptions.
Topology-Aware Scheduling Compatibility	✅ Passed	PR checks ControlPlaneTopology before applying constraints; correctly uses maxUnavailable=0 for 3+ nodes and maxUnavailable=1 for 2-node topologies.
Ote Binary Stdout Contract	✅ Passed	PR modifies deployment code, unit tests, and test wrapper script only—none introduce process-level stdout writes violating OTE Binary Stdout Contract; no process-level code in modified files.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	No new Ginkgo e2e tests are added in this PR. Changes are limited to deployment strategy code, standard Go unit tests, and e2e runner script timeout adjustment. Check is not applicable.
No-Weak-Crypto	✅ Passed	The PR changes are focused on Kubernetes deployment rolling update strategies and test timeouts. No cryptographic operations, weak crypto, or secret comparisons are present in the modified files.
Container-Privileges	✅ Passed	PR introduces no privileged container configs, host access, or privilege escalation. Only modifies RollingUpdate strategy; existing deployments enforce non-root, no-escalation security contexts.
No-Sensitive-Data-In-Logs	✅ Passed	No sensitive data exposed in logs. All logging statements use operational metrics; test-e2e.sh uses single quotes preventing KUBECONFIG credential expansion.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-05-29T10:54:58Z

Hi @asadawar. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jhadvig · 2026-06-04T07:19:57Z

/ok-to-test

jhadvig

@asadawar thank you for the fix 👍
/lgtm
/approve
/cherry-pick release-4.22

openshift-ci · 2026-06-04T07:25:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: asadawar, jhadvig

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [jhadvig]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

asadawar · 2026-06-04T10:52:47Z

/test e2e-aws-operator

openshift-ci · 2026-06-04T15:55:00Z

New changes are detected. LGTM label has been removed.

The rollout strategy change (maxUnavailable: 0) makes each deployment rollout take a bit longer since the new pod must be Ready before the old pod is terminated. Across the full test suite this adds enough time to push past the 30m limit, so the e2e test timeout is bumped from 30m to 40m. Assisted-by: Claude Code

asadawar · 2026-06-04T16:02:14Z

The e2e timeout was hitting 30m because the rollout strategy change (maxUnavailable: 0) makes each deployment rollout take a bit longer. The new pod has to be fully Ready before the old one gets terminated, which is the whole point of this PR, but it adds up across the full test suite. Bumped the test timeout from 30m to 40m to give enough headroom.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test-e2e.sh`:
- Line 20: The echo line currently uses single quotes so ${KUBECONFIG} is not
expanded; update the echo in test-e2e.sh (the echo
'KUBERNETES_CONFIG=${KUBECONFIG} go test -timeout 40m -v ./test/e2e/') to use
double quotes so the shell expands ${KUBECONFIG} (i.e., change the quoting
around the echo argument to allow variable substitution).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 880576a7-c8ff-473a-b39a-de403c2d7608

📥 Commits

Reviewing files that changed from the base of the PR and between 77df00f and 33be859.

📒 Files selected for processing (3)

pkg/console/subresource/deployment/deployment.go
pkg/console/subresource/deployment/deployment_test.go
test-e2e.sh

🚧 Files skipped from review as they are similar to previous changes (2)

pkg/console/subresource/deployment/deployment.go
pkg/console/subresource/deployment/deployment_test.go

📜 Review details

🧰 Additional context used

📓 Path-based instructions (1)

{Makefile,makefile,**/Makefile,**/*.sh}

📄 CodeRabbit inference engine (AGENTS.md)

Use GOFLAGS="-mod=vendor" for builds and tests to ensure vendored dependencies are used

Files:

test-e2e.sh

🪛 Shellcheck (0.11.0)

test-e2e.sh

[info] 20-20: Expressions don't expand in single quotes, use double quotes for that.

(SC2016)

🔀 Multi-repo context openshift/console

[::openshift/console::] pkg/server/server.go:759 — server constructs the JSON served flags including ControlPlaneTopology (ControlPlaneTopology: s.ControlPlaneTopology). This is where the operator/daemon could expose topology to the frontend via SERVER_FLAGS.

[::openshift/console::] cmd/bridge/main.go:175,331 — CLI/bridge defines the "control-plane-topology-mode" flag and maps it into the server flags (ControlPlaneTopology) passed to the server; relevant for how topology is injected into runtime flags.

[::openshift/console::] pkg/serverconfig/config.go:271-272 and pkg/serverconfig/validate.go:49,140-143 — topology handling/validation and wiring into serverconfig (flag population/validation for controlPlaneTopology).

[::openshift/console::] frontend/packages/console-dynamic-plugin-sdk/src/extensions/dashboards.ts:75 and frontend/public/components/dashboard/dashboards-page/cluster-dashboard/status-card.tsx:155-158 and frontend/packages/console-app/console-extensions.json:1909 — frontend code reads/uses controlPlaneTopology (via window.SERVER_FLAGS.controlPlaneTopology / disallowedControlPlaneTopology in extensions). This shows the topology flag is consumed by the UI.

Findings summary:

I could not locate a repo-wide function named withStrategy or the file path pkg/console/subresource/deployment/deployment.go in this checkout (rg returned no matches); ensure the PR path matches this repository layout.
ControlPlaneTopology is surfaced in server/bridge/flags and consumed by the frontend; changes in operator behavior that depend on topology (rollingUpdate strategy) may affect what topology value is exposed to the UI and any UI logic that filters features by topology.

🔇 Additional comments (1)

test-e2e.sh (1)

10-18: LGTM!

coderabbitai · 2026-06-04T16:09:44Z

 else
-	echo 'KUBERNETES_CONFIG=${KUBECONFIG} go test -timeout 30m -v ./test/e2e/'
-    KUBERNETES_CONFIG=${KUBECONFIG} go test -timeout 30m -v ./test/e2e/
+	echo 'KUBERNETES_CONFIG=${KUBECONFIG} go test -timeout 40m -v ./test/e2e/'


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix single quotes to allow variable expansion.

The echo statement uses single quotes, which prevent ${KUBECONFIG} from expanding. The output will literally show ${KUBECONFIG} instead of the actual kubeconfig path, which is misleading for debugging.

🔧 Proposed fix

- echo 'KUBERNETES_CONFIG=${KUBECONFIG} go test -timeout 40m -v ./test/e2e/' + echo "KUBERNETES_CONFIG=${KUBECONFIG} go test -timeout 40m -v ./test/e2e/"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

echo 'KUBERNETES_CONFIG=${KUBECONFIG} go test -timeout 40m -v ./test/e2e/'

echo "KUBERNETES_CONFIG=${KUBECONFIG} go test -timeout 40m -v ./test/e2e/"

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 20-20: Expressions don't expand in single quotes, use double quotes for that.

(SC2016)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test-e2e.sh` at line 20, The echo line currently uses single quotes so ${KUBECONFIG} is not expanded; update the echo in test-e2e.sh (the echo 'KUBERNETES_CONFIG=${KUBECONFIG} go test -timeout 40m -v ./test/e2e/') to use double quotes so the shell expands ${KUBECONFIG} (i.e., change the quoting around the echo argument to allow variable substitution).

The single quotes are intentional. This echo shows the user the command template to run manually, not an expanded version. It's the same pattern as before this PR (was single-quoted with 30m too).

@asadawar, understood — the single quotes are deliberate to display the literal command template (with ${KUBECONFIG} unexpanded) as a reference for manual execution. That's a reasonable intent, and the pattern is consistent with the previous version of the script. I'll note this for future reviews.

✏️ Learnings added

Learnt from: asadawar Repo: openshift/console-operator PR: 1168 File: test-e2e.sh:20-20 Timestamp: 2026-06-04T16:38:24.616Z Learning: In `test-e2e.sh` (openshift/console-operator), the `else` branch echo statement intentionally uses single quotes (`echo 'KUBERNETES_CONFIG=${KUBECONFIG} go test -timeout 40m -v ./test/e2e/'`) so that `${KUBECONFIG}` is displayed literally as a command template for the user to run manually, not expanded. This is intentional design — do not flag it as a ShellCheck SC2016 issue.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: CR Repo: openshift/console-operator PR: 0 File: TESTING.md:0-0 Timestamp: 2026-05-28T16:57:00.121Z Learning: Run end-to-end tests using `make test-e2e` (requires cluster access)

Learnt from: CR Repo: openshift/console-operator PR: 0 File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0 Timestamp: 2026-06-01T15:06:28.169Z Learning: Applies to **/{test,tests,e2e,integration}/**/*_test.go : When new Ginkgo e2e tests are added, check for external connectivity requirements including connections to public internet hosts (e.g., google.com, github.com, quay.io, registry.redhat.io), pulling images from public registries, downloading content from external URLs, DNS resolution of public hostnames, and connections to external APIs or services outside the cluster

Learnt from: CR Repo: openshift/console-operator PR: 0 File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0 Timestamp: 2026-06-01T15:06:28.169Z Learning: Applies to **/{test,tests,e2e,integration}/**/*_test.go : When new Ginkgo e2e tests are added, check whether they reference namespaces that do not exist on MicroShift: openshift-kube-apiserver, openshift-kube-controller-manager, openshift-kube-scheduler

Learnt from: CR Repo: openshift/console-operator PR: 0 File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0 Timestamp: 2026-06-01T15:06:28.169Z Learning: Applies to **/{test,tests,e2e,integration}/**/*_test.go : When new Ginkgo e2e tests are added, check whether they make assumptions about multi-node or HA clusters in Single Node OpenShift (SNO). Flag tests that expect multiple control-plane/master nodes, multiple worker nodes, pod anti-affinity, node-to-node communication patterns, leader election failover, pod rescheduling to different nodes, node scaling, separate infra/worker/master roles, rolling update assumptions, or ingress/load balancing behavior dependent on multiple endpoints on different nodes

Learnt from: CR Repo: openshift/console-operator PR: 0 File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0 Timestamp: 2026-06-01T15:06:28.169Z Learning: Applies to **/{test,tests,e2e,integration}/**/*_test.go : Review Ginkgo test code for appropriate timeouts - operations that interact with the cluster must include timeouts. Flag indefinite waits or missing timeouts on Eventually/Consistently calls

Learnt from: CR Repo: openshift/console-operator PR: 0 File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0 Timestamp: 2026-06-01T15:06:28.169Z Learning: Applies to **/{test,tests,e2e,integration}/**/*_test.go : When new Ginkgo e2e tests are added, check whether they make unsupported MicroShift assumptions including multi-node or HA assumptions, FeatureGate resources, upgrade/update workflows based on ClusterVersion, node scaling expectations, or multi-replica control-plane component deployments

Learnt from: CR Repo: openshift/console-operator PR: 0 File: .claude/skills/e2e-test-review.md:0-0 Timestamp: 2026-06-04T07:36:55.991Z Learning: Applies to test/e2e/**/*.go : Use 5-second poll intervals for most checks to balance responsiveness with API load; adjust based on what is being polled

Learnt from: CR Repo: openshift/console-operator PR: 0 File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0 Timestamp: 2026-06-01T15:06:28.169Z Learning: Applies to **/{test,tests,e2e,integration}/**/*_test.go : When new Ginkgo e2e tests are added, check whether they use any APIs or features that are NOT available on MicroShift. Flag tests that reference Project/ProjectRequest, Build/BuildConfig, DeploymentConfig, ClusterOperator, ClusterVersion, Etcd operator, CSV/OLM resources, MachineSet/Machine/MachineHealthCheck, ClusterAutoscaler/MachineAutoscaler, Console, Monitoring stack components, ImageRegistry operator, Samples operator, OperatorHub/CatalogSource/PackageManifest, CloudCredential/CredentialsRequest, Storage operator, Network operator CRDs, or any OpenShift API groups besides Route and SecurityContextConstraints

Learnt from: CR Repo: openshift/console-operator PR: 0 File: vendor/github.com/fsnotify/fsnotify/CONTRIBUTING.md:0-0 Timestamp: 2026-06-01T15:02:53.921Z Learning: Run `go test ./...` to execute all tests; use the `-short` flag to make stress tests run faster

Learnt from: CR Repo: openshift/console-operator PR: 0 File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0 Timestamp: 2026-06-01T15:06:28.169Z Learning: Applies to **/{test,tests,e2e,integration}/**/*_test.go : Review Ginkgo test code for proper setup and cleanup - tests should use BeforeEach/AfterEach for setup and cleanup. Flag tests that create resources without cleanup, especially cluster-scoped resources

Learnt from: CR Repo: openshift/console-operator PR: 0 File: .claude/skills/e2e-test-review.md:0-0 Timestamp: 2026-06-04T07:36:55.991Z Learning: Applies to test/e2e/**/*.go : Use `framework.AsyncOperationTimeout` constant instead of hardcoding timeout durations in test code

Learnt from: CR Repo: openshift/console-operator PR: 0 File: .claude/skills/e2e-test-review.md:0-0 Timestamp: 2026-06-04T07:36:55.991Z Learning: Applies to test/e2e/**/*.go : Provide helpful error messages with context in assertions (e.g., namespace, name, timeout duration) rather than vague messages

Learnt from: CR Repo: openshift/console-operator PR: 0 File: vendor/github.com/NYTimes/gziphandler/CONTRIBUTING.md:0-0 Timestamp: 2026-06-01T15:02:45.315Z Learning: Ensure code changes pass `go test` locally and on Travis CI

Learnt from: CR Repo: openshift/console-operator PR: 0 File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0 Timestamp: 2026-06-01T15:06:28.169Z Learning: Applies to **/{test,tests,e2e,integration}/**/*_test.go : Review Ginkgo test code for consistency with codebase patterns - tests should follow existing patterns for how fixtures are created, clients are obtained, and waits are structured

openshift-ci · 2026-06-04T19:13:22Z

@asadawar: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-operator	`33be859`	link	true	`/test e2e-aws-operator`
ci/prow/e2e-aws-console	`33be859`	link	true	`/test e2e-aws-console`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci Bot requested review from TheRealJon, jhadvig and spadgett May 29, 2026 10:54

openshift-ci Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 29, 2026

openshift-ci Bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 4, 2026

jhadvig approved these changes Jun 4, 2026

View reviewed changes

openshift-ci Bot assigned jhadvig Jun 4, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 4, 2026

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 4, 2026

openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 4, 2026

asadawar force-pushed the OCPBUGS-86719-sequential-rollout branch from 5224b0b to 33be859 Compare June 4, 2026 15:56

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

	echo 'KUBERNETES_CONFIG=${KUBECONFIG} go test -timeout 40m -v ./test/e2e/'
	echo "KUBERNETES_CONFIG=${KUBECONFIG} go test -timeout 40m -v ./test/e2e/"

Conversation

asadawar commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this approach

Root cause

Cluster verification

Test plan

OWNERS

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented May 29, 2026

Summary

Why this approach

Root cause

Cluster verification

Test plan

OWNERS

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (2 warnings)

Uh oh!

openshift-ci Bot commented May 29, 2026

Uh oh!

jhadvig commented Jun 4, 2026

Uh oh!

jhadvig left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented Jun 4, 2026

Uh oh!

asadawar commented Jun 4, 2026

Uh oh!

openshift-ci Bot commented Jun 4, 2026

Uh oh!

asadawar commented Jun 4, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asadawar Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

asadawar commented May 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading

coderabbitai Bot Jun 4, 2026 •

edited

Loading