[release-4.21] OCPBUGS-XXXXX: Backport noOLM Gateway API test coverage and upgrade tests#31232
[release-4.21] OCPBUGS-XXXXX: Backport noOLM Gateway API test coverage and upgrade tests#31232gcs278 wants to merge 6 commits into
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: gcs278 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
6e20b57 to
e14c7a9
Compare
|
Upgrades are actually not working for OLM (only affect 4.20->4.21 right now). https://redhat.atlassian.net/browse/OCPBUGS-86778 Let's test this via: |
|
@gcs278: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
|
/payload-job periodic-ci-openshift-release-main-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-upgrade |
|
@gcs278: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/71de0a60-5b7e-11f1-9d30-2d7158c6abc3-0 |
|
@gcs278: This pull request references NE-2286 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.21." or "openshift-4.21.", but it targets "openshift-4.22" instead. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
bc7da27 to
a9db8cd
Compare
a9db8cd to
39715ca
Compare
|
/test all |
|
/test ? |
|
Origin 4.21 jobs don't run automatically: /test e2e-aws-csi |
|
Testing is a bit tricky for this one. First, we need to prove that we don't break CI with NO-OLM disabled. The pre-submits will mostly prove this already, but there are more upgrade tests that should be run: OLM to OLM: |
|
@gcs278: trigger 8 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ba79f340-603a-11f1-95a8-8af10d374131-0 |
|
Next, we should validate with openshift/cluster-ingress-operator#1442, which shows OLM to noOLM migration is working. For this, we need to vendor the backport + removed FG annotation on RBAC manifests and then the FG to default to allow NO-OLM to run by default: OLM to noOLM: Then, I will run some /testwith for presubmits with NO-OLM to test the non-upgrade NO-OLM test jobs after this |
|
@gcs278: given command is invalid: at least one of the commands given is only supported on a one-command-per-comment basis, please separate out commands as multiple comments |
|
/testwith openshift/origin/release-4.21/e2e-gcp-ovn-upgrade openshift/cluster-ingress-operator#1462 openshift/api#2865 |
|
github not reachable |
Add upgrade test validating Gateway API migration from OLM-based Istio to CIO-managed Sail Library during 4.21 to 4.22 upgrades. Setup creates Gateway/HTTPRoute with OLM provisioning and tests connectivity. Test validates migration: Gateway remains programmed, Istiod running, Istio CRDs stay OLM-managed, GatewayClass has CIO finalizer, Istio CR deleted, subscription persists. Teardown cleans up all resources. Cherry-picked from: cf1f826 openshift#30897
…ip logic The Gateway API upgrade test was calling g.Skip() from Setup(), which runs inside a goroutine managed by the disruption framework. Since g.Skip() panics and Ginkgo can only recover panics inside leaf nodes, this caused unrecoverable panics on IPv6/dual-stack, OKD, and unsupported platform clusters. Implement the upgrades.Skippable interface with a Skip() method that the disruption framework calls before Setup, avoiding the goroutine panic. Refactor checkPlatformSupportAndGetCapabilities into shouldSkipGatewayAPITests (safe outside Ginkgo nodes) and getPlatformCapabilities (returns LB/DNS support). https://redhat.atlassian.net/browse/OCPBUGS-83267 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Cherry-picked from: 8ef51c3 openshift#31000
The Gateway API controller tests tracked Gateways in a shared in-memory gateways slice, deleting them during AfterEach cleanup. However, openshift-tests distributes tests across separate parallel worker processes. The annotation-based checkAllTestsDone coordination works correctly because annotations are stored on the cluster-scoped GatewayClass, but the gateways slice is not shared across processes. The process that runs the final AfterEach cleanup has an empty gateways slice, so it deletes the GatewayClass and istiod but never deletes the Gateways created by other processes. This leaves gateway deployments orphaned on the cluster. As a secondary issue, even when gateways were deleted, the GatewayClass and istiod were removed without waiting for the gateway proxy deployments to be fully cleaned up by GC. Since the deployments have an owner reference to the Gateway (not a finalizer), the cascade deletion is asynchronous, creating a race where gateway pods lose their control plane and crash-loop. Fix both issues by cleaning up gateways at the individual test level using defer deleteGateway, which deletes the Gateway and waits for its proxy deployment to be removed by GC. Add deleteGateway and waitForGatewayDeploymentDeletion helpers shared by both the controller tests and the upgrade test Teardown. Cleanup errors now hard fail to surface leftover resources immediately rather than causing confusing downstream test failures. https://redhat.atlassian.net/browse/OCPBUGS-83281 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: Grant Spence <gspence@redhat.com> Co-Authored-By: Ishmam Amin <iamin@redhat.com> Cherry-picked from: 3f8a12d openshift#31023
Cherry-picked from: e29073f openshift#31023
Cherry-picked from: ca41c36 openshift#31023
Add retry logic to markTestDone to handle optimistic locking conflicts when updating GatewayClass annotations. The CIO actively manages the GatewayClass (updating conditions, status, finalizers) which can cause 409 Conflict errors when tests try to update annotations. Using RetryOnConflict ensures the test automatically retries with the latest resourceVersion when concurrent updates occur. Fixes flake: Operation cannot be fulfilled on gatewayclasses.gateway.networking.k8s.io "openshift-default": the object has been modified; please apply your changes to the latest version and try again https://redhat.atlassian.net/browse/OCPBUGS-81751 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Cherry-picked from: 8e4e43a openshift#30964
39715ca to
cbef80f
Compare
|
I forgot i had a test commit in this PR, removed. |
|
/payload-abort |
|
OLM to OLM: |
|
@gcs278: trigger 8 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b5bbf0d0-604c-11f1-8aeb-16c9b905159d-0 |
|
Origin 4.21 jobs don't run automatically: /test e2e-aws-csi |
|
/testwith openshift/origin/release-4.21/e2e-gcp-ovn openshift/cluster-ingress-operator#1462 openshift/api#2865 |
|
I can't do multiple /payload-job-with-prs command, so here's OLM to NO-OLM migration test of y-minor update: /payload-job-with-prs periodic-ci-openshift-release-main-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade openshift/cluster-ingress-operator#1462 openshift/api#2865 |
|
@gcs278: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d6b82790-604c-11f1-8cfa-10e8e0b0bde5-0 |
|
/testwith openshift/origin/release-4.21/e2e-gcp-ovn-upgrade openshift/cluster-ingress-operator#1462 openshift/api#2865 |
|
/payload-job-with-prs periodic-ci-openshift-release-main-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-upgrade openshift/cluster-ingress-operator#1462 openshift/api#2865 |
|
@gcs278: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9d4f79a0-6069-11f1-8253-78c971bcc8c1-0 |
|
/payload-job-with-prs periodic-ci-openshift-release-main-ci-4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade openshift/cluster-ingress-operator#1462 openshift/api#2865 |
|
@gcs278: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c9301480-6069-11f1-94ef-707e0daebf2d-0 |
|
some odd infra failures in the "payload-jobs-with-prs" upgrade tests: Not related to this. I'll have to figure out why it's trying to pull 4.16 base images. Otherwise everything is looking pretty good. The upgrade /testwith shows migration succeed within a 4.21 z-stream: |
|
/testwith openshift/origin/release-4.21/e2e-gcp-ovn-upgrade openshift/cluster-ingress-operator#1462 openshift/api#2865 |
|
/retest |
|
@gcs278: No Jira issue is referenced in the title of this pull request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@gcs278: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Backport of Gateway API noOLM (Sail Library) test coverage and upgrade tests to release-4.21, as part of the Sail Library backport (NE-2286). This provides full test coverage for the
GatewayAPIWithoutOLMfeature gate, including OLM-to-Sail-Library migration upgrade testing, test flake fixes, and parallel worker cleanup fixes.Depends on #31139
Cherry-picked PRs
Test plan
go build ./test/extended/router/ ./test/e2e/upgrade/compiles🤖 Generated with Claude Code