Skip to content

🐛 OCPBUGS-77829: fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out#2566

Open
camilamacedo86 wants to merge 1 commit into
operator-framework:mainfrom
camilamacedo86:fix-bug-analyse
Open

🐛 OCPBUGS-77829: fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out#2566
camilamacedo86 wants to merge 1 commit into
operator-framework:mainfrom
camilamacedo86:fix-bug-analyse

Conversation

@camilamacedo86
Copy link
Copy Markdown
Contributor

@camilamacedo86 camilamacedo86 commented Mar 16, 2026

What's wrong

When a revision is rolling out, the controller skips the resolver — even if the CE consumer changes the spec. The CE consumer's only option to recover is manually deleting revisions.

What this PR does

Fixes Boxcutter rollout behavior so ClusterExtension spec updates (during an in-progress/stuck rollout) are no longer ignored, by persisting and comparing a catalog-spec fingerprint per revision.

Changes:

  • Add a SHA-256 “source digest” annotation on each Boxcutter revision and plumb it through revision state reading.
  • Reuse a rolling-out revision only when its stored digest matches the current ClusterExtension catalog spec; otherwise re-resolve.
  • Add E2E + controller-unit coverage for “spec changes during rollout triggers re-resolution” and digest normalization behavior.

Backward compatibility

Revisions created before this PR have no fingerprint. The controller detects this and falls back to the old behavior until a new revision is created.

RFC: https://docs.google.com/document/d/1qNjdal3zaz7QTGHPAnPRO1aRnaum5iup0rRyiFar8CM/edit?tab=t.0#heading=h.x3tfh25grvnv
Motivated by: https://redhat.atlassian.net/browse/OCPBUGS-77829

PS: :jira-blocker: by thread: https://redhat-internal.slack.com/archives/C06KP34REFJ/p1774251840822449?thread_ts=1773998183.972329&cid=C06KP34REFJ

Copilot AI review requested due to automatic review settings March 16, 2026 10:29
@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 16, 2026

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit b988ace
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/6a15be2e9fd86e0008ca73ad
😎 Deploy Preview https://deploy-preview-2566--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes operator-controller (Boxcutter runtime) behavior where a ClusterExtension spec change could be ignored while a prior ClusterExtensionRevision is still rolling out, by re-resolving from the catalog when the rolling-out revision no longer matches the current spec.

Changes:

  • Update ResolveBundle to reuse the rolling-out revision only when it still matches the current package/version constraints; otherwise re-resolve from the catalog.
  • Add rollingOutRevisionMatchesSpec helper to compare rolling-out revision metadata against spec version constraints.
  • Add unit and E2E coverage for “spec change while rolling out” behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
internal/operator-controller/controllers/clusterextension_reconcile_steps.go Implements spec-vs-rolling-out compatibility check and re-resolve behavior.
internal/operator-controller/controllers/clusterextension_controller_test.go Updates an existing Boxcutter test fixture and adds 2 new tests for re-resolve vs reuse behavior.
test/e2e/features/update.feature Adds E2E scenario verifying version change during a stuck rollout triggers re-resolution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread internal/operator-controller/controllers/clusterextension_reconcile_steps.go Outdated
Comment thread internal/operator-controller/controllers/clusterextension_reconcile_steps.go Outdated
Comment thread internal/operator-controller/controllers/clusterextension_controller_test.go Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 16, 2026

Codecov Report

❌ Patch coverage is 84.80000% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.25%. Comparing base (afa2e7a) to head (b988ace).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...er/controllers/clusterextension_reconcile_steps.go 82.56% 14 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2566      +/-   ##
==========================================
+ Coverage   68.12%   68.25%   +0.12%     
==========================================
  Files         145      145              
  Lines       10698    10801     +103     
==========================================
+ Hits         7288     7372      +84     
- Misses       2880     2894      +14     
- Partials      530      535       +5     
Flag Coverage Δ
e2e 36.85% <12.00%> (-0.32%) ⬇️
experimental-e2e 52.76% <64.80%> (+0.07%) ⬆️
unit 54.13% <78.40%> (+0.32%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@pedjak pedjak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

IMHO, the added e2e test could be made faster.

Comment thread test/e2e/features/update.feature Outdated
@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 16, 2026
Copilot AI review requested due to automatic review settings March 16, 2026 17:31
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 16, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 16, 2026

New changes are detected. LGTM label has been removed.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 16, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from pedjak. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This comment was marked as outdated.

@camilamacedo86 camilamacedo86 changed the title 🐛 fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out WIP 🐛 fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out Mar 16, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 16, 2026
@camilamacedo86 camilamacedo86 force-pushed the fix-bug-analyse branch 2 times, most recently from 392ae57 to d952dcf Compare March 17, 2026 14:42
@camilamacedo86 camilamacedo86 changed the title WIP 🐛 fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out 🐛 fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out Mar 17, 2026
@camilamacedo86 camilamacedo86 requested a review from Copilot March 17, 2026 14:45
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 17, 2026

This comment was marked as outdated.

@camilamacedo86 camilamacedo86 force-pushed the fix-bug-analyse branch 2 times, most recently from 259bebc to 9c7214a Compare March 17, 2026 15:06
Copilot AI review requested due to automatic review settings March 17, 2026 15:06

This comment was marked as outdated.

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Boxcutter’s rollout behavior so ClusterExtension catalog-spec changes made during an in-progress (or stuck) rollout are detected and trigger re-resolution, rather than continuing to reuse a stale rolling-out revision.

Changes:

  • Persist a SHA-256 “catalog source spec digest” on each Boxcutter revision and read it back into revision state.
  • Reuse an existing rolling-out revision only when its stored digest matches the current ClusterExtension catalog spec; otherwise re-run resolution.
  • Add controller unit tests + an E2E scenario covering “spec change during stuck rollout triggers re-resolution” and digest normalization behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/e2e/features/update.feature Adds an E2E scenario ensuring version changes during a stuck rollout cause a new resolution and successful rollout.
internal/operator-controller/labels/labels.go Introduces a new annotation key constant for storing the source digest on revisions.
internal/operator-controller/controllers/suite_test.go Adds a test helper to compute digests consistent with CRD defaults in envtest.
internal/operator-controller/controllers/clusterextension_reconcile_steps.go Implements digest computation/normalization and rolling-out revision reuse vs re-resolve logic; improves version constraint matching when deciding fallback behavior.
internal/operator-controller/controllers/clusterextension_controller_test.go Adds extensive unit coverage for re-resolution behavior during rollout and digest normalization.
internal/operator-controller/controllers/clusterextension_controller.go Extends RevisionMetadata to include SourceDigest.
internal/operator-controller/controllers/boxcutter_reconcile_steps.go Plumbs digest from/to ClusterObjectSet annotations (write on apply, read in revision state getter).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/operator-controller/labels/labels.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Boxcutter rollout behavior so ClusterExtension spec changes made during an in-progress/stuck rollout are no longer ignored. It does this by persisting a catalog-spec fingerprint per revision and only reusing a rolling-out revision when its fingerprint matches the current spec.

Changes:

  • Add a SHA-256 “source digest” (olm.operatorframework.io/source-digest) annotation to Boxcutter revisions and plumb it through revision state reading.
  • Reuse a rolling-out revision only when its stored digest matches the current ClusterExtension.spec.source.catalog inputs; otherwise re-resolve.
  • Add unit + E2E coverage for “spec changes during rollout triggers re-resolution” and digest normalization.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
test/e2e/features/update.feature Adds an E2E scenario validating that changing version during a stuck rollout triggers a new resolution.
internal/operator-controller/labels/labels.go Introduces the SourceDigestKey annotation constant for revision fingerprinting.
internal/operator-controller/controllers/suite_test.go Adds a test helper to compute digests consistent with CRD defaulting (for envtest expectations).
internal/operator-controller/controllers/clusterextension_reconcile_steps.go Implements digest computation/normalization and rolling-out revision reuse vs re-resolve decision logic; improves version constraint handling in resolution error fallback.
internal/operator-controller/controllers/clusterextension_controller_test.go Adds unit coverage for re-resolve/reuse behavior during rollout and digest normalization behavior.
internal/operator-controller/controllers/clusterextension_controller.go Extends RevisionMetadata with SourceDigest to carry the fingerprint through controller logic.
internal/operator-controller/controllers/boxcutter_reconcile_steps.go Reads/writes the source digest annotation on ClusterObjectSet revisions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings April 30, 2026 14:41

This comment was marked as outdated.

@camilamacedo86 camilamacedo86 changed the title 🐛 OCPBUGS-77829: fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out WIP 🐛 OCPBUGS-77829: fix(Boxcutter): Fix ClusterExtension spec changes ignored while revision is rolling out Apr 30, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 30, 2026
Copilot AI review requested due to automatic review settings April 30, 2026 16:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Boxcutter rollout behavior so ClusterExtension catalog-spec changes made during an in-progress rollout are detected and trigger re-resolution instead of continuing to reuse a stale rolling-out revision.

Changes:

  • Add per-revision “source digest” annotation (olm.operatorframework.io/source-digest) and plumb it through revision state reading.
  • Reuse a rolling-out revision only when its stored digest matches the current ClusterExtension.spec.source.catalog fingerprint; otherwise re-resolve.
  • Add controller-unit tests plus an E2E scenario covering “spec changes during stuck rollout triggers re-resolution” and digest normalization cases.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/e2e/features/update.feature Adds an E2E scenario asserting spec version changes during a stuck rollout cause a new resolution and successful install.
internal/operator-controller/labels/labels.go Introduces the SourceDigestKey annotation constant for persisted revision fingerprints.
internal/operator-controller/controllers/suite_test.go Adds a test helper to compute digests in a way consistent with CRD defaulting.
internal/operator-controller/controllers/clusterextension_reconcile_steps.go Implements digest computation + rolling-out revision reuse vs re-resolve decision logic; improves version constraint matching.
internal/operator-controller/controllers/clusterextension_controller_test.go Adds extensive unit coverage for digest-based re-resolution and selector/channel normalization.
internal/operator-controller/controllers/clusterextension_controller.go Extends RevisionMetadata with SourceDigest to carry the persisted fingerprint.
internal/operator-controller/controllers/boxcutter_reconcile_steps.go Reads/writes the source digest annotation when listing revisions and creating new revisions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Comment thread internal/operator-controller/labels/labels.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@tmshort
Copy link
Copy Markdown
Contributor

tmshort commented May 26, 2026

Hmm... the tests passes in the experimental-e2e, but some other issue caused a failure; re-running.

@tmshort
Copy link
Copy Markdown
Contributor

tmshort commented May 26, 2026

Hmm... the tests passes in the experimental-e2e, but some other issue caused a failure; re-running.

And still failing?

@camilamacedo86
Copy link
Copy Markdown
Contributor Author

The failure does not seems related: https://github.com/operator-framework/operator-controller/actions/runs/26450862021/job/77877287902?pr=2566

But I am looking on that

Implements catalog spec digest tracking to detect spec changes during stuck
rollouts, enabling recovery by updating the ClusterExtension spec.

**Problem**: When a ClusterExtension rollout gets stuck (e.g., bad image causing
probe failures), updating the spec to a different version doesn't trigger
re-resolution. The controller reuses the stuck revision indefinitely.

**Solution**: Track a digest of resolution-relevant spec fields
(packageName, version, channels, selector, upgradeConstraintPolicy) in
ClusterObjectSet annotations. When spec changes during rollout, detect
digest mismatch and trigger re-resolution to create a new revision.

Generated-by: Claude
@camilamacedo86
Copy link
Copy Markdown
Contributor Author

@pedjak @tmshort now it is passing in the e2e tests please fell free to give a look.
I think it is good to get a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants