Skip to content

perf(scripts): speed up RHDH operator install using install-rhdh-catalog-source.sh (~30 min → ~2 min)#2870

Merged
openshift-merge-bot[bot] merged 6 commits into
redhat-developer:mainfrom
subhashkhileri:perf/parallel-iib-bundle-processing
May 28, 2026
Merged

perf(scripts): speed up RHDH operator install using install-rhdh-catalog-source.sh (~30 min → ~2 min)#2870
openshift-merge-bot[bot] merged 6 commits into
redhat-developer:mainfrom
subhashkhileri:perf/parallel-iib-bundle-processing

Conversation

@subhashkhileri
Copy link
Copy Markdown
Member

@subhashkhileri subhashkhileri commented May 20, 2026

JIRA : https://redhat.atlassian.net/browse/RHIDP-13676

Summary

  • Skips slow skopeo inspect (~42s/bundle) — attempts the copy directly; failed copies (~3s) are faster than successful inspects, so images that don't exist on quay are handled with minimal overhead
  • Parallel bundle processing with MAX_PARALLEL=10 (env-overridable)
  • Parallel opm render + registry setup — these two phases are independent, so they now run concurrently; bundle processing waits until both are done
  • Atomic render.yaml updates — each worker writes a per-bundle .sed file (keyed by digest, no locking needed); all replacements are applied in a single sed -f pass after all workers complete
  • --ignore-not-found replaces the check-then-delete secret pattern
  • Force OLM re-index by deleting the CatalogSource before recreating it — without this, OLM skips re-indexing when the tag is unchanged but the image digest changed (rebuilt IIB)
  • Fail loudly if any bundle fails to process (was: silent error log + continue)

Assisted-by: Claude Code

- Skips slow `skopeo inspect` (~42s/bundle) — attempts the copy directly
  instead; failed copies (~3s) are faster than successful inspects
- Processes bundles in parallel up to MAX_PARALLEL (default 10), with a
  portable kill-0 throttle loop that prunes finished PIDs each iteration
- Collects per-worker sed files and applies them in one pass after all
  bundles complete, avoiding concurrent writes to render.yaml
- Runs `opm render` and cluster registry setup in parallel since they
  are independent; waits before the bundle-processing phase begins
- Replaces check-then-delete secret pattern with --ignore-not-found
- Deletes existing CatalogSource before recreating to force OLM re-index
  when the tag is unchanged but the digest has changed (rebuilt IIB)
- Fails loudly if any bundle fails to process (was: silent error log)

Assisted-by: Claude Code
Co-Authored-By: Claude Code <noreply@anthropic.com>
@subhashkhileri subhashkhileri requested a review from a team as a code owner May 20, 2026 08:11
@openshift-ci openshift-ci Bot requested review from gazarenkov and zdrapela May 20, 2026 08:11
@subhashkhileri subhashkhileri changed the title perf(scripts): parallelize IIB bundle processing (~27 min → ~5 min) perf(scripts): parallelize IIB bundle processing (~27 min → ~2 min) May 20, 2026
@subhashkhileri subhashkhileri changed the title perf(scripts): parallelize IIB bundle processing (~27 min → ~2 min) perf(scripts): parallelize IIB bundle processing (~30 min → ~2 min) May 20, 2026
@rhdh-qodo-merge
Copy link
Copy Markdown

rhdh-qodo-merge Bot commented May 20, 2026

Review Summary by Qodo

(Agentic_describe updated until commit 49ef790)

Parallelize IIB bundle processing for ~13x speedup

✨ Enhancement 🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Parallelize bundle processing with configurable MAX_PARALLEL (default 10)
  - Skips slow skopeo inspect (~42s/bundle); attempts copy directly instead
  - Processes bundles concurrently with portable kill-0 throttle loop
  - Collects per-worker sed files; applies replacements in single pass
• Run opm render and registry setup concurrently (independent phases)
• Force OLM re-index by deleting CatalogSource before recreation
• Improve error handling and debugging
  - Add set -euo pipefail to process_bundle function
  - Preserve skopeo stderr to per-bundle files for troubleshooting
  - Fail loudly if any bundle fails (was silent error log)
  - Validate MAX_PARALLEL is positive integer
• Simplify secret deletion with --ignore-not-found flag
Diagram
flowchart LR
  A["Extract bundle images"] --> B["Parallel bundle processing<br/>MAX_PARALLEL=10"]
  A --> C["opm render"]
  A --> D["Registry setup"]
  B --> E["Collect sed files"]
  C --> F["Wait for completion"]
  D --> F
  E --> G["Apply replacements<br/>single sed pass"]
  F --> G
  G --> H["Delete CatalogSource"]
  H --> I["Recreate CatalogSource<br/>force OLM re-index"]

Loading

File Changes

1. .rhdh/scripts/install-rhdh-catalog-source.sh Enhancement, bug fix, error handling +121/-48

Parallelize bundle processing and improve error handling

• Add MAX_PARALLEL environment variable with validation (default 10)
• Extract bundle processing logic into new process_bundle() function with set -euo pipefail
• Refactor update_refs_in_iib_bundles() to spawn parallel workers with portable throttle loop
• Collect per-bundle sed commands and apply atomically in single pass after all workers complete
• Run render_iib and registry setup concurrently; wait for both before bundle processing
• Replace check-then-delete secret pattern with --ignore-not-found flag
• Delete existing CatalogSource before recreation to force OLM re-indexing on digest changes
• Preserve skopeo stderr to per-bundle .copy.err files for debugging
• Add error handling to fail loudly if any bundle worker fails
• Update trap to kill background processes on EXIT/INT/TERM
• Remove unused variable declarations and simplify comments

.rhdh/scripts/install-rhdh-catalog-source.sh


Grey Divider

Qodo Logo

@rhdh-qodo-merge rhdh-qodo-merge Bot added enhancement New feature or request Other labels May 20, 2026
@subhashkhileri subhashkhileri changed the title perf(scripts): parallelize IIB bundle processing (~30 min → ~2 min) erf(scripts): speed up RHDH operator cluster install (~30 min → ~2 min) May 20, 2026
…opeo stderr

Background subshells don't inherit set -e from the parent, so
intermediate failures (umoci, skopeo push) went undetected and the
worker would write a .sed entry for a broken bundle. Also redirect
speculative copy stderr to a per-bundle file instead of /dev/null
so auth failures, timeouts, and disk errors are debuggable.

Assisted-by: Claude Code
Co-Authored-By: Claude Code <noreply@anthropic.com>
@subhashkhileri subhashkhileri changed the title erf(scripts): speed up RHDH operator cluster install (~30 min → ~2 min) perf(scripts): speed up RHDH operator cluster install (~30 min → ~2 min) May 20, 2026
@subhashkhileri subhashkhileri changed the title perf(scripts): speed up RHDH operator cluster install (~30 min → ~2 min) perf(scripts): speed up RHDH operator install using install-rhdh-catalog-source.sh (~30 min → ~2 min) May 20, 2026
Assisted-by: Claude Code
Co-Authored-By: Claude Code <noreply@anthropic.com>
@rhdh-qodo-merge
Copy link
Copy Markdown

rhdh-qodo-merge Bot commented May 25, 2026

Persistent review updated to latest commit 49ef790

Comment thread .rhdh/scripts/install-rhdh-catalog-source.sh Outdated
Comment thread .rhdh/scripts/install-rhdh-catalog-source.sh Outdated
Comment thread .rhdh/scripts/install-rhdh-catalog-source.sh
Comment thread .rhdh/scripts/install-rhdh-catalog-source.sh
@rhdh-qodo-merge rhdh-qodo-merge Bot added Bug fix and removed Other labels May 25, 2026
Comment thread .rhdh/scripts/install-rhdh-catalog-source.sh
Comment thread .rhdh/scripts/install-rhdh-catalog-source.sh Outdated
…cess group

kill 0 sends SIGTERM to the entire process group including the parent
shell/CI harness, causing segfaults on normal exit. Use jobs -p to
target only this script's background jobs. Split INT/TERM from EXIT
to avoid re-entrant cleanup.

Assisted-by: Claude Code
Co-Authored-By: Claude Code <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Member

@rm3l rm3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @subhashkhileri !!

@rm3l
Copy link
Copy Markdown
Member

rm3l commented May 28, 2026

/override "PR Publish"
/override "PR Validate"

unrelated

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 28, 2026

@rm3l: Overrode contexts on behalf of rm3l: PR Publish, PR Validate

Details

In response to this:

/override "PR Publish"
/override "PR Validate"

unrelated

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot Bot merged commit c417781 into redhat-developer:main May 28, 2026
11 of 13 checks passed
@rm3l
Copy link
Copy Markdown
Member

rm3l commented May 28, 2026

/cherry-pick release-1.10

Just as a reminder to backport to 1.10 once 1.10.0 is out.

@openshift-cherrypick-robot
Copy link
Copy Markdown

@rm3l: new pull request created: #2913

Details

In response to this:

/cherry-pick release-1.10

Just as a reminder to backport to 1.10 once 1.10.0 is out.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug fix enhancement New feature or request lgtm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants