Skip to content

Add ARM64 (aarch64) multi-architecture Docker build support#393

Open
t0mdavid-m wants to merge 14 commits into
mainfrom
claude/gallant-mendel-K2jDd
Open

Add ARM64 (aarch64) multi-architecture Docker build support#393
t0mdavid-m wants to merge 14 commits into
mainfrom
claude/gallant-mendel-K2jDd

Conversation

@t0mdavid-m
Copy link
Copy Markdown
Member

@t0mdavid-m t0mdavid-m commented May 25, 2026

Summary

This PR adds native ARM64 (aarch64) Docker image builds alongside existing AMD64 builds, enabling the streamlit-template to run on ARM-based systems (Apple Silicon, AWS Graviton, Raspberry Pi, etc.). The changes implement a multi-architecture build pipeline that produces per-arch images and stitches them into a unified multi-arch manifest for transparent architecture selection on pull.

Key Changes

  • New Dockerfiles for ARM64:

    • Dockerfile.arm — Full build with OpenMS, TOPP tools, and pyOpenMS (aarch64 variant)
    • Dockerfile_simple.arm — Lightweight pyOpenMS-only build (aarch64 variant)
    • Both swap the miniforge installer from x86_64 to aarch64 and conditionally copy THIRDPARTY/Linux/aarch64 dependencies
  • CI/CD Pipeline Restructuring (.github/workflows/build-and-test.yml):

    • Renamed build job to build-amd64 for clarity
    • Added new build-arm64 job that runs on native ARM64 runner (ubuntu-24.04-arm)
    • Per-arch tags now include -amd64 / -arm64 suffixes (e.g., main-full-amd64, main-full-arm64)
    • New create-manifest job stitches per-arch tags into multi-arch manifests under the original tag scheme (e.g., main-full, latest)
    • ARM64 build includes disk space cleanup and smoke test (health endpoint validation)
    • Test jobs (test-apptainer, test-nginx, test-traefik) remain AMD64-only for now
  • Manifest Strategy:

    • Per-arch images tagged with architecture suffix for traceability
    • Multi-arch manifests reuse pre-ARM64 tag scheme so existing consumers (k8s overlays, docker-compose, docker pull callers) continue working transparently
    • Docker automatically selects the correct architecture on pull

Implementation Details

  • ARM64 runner uses ubuntu-24.04-arm (native, no QEMU emulation)
  • Miniforge installer URL swapped to Miniforge3-Linux-aarch64.sh in ARM Dockerfiles
  • THIRDPARTY/Linux/aarch64 copy guarded with conditional check (only copies if directory exists)
  • ARM64 build includes provenance: false to avoid attestation overhead on native runner
  • Smoke test on ARM64 validates entrypoint and Streamlit health endpoint post-push
  • Cache keys include -amd64 / -arm64 suffix to prevent cross-arch cache pollution
  • PR builds validate Dockerfile syntax and compilation but don't push per-arch tags (no manifest to create)

Backward Compatibility

Existing image references (:main-full, :latest, :v1.0.0-full) continue to work unchanged — they now resolve to multi-arch manifests that auto-select the correct architecture. No breaking changes to consumers.

https://claude.ai/code/session_01G3xxGELawVjFLSLy9NEg3R

Summary by CodeRabbit

  • New Features

    • Added ARM64 image variants and corresponding multi-architecture manifests.
    • Added ARM64-ready container images for the app (build + runtime) including service defaults, mount points, scheduled cleanup, and optional release asset download.
  • Chores

    • CI split into separate amd64/arm64 build jobs; artifact naming and test matrices updated.
  • Improvements

    • Expanded integration tests across architectures and added cluster-state dumps on failure; Apptainer test remains amd64 and frees disk before loading artifacts.

Review Change Stack

Mirror FLASHApp's split-build / manifest-merge approach so both
linux/amd64 and linux/arm64 are published for the full and simple
variants. Existing `<ref>-full` / `<ref>-simple` / `latest` tags
become multi-arch manifests — k8s overlays, docker-compose users,
and direct `docker pull` callers transparently get the right arch.

Dockerfile.arm (delta from Dockerfile):
 - aarch64 miniforge installer
 - conditional THIRDPARTY/Linux/aarch64 copy (some OpenMS releases
   ship an empty/missing aarch64 dir)
 - pruned thirdparty PATH to tools that actually have ARM builds:
   LuciPHOr2, MSGFPlus, ThermoRawFileParser, Comet, Percolator, Sage

Dockerfile_simple.arm (delta from Dockerfile_simple):
 - aarch64 miniforge installer only — pyOpenMS ships aarch64 wheels
   on PyPI, so `pip install -r requirements.txt` works as-is

The shared docker/entrypoint.sh is reused as-is on ARM: its
apptainer/read-only-root handling is arch-neutral and worth keeping.
Base stays ubuntu:22.04 (Redis 6.0 predates the ARM64-COW-BUG
warning, so no `--ignore-warnings` flag needed).

Workflow changes (build-and-test.yml):
 - `build` renamed `build-amd64`; per-arch tags carry `-amd64`.
 - New `build-arm64` job runs on `ubuntu-24.04-arm`, builds the
   `.arm` Dockerfiles for both variants, ends with a pull-back +
   /_stcore/health probe on push events.
 - New `create-manifest` job stitches `<ref>-<variant>-amd64` +
   `<ref>-<variant>-arm64` into multi-arch `<ref>-<variant>` and
   `latest` manifests.
 - test-apptainer / test-nginx / test-traefik / publish-apptainer
   keep consuming the amd64 artifact only. SIF publishing stays
   amd64-only this iteration.
 - PRs build both arches (registry cache keeps warm runs cheap) but
   don't push; manifest creation also skipped on PRs.

Branch-protection note: the `build` required check is renamed to
`build-amd64`. Admins should update protected-branch rules and add
`build-arm64` / `create-manifest` if those should also be required.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR splits CI into architecture-specific build jobs (amd64/arm64), assembles multi-arch manifests, updates integration tests to run per-arch matrices and download arch-tagged artifacts, and adds two ARM64 Dockerfiles (full OpenMS build and a simpler runtime).

Changes

Multi-arch CI/CD and ARM64 Docker builds

Layer / File(s) Summary
CI jobs: split build, per-arch tags, test wiring
.github/workflows/build-and-test.yml
Replaces monolithic build with build-amd64 and build-arm64, emits -amd64/-arm64 tags and uploads arch-specific artifacts, adds create-manifest to assemble non-suffixed multi-arch manifests, makes test-apptainer amd64-only, expands test-nginx/test-traefik matrices to {variant,arch,runner}, updates artifact download names, adds disk-freeing steps before loads, and adds failure-state dump steps for nginx/traefik tests.
Dockerfile.arm: full OpenMS ARM64 multi-stage build
Dockerfile.arm
Adds a multi-stage ARM64 build: declare build args, install build deps and Miniforge, clone OpenMS with submodules, populate THIRDPARTY, compile OpenMS/TOPP and pyopenms wheels, install remaining Python requirements, package /openms, and create a runtime stage installing Redis/nginx, copying the Streamlit app, configuring cron, patching settings, and optionally downloading a release artifact.
Dockerfile_simple.arm: simplified ARM64 runtime build
Dockerfile_simple.arm
Adds a simpler ARM64 Dockerfile that prepares a Miniforge/mamba Python 3.10 env, installs Python deps, copies the app, configures mount targets and cron, patches settings.json, runs a build-time analytics hook, optionally downloads OpenMS-App.zip when GITHUB_TOKEN is provided, and sets the entrypoint and exposed port.

Sequence Diagram

sequenceDiagram
  participant GH as GitHub Actions
  participant build_amd as build-amd64
  participant build_arm as build-arm64
  participant registry as Container Registry
  participant manifest as create-manifest
  participant tests as Test Jobs

  GH->>build_amd: trigger build-amd64 (linux/amd64)
  GH->>build_arm: trigger build-arm64 (linux/arm64)
  build_amd->>registry: push image tagged -amd64 + upload amd64 artifact
  build_arm->>registry: push image tagged -arm64 + upload arm64 artifact
  build_amd-->>manifest: completion + tag refs
  build_arm-->>manifest: completion + tag refs
  manifest->>registry: docker manifest create --amend (-amd64, -arm64)
  manifest->>registry: push consolidated manifest (non-suffixed)
  manifest-->>tests: manifests and arch artifacts ready
  tests->>tests: run apptainer/nginx/traefik per-arch matrices (where applicable)
Loading

Possibly related PRs

Poem

🐰 I hopped through CI with tags in tow,

amd64 and arm64 in a row,
Manifests stitched both near and far,
Two Dockerfiles—one full, one spar,
Now images dance where runners go.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding ARM64 multi-architecture Docker build support. It is concise, specific, and directly reflects the primary focus of the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/gallant-mendel-K2jDd

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (8)
.github/workflows/build-and-test.yml (1)

150-150: 💤 Low value

Consider adding persist-credentials: false to checkout step.

The checkout action persists git credentials by default, which could be accessed by subsequent steps. While the same pattern exists in other jobs, adding persist-credentials: false is a security hardening best practice, especially on ARM runners which may have different security characteristics.

      - uses: actions/checkout@v4
+       with:
+         persist-credentials: false
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/build-and-test.yml at line 150, The checkout step using
actions/checkout@v4 is persisting git credentials by default; update the
checkout step (the actions/checkout@v4 invocation) to include
persist-credentials: false under its with: block so credentials are not left for
subsequent steps—ensure the existing checkout step (actions/checkout@v4) is
modified to add the persist-credentials: false setting.
Dockerfile_simple.arm (4)

73-75: 💤 Low value

Remove duplicate WORKDIR statement.

Line 75 duplicates the WORKDIR /app already set on line 73.

♻️ Proposed fix
 # create workdir and copy over all streamlit related files/folders
 WORKDIR /app
 # note: specifying folder with slash as suffix and repeating the folder name seems important to preserve directory structure
-WORKDIR /app
 COPY assets/ /app/assets
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile_simple.arm` around lines 73 - 75, Remove the redundant Dockerfile
WORKDIR directive: there are two identical statements "WORKDIR /app" (the
duplicate on the later line). Keep a single WORKDIR /app and delete the repeated
one to avoid unnecessary duplication and potential confusion during image
builds; search for the "WORKDIR /app" entries and remove the second occurrence.

23-25: ⚡ Quick win

Combine apt-get update with apt-get install.

Same issue as in Dockerfile.arm - the separate apt-get update can lead to stale package index issues when the layer is cached.

♻️ Proposed fix
-RUN apt-get -y update
-# note: streamlit in docker needs libgtk2.0-dev (see https://yugdamor.medium.com/importerror-libgthread-2-0-so-0-cannot-open-shared-object-file-no-such-file-or-directory-895b94a7827b)
-RUN apt-get install -y --no-install-recommends --no-install-suggests wget ca-certificates libgtk2.0-dev curl jq cron nginx
+# note: streamlit in docker needs libgtk2.0-dev (see https://yugdamor.medium.com/importerror-libgthread-2-0-so-0-cannot-open-shared-object-file-no-such-file-or-directory-895b94a7827b)
+RUN apt-get -y update && \
+    apt-get install -y --no-install-recommends --no-install-suggests wget ca-certificates libgtk2.0-dev curl jq cron nginx
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile_simple.arm` around lines 23 - 25, The Dockerfile currently runs
`RUN apt-get -y update` and then a separate `RUN apt-get install -y
--no-install-recommends ...` which can produce stale index issues when layers
are cached; combine them into a single `RUN` that does `apt-get update &&
apt-get install -y --no-install-recommends --no-install-suggests wget
ca-certificates libgtk2.0-dev curl jq cron nginx` (and optionally `rm -rf
/var/lib/apt/lists/*` afterwards) so both commands run in one layer and the
package index is fresh.

123-124: 💤 Low value

Remove ineffective SHELL directive.

This SHELL directive has no effect since there are no subsequent RUN commands after it. The EXPOSE and ENTRYPOINT instructions don't use the shell.

♻️ Proposed fix
     fi
 
-# make sure that mamba environment is used
-SHELL ["mamba", "run", "-n", "streamlit-env", "/bin/bash", "-c"]
-
 EXPOSE $PORT
 ENTRYPOINT ["/app/entrypoint.sh"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile_simple.arm` around lines 123 - 124, The SHELL ["mamba", "run",
"-n", "streamlit-env", "/bin/bash", "-c"] directive is ineffective because there
are no subsequent RUN instructions that would use it; remove this SHELL line
from the Dockerfile (or alternatively move it above any RUN commands if you
intend mamba to be used for build steps). Ensure EXPOSE and ENTRYPOINT remain
unchanged (they don't use the SHELL), and confirm there are no other RUN blocks
that rely on "mamba" or the "streamlit-env" environment before deleting or
moving the SHELL directive.

11-12: ⚡ Quick win

Remove unused ARGs.

OPENMS_REPO and OPENMS_BRANCH are declared but never used in this simplified Dockerfile since it installs pyOpenMS from pip rather than building from source. These appear to be copy-paste artifacts from Dockerfile.arm.

♻️ Proposed fix
 FROM ubuntu:22.04 AS stage1
-ARG OPENMS_REPO=https://github.com/OpenMS/OpenMS.git
-ARG OPENMS_BRANCH=develop
 ARG PORT=8501
 # Streamlit app GitHub user name (to download artifact from).
 ARG GITHUB_USER=OpenMS
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile_simple.arm` around lines 11 - 12, Remove the unused build
arguments OPENMS_REPO and OPENMS_BRANCH from Dockerfile_simple.arm: locate the
ARG declarations for OPENMS_REPO and OPENMS_BRANCH and delete them (they are
artifacts from Dockerfile.arm and not used because pyOpenMS is installed from
pip); ensure no other references to OPENMS_REPO or OPENMS_BRANCH remain in the
simplified Dockerfile or accompanying build scripts, and run a quick build to
confirm no missing build-arg references.
Dockerfile.arm (3)

1-6: 💤 Low value

Minor documentation issues.

  • Line 1: Typo "thidparty" should be "thirdparty"
  • Line 6: References streamlitappsimple but the build command on line 4 uses streamlitapp - these should be consistent
📝 Proposed fix
-# This Dockerfile builds OpenMS, the TOPP tools, pyOpenMS and thidparty tools.
+# This Dockerfile builds OpenMS, the TOPP tools, pyOpenMS and thirdparty tools.
 # It also adds a basic streamlit server that serves a pyOpenMS-based app.
 # hints:
 # build image and give it a name (here: streamlitapp) with: docker build -f Dockerfile.arm --no-cache -t streamlitapp:latest-arm64 --build-arg GITHUB_TOKEN=<your-github-token> . 2>&1 | tee build.log
 # check if image was build: docker image ls
-# run container: docker run -p 8501:8501 streamlitappsimple:latest
+# run container: docker run -p 8501:8501 streamlitapp:latest-arm64
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile.arm` around lines 1 - 6, Fix the typos and inconsistent image name
in the Dockerfile comments: change the misspelled word "thidparty" to
"thirdparty" and make the container name consistent by using the same image name
in the run example as in the build example (use "streamlitapp" instead of
"streamlitappsimple" or vice versa), updating the comment lines inside
Dockerfile.arm so both build and run commands reference the same image tag.

68-68: 💤 Low value

Remove no-op cd command.

The && cd /OpenMS at the end of this RUN instruction has no effect since WORKDIR /OpenMS is set immediately after on line 71.

♻️ Proposed fix
-RUN git clone --recursive --depth=1 -b ${OPENMS_BRANCH} --single-branch ${OPENMS_REPO} && cd /OpenMS
+RUN git clone --recursive --depth=1 -b ${OPENMS_BRANCH} --single-branch ${OPENMS_REPO}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile.arm` at line 68, The RUN line currently ends with a no-op "&& cd
/OpenMS" because WORKDIR /OpenMS is set immediately afterwards; remove the
redundant "&& cd /OpenMS" from the RUN command (the rest of the git clone
options like --recursive --depth=1 -b ${OPENMS_BRANCH} --single-branch
${OPENMS_REPO} should remain unchanged) so the RUN only performs the clone and
leaves directory handling to WORKDIR.

22-23: ⚡ Quick win

Combine apt-get update with apt-get install to prevent stale index issues.

When apt-get update runs alone, Docker caches that layer. If the base image or package sources change, the cached update layer may have a stale index while install fetches packages, potentially causing package-not-found errors.

♻️ Proposed fix
-RUN apt-get -y update
-RUN apt-get install -y --no-install-recommends --no-install-suggests g++ autoconf automake patch libtool make git gpg wget ca-certificates curl jq libgtk2.0-dev openjdk-8-jdk cron
+RUN apt-get -y update && \
+    apt-get install -y --no-install-recommends --no-install-suggests g++ autoconf automake patch libtool make git gpg wget ca-certificates curl jq libgtk2.0-dev openjdk-8-jdk cron
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile.arm` around lines 22 - 23, Combine the two apt-get layers into a
single RUN so the package index and install occur atomically: replace the
separate "RUN apt-get -y update" and "RUN apt-get install -y
--no-install-recommends ..." with a single "RUN apt-get update && apt-get
install -y --no-install-recommends --no-install-suggests g++ autoconf automake
patch libtool make git gpg wget ca-certificates curl jq libgtk2.0-dev
openjdk-8-jdk cron && rm -rf /var/lib/apt/lists/*" to avoid stale index issues
and trim apt lists; update the Dockerfile lines containing those RUN commands
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/workflows/build-and-test.yml:
- Line 150: The checkout step using actions/checkout@v4 is persisting git
credentials by default; update the checkout step (the actions/checkout@v4
invocation) to include persist-credentials: false under its with: block so
credentials are not left for subsequent steps—ensure the existing checkout step
(actions/checkout@v4) is modified to add the persist-credentials: false setting.

In `@Dockerfile_simple.arm`:
- Around line 73-75: Remove the redundant Dockerfile WORKDIR directive: there
are two identical statements "WORKDIR /app" (the duplicate on the later line).
Keep a single WORKDIR /app and delete the repeated one to avoid unnecessary
duplication and potential confusion during image builds; search for the "WORKDIR
/app" entries and remove the second occurrence.
- Around line 23-25: The Dockerfile currently runs `RUN apt-get -y update` and
then a separate `RUN apt-get install -y --no-install-recommends ...` which can
produce stale index issues when layers are cached; combine them into a single
`RUN` that does `apt-get update && apt-get install -y --no-install-recommends
--no-install-suggests wget ca-certificates libgtk2.0-dev curl jq cron nginx`
(and optionally `rm -rf /var/lib/apt/lists/*` afterwards) so both commands run
in one layer and the package index is fresh.
- Around line 123-124: The SHELL ["mamba", "run", "-n", "streamlit-env",
"/bin/bash", "-c"] directive is ineffective because there are no subsequent RUN
instructions that would use it; remove this SHELL line from the Dockerfile (or
alternatively move it above any RUN commands if you intend mamba to be used for
build steps). Ensure EXPOSE and ENTRYPOINT remain unchanged (they don't use the
SHELL), and confirm there are no other RUN blocks that rely on "mamba" or the
"streamlit-env" environment before deleting or moving the SHELL directive.
- Around line 11-12: Remove the unused build arguments OPENMS_REPO and
OPENMS_BRANCH from Dockerfile_simple.arm: locate the ARG declarations for
OPENMS_REPO and OPENMS_BRANCH and delete them (they are artifacts from
Dockerfile.arm and not used because pyOpenMS is installed from pip); ensure no
other references to OPENMS_REPO or OPENMS_BRANCH remain in the simplified
Dockerfile or accompanying build scripts, and run a quick build to confirm no
missing build-arg references.

In `@Dockerfile.arm`:
- Around line 1-6: Fix the typos and inconsistent image name in the Dockerfile
comments: change the misspelled word "thidparty" to "thirdparty" and make the
container name consistent by using the same image name in the run example as in
the build example (use "streamlitapp" instead of "streamlitappsimple" or vice
versa), updating the comment lines inside Dockerfile.arm so both build and run
commands reference the same image tag.
- Line 68: The RUN line currently ends with a no-op "&& cd /OpenMS" because
WORKDIR /OpenMS is set immediately afterwards; remove the redundant "&& cd
/OpenMS" from the RUN command (the rest of the git clone options like
--recursive --depth=1 -b ${OPENMS_BRANCH} --single-branch ${OPENMS_REPO} should
remain unchanged) so the RUN only performs the clone and leaves directory
handling to WORKDIR.
- Around line 22-23: Combine the two apt-get layers into a single RUN so the
package index and install occur atomically: replace the separate "RUN apt-get -y
update" and "RUN apt-get install -y --no-install-recommends ..." with a single
"RUN apt-get update && apt-get install -y --no-install-recommends
--no-install-suggests g++ autoconf automake patch libtool make git gpg wget
ca-certificates curl jq libgtk2.0-dev openjdk-8-jdk cron && rm -rf
/var/lib/apt/lists/*" to avoid stale index issues and trim apt lists; update the
Dockerfile lines containing those RUN commands accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7cf4b60e-ddd9-429c-a5fd-4fabe6a9091e

📥 Commits

Reviewing files that changed from the base of the PR and between 6ca8e97 and b0e682e.

📒 Files selected for processing (3)
  • .github/workflows/build-and-test.yml
  • Dockerfile.arm
  • Dockerfile_simple.arm

claude added 13 commits May 25, 2026 19:43
Previously the apptainer/nginx/traefik integration tests only ran
against the amd64 artifact, so the arm64 image was validated solely
by its build succeeding plus a post-push /_stcore/health probe. Now
all three integration matrices fan out over arch=[amd64, arm64] with
a matrix-driven runs-on, exercising the read-only-root apptainer
contract and both kind-based ingress paths on a native ARM runner
too.

Changes:
 - `build-amd64` artifact renamed from
   `openms-streamlit-<variant>-image` to
   `openms-streamlit-<variant>-amd64-image` for symmetry.
 - `build-arm64` now also `load: true`'s the built image, retags to
   the kind-friendly `openms-streamlit:test`, saves it as a tar, and
   uploads it as `openms-streamlit-<variant>-arm64-image`. The
   post-push pull-back smoke test is removed — the new apptainer/
   nginx/traefik runs subsume it and avoid the slow GHCR pull.
 - `test-apptainer`, `test-nginx`, `test-traefik` matrices switched
   from `variant: [full, simple]` to an `include:` list with
   {variant, arch, runner} tuples; `runs-on: ${{ matrix.runner }}`
   selects `ubuntu-latest` for amd64 and `ubuntu-24.04-arm` for arm64.
   Artifact download names get `${{ matrix.arch }}` interpolated.
 - SIF upload at the tail of `test-apptainer` gated on
   `matrix.arch == 'amd64'`: arm64 still runs the full apptainer
   contract end-to-end, but only amd64 produces the SIF that
   `publish-apptainer` ships to GHCR (HPC SIF consumers are amd64).

Note on `publish-apptainer`: it stays on `needs: test-apptainer`,
which now waits for the arm64 matrix entries too — meaning an arm64
apptainer regression will block amd64 SIF publishing. Conservative
on purpose; happy to decouple via separate jobs if that turns out to
be too strict in practice.
The ARM build of `make -j4 TOPP` failed at the link step with

    /usr/bin/ld: /root/miniforge3/lib/libyaml-cpp.so.0.8: undefined
    reference to `std::ios_base_library_init()@GLIBCXX_3.4.32'

The conda-forge libyaml-cpp wheel for aarch64 is built against
GLIBCXX_3.4.32 (gcc 13+), but Ubuntu 22.04's system g++ ships with an
older libstdc++. Running cmake inside the mamba shell lets it discover
/root/miniforge3/lib first, so the conda-forge yaml-cpp gets linked
into every TOPP binary and breaks. amd64 happens to work because the
conda-forge amd64 yaml-cpp build is older.

Fix mirrors FLASHApp's Dockerfile.arm: configure OpenMS in two cmake
passes — pass 1 under plain `/bin/bash` with
`-DCMAKE_IGNORE_PREFIX_PATH=/root/miniforge3` so cmake resolves C++
deps from the system tree (libyaml-cpp from contrib, boost from apt,
etc.); pass 2 under `mamba run` with `-DPYOPENMS=ON` so the Python
bindings still find conda-forge Python / Cython / NumPy. The
IGNORE_PREFIX_PATH flag is repeated on pass 2 to keep the cached C++
link command unchanged.

Only Dockerfile.arm changes; Dockerfile (amd64) keeps its single-pass
cmake to avoid disturbing the working x86 path.
The two-pass cmake split from 1d73b67 runs pass 1 under
`SHELL ["/bin/bash", "-c"]`, but the only cmake on the image is the
one from `mamba install cmake` at /root/miniforge3/envs/streamlit-env/bin/cmake
— not on plain bash's PATH. Result: exit 127 (command not found) the
moment pass 1 invokes cmake.

FLASHApp.arm sidesteps this by installing cmake via apt; do the same
here (just append `cmake` to the existing apt-get install line). The
mamba cmake install stays, so pass 2 under the mamba shell continues
to use the conda-forge cmake exactly as it did before. Ubuntu 22.04
ships cmake 3.22, comfortably above OpenMS 3.5's 3.15 floor.
The previous fix (install cmake via apt) didn't actually help: OpenMS
3.5's CMakeLists.txt requires cmake >= 3.24, and Ubuntu 22.04's apt
cmake is 3.22.1, which fails configure with

    CMake Error at src/openms/extern/CMakeLists.txt:11 (cmake_minimum_required):
      CMake 3.24 or higher is required. You are running version 3.22.1

That's exactly why the existing x86 Dockerfile installs cmake via
mamba (the conda-forge build is 3.30+). FLASHApp.arm escapes this by
using ubuntu:24.04 (apt cmake 3.28); we stay on 22.04 to minimize
churn vs. the working x86 Dockerfile.

Fix: in pass 1, call the mamba-env cmake by its full path
`/root/miniforge3/envs/streamlit-env/bin/cmake`. The plain-bash SHELL
is still in effect, so cmake doesn't pick up any conda-forge
environment side effects, and CMAKE_IGNORE_PREFIX_PATH keeps it from
auto-discovering miniforge libraries during find_package. The cmake
binary itself runs against miniforge's libstdc++, but that's a
runtime detail of cmake — it doesn't leak into the configured
project's link command.

The apt cmake addition from f11bc99 is now redundant but harmless;
leaving it in place to keep this diff focused.
Two failures in the previous run (test-traefik full, test-nginx
simple) ended with the runner reporting "No space left on device"
while flushing its diagnostic log. ubuntu-latest starts with ~14 GB
free; downloading the full image artifact (5-8 GB), loading it into
docker (decompressed, larger), pulling kind's node image, then
loading the OCI tar into the kind cluster easily exceeds that budget.

Mirror the cleanup already used by `build-arm64`: drop the runner's
preinstalled dotnet / android SDK / ghc / hostedtoolcache to recover
~30 GB. Same step now runs at the top of test-apptainer, test-nginx,
and test-traefik on both amd64 (ubuntu-latest) and arm64
(ubuntu-24.04-arm) matrix entries — the arm runner is at least as
tight as amd64.
After the two-pass cmake configure landed in 5185c3e, the next
attempt got past `make -j4 TOPP` (the link error is fixed) but
failed fast in `make -j4 pyopenms` with:

    CMake Error: Not a file: /openms-build/CMakeFiles/VerifyGlobs.cmake
    CMake Error: Error processing file: /openms-build/CMakeFiles/VerifyGlobs.cmake
    make: *** [Makefile:11553: cmake_check_build_system] Error 1

`VerifyGlobs.cmake` is generated by cmake for `file(GLOB CONFIGURE_DEPENDS ...)`
targets and is consulted by `cmake_check_build_system` at the top of
every subsequent `make` invocation. The intermediate cleanup line

    RUN rm -rf src doc CMakeFiles

deleted it, which is fine on the x86 single-pass build (different
cmake codepath when PYOPENMS=ON is set in the initial configure, no
VerifyGlobs.cmake generated) but breaks the ARM two-pass build.

Stop deleting CMakeFiles/ between `make TOPP` and `make pyopenms`.
We still drop `src/` and `doc/` for disk savings; keeping CMakeFiles
costs only a few hundred MB on the intermediate layer.
eWaterCycle/setup-apptainer@v2 installs apptainer from the upstream
.deb asset on the GitHub release. Upstream apptainer only publishes
amd64 .debs (verified: every v1.3.x release lists only
`apptainer_<ver>_amd64.deb`, no _arm64 / _aarch64 variant). On the
ubuntu-24.04-arm runner the action's `apt-get install ./apptainer_*.deb`
fails with sudo exit code 100 because the package can't be resolved.

Building apptainer from source on the ARM runner would add ~15 minutes
and a maintenance surface (Go toolchain, suid configuration) for
limited value — HPC SIF consumers remain amd64. Revert test-apptainer
to amd64-only and document why. test-nginx and test-traefik still
exercise the ARM image via kind, which gives us functional ARM
coverage at the docker-runtime level even without apptainer.

Side cleanups now that arm64 is gone from this matrix:
- artifact name back to a literal `*-amd64-image` (no matrix.arch)
- SIF upload gate drops the `matrix.arch == 'amd64'` check
kind/kubectl/helm setup actions fail with "Cache directory
'/opt/hostedtoolcache' does not exist". Drop just dotnet/android/ghc
(~34 GB) and leave the tool cache in place.
curl exit-22 doesn't tell us whether the pod, service, or ingress is
the broken link. Dump pods/logs/ingress/controller logs on failure so
the next run surfaces the actual cause.
\`docker load\` + \`kind load docker-image\` keeps the image in both host
docker AND each kind node's containerd. With a 5-8 GB image and two
kind nodes that's ~25 GB of duplicated storage, which trips the
"no space left on device" error in kind's ctr import.

Switch to \`kind load image-archive\` so the tar streams directly into
each node, and rm the tar after to reclaim /tmp.
503s in test-nginx/test-traefik traced to two issues:

1. The prod overlay maps openms-streamlit ->
   ghcr.io/openms/streamlit-template:main-full, but the build job was
   re-tagging the local image as openms-streamlit:test. Rendered
   manifests pointed at the registry name; kind only had :test loaded;
   pods stayed ErrImagePull. Retag as :main-full so kind has exactly
   the ref the manifests use.

2. Three of the four pod specs declare imagePullPolicy: Always; the
   existing sed only rewrote IfNotPresent. With Always and no registry
   creds in kind, pods loop on ImagePullBackOff. Extend the sed to
   catch both.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants