Skip to content

Update to Ubuntu 26.04 and ROOT 6.38.04#188

Open
tvami wants to merge 8 commits into
mainfrom
ubuntu_26.04
Open

Update to Ubuntu 26.04 and ROOT 6.38.04#188
tvami wants to merge 8 commits into
mainfrom
ubuntu_26.04

Conversation

@tvami
Copy link
Copy Markdown
Member

@tvami tvami commented May 14, 2026

I am adding a new package to the container, here are the details.

What new packages does this PR add to the development image?

  • Update to Ubuntu 26.04 and
  • ROOT 6.38.04

Check List

  • I successfully built the container using docker
  • I was able to build ldmx-sw using this new container build

Yes with LDMX-Software/ldmx-sw#2047

  • I was able to test run a small simulation and reconstruction inside this container
  • I was able to successfully use the new packages.

I can compile and run ldmx-sw

@tomeichlersmith
Copy link
Copy Markdown
Member

Just as a heads up, when I was experimenting with bumping the ROOT version, the image was failing to push to DockerHub: https://github.com/LDMX-Software/dev-build-context/actions/runs/25692264293

This error happens at the end of the build and is fairly opaque:

#38 exporting cache to registry
... enumeration of cache layer digests being pushed
#38 ERROR: error writing layer blob: failed to copy: unexpected status from PUT request to https://registry-1.docker.io/v2/ldmx/dev/blobs/uploads/6dfe44a1-d8e5-4101-96b0-60ecb367ce2e?_state=YWmJR8ZplwmAgIsMakD5kyA9Din_n16W7g8aDlwfGqp7Ik5hbWUiOiJsZG14L2RldiIsIlVVSUQiOiI2ZGZlNDRhMS1kOGU1LTQxMDEtOTZiMC02MGVjYjM2N2NlMmUiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjYtMDUtMTJUMTk6NTI6MjkuMzU4NzQzMzg2WiIsIkV4cGlyZXNBdCI6IjIwMjYtMDUtMTJUMjA6MjI6MjkuNDM2NjE3MzY5WiIsIlVzZXJuYW1lIjoi***IiwiVXNlclVVSUQiOiJiY2I0ODZmOS04YmFhLTRiNTItOWQwMi1hZGJmZjg0MjFmYzIiLCJTZXNzaW9uSUQiOiJkY2tyX2p0aV96c3l5QTlmZkpHYjFZdkU0eDRSRnBVY3dvYWM9In0%3D&digest=sha256%3A0d116fb00beae894036892d63cedd3769966bc245456841abee862cc9d7fb8c7: 400 Bad request

I fear that it has to do with the caching since it fails in the build step of trying to push the cache. From a little searching, this looks like the layer being pushed is too large. Either we need to drop all caching (which works) or maybe we can get away with dropping the mode=max parameter? I am uncertain. Caching is not necessary, it just helps speed up re-builds when we merge a PR.

Another solution is to look back at customizing the CI workflow to cache locally on the runner. This makes sense for us since we just have the single VM acting as the runners doing the building. Right now, we treat it like two runners and have GitHub distribute the load, but we could treat it like one runner and do the parallel-building within the action itself.

Comment thread Dockerfile Outdated
@tomeichlersmith
Copy link
Copy Markdown
Member

My concerns were confirmed. The amd64 run failed to push its cache to DockerHub like what happened on my branch. https://github.com/LDMX-Software/dev-build-context/actions/runs/25886123565/job/76078111961

I got around this issue by remove the cache-to line in the ci.yaml, but that means every run of the CI will build the image from scratch (taking two days ish).

The largest layer in the current cache on DockerHub (so the largest layer that was successfully pushed) is below 1GB.

tom@zuko:~/code/ldmx/dev-build-context$ docker buildx imagetools inspect ldmx/dev:build-cache-amd64 --raw | jq '.layers | map(.size) | max' | numfmt --to=si
726M

I think it makes sense that they would prevent individual layers from exceeding 1GB, so that is probably the issue. I don't know which layer is exceeding the size limit but I suspect its the ROOT one since ROOT is the one changing. I don't know how ROOT would increase the layer size by ~275MB though if that is the issue.

@tvami
Copy link
Copy Markdown
Member Author

tvami commented May 18, 2026

OK so I guess we have to remove the cache-to ?

@tomeichlersmith
Copy link
Copy Markdown
Member

Yea :/ pending some other solution I'm unaware of

@tvami
Copy link
Copy Markdown
Member Author

tvami commented May 18, 2026

cache-from too, right?

@tvami
Copy link
Copy Markdown
Member Author

tvami commented May 19, 2026

So our old GEANT does not work out with the gcc15 within ldmx-sw (I think we have seen this issue with the lto build too, so it's not completely new), so I compile GEANT with gcc13 in 7e2efee

I have spent some time on trying to make the gcc15 based GEANT work, but at this point I gave up. Especially since we plan to update GEANT too anyway

@tvami tvami marked this pull request as ready for review May 19, 2026 01:41
@tvami tvami requested a review from tomeichlersmith May 19, 2026 01:54
Comment thread Dockerfile
Comment thread Dockerfile Outdated
Comment thread Dockerfile Outdated
@tvami tvami requested a review from tomeichlersmith May 19, 2026 16:05
@tvami
Copy link
Copy Markdown
Member Author

tvami commented May 19, 2026

So I should wait for https://github.com/LDMX-Software/dev-build-context/actions/runs/26109399080 to finish before merging, right?

@tomeichlersmith
Copy link
Copy Markdown
Member

Yes, I would - then we will have a build at ldmx/dev:ubuntu_26.04 for easier testing across machines and at least see which versions of ldmx-sw are supported.

@tvami
Copy link
Copy Markdown
Member Author

tvami commented May 19, 2026

Yes, I would

Sounds good

least see which versions of ldmx-sw are supported.

I expect a lot to fail w/o
LDMX-Software/ldmx-sw#2047

@tomeichlersmith
Copy link
Copy Markdown
Member

Yea, which motivates updating the legacy interop patches https://ldmx-software.github.io/dev-build-context/interop.html and maybe updating the oldest ldmx-sw version that we commit to compiling with the image.

@tvami
Copy link
Copy Markdown
Member Author

tvami commented May 20, 2026

@tomeichlersmith
Copy link
Copy Markdown
Member

tomeichlersmith commented May 20, 2026

Nope, it takes ~6hr to build x86 and ~48hr to build the ARM (since its emulating ARM on an x86 machine). After both successfully build, the image is merged into one "image manifest" so ARM and x86 users can pull from the same tag. At this point, it runs the ldmx-sw tests.

The specific job your linked is just reading in the ci/ldmx-sw-to-test.json file and creating GitHub jobs from them. Those "Test X.Y.Z" Jobs will spawn after the build+merge completes.

Edit: You can see the final job tree as an example here: https://github.com/LDMX-Software/dev-build-context/actions/runs/23150843005

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants