Skip to content

CI: run test-integration[-rootless] directly on hosts and Lima guests#5035

Open
AkihiroSuda wants to merge 8 commits into
containerd:mainfrom
AkihiroSuda:ci-no-docker
Open

CI: run test-integration[-rootless] directly on hosts and Lima guests#5035
AkihiroSuda wants to merge 8 commits into
containerd:mainfrom
AkihiroSuda:ci-no-docker

Conversation

@AkihiroSuda

@AkihiroSuda AkihiroSuda commented Jul 2, 2026

Copy link
Copy Markdown
Member

Fix #3858 .

This is notably useful for running SELinux tests on an actual rootfs of AlmaLinux 8.

Anyway, we can no longer run an Ubuntu container on AlmaLinux 8 as Ubuntu 26.04 dropped the support for cgroup v1 hosts


Note: used Claude

@AkihiroSuda AkihiroSuda force-pushed the ci-no-docker branch 2 times, most recently from 0570954 to c1b7b0e Compare July 2, 2026 15:50
@AkihiroSuda AkihiroSuda added the area/ci e.g., CI failure label Jul 2, 2026
@AkihiroSuda AkihiroSuda force-pushed the ci-no-docker branch 5 times, most recently from dd07ab4 to ba61c06 Compare July 3, 2026 18:54
@AkihiroSuda AkihiroSuda added this to the v2.4.0 milestone Jul 3, 2026
@AkihiroSuda AkihiroSuda force-pushed the ci-no-docker branch 2 times, most recently from 7cb542e to 0ffb46f Compare July 3, 2026 21:32
@AkihiroSuda AkihiroSuda linked an issue Jul 3, 2026 that may be closed by this pull request
@AkihiroSuda

This comment was marked as resolved.

@AkihiroSuda AkihiroSuda force-pushed the ci-no-docker branch 2 times, most recently from bec11a1 to 01b7287 Compare July 4, 2026 15:01
…ainerd v2

Rootful containerd creates /run/nri on the host (NRI is enabled by
default since containerd v2.0). RootlessKit's copy-up of /run turns
that directory into a symlink pointing back at the root-owned original,
inside which the rootless containerd cannot bind its own NRI socket:

  failed to create socket "/var/run/nri/nri.sock":
  listen unix /var/run/nri/nri.sock: bind: permission denied

Remove the symlink in the child namespace, like the existing
/run/containerd one, so that a fresh, writable directory gets created
on the copy-up tmpfs instead.

Assisted-by: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
gotestsum was invoked with the "testname" format, which it silently
upgrades to "github-actions" when GITHUB_ACTIONS=true (with no option
to disable the upgrade), printing the output of every test: the logs
were too large to be rendered by the GitHub Actions web UI.

Use the "pkgname-and-test-fails" format by default: it prints a single
line per package, plus the output of the failing tests (which is also
repeated in the "=== Failed" summary at the end of the run), and is
not subjected to the "github-actions" upgrade.

Set GOTESTSUM_FORMAT to override the format.

Assisted-by: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
TestRunSelinux, TestRunSelinuxWithSecurityOpt, and
TestRunSelinuxWithVolumeLabel had never actually run in the CI, as the
containerized test environment did not enable SELinux. Now that the
integration tests run directly on Enterprise Linux Lima guests, they
run for the first time, revealing that:

- the image argument was missing from the `nerdctl run` command lines
  ("sleep" was interpreted as the image name, failing to pull)
- the volume directory was created at "/", which is not writable in
  rootless mode: create it under the test temporary directory instead
- a volume relabeled with :Z gets container_file_t, which does not
  contain the expected "container_t" substring; `ls -Z` also needs -d
  to print the label of the (empty) directory itself

Assisted-by: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
The test asserted that the image tag appears on stderr, which only
happens when BuildKit exports the build result directly to the
containerd image store ("naming to docker.io/library/..."). When
nerdctl determines that the image is not sharable between the BuildKit
worker and the client (eg: on EL 8 rootless, where the BuildKit worker
uses the fuse-overlayfs snapshotter while the client uses the default
overlayfs), it falls back to loading a tarball, and the tag is printed
on stdout ("Loaded image: ...") instead.

Assert the actual outcome instead: the image built from the Dockerfile
fed on stdin exists and runs its CMD.

Assisted-by: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
TestHostNetworkDnsPreserved: GitHub Actions hosts run systemd-resolved,
so /etc/resolv.conf only contains the 127.0.0.53 stub. For host-network
containers, nerdctl deliberately provides the resolv.conf that
systemd-resolved generates with the actual upstream nameservers (see
pkg/resolvconf.Path()), while Docker keeps the stub. Capture the
nameservers accordingly in the test setup.

This could not happen while the tests ran inside a Docker container
with a Docker-generated resolv.conf.

Assisted-by: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
The integration tests are no longer wrapped inside a Docker container.
Docker is still used to build the test dependencies (the new
`out-test-integration-artifacts` Dockerfile stage), which are then
installed on the host with hack/provisioning/linux/test-integration-env.sh,
and the tests now run with `go test` via hack/test-integration.sh:

- job-test-in-container.yml is removed; its matrix is migrated to
  job-test-in-host.yml
- job-test-in-lima.yml no longer nests Docker inside the guest VM
  (issue 3858); the artifacts are built on the host and installed
  in the guest, which is started in plain mode
- the test-integration* Dockerfile stages are removed, along with the
  unused build-minimal stage (the demo stage is retained)
- hack/test-integration-rootless.sh (moved from Dockerfile.d) runs the
  tests inside the systemd user session of the unprivileged user, which
  is available without a login, as the provisioning script enables
  lingering for that user

Assisted-by: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
The command passed to `docker run` is executed by the generated
docker-entrypoint.service, which had no ordering dependency on the
containerd, buildkit, and stargz-snapshotter units, so
`docker run -t --rm --privileged ghcr.io/containerd/nerdctl nerdctl run ...`
was racy: it failed with "cannot access containerd socket" whenever
the command won the race against containerd. Add the ordering through
a unit drop-in.

Assisted-by: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Build the image for the host platform, load it into Docker, and check
that `nerdctl run` works inside it, before the multi-platform image is
built and (except on pull requests) published to ghcr.io.

Assisted-by: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
@AkihiroSuda AkihiroSuda marked this pull request as ready for review July 4, 2026 19:21
@AkihiroSuda

Copy link
Copy Markdown
Member Author

cc @apostasie

@AkihiroSuda AkihiroSuda requested a review from a team July 4, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ci e.g., CI failure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI]: Discussion: should we drop testing inside a container?

1 participant