rootlessutil: remove dead -r/ from nsenter args#4837
rootlessutil: remove dead -r/ from nsenter args#4837MayCXC wants to merge 1 commit intocontainerd:mainfrom
Conversation
utafrali
left a comment
There was a problem hiding this comment.
The fix is correct: -r/ was sitting at argv[0] and was consumed by nsenter as its own program name, never parsed as a flag, so its removal is a no-op today and a safety improvement if argv ordering is ever corrected. Setting args[0] = arg0 now properly follows Unix convention. The only minor gap is the loss of the busybox nsenter compatibility comment without any replacement explanation.
| args := []string{ | ||
| "-r/", // root dir (busybox nsenter wants this to be explicitly specified), | ||
| } | ||
| args := []string{arg0} |
There was a problem hiding this comment.
The removed comment mentioned busybox nsenter compatibility with -r/. Now that the flag is intentionally absent, a short note here explaining why it is omitted would help future readers avoid re-adding it. For example:
// Note: -r/ (root dir) is intentionally omitted. In rootless mode, chrooting to
// the host root before setns would anchor the process to host paths that are
// inaccessible inside the user namespace, breaking overlay mounts.
args := []string{arg0}There was a problem hiding this comment.
Added in fc77158 — comment explains why -r/ is intentionally absent:
// -r/ (root dir) is intentionally omitted. nsenter would open the host
// root fd before setns, then chroot to it after entering the mount
// namespace, anchoring the process to host paths. In rootless mode,
// host dirs owned by real uid 0 (e.g. /var/lib/containerd) are
// inaccessible inside the user namespace and overlay mounts would
// fail with EACCES.
args := []string{arg0}Used "opens root fd before setns, chroots to it after" to match what nsenter actually does (the fd is captured in the host mount namespace, then the chroot happens after setns(CLONE_NEWNS) has entered the rootless mount namespace, which is why the anchor is wrong in rootless mode). Mentioned EACCES directly since that's the failure mode a future reader would see in strace if they re-added the flag.
The -r/ flag was placed at args[0], which becomes argv[0] (the program name) when passed to syscall.Exec. nsenter never parsed it as a flag, so it has been a no-op since it was added. If -r/ were moved to a proper argv position, it would break rootless container creation. nsenter opens the root fd before setns, so chroot anchors path resolution to the host root. In rootless mode, the host /var/lib/containerd is owned by real root (unmapped in the user namespace), causing overlay lowerdir resolution to fail with EACCES during WithAdditionalGIDs. Remove -r/ entirely rather than fixing its position. Signed-off-by: Aaron <aaron@omniband.ca>
f7fc48a to
fc77158
Compare
Summary
The
-r/flag inParentMain's nsenter args has been a no-op since it was added. It was placed atargs[0], which becomesargv[0](the program name) when passed tosyscall.Exec(arg0, args, env). nsenter consumes it as its own name and never parses it as a flag.This is harmless today, but if someone corrects the argv ordering (e.g. prepending
arg0to the slice),-r/would start working and break rootless container creation:/) beforesetnsfchdir(root_fd)+chroot(\".\")anchors the process to the host root/var/lib/containerdis owned by real root (uid 0), which is unmapped in the user namespace (appears asnobody/65534)EACCESbecause the process cannot traverse the 0700 host directoryThis also fixes
argv[0]to bearg0(the nsenter binary path), matching the standard convention.Verification
Strace comparison before and after, running
nerdctl createin rootless mode:Before (current code,
-r/accidentally in argv[0]):If -r/ were at argv[1] (the latent bug):
After (this PR,
-r/removed,arg0as argv[0]):