Skip to content

Commit ae7e901

Browse files
authored
fix(sandbox): strip " (deleted)" suffix from unlinked /proc/<pid>/exe paths (#844)
* fix(sandbox): strip " (deleted)" suffix from unlinked /proc/<pid>/exe paths When a running binary is unlinked from its filesystem path — the common case is a `docker cp` hot-swap of `/opt/openshell/bin/openshell-sandbox` during the `cluster-deploy-fast` dev upgrade workflow — the Linux kernel appends the literal string ` (deleted)` to the `/proc/<pid>/exe` readlink target. The tainted `PathBuf` then flows into `collect_ancestor_binaries` and on into `BinaryIdentityCache::verify_or_cache`, which tries to `std::fs::metadata` the path. `stat()` fails with `ENOENT` because the literal suffix isn't a real filesystem path, and the CONNECT proxy denies every outbound request with: ancestor integrity check failed for \ /opt/openshell/bin/openshell-sandbox (deleted): \ Failed to stat ...: No such file or directory (os error 2) Reproduced in production 2026-04-15: a cluster-deploy-fast-style hot-swap of the supervisor binary caused every pod whose PID 1 held the now-deleted inode to deny ALL outbound CONNECTs (slack.com, registry.npmjs.org, 169.254.169.254, etc.), breaking Slack REST delivery, npm installs, and IMDS probes simultaneously. Existing pre-hot-swap TCP tunnels (e.g. Slack Socket Mode WSS) kept working because they never re-evaluate the proxy. Strip the suffix in `binary_path()` so downstream callers see the clean, grep-friendly path. This aligns the cache key and log messages with the original on-disk location. Note: stripping the suffix does NOT by itself make the identity cache tolerant of a legitimate binary replacement — `verify_or_cache` will now `stat` and hash whatever currently lives at the stripped path, which is the NEW binary, and surface a clearer `Binary integrity violation` error. Fully unblocking the cluster-deploy-fast hot-swap workflow needs a follow-up that either (a) reads running-binary content from `/proc/<pid>/exe` directly via `File::open` (procfs resolves this to the live in-memory executable even when the original inode has been unlinked), or (b) keys the identity cache by exec dev+inode instead of path. Happy to send that as a separate PR once the approach is decided — filing this narrow fix first because it stands on its own: it fixes a concrete misleading error and unblocks the obvious next step. Added `binary_path_strips_deleted_suffix` test that copies `/bin/sleep` to a temp path, spawns a child from it, unlinks the temp binary, verifies the raw readlink contains the ` (deleted)` suffix, then asserts the public API returns the stripped path. Signed-off-by: mjamiv <michael.commack@gmail.com> * fix(sandbox): narrow the " (deleted)" suffix strip and exercise it end-to-end Address review feedback from @johntmyers on the initial version of this PR: procfs::binary_path - Only strip the kernel's " (deleted)" suffix when stat() on the raw readlink target reports NotFound. A live executable whose basename literally ends with " (deleted)" is now returned unchanged instead of being silently truncated, which matters because identity.rs hashes whatever this function returns as a trust anchor. - Operate on raw bytes via OsStrExt, so filenames that are not valid UTF-8 still get exactly one suffix stripped. The previous strip_suffix-on-&str path skipped non-UTF-8 entirely and fell through to returning the tainted path. - Expand the doc comment to describe both guardrails. procfs tests - binary_path_preserves_live_deleted_basename: copy /bin/sleep to a live file literally named "sleepy (deleted)", spawn it, and assert that the returned path still ends with " (deleted)". - binary_path_strips_suffix_for_non_utf8_filename: exec a binary whose basename contains a 0xFF byte, unlink it, and assert that binary_path returns the stripped non-UTF-8 path. Writes the bytes with OpenOptions + sync_all + explicit drop so the write fd is fully released before exec() to avoid ETXTBSY under concurrent tests. proxy: extract resolve_process_identity helper - Pull the peer-resolution + TOFU verify + ancestor walk + cmdline collection block out of evaluate_opa_tcp into resolve_process_identity. - Introduce IdentityError which carries the deny reason along with whatever partial identity data was resolved before the failure so evaluate_opa_tcp can thread that into ConnectDecision unchanged. - evaluate_opa_tcp now calls the helper and proceeds straight to the OPA evaluate step; the surface visible to OPA and OCSF is unchanged. proxy: end-to-end regression test for the hot-swap contract - resolve_process_identity_surfaces_binary_integrity_violation_on_hot_swap stands up a real TcpListener, copies /bin/bash to a temp path, primes BinaryIdentityCache with that binary, spawns bash with a /dev/tcp one-liner that opens a real connection to the listener, and captures the peer's ephemeral port from accept(). - Simulates `docker cp` correctly: unlink the running binary (which persists via the child's exec mapping) and create a fresh file with different bytes at the same path. Writing in place is rejected by the kernel with ETXTBSY, so the old single-inode approach did not actually model the production failure mode. - Asserts the error returned by resolve_process_identity contains "Binary integrity violation" (from BinaryIdentityCache) and does NOT contain "Failed to stat" or "(deleted)" — the pre-PR-#844 failure mode. The binary field on the error is populated and is free of the tainted suffix. - Skips cleanly if /bin/bash is not installed. Child process is always reaped before the assertion block so a failure does not leak a sleeping process. --------- Signed-off-by: mjamiv <michael.commack@gmail.com>
1 parent 5c3015a commit ae7e901

File tree

2 files changed

+479
-47
lines changed

2 files changed

+479
-47
lines changed

crates/openshell-sandbox/src/procfs.rs

Lines changed: 211 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,63 @@ use tracing::debug;
1919
/// `/proc/{pid}/cmdline` because `argv[0]` is trivially spoofable by any
2020
/// process and must not be used as a trusted identity source.
2121
///
22-
/// If this fails, ensure the proxy process has permission to read
23-
/// `/proc/<pid>/exe` (e.g. same user, or `CAP_SYS_PTRACE`).
22+
/// ### Unlinked binaries (`(deleted)` suffix)
23+
///
24+
/// When a running binary is unlinked from its filesystem path — the common
25+
/// case is a `docker cp` hot-swap of `/opt/openshell/bin/openshell-sandbox`
26+
/// during a `cluster-deploy-fast` dev upgrade — the kernel appends the
27+
/// literal string `" (deleted)"` to the `/proc/<pid>/exe` readlink target.
28+
/// The raw tainted path (e.g. `"/opt/openshell/bin/openshell-sandbox (deleted)"`)
29+
/// is not a real filesystem path: any downstream `stat()` fails with `ENOENT`.
30+
///
31+
/// We strip the suffix so callers see a clean, grep-friendly path suitable
32+
/// for cache keys and log messages. The strip is guarded: we only strip when
33+
/// `stat()` on the raw readlink target reports `NotFound`, so a live executable
34+
/// whose basename literally ends with `" (deleted)"` is returned unchanged.
35+
/// The comparison is done on raw bytes via `OsStrExt`, so filenames that are
36+
/// not valid UTF-8 are still handled correctly. Exactly one kernel-added
37+
/// suffix is stripped.
38+
///
39+
/// This does NOT claim the file at the stripped path is the same binary that
40+
/// the process is executing — the on-disk inode may now be arbitrary. Callers
41+
/// that need to verify the running binary's *contents* (for integrity
42+
/// checking) should read the magic `/proc/<pid>/exe` symlink directly via
43+
/// `File::open`, which procfs resolves to the live in-memory executable even
44+
/// when the original inode has been unlinked.
45+
///
46+
/// If the readlink itself fails, ensure the proxy process has permission
47+
/// to read `/proc/<pid>/exe` (e.g. same user, or `CAP_SYS_PTRACE`).
2448
#[cfg(target_os = "linux")]
2549
pub fn binary_path(pid: i32) -> Result<PathBuf> {
26-
std::fs::read_link(format!("/proc/{pid}/exe")).map_err(|e| {
50+
use std::ffi::OsString;
51+
use std::io::ErrorKind;
52+
use std::os::unix::ffi::{OsStrExt, OsStringExt};
53+
54+
const DELETED_SUFFIX: &[u8] = b" (deleted)";
55+
56+
let link = format!("/proc/{pid}/exe");
57+
let target = std::fs::read_link(&link).map_err(|e| {
2758
miette::miette!(
2859
"Failed to read /proc/{pid}/exe: {e}. \
2960
Cannot determine binary identity — denying request. \
3061
Hint: the proxy may need CAP_SYS_PTRACE or to run as the same user."
3162
)
32-
})
63+
})?;
64+
65+
// Only strip when the raw readlink target cannot be stat'd and its bytes
66+
// end with the kernel-added suffix. This preserves live executables whose
67+
// basename legitimately ends with " (deleted)" and handles non-UTF-8
68+
// filenames correctly.
69+
let raw_target_missing =
70+
matches!(std::fs::metadata(&target), Err(err) if err.kind() == ErrorKind::NotFound);
71+
72+
let bytes = target.as_os_str().as_bytes();
73+
if raw_target_missing && bytes.ends_with(DELETED_SUFFIX) {
74+
let stripped = bytes[..bytes.len() - DELETED_SUFFIX.len()].to_vec();
75+
return Ok(PathBuf::from(OsString::from_vec(stripped)));
76+
}
77+
78+
Ok(target)
3379
}
3480

3581
/// Resolve the binary path of the TCP peer inside a sandbox network namespace.
@@ -391,6 +437,167 @@ mod tests {
391437
assert!(path.exists());
392438
}
393439

440+
/// Verify that an unlinked binary's path is returned without the
441+
/// kernel's " (deleted)" suffix. This is the common case during a
442+
/// `docker cp` hot-swap of the supervisor binary — before this strip,
443+
/// callers that `stat()` the returned path get `ENOENT` and the
444+
/// ancestor integrity check in the CONNECT proxy denies every request.
445+
#[cfg(target_os = "linux")]
446+
#[test]
447+
fn binary_path_strips_deleted_suffix() {
448+
use std::os::unix::fs::PermissionsExt;
449+
450+
// Copy /bin/sleep to a temp path we control so we can unlink it.
451+
let tmp = tempfile::TempDir::new().unwrap();
452+
let exe_path = tmp.path().join("deleted-sleep");
453+
std::fs::copy("/bin/sleep", &exe_path).unwrap();
454+
std::fs::set_permissions(&exe_path, std::fs::Permissions::from_mode(0o755)).unwrap();
455+
456+
// Spawn a child from the temp binary, then unlink it while the
457+
// child is still running. The child keeps the exec mapping via
458+
// `/proc/<pid>/exe`, but readlink will now return the tainted
459+
// "<path> (deleted)" string.
460+
let mut child = std::process::Command::new(&exe_path)
461+
.arg("5")
462+
.spawn()
463+
.unwrap();
464+
let pid: i32 = child.id().cast_signed();
465+
std::fs::remove_file(&exe_path).unwrap();
466+
467+
// Sanity check: the raw readlink should contain " (deleted)".
468+
let raw = std::fs::read_link(format!("/proc/{pid}/exe"))
469+
.unwrap()
470+
.to_string_lossy()
471+
.into_owned();
472+
assert!(
473+
raw.ends_with(" (deleted)"),
474+
"kernel should append ' (deleted)' to unlinked exe readlink; got {raw:?}"
475+
);
476+
477+
// The public API should return the stripped path, not the tainted one.
478+
let resolved = binary_path(pid).expect("binary_path should succeed for deleted binary");
479+
assert_eq!(
480+
resolved, exe_path,
481+
"binary_path should strip the ' (deleted)' suffix"
482+
);
483+
let resolved_str = resolved.to_string_lossy();
484+
assert!(
485+
!resolved_str.contains("(deleted)"),
486+
"stripped path must not contain '(deleted)'; got {resolved_str:?}"
487+
);
488+
489+
let _ = child.kill();
490+
let _ = child.wait();
491+
}
492+
493+
/// A live executable whose basename literally ends with `" (deleted)"`
494+
/// must be returned unchanged — we only strip when `stat()` reports
495+
/// the raw readlink target missing. This guards against the trusted
496+
/// identity source misattributing a running binary to a truncated
497+
/// sibling path.
498+
#[cfg(target_os = "linux")]
499+
#[test]
500+
fn binary_path_preserves_live_deleted_basename() {
501+
use std::os::unix::fs::PermissionsExt;
502+
503+
let tmp = tempfile::TempDir::new().unwrap();
504+
// Basename literally ends with " (deleted)" while the file is still
505+
// on disk — a pathological but legal filename.
506+
let exe_path = tmp.path().join("sleepy (deleted)");
507+
std::fs::copy("/bin/sleep", &exe_path).unwrap();
508+
std::fs::set_permissions(&exe_path, std::fs::Permissions::from_mode(0o755)).unwrap();
509+
510+
let mut child = std::process::Command::new(&exe_path)
511+
.arg("5")
512+
.spawn()
513+
.unwrap();
514+
let pid: i32 = child.id().cast_signed();
515+
516+
// File is still linked — binary_path must return the path unchanged,
517+
// suffix and all.
518+
let resolved = binary_path(pid).expect("binary_path should succeed for live binary");
519+
assert_eq!(
520+
resolved, exe_path,
521+
"binary_path must NOT strip ' (deleted)' from a live executable's basename"
522+
);
523+
assert!(
524+
resolved.to_string_lossy().ends_with(" (deleted)"),
525+
"stripped path unexpectedly trimmed a real filename: {resolved:?}"
526+
);
527+
528+
let _ = child.kill();
529+
let _ = child.wait();
530+
}
531+
532+
/// An unlinked executable whose filename contains non-UTF-8 bytes must
533+
/// still strip exactly one kernel-added `" (deleted)"` suffix. We operate
534+
/// on raw bytes via `OsStrExt`, so invalid UTF-8 is not a reason to skip
535+
/// the strip and return a path that downstream `stat()` calls will reject.
536+
#[cfg(target_os = "linux")]
537+
#[test]
538+
fn binary_path_strips_suffix_for_non_utf8_filename() {
539+
use std::ffi::OsString;
540+
use std::io::Write;
541+
use std::os::unix::ffi::{OsStrExt, OsStringExt};
542+
use std::os::unix::fs::{OpenOptionsExt, PermissionsExt};
543+
544+
let tmp = tempfile::TempDir::new().unwrap();
545+
// 0xFF is not valid UTF-8. Build the filename on raw bytes.
546+
let mut raw_name: Vec<u8> = b"badname-".to_vec();
547+
raw_name.push(0xFF);
548+
raw_name.extend_from_slice(b".bin");
549+
let exe_path = tmp.path().join(OsString::from_vec(raw_name));
550+
551+
// Write bytes explicitly (instead of `std::fs::copy`) with an
552+
// explicit `sync_all()` + scope drop so the write fd is fully closed
553+
// before we `exec()` the file. Otherwise concurrent tests can race
554+
// the kernel into returning ETXTBSY on spawn.
555+
let bytes = std::fs::read("/bin/sleep").expect("read /bin/sleep");
556+
{
557+
let mut f = std::fs::OpenOptions::new()
558+
.write(true)
559+
.create_new(true)
560+
.mode(0o755)
561+
.open(&exe_path)
562+
.expect("create non-UTF-8 target file");
563+
f.write_all(&bytes).expect("write bytes");
564+
f.sync_all().expect("sync_all before exec");
565+
}
566+
567+
let mut child = std::process::Command::new(&exe_path)
568+
.arg("5")
569+
.spawn()
570+
.unwrap();
571+
let pid: i32 = child.id().cast_signed();
572+
std::fs::remove_file(&exe_path).unwrap();
573+
574+
// Sanity: raw readlink ends with " (deleted)" and is not valid UTF-8.
575+
let raw = std::fs::read_link(format!("/proc/{pid}/exe")).unwrap();
576+
let raw_bytes = raw.as_os_str().as_bytes();
577+
assert!(
578+
raw_bytes.ends_with(b" (deleted)"),
579+
"kernel should append ' (deleted)' to unlinked exe readlink"
580+
);
581+
assert!(
582+
std::str::from_utf8(raw_bytes).is_err(),
583+
"test precondition: raw readlink must contain non-UTF-8 bytes"
584+
);
585+
586+
let resolved =
587+
binary_path(pid).expect("binary_path should succeed for non-UTF-8 unlinked path");
588+
assert_eq!(
589+
resolved, exe_path,
590+
"binary_path must strip exactly one ' (deleted)' suffix for non-UTF-8 paths"
591+
);
592+
assert!(
593+
!resolved.as_os_str().as_bytes().ends_with(b" (deleted)"),
594+
"stripped path must not end with ' (deleted)'"
595+
);
596+
597+
let _ = child.kill();
598+
let _ = child.wait();
599+
}
600+
394601
#[cfg(target_os = "linux")]
395602
#[test]
396603
fn collect_descendants_includes_self() {

0 commit comments

Comments
 (0)