Skip to content

seccomp: cap dirent/BPF size, seal injected memfds (#13)#25

Open
congwang-mk wants to merge 1 commit intomainfrom
sec-13-hardening
Open

seccomp: cap dirent/BPF size, seal injected memfds (#13)#25
congwang-mk wants to merge 1 commit intomainfrom
sec-13-hardening

Conversation

@congwang-mk
Copy link
Copy Markdown
Contributor

Summary

Three defensive hardenings cherry-picked from the audit patch attached to #13.

  • build_dirent64 now rejects names longer than 255 bytes (NAME_MAX). Such names overflow d_reclen (u16). Returns Option<Vec<u8>>; callers in procfs and cow/dispatch propagate the rejection.
  • assemble_filter now returns Result and rejects programs over BPF_MAXINSNS (4096). The kernel rejects them via seccomp(2) anyway; surfacing the error here gives a clearer message and guards the u8 jump-offset arithmetic against silent truncation.
  • Injected memfds (synthesised /proc/* and /dev/{u,}random) are now created with MFD_ALLOW_SEALING | MFD_CLOEXEC and sealed (F_SEAL_SEAL | F_SEAL_WRITE | F_SEAL_GROW | F_SEAL_SHRINK) after population. The child receives an fd to the same RW description, so without seals it could overwrite the synthesised content.

What is not in this PR

The bundled patch in #13 also proposed changes I declined:

  • Atomic conversions of proc_count / mem_used / disk_used — those fields already live inside Arc<Mutex<…>> and notifications serialise through one supervisor task, so the claimed "races" can't actually occur. The atomic version is load → check → store, which is less defensive if the mutex is later removed.
  • Adding fork(2) (NR 57) to notif_syscalls — this would break the COW-fork escape hatch deliberately documented in crates/sandlock-core/src/fork.rs:1-43. The hardcoded 57u32 in the proposal is also non-portable.
  • AF_UNIX → EAFNOSUPPORT in connect_on_behalf — breaks socketpair, \$XDG_RUNTIME_DIR sockets, etc.; also doesn't actually close the abstract-socket bypass (which goes through socket(), not connect()). A path-based filter on the sun_path would be the correct fix.
  • DispatchTable default-deny — fragile; the BPF filter is the gatekeeper, and forcing EPERM on missing handlers makes it easy to silently break a new notif syscall before its handler is wired.
  • Default-deny IPC syscalls — behavior change that should be policy-driven.
  • getrandom fallback removal — only fires when write_child_mem fails (rare); the rationale ("breaks determinism") doesn't hold in the failure path.

Test plan

  • cargo build clean
  • cargo test -p sandlock-core --lib — 219 passed
  • cargo test -p sandlock-core -- --test-threads=2 — 183 passed (parallel runs occasionally hit "Cannot fork" / temp-file flakes due to local resource pressure, not this diff)
  • New unit tests added: test_build_dirent64_rejects_oversize_name, test_oversized_filter_is_rejected

Three defensive hardenings from the audit in #13:

* `build_dirent64` now rejects names longer than 255 bytes (NAME_MAX).
  Such a name would push `d_reclen` past u16 and produce a malformed
  record. Callers in procfs and cowfs propagate the rejection.

* `assemble_filter` now returns `Result` and rejects programs that
  exceed BPF_MAXINSNS (4096). The kernel rejects them anyway via
  `seccomp(2)`; surfacing the error here gives a clearer message and
  guards the u8 jump-offset arithmetic against silent truncation.

* The memfds injected for synthesised /proc files and /dev/{u,}random
  are created with MFD_ALLOW_SEALING | MFD_CLOEXEC and sealed against
  write/grow/shrink/seal after population. The child receives a fd to
  the same RW description, so without sealing it could overwrite the
  synthesised content.

Other fixes proposed in the patch attached to #13 are not applied here:
they either misdiagnose serialised access through the supervisor's
tokio Mutex as a race, break intentional behaviour (raw fork(57) is
the COW-fork escape hatch documented in fork.rs; AF_UNIX is needed for
socketpair and runtime sockets), or change defaults in ways that need
to be policy-driven rather than unconditional.

Signed-off-by: Cong Wang <cwang@multikernel.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant