Skip to content

feat: use node-local overlayfs rootfs cache to eliminate per-restore untar (#228)#283

Open
chenggui53 (chenggui53) wants to merge 1 commit into
agent-substrate:mainfrom
chenggui53:worktree-overlayfs-rootfs-cache
Open

feat: use node-local overlayfs rootfs cache to eliminate per-restore untar (#228)#283
chenggui53 (chenggui53) wants to merge 1 commit into
agent-substrate:mainfrom
chenggui53:worktree-overlayfs-rootfs-cache

Conversation

@chenggui53

Copy link
Copy Markdown

Summary

Implements #228: cache one extracted, read-only rootfs per immutable image digest on each node, and materialize each actor's bundle as a thin overlayfs mount instead of re-untarring the whole image on every restore.

Problem

Every Restore call in atelet fully reconstructs the rootfs by pulling and untarring the OCI image — even when the same image digest has already been extracted on the same node many times before. Observed cost:

  • prepareOCIDirectory (untar rootfs): ~15–20s
  • runsc restore (checkpoint restore): ~268ms

Rootfs extraction dominates resume latency by ~99%.

Solution

Change the scaling behavior from "every restore pays extraction cost" to "first restore of a digest on a node pays extraction; later restores pay only an overlay mount."

On first use of an image digest on a node:

  • Pull + extract the flattened rootfs once into a node-local, read-only cache directory keyed by the image digest: /var/lib/ateom-gvisor/rootfs-cache/<sha256>/lower/

On every restore using the same digest:

  • Instead of RemoveAll + untar, set up an overlayfs mount for the actor bundle's rootfs:
    • lowerdir = the cached, read-only extracted rootfs (shared, never mutated)
    • upperdir + workdir = per-actor, actor-private writable layers

Changes

File Operation Description
internal/ateompath/ateompath.go Modified Added RootfsCacheDir and RootfsCacheLowerDir()
cmd/atelet/internal/rootfscache/rootfscache.go New Core cache module: EnsureRootfs, Untar, ValidateTarName, LRU eviction, concurrent dedup
cmd/atelet/internal/rootfscache/rootfscache_test.go New Unit tests: cache miss/hit, concurrent safety, partial cleanup, eviction, digest validation
cmd/atelet/overlay.go New overlayfs mount/unmount helpers + isOverlayfsAvailable()
cmd/atelet/oci.go Modified prepareOCIDirectory integrates overlayfs path with untar fallback; extractDigestFromRef; unmountActorRootfs
cmd/atelet/main.go Modified Creates rootfs cache, wires into AteomHerder, resetActorDirs adds unmount before cleanup

Key Design Decisions

  1. Fallback safety: tag-only refs (no digest) automatically fall back to the existing untar path
  2. Concurrent safety: per-digest inflightEntry dedup — N goroutines requesting the same digest only trigger 1 untar
  3. Crash safety: .ready sentinel file; loadIndex auto-cleans partial entries from previous crashes
  4. Eviction: LRU by .last_access timestamp, async trigger, 20GB default cap
  5. Unmount cleanup: resetActorDirs does MNT_DETACH unmount on overlayfs rootfs before RemoveAll

Expected Impact

Scenario Before After
First restore (same node + digest) ~15-20s ~15-20s (populates cache)
Subsequent restore (cache hit) ~15-20s <1s (overlayfs mount)
Checkpoint restore ~268ms ~268ms (unchanged)
Total resume latency (cache hit) ~15-20s <1.3s

Testing

  • ✅ All unit tests pass (go test ./cmd/atelet/...)
  • ✅ Rootfscache tests: cache miss, cache hit, concurrent misses, partial entry cleanup, LRU eviction, digest validation
  • ✅ Existing oci_test.go tests continue to pass (untar, path traversal, symlink escape, hardlink escape)
  • ✅ Built and deployed to kind cluster — atelet running, rootfs-cache directory initialized, overlayfs kernel module available

Open Questions (for follow-up)

  • Eviction policy tuning (size cap, reference counting)
  • Tag-based images without digest — should we resolve digest via HEAD request?
  • Observability: cache hit/miss rate metrics (counters already added, need Prometheus dashboard)

@google-cla

google-cla Bot commented Jun 22, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Implements issue agent-substrate#228: cache extracted rootfs per image digest on each
node, and materialize per-actor bundles as overlayfs mounts instead of
re-untarring on every restore.

Changes:
- New rootfscache package (cmd/atelet/internal/rootfscache)
- New overlay.go with mount/unmount helpers
- Modified prepareOCIDirectory for overlayfs integration
- Updated resetActorDirs to unmount before cleanup
- Added rootfs cache paths to ateompath
- Unit tests for cache hit/miss, concurrent access, eviction
@chenggui53 chenggui53 (chenggui53) force-pushed the worktree-overlayfs-rootfs-cache branch from 2b2efc3 to 970a309 Compare June 22, 2026 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant