[v1.14] Expose memory mapping & dirty pages; Make memfile dump optional by bchalios · Pull Request #8 · e2b-dev/firecracker

bchalios · 2026-02-05T19:16:44Z

WIP

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

This functionality cannot be added in rust-vmm.

Add a few APIs to get information about guest memory: * An endpoint for guest memory mappings (guest physical to host virtual). * An endpoint for resident and empty pages. * An endpoint for dirty pages. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

There are cases where a user might want to snapshot the memoyr of a VM externally. In these cases, we can ask Firecracker to avoid serializing the memory file to disk when we create a snapshot. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Implement API /memory/mappings which returns the memory mappings of guest physical to host virtual memory. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Implement API /memory which returns two bitmaps: resident and empty. `resident` tracks whether a guest page is in the resident set and `empty` tracks whether it's actually all 0s. Both bitmaps are structures as vectors of u64, so their length is: total_number_of_pages.div_ceil(64). Pages are ordered in the order of pages as reported by/memory/mappings. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Implement API /memory/dirty which returns a bitmap tracking dirty guest memory. The bitmap is structured as a vector of u64, so its length is: total_number_of_pages.div_ceil(64). Pages are ordered in the order of pages as reported by /memory/mappings. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

UFFD provides an API to enable write-protection for memory ranges tracked by a userfault file descriptor. Detailed information can be found here: https://docs.kernel.org/admin-guide/mm/userfaultfd.html. To use the feature, users need to register the memory region with UFFDIO_REGISTER_MODE_WP. Then, users need to enable explicitly write-protection for sub-ranges of the registered region. Writes in pages within write-protected memory ranges can be handled in one of two ways. In synchronous mode, writes in a protected page will cause kernel to send a write protection event over the userfaultfd. In asynchronous mode, the kernel will automatically handle writes to protected pages by clearing the write-protection bit. Userspace can later observe the write protection bit by looking into the corresponding entry of /proc/<pid>/pagemap. This commit, uncoditionally, enables write protection for guest memory using the asynchronous mode. !NOTE!: asynchronous write protection requires (host) kernel version 6.7 or later). Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

This is an optional test on the Firecracker side and most of the times it's ignored (when valid dependency changes happen). Having this fail blocks our fc-versions releases. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

TODO Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Add descriptions for MicovmState from previous Firecracker versions. Moreover, add methods to translate a snapshot file from previous versions in the current one. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Now that we have logic for translating snapshot formats, we can allow the /snapshot/load API to parse v1.10 and v1.12 snapshots. We change the logic that parses the snapshot file to first read the version from the file and then (if needed) translate it to the expected v1.14 version. Currently older versions supported are v1.10 and v1.12. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Changes we did for supporting older snapshot formats, did not really compile on ARM systems. Fix the compilation issues. The issues were mainly bad re-exports. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

cursor · 2026-04-14T15:51:10Z

PR Summary

High Risk
Touches guest memory exposure APIs, snapshot/UFFD restore (write-protect, memfd FD passing), dependency fork, and block I/O paths that use fallocate—errors could affect data integrity or compatibility.

Overview
Adds GET /memory, /memory/mappings, and /memory/dirty so operators can inspect guest RAM (resident/empty bitmaps, host mapping metadata, and dirty pages via mincore + /proc/pagemap) without dumping a memfile.

Snapshots: mem_file_path on create is now optional—VM state can be snapshotted while memory is captured externally. UFFD restore gains use_memfd (validated only for Uffd backends) and sends an optional memfd over the UDS handshake. Restore loads snapshot format 8.0 natively and upgrades 6.0 / 4.0 via new v1_10 / v1_12 persist shims; device save order is fixed so virtio interrupts aren’t lost. GICv3 ITS restore is skipped when missing from older snapshots.

Virtio block: writable drives always advertise DISCARD and WRITE_ZEROES (fallocate / io_uring), with EOPNOTSUPP caching and seccomp fallocate / x86 pread64. Balloon advertises HINT_WAIT_ON_ACK with free-page hinting. UFFD switches to a forked userfaultfd-rs with MISSING + WRITE_PROTECT (and hugetlb WP where applicable). Removes the CI workflow that blocked Cargo.lock changes.

Also documents block TRIM/write-zeroes and balloon hint-ACK behavior; tightens virtio prepare_save ordering; makes several persist structs pub for migration.

^{Reviewed by Cursor Bugbot for commit 431f1fc. Bugbot is set up for automated code reviews on this repo. Configure here.}

Add docs/api_requests/block-write-zeroes.md describing: - automatic advertisement on writable devices - UNMAP=0 → FALLOC_FL_ZERO_RANGE (zeros in place) - UNMAP=1 → FALLOC_FL_PUNCH_HOLE (zeros + deallocate) - host filesystem requirements - EOPNOTSUPP fallback (silent VIRTIO_BLK_S_UNSUPP, shared cache) - known limitations Remove the "write_zeroes is not supported" line from block-discard.md now that the feature is implemented. Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>

cla-bot · 2026-05-12T08:54:04Z

We require contributors to sign our Contributor License Agreement, and we don't have @ilstam, @ShadowCurse, @JackThomson2, @Manciukic, @zulinx86 on file. You can sign our CLA at https://e2b.dev/docs/cla . Once you've signed, post a comment here that says '@cla-bot check'

cursor · 2026-05-12T08:57:41Z

+                    let is_eopnotsupp = matches!(
+                        &cqe_result,
+                        Err(e) if e.raw_os_error() == Some(-libc::EOPNOTSUPP)
+                    );


Async CQE EOPNOTSUPP check uses wrong errno sign

High Severity

The async completion path checks e.raw_os_error() == Some(-libc::EOPNOTSUPP) (negative), but the Cqe wrapper almost certainly converts the negative CQE result to a positive errno when constructing io::Error (the standard convention for from_raw_os_error). The sync path in BlockIoError::is_eopnotsupp() correctly checks for positive libc::EOPNOTSUPP. If the sign is wrong, discard/write-zeroes EOPNOTSUPP caching never triggers for the async io_uring engine, and every unsupported request hits the generic error path instead.

^{Reviewed by Cursor Bugbot for commit ab2399e. Configure here.}

Whenever free-page hinting is enabled, also advertise the new VIRTIO_BALLOON_F_HINT_WAIT_ON_ACK feature bit (6). When negotiated, the guest driver waits for the device to signal-used each hint buffer before pushing the just-hinted page onto vb->free_page_list, closing a stale-hint data-loss race where the shrinker could recycle a page back to the buddy allocator before discard_range completed on the host. Guests without kernel support for bit 6 simply do not negotiate it (the driver self-clears the bit if VIRTIO_BALLOON_F_FREE_PAGE_HINT is not also negotiated), so this is forward-compatible with stock guests. No host-side protocol change is required: process_free_page_hinting_queue already calls signal_used_queue once per drain, which serves as the ACK the guest waits on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>

Adds a guest-side check that the negotiated balloon features in /sys/bus/virtio/devices/virtioN/features include bit 3 (FREE_PAGE_HINT) and bit 6 (HINT_WAIT_ON_ACK) when free_page_hinting is enabled. The test is gated on a new dedicated marker, requires_patched_kernel, which is registered in tests/pytest.ini and added to the default -m exclusion filter so the test is auto-skipped by every CI run (regular and nightly). To run it, replace the 6.1 artifact vmlinux with a build that carries Jack Thomson's wait-on-ACK patch and invoke: tools/devtool -y test -- -m requires_patched_kernel \ tests/integration_tests/functional/test_balloon_wait_on_ack.py If the kernel is not patched, the bit-6 assertion fails with a clear "did you replace the kernel?" message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>

Add a subsection under free_page_hinting describing the behaviour of VIRTIO_BALLOON_F_HINT_WAIT_ON_ACK: always advertised alongside FPH, self-cleared by guests without the supporting kernel patch, no separate config knob, and a note on the per-buffer round-trip cost on supported guests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>

cla-bot · 2026-05-12T09:02:53Z

We require contributors to sign our Contributor License Agreement, and we don't have @ilstam, @ShadowCurse, @JackThomson2, @Manciukic, @zulinx86 on file. You can sign our CLA at https://e2b.dev/docs/cla . Once you've signed, post a comment here that says '@cla-bot check'

djeebus · 2026-05-12T21:29:01Z

@cla-bot check

cla-bot · 2026-05-12T21:29:05Z

We require contributors to sign our Contributor License Agreement, and we don't have @ilstam, @ShadowCurse, @JackThomson2, @Manciukic, @zulinx86 on file. You can sign our CLA at https://e2b.dev/docs/cla . Once you've signed, post a comment here that says '@cla-bot check'

cla-bot · 2026-05-12T21:29:07Z

The cla-bot has been summoned, and re-checked this pull request!

When saving the state of a microVM with one or more block devices backed by the async IO engine, we need to take a few steps extra steps before serializing the state to the disk, as we need to make sure that there aren't any pending io_uring requests that have not been handled by the kernel yet. For these types of devices that need that we call a prepare_save() hook before serializing the device state. If there are indeed pending requests, once we handle them we need to let the guest know, by adding the corresponding VirtIO descriptors to the used ring. Moreover, since we use notification suppression, this might or might not require us to send an interrupt to the guest. Now, when we save the state of a VirtIO device, we save the device specific state **and** the transport (MMIO or PCI) state along with it. There were a few issues with how we were doing the serialization: 1. We were saving the transport state before we run the prepare_save() hook. The transport state includes information such as the `interrupt_status` in MMIO or `MSI-X config` in PCI. prepare_save() in the case of async IO might change this state, so us running it after saving the transport state essentially looses information. 2. We were saving the devices states after saving the KVM state. This is problematic because, if prepare_save() sends an interrupt to the guest we don't save that "pending interrupt" bit of information in the snapshot. These two issues, were making microVMs with block devices backed by async IO freeze in some cases post snapshot resume, since the guest is stuck in the kernel waiting for some notification for the device emulation which never arrives. Currently, this is only a problem with virtio-block with async IO engine. The only other device using the prepare_save() hook is currently virtio-net, but this one doesn't modify any VirtIO state, neither sends interrupts. Fix this by ensuring the correct ordering of operations during the snapshot phase. Signed-off-by: Babis Chalios <bchalios@amazon.es> (cherry picked from commit 67ba7a2) Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>

cla-bot · 2026-05-18T09:40:28Z

We require contributors to sign our Contributor License Agreement, and we don't have @ilstam, @ShadowCurse, @JackThomson2, @Manciukic, @zulinx86 on file. You can sign our CLA at https://e2b.dev/docs/cla . Once you've signed, post a comment here that says '@cla-bot check'

PR #8 picked up the upstream ordering fix as 639196c (cherry-pick of 67ba7a2), which closes: - Bug 1 (Vmm::save_state KVM-state-before-device-state) - Bug 2 (MMIO transport_state captured before prepare_save) - Bug 3 (PCI transport_state captured before prepare_save) Remove those sections entirely. Findings 4-10 keep their numbers unchanged so external references stay stable. Re-pin all source links from f0a35a1 to 639196c (the new HEAD). Refresh line numbers for the items that shifted (block kick 219-228 -> 212-222, net kick 1062-1071 -> 1042-1052). Update cross-references that previously read "Bug 1" / "Bugs 1-3" to refer to the upstream-fixed ordering bugs instead. Per-branch backport table simplified to two columns (ordering fix vs vsock companion); PR #8 row shows the ordering fix applied and the vsock companion still missing. The vsock companion 48a5ae3 is still not on PR #8, so Bug 9 remains open. Findings 4-10 and P2-1..P2-8 are unchanged in substance.

Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Previous commit (cd3fe9a) changed the signature of ArchVm::get_dirty_bitmap() to get a page_size argument, but corresponding integration test was not updated to match this change. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

GuestRegionMmapExt::discard_range() is used to deallocate guest memory that we don't use any more, for example when we use balloon inflation or free page reporting/hinting. There is the implicit requirement that the range we are discarding is aligned (both starting address and lenght) to the page size used to back the guest memory. If this alignment is not respected by the caller, we can end up with undefined behaviour. For example, if we use huge pages to back memory but we receive from the guest regions to discard that are 4K pages aligned, we might end up removing memory that we are not meant to. This currently doesn't happen but the requirement is not explicitly encoded in the type system. Add a check for these requirements and return an error when they are not met. This way, we can't shoot ourselves in the foot in the future. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Use fallocate(PUNCH_HOLE|KEEP_SIZE) for MAP_SHARED file-backed guest memory so memfd-backed balloon hinting/reporting clears the shared backing instead of only dropping PTEs. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

cursor · 2026-05-21T13:54:42Z

+            {
+                "syscall": "fallocate",
+                "comment": "Used by the block device for VIRTIO_BLK_F_DISCARD (FALLOC_FL_PUNCH_HOLE)"
+            },


Missing pread64 syscall in aarch64 seccomp filter

High Severity

The pread64 syscall is added to the x86_64 seccomp filter but not to the aarch64 filter. The new PagemapReader (used by get_dirty_memory) reads from /proc/self/pagemap at specific offsets, which requires pread64 on both architectures. On aarch64, calling this new API endpoint will trigger a seccomp violation and kill the VMM process.

Additional Locations (1)

resources/seccomp/x86_64-unknown-linux-musl.json#L33-L36

^{Reviewed by Cursor Bugbot for commit bd85e43. Configure here.}

io_uring_enter() might return with a EINTR when called with IORING_ENTER_GETEVENTS. Make the submit() call a bit more robust by retrying when we observe this error. Retry 3 times. This is a semi-arbitrary choice. The assumption is that if an interrupt arrives subsequent call to the system call should most likely succeed. If we keep receiving interrupts something is more severely broken, so propagate to caller. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

If prepare_save() fails to drain the io_uring queues (when used) and sync the host filesystem we might end up with a corrupted disk snapshot. Currently, Firecracker ignores that, only emitting an error message. Be more strict and expect no errors, so that we can have a better post-mortem analysis of what happened. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>

Replace fallocate(PUNCH_HOLE) with madvise(MADV_REMOVE) for the memfd-backed (MAP_SHARED) memory discard path. The critical difference is that madvise(MADV_REMOVE) calls userfaultfd_remove() on the VMA before issuing the fallocate, which delivers a UFFD_EVENT_REMOVE to any userfaultfd registered on that VMA. fallocate(PUNCH_HOLE) called directly on the file descriptor does not go through this path and produces no uffd event. Without the event, a uffd handler cannot learn that the pages have been freed and may serve stale data on subsequent faults in the discarded range. Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 5 total unresolved issues (including 3 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 431f1fc. Configure here.}

cursor · 2026-05-26T15:06:14Z

+        let entry = PagemapEntry::from_bytes(entry_bytes);
+
+        // Page must be present and the write_protected bit cleared (indicating it was written to)
+        Ok(entry.is_present() && !entry.is_write_protected())


Dirty API marks all resident pages

Medium Severity

GET /memory/dirty treats a page as dirty when pagemap shows it present and the UFFD write-protected bit is clear. Without UFFD write protection, that bit is typically unset for normal RAM, so every resident page is reported dirty instead of only pages written since the last snapshot.

^{Reviewed by Cursor Bugbot for commit 431f1fc. Configure here.}

cursor · 2026-05-26T15:06:14Z

+        self.disk
+            .file_engine
+            .drain_and_flush(discard)
+            .expect("virtio-block: failed to drain ops and flush block data");


Snapshot flush failure panics VMM

Medium Severity

During prepare_save, virtio-block now calls drain_and_flush with .expect(...). Any drain or flush error aborts the whole Firecracker process instead of returning a snapshot error, so a transient I/O failure while creating a snapshot becomes a hard crash.

^{Reviewed by Cursor Bugbot for commit 431f1fc. Configure here.}

bchalios marked this pull request as draft February 5, 2026 19:16

ValentaTomas added the wontfix This will not be worked on label Feb 5, 2026

ValentaTomas mentioned this pull request Feb 5, 2026

Move memfile dirty tracking to Firecracker e2b-dev/infra#1858

Closed

bchalios force-pushed the firecracker-v1.14-direct-mem branch from 88151c3 to 89538bc Compare February 5, 2026 20:20

bchalios force-pushed the firecracker-v1.14-direct-mem branch from a041675 to 61fdd9d Compare February 12, 2026 23:10

bchalios mentioned this pull request Feb 13, 2026

Build firecracker v1.14 e2b-dev/fc-versions#4

Merged

ValentaTomas reviewed Feb 13, 2026

View reviewed changes

Comment thread src/vmm/src/utils/pagemap.rs

bchalios force-pushed the firecracker-v1.14-direct-mem branch from d91bdb1 to 61fdd9d Compare February 13, 2026 23:58

ValentaTomas requested review from ValentaTomas and removed request for ValentaTomas March 12, 2026 19:56

ValentaTomas self-assigned this Mar 12, 2026

ValentaTomas self-requested a review March 12, 2026 19:57

ValentaTomas assigned ValentaTomas and unassigned ValentaTomas Mar 12, 2026

ValentaTomas removed their request for review March 13, 2026 00:03

bchalios force-pushed the firecracker-v1.14-direct-mem branch from f01905f to af9c995 Compare March 23, 2026 15:33

ValentaTomas removed their assignment Apr 8, 2026

bchalios added 11 commits April 14, 2026 17:46

api: implement API for getting guest memory mappings

160c3af

Implement API /memory/mappings which returns the memory mappings of guest physical to host virtual memory. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

ci: remove dependency changes test

fff6fd9

This is an optional test on the Firecracker side and most of the times it's ignored (when valid dependency changes happen). Having this fail blocks our fc-versions releases. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

feat: make network device snapshots backwards compatible

a284adf

TODO Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

snapshot: add state types for previous versions

7a2ef60

Add descriptions for MicovmState from previous Firecracker versions. Moreover, add methods to translate a snapshot file from previous versions in the current one. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

fix: compilation in aarch64

458ca91

Changes we did for supporting older snapshot formats, did not really compile on ARM systems. Fix the compilation issues. The issues were mainly bad re-exports. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

bchalios force-pushed the firecracker-v1.14-direct-mem branch from 76f16f0 to 458ca91 Compare April 14, 2026 15:51

cursor Bot reviewed May 12, 2026

View reviewed changes

kalyazin and others added 3 commits May 12, 2026 10:02

djeebus added the cla-signed label May 12, 2026

cla-bot Bot removed the cla-signed label May 12, 2026

ValentaTomas mentioned this pull request May 12, 2026

validate: ignore cla-bot status when gating release builds e2b-dev/fc-versions#21

Merged

3 tasks

kalyazin force-pushed the firecracker-v1.14-direct-mem branch from c3d2d61 to 639196c Compare May 18, 2026 09:44

cla-bot Bot added the cla-signed label May 18, 2026

cursor Bot reviewed May 18, 2026

View reviewed changes

Comment thread src/vmm/src/rpc_interface.rs

bchalios added a commit that referenced this pull request May 19, 2026

Add analysis for bug #7,#8,#9

ae5a224

Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

bchalios and others added 3 commits May 21, 2026 14:52

fix(tests): fix compilation in integration test

1de011d

Previous commit (cd3fe9a) changed the signature of ArchVm::get_dirty_bitmap() to get a page_size argument, but corresponding integration test was not updated to match this change. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

fix(memory): punch holes for shared discard ranges

bd85e43

Use fallocate(PUNCH_HOLE|KEEP_SIZE) for MAP_SHARED file-backed guest memory so memfd-backed balloon hinting/reporting clears the shared backing instead of only dropping PTEs. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

cursor Bot reviewed May 21, 2026

View reviewed changes

bchalios added 2 commits May 26, 2026 17:01

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread src/vmm/src/lib.rs

Comment thread resources/seccomp/x86_64-unknown-linux-musl.json

kalyazin added 2 commits May 26, 2026 16:04

fix(machine_config): remove unused MachineConfigUpdate import in tests

3a19026

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>

cursor Bot reviewed May 26, 2026

View reviewed changes

Conversation

bchalios commented Feb 5, 2026

License Acceptance

PR Checklist

Uh oh!

Uh oh!

cursor Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

cla-bot Bot commented May 12, 2026

Uh oh!

cursor Bot May 12, 2026

Choose a reason for hiding this comment

Async CQE EOPNOTSUPP check uses wrong errno sign

Uh oh!

cla-bot Bot commented May 12, 2026

Uh oh!

djeebus commented May 12, 2026

Uh oh!

cla-bot Bot commented May 12, 2026

Uh oh!

cla-bot Bot commented May 12, 2026

Uh oh!

cla-bot Bot commented May 18, 2026

Uh oh!

Uh oh!

cursor Bot May 21, 2026

Choose a reason for hiding this comment

Missing pread64 syscall in aarch64 seccomp filter

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 26, 2026

Choose a reason for hiding this comment

Dirty API marks all resident pages

Uh oh!

cursor Bot May 26, 2026

Choose a reason for hiding this comment

Snapshot flush failure panics VMM

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cursor Bot commented Apr 14, 2026 •

edited

Loading