Skip to content

Remove mailbox for now#268

Merged
sjmiller609 merged 4 commits into
mainfrom
revert-260-hypeship/network-handoff-v2
Jun 3, 2026
Merged

Remove mailbox for now#268
sjmiller609 merged 4 commits into
mainfrom
revert-260-hypeship/network-handoff-v2

Conversation

@sjmiller609
Copy link
Copy Markdown
Collaborator

@sjmiller609 sjmiller609 commented Jun 2, 2026

Reverts #260


Note

Medium Risk
Changes networked fork/restore behavior on Firecracker by dropping the fast mailbox/ack path; restores depend on vsock guest-agent reconfigure, which affects timing and failure modes for forked standby restores.

Overview
Reverts the guest-initiated resume network mailbox flow (PR #260): the host no longer patches Firecracker snapshot memory, waits for guest UDP acks, or arms mailbox env on create/start.

Restore and fork networking after standby/running resume now rely only on the existing host-initiated path—reconfigureGuestNetwork via guest-agent gRPC (with shell ip exec fallback)—when a fresh allocation is applied for forked standby restores.

Removed code spans lib/mailbox, instance handoff helpers (prepareResumeNetworkHandoff, mailbox patching/UDP waiter), guest-agent VMGenID/mailbox watcher, related tests/docs, and a ForkSnapshot API unit test. Linux test env no longer passes HYPEMAN_TEST_NETWORK_TMPDIR; parallel test network locks/leases use the system temp dir only.

Reviewed by Cursor Bugbot for commit 71309cb. Bugbot is set up for automated code reviews on this repo. Configure here.

@sjmiller609 sjmiller609 requested a review from rgarcia June 2, 2026 22:15
@firetiger-agent
Copy link
Copy Markdown

Created a monitoring plan for this PR.

What this PR does: Removes the guest-initiated resume-network mailbox — a fast-path where hypeman pre-patched VM snapshot memory with new network identity before restore. Network reconfiguration after standby→running transitions now uses only the shell-exec path (already deployed as the fallback).

Intended effect:

  • Spawn success rate: baseline 99.9–100% (310K–754K successes/hr active hours); confirmed if rate holds at ≥99.9%
  • POST /browsers 500 errors: baseline 0–6/hr during clean active hours (10K–14K req/hr); confirmed if no sustained increase above ~50/hr post-deploy

Risks:

  • Restore failures (exec path less reliable than mailbox)kernel_invocation_spawn_total{success=false}, alert if >1,000/hr sustained for 2+ hours (clean baseline: 0–240/hr; infra-noise ceiling ~774/hr)
  • Restore latency increase — POST /browsers p95 latency, alert if sustained above ~12s (baseline 8–10s p95)
  • Fork readiness regression — spawn failures or "wait for forked guest agent readiness" ERROR logs; alert on any new occurrence
  • failed to configure guest network after restore ERROR logs — any occurrence post-deploy is actionable

Status updates will be posted automatically on this PR as monitoring progresses.

View monitor

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 58ce89f. Configure here.

Comment thread lib/instances/manager.go Outdated
@sjmiller609 sjmiller609 changed the title Revert "Add mailbox resume network handoff" Remove mailbox for now Jun 2, 2026
@sjmiller609
Copy link
Copy Markdown
Collaborator Author

CI server firewall was messed up dropping guest -> host communication, which is why CI was failing:

TASK [nftables_firewall : Deploy nftables ruleset] ********************************************************************************************************************************
--- before: /etc/nftables.conf
+++ after: /Users/steven/.ansible/tmp/ansible-local-67508yu0zar6_/tmpbq_ycng0/nftables.conf.j2
@@ -20,6 +20,9 @@
...
+
+        # Trusted host-local interfaces
+        iifname "hm*" accept

@sjmiller609 sjmiller609 merged commit 43d1210 into main Jun 3, 2026
17 of 20 checks passed
@sjmiller609 sjmiller609 deleted the revert-260-hypeship/network-handoff-v2 branch June 3, 2026 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants