chore: fix test flakes#247
Merged
Merged
Conversation
fac58c9 to
f17ed89
Compare
f17ed89 to
773ad49
Compare
Monitoring Plan: Vsock nil-guard, connection eviction, restart-policy fixWhat this PR does: Fixes three latent runtime bugs in the hypervisor communication layer — prevents a potential nil-pointer panic when the vsock dialer is unavailable, ensures stale gRPC connections are evicted after a failed non-retrying exec, and corrects a race in the restart-policy controller that could prematurely clear the restart-attempt counter while a restart was still in flight. Intended effect:
Risks:
Status updates will be posted automatically on this PR as monitoring progresses. |
hiroTamada
approved these changes
May 30, 2026
abc499d to
b62f483
Compare
b62f483 to
4a5cc98
Compare
Collaborator
Author
|
4 re-runs worked |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Validation
Deft notes
Note
Medium Risk
Changes guest connection pooling, fork restore readiness, and restart-policy reset timing in production paths; scope is defensive and test-focused but affects instance lifecycle behavior.
Overview
This PR hardens test and CI reliability around guest connectivity, fork lifecycles, restart policy, and embedded Cloud Hypervisor binaries.
Guest exec now drops pooled gRPC connections after retryable no-wait failures, matching the wait-and-retry path so stale vsock connections are not reused.
Builds treat a nil vsock dialer as an explicit error while waiting for the builder agent. Running forks wait for the restored source guest agent even when networking is disabled, and fork readiness checks no longer skip network-disabled instances. Warm fork chain tests use a 90s readiness budget.
Restart policy only resets attempt counters after stability measured from the later of instance start and the latest restart attempt, avoiding races with in-flight health-check restarts.
Firecracker fork disk assertions use workspace disk utilization totals instead of host-wide free-space deltas. Basic e2e tests treat nginx startup log waits as diagnostics; ingress HTTP probes remain the behavior check.
ensure-ch-binariesverifies cached Cloud Hypervisor binaries withfileand refreshes wrong-arch or corrupt caches preserved across CI runs.Reviewed by Cursor Bugbot for commit 4a5cc98. Bugbot is set up for automated code reviews on this repo. Configure here.