Summary
In hibernation/wake.go, when reconcileWake observes that the VM has already come back (container Running + VMID set), it immediately drops the snapshot tag and marks the CR Active, without clearing the vm.cocoonstack.io/hibernate annotation on the pod.
// hibernation/wake.go
if vmClonedAndRunning(pod) {
r.Epoch.DeleteManifest(ctx, vmName, meta.HibernateSnapshotTag)
return ctrl.Result{}, r.setPhase(ctx, hib, cocoonv1.CocoonHibernationPhaseActive, vmName)
}
// this clear is skipped on the fast-path above
if meta.ReadHibernateState(pod) {
commonk8s.PatchHibernateState(ctx, r.Client, pod, false)
}
Scenario
- A pod is already running with a valid VMID, but still carries
hibernate=true (e.g. residue from a prior failed hibernate, or a CR created against an already-awake pod).
- User creates/sets
Desire=Wake. First reconcile hits the fast-path, returns Active. The hibernate=true annotation is left in place.
- User flips
Desire=Hibernate. reconcileHibernate calls PatchHibernateState(pod, true), which is a no-op because the annotation already matches (see cocoon-common/k8s/utils.go:27).
- The reconciler immediately probes the registry for the snapshot tag. If a stale tag happens to be present, the CR gets marked
Hibernated without vk-cocoon ever taking a new snapshot for this cycle.
Impact
A subsequent wake would clone from a stale (or nonexistent) snapshot, resulting in data divergence or a stuck Waking phase.
Notes
- This is pre-existing behavior (predates
82a9bc3). Not introduced by the recent VMID-gate hardening.
- Raised during a
/code review of HEAD~3..HEAD; deferred out of scope for that review.
Possible fixes
- Always call
PatchHibernateState(pod, false) before returning Active on the fast-path.
- Or: move the
ReadHibernateState/PatchHibernateState block above the fast-path, so the annotation is cleared unconditionally during any wake reconcile.
- Either fix needs a small unit test covering the "hibernate annotation residue on an already-live pod" case.
Summary
In
hibernation/wake.go, whenreconcileWakeobserves that the VM has already come back (container Running + VMID set), it immediately drops the snapshot tag and marks the CRActive, without clearing thevm.cocoonstack.io/hibernateannotation on the pod.Scenario
hibernate=true(e.g. residue from a prior failed hibernate, or a CR created against an already-awake pod).Desire=Wake. First reconcile hits the fast-path, returnsActive. Thehibernate=trueannotation is left in place.Desire=Hibernate.reconcileHibernatecallsPatchHibernateState(pod, true), which is a no-op because the annotation already matches (seecocoon-common/k8s/utils.go:27).Hibernatedwithout vk-cocoon ever taking a new snapshot for this cycle.Impact
A subsequent wake would clone from a stale (or nonexistent) snapshot, resulting in data divergence or a stuck
Wakingphase.Notes
82a9bc3). Not introduced by the recent VMID-gate hardening./codereview of HEAD~3..HEAD; deferred out of scope for that review.Possible fixes
PatchHibernateState(pod, false)before returningActiveon the fast-path.ReadHibernateState/PatchHibernateStateblock above the fast-path, so the annotation is cleared unconditionally during any wake reconcile.