Implement SXM with snapshots for SMAPIv3#7137
Conversation
When reverting a snapshot, XAPI clones the snapshot disk via VDI.clone to create a new active VDI. The vdi_clone_impl in xapi-storage-script called Volume_client.clone but forgot to write the vdi-type key into the new volume's key store. vdi_create_impl does write this key, but vdi_clone_impl never did. After the revert, the next SR.scan reads back the volume metadata and gets ty = "" for the new active VDI because the key is missing. This empty type ends up in the XAPI database and breaks any logic that checks VDI type, including the snapshot migration flow. The fix adds the missing update_keys step to vdi_clone_impl. Signed-off-by: Lunfan Zhang[Lunfan.Zhang] <Lunfan.Zhang@cloud.com>
this failure is in qcow-stream-tool, not touched by this PR — likely an opam version mismatch from the upstream qcow2 work. |
Yes, this is because xs-opam was updated and the code in this branch is incompatible |
355e8ad to
da929c8
Compare
Sync master to feature, pass the check now. |
| @@ -414,7 +426,9 @@ module Mux = struct | |||
| set_snapshot_of __context ~dbg ~sr ~vdi:local_snapshot | |||
There was a problem hiding this comment.
Would these set_* do the same things?
There was a problem hiding this comment.
These set_ here will update the different VDI fields of VDI in XAPI db, so they are doing the similar thing but for different VDI fields
There was a problem hiding this comment.
The data is the same but to be written to different places: XAPI and Storage.
Will the copy in Storage be the main copy always as I see the SR.scan will try to fix XAPI copy (in XAPI DB) by the result of reading copy from Storage? If so, would the Storage be always written firstly?
| set_is_a_snapshot __context ~dbg ~sr ~vdi:local_snapshot | ||
| ~is_a_snapshot:true | ||
| ~is_a_snapshot:true ; | ||
| update_backend_snapshot_metadata ~dbg:(Debug_info.to_string _di) |
There was a problem hiding this comment.
The _di is used here. By contention, it should not start with underscore _.
There was a problem hiding this comment.
Good catch, it should renamed _di -> di in update_snapshot_info_dest since it's now consumed by update_backend_snapshot_metadata
…ation When migrating from SMAPIv1 to SMAPIv3, the snapshot relationship fields (snapshot_of, snapshot_time, is_a_snapshot) were only updated in the XAPI database, not on the destination backend storage. A subsequent SR.scan then read the stale custom keys back from the volume metadata and overwrote the XAPI database, which removed the relation between the snapshots and their leaf VDI (observed on GFS2 SRs). This adds a new VDI.set_snapshot_metadata SMAPIv2 operation that writes these fields into the volume key store. It is declared in storage_interface, defaulted in storage_skeleton, implemented in xapi-storage-script (SMAPIv3), and wired through storage_mux. SMAPIv1 implements it as a no-op since its snapshot metadata is managed by the XAPI database only. SR.update_snapshot_info_dest now calls it best-effort after updating the database (errors are logged and swallowed, the XAPI DB remains authoritative) so the backend metadata stays consistent across an SR.scan. Signed-off-by: Lunfan Zhang[Lunfan.Zhang] <Lunfan.Zhang@cloud.com>
During SXM mirroring the DATA.stat polling loop repeatedly calls VDI.stat, which issues a Volume.stat RPC to the storage script on every poll. This adds unnecessary load and latency while waiting for a mirror to complete. Introduce a lightweight per-(SR, VDI) cache in DATAImpl: [mirror] populates it at mirror start, [stat] serves from it on a cache hit instead of calling Volume.stat, and the entry is evicted when the mirror completes or fails. The cache needs no mutex because the module runs in a cooperative Lwt async context. Signed-off-by: Lunfan Zhang[Lunfan.Zhang] <Lunfan.Zhang@cloud.com>
|
Is there a way to test the patch ? Do you have some smapiv3 plugins that support migration? |
Of course, there is SMAPIv3 plugin to support the migration, and which has been verified. |
Do you have a link to download it? |
No link, It`s an internal repo. but the document of SMAPIv3 is opensource, you can access it from XAPI project. |
We have some SMAPIv3 plugins (such as https://github.com/xcp-ng/xcp-ng-xapi-storage), but AFAIK migration is not supported. I read the SMAPIv3 documentation, but verifying the functionality would be easier with a real plugin that supports migration. |
…->SMAPIv3 SXM
V1 receive_finalize_common takes a "dummy" VDI snapshot of the dest leaf
(storage_smapiv1_migrate.ml:657) so that VDI.compose can later merge the
mirrored leaf onto its parent. On SMAPIv1 SRs the dummy must be destroyed
explicitly after compose; on SMAPIv3 SRs Volume.compose silently consumes
the dummy backend volume but leaves the XAPI DB row behind, so the
subsequent VDI.destroy raises Volume_does_not_exist (currently swallowed
by log_and_ignore_exn) and the orphan DB row persists. Any subsequent
SXM v3 migration then aborts in assert_smapiv3_snapshot_topology_migratable
because the orphan looks like a hidden vdi-snapshot of the active disk.
Catch the V3-specific Backend_error_with_backtrace("SR_BACKEND_FAILURE",
_ :: "Volume_does_not_exist" :: uuid :: _) pattern (with uuid match
against r.dummy_vdi) and trigger SR.scan via the public XenAPI to
reconcile the orphan DB row. All other failures preserve the historical
V1 best-effort tolerate-and-log behaviour, so SMAPIv1 destinations are
unaffected.
reconcile_sr_after_v3_destroy is placed in storage_migrate_helper so the
in-flight SMAPIv3-native migration engine can reuse it.
Signed-off-by: Lunfan Zhang[Lunfan.Zhang] <Lunfan.Zhang@cloud.com>
Implement the new SXM v3 storage-layer migration for SMAPIv3 SRs, enabling a VDI to be mirrored together with its snapshot chain, including reverted/nested snapshot structures. storage_smapiv3_migrate gains snapshot-tree discovery (get_snapshot_tree), the DFS walk that mirrors/clones each node while keeping the storage chain advancing on the same leaf, NBD proxy export, and the send_start orchestration that drives the per-node mirroring. storage_migrate_helper.State is extended with a snapshot_relation type and set/get/remove_snapshot_mappings so the source->destination snapshot pairings recorded during mirroring can be retrieved later to restore snapshot metadata on the destination. Signed-off-by: Lunfan Zhang[Lunfan.Zhang] <Lunfan.Zhang@cloud.com>
Drive the new SMAPIv3 snapshot-aware mirroring from the VM migration flow. `vdi_copy_fun` now retrieves the snapshot relations recorded during send_start (via Storage_migrate_helper.State.get_snapshot_mappings), builds a mirror record for each mirrored snapshot, and feed the leaf mirror record together with the per-snapshot mirror records into the post-mirror continuation as a single list; the mappings are cleared afterwards. migrate_send' excludes SMAPIv3 snapshots from the set of copyable snapshots because they are already mirrored during send_start. Signed-off-by: Lunfan Zhang[Lunfan.Zhang] <Lunfan.Zhang@cloud.com>
da929c8 to
ad50d8b
Compare
| >>>= (fun sr -> | ||
| clone ~dbg ~sr | ||
| ~vdi:(Storage_interface.Vdi.string_of vdi_info.Storage_interface.vdi) | ||
| >>>= update_keys ~dbg ~sr ~key:_vdi_type_key |
There was a problem hiding this comment.
From https://xapi-project.github.io/xapi-storage/#volume-method-clone, "Note the name and description are copied but any extra metadata associated by [set] is not copied."
Is this key included in the metadata? If so, it seems to me that this should be done in revert code logic as clone is only a step of revert.
| let rpc = of_sr sr | ||
| end)) in | ||
| C.VDI.set_snapshot_metadata dbg sr snapshot leaf snapshot_time true | ||
| with e -> |
There was a problem hiding this comment.
Is it really proper to swallow the error? It's local operation. The failure would suggest severe error. And it would be worse given the backend metadata is the main copy.
| VDI.stat ~dbg ~sr ~vdi:temporary | ||
| VDI.stat ~dbg ~sr ~vdi:temporary >>>= fun temp_response -> | ||
| Base.Hashtbl.set vdi_stat_cache ~key:cache_key ~data:temp_response ; | ||
| return temp_response |
There was a problem hiding this comment.
Could it be:
VDI.stat ~dbg ~sr ~vdi >>>= fun response ->
( match
List.assoc_opt _clone_on_boot_key response.Xapi_storage.Control.keys
with
| None ->
return response
| Some temporary ->
VDI.stat ~dbg ~sr ~vdi:temporary
)
>>>= fun response ->
Base.Hashtbl.set vdi_stat_cache ~key:cache_key ~data:response ;
choose_datapath response >>>= fun (rpc, _datapath, uri) ->
?
| VDI.stat ~dbg ~sr ~vdi >>>= fun response -> | ||
| ( match | ||
| List.assoc_opt _clone_on_boot_key response.Xapi_storage.Control.keys | ||
| with | ||
| | None -> | ||
| return response | ||
| | Some temporary -> | ||
| VDI.stat ~dbg ~sr ~vdi:temporary |
There was a problem hiding this comment.
This duplicates the snippet in mirror. Could they share a function?
The new SXM v3 storage-layer engine drives the migration from the source side. For each disk being migrated, it discovers the snapshot tree from the source VM, then DFS-walks the tree and mirrors each node over an NBD proxy onto the destination. Inactive (reverted) branches are mirrored onto cloned working VDIs so they don't affect the active leaf, while the active path keeps advancing on the same destination leaf — so when the walk bottoms out, the destination leaf is correctly positioned for the live-leaf mirror that follows. The source→destination snapshot pairings are recorded so the toolstack can reconstruct the snapshot relationships on the destination once the mirror completes.
The result: a VM with snapshots on a SMAPIv3 SR can now be live-migrated cleanly, with its full snapshot tree (including reverted branches) reproduced on the destination.
Design refer: https://github.com/xapi-project/xen-api/blob/feature/sxm-v3/doc/content/xapi/storage/sxm/sxm-v3-with-snapshot.md