Skip to content

Implement SXM with snapshots for SMAPIv3#7137

Open
LunfanZhang wants to merge 6 commits into
xapi-project:feature/sxm-v3from
LunfanZhang:private/luzhan/sxm-v3
Open

Implement SXM with snapshots for SMAPIv3#7137
LunfanZhang wants to merge 6 commits into
xapi-project:feature/sxm-v3from
LunfanZhang:private/luzhan/sxm-v3

Conversation

@LunfanZhang

Copy link
Copy Markdown
Collaborator

The new SXM v3 storage-layer engine drives the migration from the source side. For each disk being migrated, it discovers the snapshot tree from the source VM, then DFS-walks the tree and mirrors each node over an NBD proxy onto the destination. Inactive (reverted) branches are mirrored onto cloned working VDIs so they don't affect the active leaf, while the active path keeps advancing on the same destination leaf — so when the walk bottoms out, the destination leaf is correctly positioned for the live-leaf mirror that follows. The source→destination snapshot pairings are recorded so the toolstack can reconstruct the snapshot relationships on the destination once the mirror completes.

The result: a VM with snapshots on a SMAPIv3 SR can now be live-migrated cleanly, with its full snapshot tree (including reverted branches) reproduced on the destination.

Design refer: https://github.com/xapi-project/xen-api/blob/feature/sxm-v3/doc/content/xapi/storage/sxm/sxm-v3-with-snapshot.md

  • Migrate with SMAPIv3 → SMAPIv1 and SMAPIv1 → SMAPIv3 round-trip with multiple snapshots + mid-way revert verified
  • Stress test with a complex snapshot tree (multiple reverted branches plus an active leaf); full tree reproduced on the destination, no source-side leaks

When reverting a snapshot, XAPI clones the snapshot disk via VDI.clone
to create a new active VDI. The vdi_clone_impl in xapi-storage-script
called Volume_client.clone but forgot to write the vdi-type key into
the new volume's key store. vdi_create_impl does write this key, but
vdi_clone_impl never did.

After the revert, the next SR.scan reads back the volume metadata and
gets ty = "" for the new active VDI because the key is missing. This
empty type ends up in the XAPI database and breaks any logic that
checks VDI type, including the snapshot migration flow.

The fix adds the missing update_keys step to vdi_clone_impl.

Signed-off-by: Lunfan Zhang[Lunfan.Zhang] <Lunfan.Zhang@cloud.com>
@LunfanZhang

LunfanZhang commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

File "ocaml/qcow-stream-tool/qcow_stream_tool.ml", line 17, characters 8-24:
17 | data_cluster_map
^^^^^^^^^^^^^^^^
Error: This expression has type Qcow_mapping.t
but an expression was expected of type 'a Qcow_types.Cluster.Map.t

this failure is in qcow-stream-tool, not touched by this PR — likely an opam version mismatch from the upstream qcow2 work.
maybe a rebase to master is required.

@psafont

psafont commented Jun 22, 2026

Copy link
Copy Markdown
Member

maybe a rebase to master is required.

Yes, this is because xs-opam was updated and the code in this branch is incompatible

@LunfanZhang LunfanZhang force-pushed the private/luzhan/sxm-v3 branch from 355e8ad to da929c8 Compare June 23, 2026 06:24
@LunfanZhang

Copy link
Copy Markdown
Collaborator Author

maybe a rebase to master is required.

Yes, this is because xs-opam was updated and the code in this branch is incompatible

Sync master to feature, pass the check now.

@LunfanZhang LunfanZhang changed the title Enable SMAPIv3 storage migration with snapshots Implement SXM with snapshots for SMAPIv3 Jun 23, 2026
Comment thread ocaml/xapi/storage_mux.ml
@@ -414,7 +426,9 @@ module Mux = struct
set_snapshot_of __context ~dbg ~sr ~vdi:local_snapshot

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would these set_* do the same things?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These set_ here will update the different VDI fields of VDI in XAPI db, so they are doing the similar thing but for different VDI fields

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data is the same but to be written to different places: XAPI and Storage.
Will the copy in Storage be the main copy always as I see the SR.scan will try to fix XAPI copy (in XAPI DB) by the result of reading copy from Storage? If so, would the Storage be always written firstly?

Comment thread ocaml/xapi/storage_mux.ml Outdated
set_is_a_snapshot __context ~dbg ~sr ~vdi:local_snapshot
~is_a_snapshot:true
~is_a_snapshot:true ;
update_backend_snapshot_metadata ~dbg:(Debug_info.to_string _di)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _di is used here. By contention, it should not start with underscore _.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it should renamed _di -> di in update_snapshot_info_dest since it's now consumed by update_backend_snapshot_metadata

…ation

When migrating from SMAPIv1 to SMAPIv3, the snapshot relationship
fields (snapshot_of, snapshot_time, is_a_snapshot) were only updated
in the XAPI database, not on the destination backend storage. A
subsequent SR.scan then read the stale custom keys back from the
volume metadata and overwrote the XAPI database, which removed the
relation between the snapshots and their leaf VDI (observed on GFS2
SRs).

This adds a new VDI.set_snapshot_metadata SMAPIv2 operation that
writes these fields into the volume key store. It is declared in
storage_interface, defaulted in storage_skeleton, implemented in
xapi-storage-script (SMAPIv3), and wired through storage_mux.
SMAPIv1 implements it as a no-op since its snapshot metadata is
managed by the XAPI database only. SR.update_snapshot_info_dest now
calls it best-effort after updating the database (errors are logged
and swallowed, the XAPI DB remains authoritative) so the backend
metadata stays consistent across an SR.scan.

Signed-off-by: Lunfan Zhang[Lunfan.Zhang] <Lunfan.Zhang@cloud.com>
During SXM mirroring the DATA.stat polling loop repeatedly calls
VDI.stat, which issues a Volume.stat RPC to the storage script on
every poll. This adds unnecessary load and latency while waiting for
a mirror to complete.

Introduce a lightweight per-(SR, VDI) cache in DATAImpl: [mirror]
populates it at mirror start, [stat] serves from it on a cache hit
instead of calling Volume.stat, and the entry is evicted when the
mirror completes or fails. The cache needs no mutex because the
module runs in a cooperative Lwt async context.

Signed-off-by: Lunfan Zhang[Lunfan.Zhang] <Lunfan.Zhang@cloud.com>
@gthvn1

gthvn1 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Is there a way to test the patch ? Do you have some smapiv3 plugins that support migration?

@LunfanZhang

Copy link
Copy Markdown
Collaborator Author

Is there a way to test the patch ? Do you have some smapiv3 plugins that support migration?

Of course, there is SMAPIv3 plugin to support the migration, and which has been verified.

@gthvn1

gthvn1 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Is there a way to test the patch ? Do you have some smapiv3 plugins that support migration?

Of course, there is SMAPIv3 plugin to support the migration, and which has been verified.

Do you have a link to download it?

@LunfanZhang

LunfanZhang commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator Author

Is there a way to test the patch ? Do you have some smapiv3 plugins that support migration?

Of course, there is SMAPIv3 plugin to support the migration, and which has been verified.

Do you have a link to download it?

No link, It`s an internal repo. but the document of SMAPIv3 is opensource, you can access it from XAPI project.

@gthvn1

gthvn1 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

No link, It`s an internal repo. but the document of SMAPIv3 is opensource, you can access it from XAPI project.

We have some SMAPIv3 plugins (such as https://github.com/xcp-ng/xcp-ng-xapi-storage), but AFAIK migration is not supported. I read the SMAPIv3 documentation, but verifying the functionality would be easier with a real plugin that supports migration.

…->SMAPIv3 SXM

V1 receive_finalize_common takes a "dummy" VDI snapshot of the dest leaf
(storage_smapiv1_migrate.ml:657) so that VDI.compose can later merge the
mirrored leaf onto its parent. On SMAPIv1 SRs the dummy must be destroyed
explicitly after compose; on SMAPIv3 SRs Volume.compose silently consumes
the dummy backend volume but leaves the XAPI DB row behind, so the
subsequent VDI.destroy raises Volume_does_not_exist (currently swallowed
by log_and_ignore_exn) and the orphan DB row persists. Any subsequent
SXM v3 migration then aborts in assert_smapiv3_snapshot_topology_migratable
because the orphan looks like a hidden vdi-snapshot of the active disk.

Catch the V3-specific Backend_error_with_backtrace("SR_BACKEND_FAILURE",
_ :: "Volume_does_not_exist" :: uuid :: _) pattern (with uuid match
against r.dummy_vdi) and trigger SR.scan via the public XenAPI to
reconcile the orphan DB row. All other failures preserve the historical
V1 best-effort tolerate-and-log behaviour, so SMAPIv1 destinations are
unaffected.

reconcile_sr_after_v3_destroy is placed in storage_migrate_helper so the
in-flight SMAPIv3-native migration engine can reuse it.

Signed-off-by: Lunfan Zhang[Lunfan.Zhang] <Lunfan.Zhang@cloud.com>
Implement the new SXM v3 storage-layer migration for SMAPIv3 SRs,
enabling a VDI to be mirrored together with its snapshot chain,
including reverted/nested snapshot structures.

storage_smapiv3_migrate gains snapshot-tree discovery
(get_snapshot_tree), the DFS walk that mirrors/clones each node while
keeping the storage chain advancing on the same leaf, NBD proxy
export, and the send_start orchestration that drives the per-node
mirroring. storage_migrate_helper.State is extended with a
snapshot_relation type and set/get/remove_snapshot_mappings so the
source->destination snapshot pairings recorded during mirroring can
be retrieved later to restore snapshot metadata on the destination.

Signed-off-by: Lunfan Zhang[Lunfan.Zhang] <Lunfan.Zhang@cloud.com>
Drive the new SMAPIv3 snapshot-aware mirroring from the VM migration
flow.

`vdi_copy_fun` now retrieves the snapshot relations recorded during
send_start (via Storage_migrate_helper.State.get_snapshot_mappings),
builds a mirror record for each mirrored snapshot, and feed the
leaf mirror record together with the per-snapshot mirror records
into the post-mirror continuation as a single list; the mappings
are cleared afterwards. migrate_send' excludes SMAPIv3 snapshots
from the set of copyable snapshots because they are already
mirrored during send_start.

Signed-off-by: Lunfan Zhang[Lunfan.Zhang] <Lunfan.Zhang@cloud.com>
@LunfanZhang LunfanZhang force-pushed the private/luzhan/sxm-v3 branch from da929c8 to ad50d8b Compare June 24, 2026 09:53
>>>= (fun sr ->
clone ~dbg ~sr
~vdi:(Storage_interface.Vdi.string_of vdi_info.Storage_interface.vdi)
>>>= update_keys ~dbg ~sr ~key:_vdi_type_key

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://xapi-project.github.io/xapi-storage/#volume-method-clone, "Note the name and description are copied but any extra metadata associated by [set] is not copied."
Is this key included in the metadata? If so, it seems to me that this should be done in revert code logic as clone is only a step of revert.

Comment thread ocaml/xapi/storage_mux.ml
let rpc = of_sr sr
end)) in
C.VDI.set_snapshot_metadata dbg sr snapshot leaf snapshot_time true
with e ->

@minglumlu minglumlu Jun 29, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really proper to swallow the error? It's local operation. The failure would suggest severe error. And it would be worse given the backend metadata is the main copy.

VDI.stat ~dbg ~sr ~vdi:temporary
VDI.stat ~dbg ~sr ~vdi:temporary >>>= fun temp_response ->
Base.Hashtbl.set vdi_stat_cache ~key:cache_key ~data:temp_response ;
return temp_response

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be:

VDI.stat ~dbg ~sr ~vdi >>>= fun response ->
( match
    List.assoc_opt _clone_on_boot_key response.Xapi_storage.Control.keys
  with
  | None ->
      return response
  | Some temporary ->
      VDI.stat ~dbg ~sr ~vdi:temporary
  )
>>>= fun response ->
Base.Hashtbl.set vdi_stat_cache ~key:cache_key ~data:response ;
choose_datapath response >>>= fun (rpc, _datapath, uri) ->

?

Comment on lines -1828 to -1835
VDI.stat ~dbg ~sr ~vdi >>>= fun response ->
( match
List.assoc_opt _clone_on_boot_key response.Xapi_storage.Control.keys
with
| None ->
return response
| Some temporary ->
VDI.stat ~dbg ~sr ~vdi:temporary

@minglumlu minglumlu Jun 29, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates the snippet in mirror. Could they share a function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants