From 1057099a86346f6d3307bda8c6d6dae519a5a666 Mon Sep 17 00:00:00 2001 From: Komh Date: Fri, 24 Apr 2026 11:19:20 +0000 Subject: [PATCH] =?UTF-8?q?[virtualization]=20"VM=20Migration=20Disk=20Tra?= =?UTF-8?q?nsfer=20Is=20Serialized=20=E2=80=94=20Parallel=20Disk=20Copy=20?= =?UTF-8?q?Is=20a=20Tracked=20RFE"?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...zed_Parallel_Disk_Copy_Is_a_Tracked_RFE.md | 86 +++++++++++++++++++ 1 file changed, 86 insertions(+) create mode 100644 docs/en/solutions/VM_Migration_Disk_Transfer_Is_Serialized_Parallel_Disk_Copy_Is_a_Tracked_RFE.md diff --git a/docs/en/solutions/VM_Migration_Disk_Transfer_Is_Serialized_Parallel_Disk_Copy_Is_a_Tracked_RFE.md b/docs/en/solutions/VM_Migration_Disk_Transfer_Is_Serialized_Parallel_Disk_Copy_Is_a_Tracked_RFE.md new file mode 100644 index 00000000..01a3bb4d --- /dev/null +++ b/docs/en/solutions/VM_Migration_Disk_Transfer_Is_Serialized_Parallel_Disk_Copy_Is_a_Tracked_RFE.md @@ -0,0 +1,86 @@ +--- +kind: + - Information +products: + - Alauda Container Platform +ProductsVersion: + - 4.1.0,4.2.x +--- +## Overview + +When migrating a multi-disk VM from VMware (or another source hypervisor) into ACP Virtualization through the migration toolkit, operators observe that the disk transfers happen **sequentially** — one disk at a time. For a VM with several large disks, this sets the total migration time at the sum of per-disk copy times, rather than the maximum of them. + +The question is whether the toolkit supports parallel disk copy to shorten that wall-clock time. + +## Current Behaviour + +The migration toolkit's disk-transfer orchestration invokes the underlying v2v converter per disk in sequence. Specifically: + +1. The toolkit reads the source VM's disk list. +2. For each disk, the toolkit starts a v2v transfer, waits for it to complete, then moves to the next. +3. The VM's migration cannot complete until every disk has transferred. + +A VM with four 500 GiB disks therefore migrates in roughly four times the per-disk transfer duration, not in the fastest-disk's duration. + +The underlying `virt-v2v` converter itself has gained support for parallel disk copy in recent versions. The toolkit's orchestration layer, however, does not yet drive that parallelism — the integration work to pipeline multiple disks through v2v concurrently is a separate concern and is still in development. The ability to parallelise is therefore latent in the stack but not yet exposed through the toolkit's Plan / Migration flow. + +This limitation is tracked as a Request for Enhancement (RFE) against the migration toolkit. Future toolkit releases will expose a `spec.diskTransferConcurrency` (or similarly named) field on the `Plan` or `Migration` CR; the v2v engine's existing parallel path will be activated from there. + +## What to Do Today + +Until parallel disk copy is exposed, the practical options are: + +### 1. Parallelise at the VM level, not the disk level + +Migrate multiple VMs **concurrently** rather than expecting a single VM to finish faster. The toolkit already runs multiple VM migrations in parallel (governed by the Plan's worker concurrency and the cluster's available CDI importer pods). A batch of many small VMs finishes faster than the wall-clock time the same total data would take on a serialised-per-disk single-VM migration. + +Practical tuning: + +- Set the Plan's concurrency to a reasonable multiple of available network bandwidth / CDI importer capacity. +- Group VMs in batches such that the sum of their largest disks fits under the cluster's transfer bandwidth budget; prioritise batches that parallelise well. + +### 2. Start with cold migration for large multi-disk VMs + +A cold migration (VM is stopped at the source, disks are transferred in full, VM is restarted at the destination) also runs disks serially, but each disk's transfer is faster — no warm-migration overhead (no snapshot chains, no dirty-page tracking). For very large VMs, the absolute wall-clock time of a cold migration with serial transfers can still beat the warm-migration alternative. + +### 3. Pre-seed the destination through storage-level replication + +If the source and destination share a storage platform that supports block-level replication (e.g. same storage array with replication between source VMware datastore and destination ACP storage class), replicate the disks out-of-band before the migration runs. The toolkit's migration then mostly serves as the metadata swap — the disk transfer is a no-op because the blocks are already in place. + +This is storage-platform-specific and not available on every backend. Where it is available, it can cut migration time for the largest VMs by an order of magnitude. + +### 4. Wait for the RFE + +For non-urgent migrations, wait for the toolkit release that exposes parallel disk copy. The RFE is tracked internally; check the platform's release notes when new operator versions become available. Once the feature ships, a single config field on the Plan will unlock the existing parallelism in `virt-v2v`. + +## What Not to Do + +- **Do not manually split a multi-disk VM into single-disk VMs before migration.** The split changes the VM's identity (PCI layout, disk order), breaks in-guest references (`/etc/fstab`, drive letters on Windows), and often requires guest-side reconfiguration on the destination. +- **Do not edit the toolkit's pod templates to force concurrent v2v invocations.** The orchestration layer synchronises disk transfers for correctness (one completed disk before the next starts); running v2v twice against the same source VM can corrupt the snapshot chain the toolkit is relying on. + +## Diagnostic Steps + +Confirm that disks are indeed transferring serially in your case: + +```bash +MIG_NS= +PLAN= + +# List the VMs in the plan and their progress. +kubectl -n "$MIG_NS" get plan "$PLAN" -o json | \ + jq '.status.migration.vms[] | {name, phase, progress: (.disks // [])}' +``` + +The `disks` array records per-disk state. At any moment, at most one disk per VM is `Copying`; the others are either `Pending` (not yet started) or `Completed`. + +Track per-disk start and completion times: + +```bash +kubectl -n "$MIG_NS" get plan "$PLAN" -o json | \ + jq '.status.migration.vms[].disks[] | + {name, phase, startedAt: .startedAt, finishedAt: .finishedAt}' +``` + +Measure the gaps between one disk finishing and the next starting — should be small (seconds). A very long gap indicates something beyond the "serial" design is happening (transfer pod failed to schedule, importer pod stuck, etc.) and is worth investigating separately. + +For total migration time estimation: sum the per-disk transfer durations and add a small orchestration overhead. That product is the wall-clock time to plan for until the parallel-copy feature ships. Budgeting migrations around that number lets operations teams stagger batches across maintenance windows appropriately.