CP-313264: introduce faster migration datapath using kernel TLS (kTLS) instead of stunnel#7154
CP-313264: introduce faster migration datapath using kernel TLS (kTLS) instead of stunnel#7154mg12 wants to merge 2 commits into
Conversation
stunnel is an external process that provides secure TLS transport for the
xenguest VM-migrate stream. It's a separate process from xenguest, and
the dom0 kernel pipes the data between these processes, wasting cpu and
memory throughput. The result is a slower VM-migrate experience, and
consequently a slower host-evacuate experience for the user.
This change removes this inefficient data pipe between stunnel and xenguest,
with a new option to replace stunnel's TLS with kernel TLS (kTLS). When
this ktls option is enabled, the kernel transparently uses TLS to securely
transport the data produced by xenguest, without the need to modify
xenguest and without the need to use stunnel.
This new ktls option uses a small C helper that:
(a) uses the same stunnel's OpenSSL library to perform the TLS handshake
and install the symmetric key into the kernel to perform kTLS, and
(b) hands the kTLS-enabled socket fd back to xenopsd over SCM_RIGHTS,
which stunnel can't do.
xenopsd then treats the fd as an ordinary TCP socket; the kernel encrypts
on the way out so the stunnel pipe and the extra data copies disappear
from the xenguest sender side. The receiver side for now still uses stunnel
on the destination host, and can be updated to use kTLS as well in the
future.
Measurements indicate significant improvements:
* host-evacuate time: ~1.50x faster (stunnel/kTLS), depending on the load
of the guests (higher load the better, as there is more guest data
pages being transmitted, see details in the design page).
* dom0 cpu usage: ~20% less (see details in the design page).
Activation uses a new xenopsd.conf flag so the existing path remains the
default and the two paths can be A/B compared on the same build:
migration-tls = "ktls" # use the helper
migration-tls = "stunnel" # explicit default
migration-tls = "" # currently defaults to original "stunnel"
If the helper fails for any reason before the fd is handed to the
migration (binary missing, TLS handshake error, tls.ko not loaded, cipher
rejected by the kernel, SCM_RIGHTS message lost, helper timeout)
then xenopsd logs a single warn line:
migration-tls=ktls connect failed for <host>:<port> (<reason>);
falling back to stunnel for this connection
and transparently falls back to the original Open_uri.with_open_uri so
the migration still proceeds using stunnel as before for the sender.
Signed-off-by: Marcus Granado <marcus.granado@citrix.com>
Signed-off-by: Marcus Granado <marcus.granado@citrix.com>
andyhhp
left a comment
There was a problem hiding this comment.
The headline improvement here speaks for itself, and the proposed approach seems like an obvious way forward.
How applicable is this to other streams? Migrate might be the bulkiest data but it's not the only bulk data which is bouncing around userspace pipes. Storage migration also comes to mind, and RRDs too.
| socket with `setsockopt(SOL_TLS, ...)`, the kernel encrypts/decrypts every | ||
| subsequent `read`/`write` transparently, using AES-NI, producing the same | ||
| byte stream stunnel produces today. |
There was a problem hiding this comment.
What visibility/debuggibility do we have on the kernel's choice of algorithm?
This approach gets more valuable when we ensure that dom0 can see and use all hardware accelerations. i.e. we probably want to be more proactive at checking details like this (and fixing if necessary) as part of supporting new CPUs.
| Conclusion: design C (kTLS) seems superior, as it reaches the goal with fewer | ||
| changes, is toolstack-only (no xen-devel upstream loop, no future xenguest | ||
| security maintenance), is a small focused tool rather than a change to the | ||
| highly-complex xenguest, leaves the stunnel option in place, and opens the way | ||
| to kTLS hardware offload later. |
There was a problem hiding this comment.
I agree that option C is the obvious choice. Option A is going to encounter firm resistance upstream.
| stream is byte-identical to stunnel's and is accepted by the unchanged | ||
| destination stunnel. | ||
|
|
||
| * same handshake: TLS 1.2, cipher `ECDHE-RSA-AES256-GCM-SHA384`, ECDHE group |
There was a problem hiding this comment.
Perhaps not relevant to this PR, but why are we wasting time/effort with SHA384?
It's the SHA512 computation with the upper bits discarded at the end, so is strictly worse (security wise) without recovering any performance
| | idle | 185s | 163s | 1.13x | | ||
| | medium (windows apps) | 249s | 171s | 1.46x | | ||
| | high (synthetic page thrasher) | 1341s | 835s | 1.60x | | ||
|
|
There was a problem hiding this comment.
This has times, but no information about the OS or VM size. I presume Win11, but the VM size matters greatly for measurements like this
| * kTLS hardware-offload NICs could move the bulk encryption off the cpu and make | ||
| the transfer faster still. |
There was a problem hiding this comment.
Can you expand on this some more? What should we be looking out for?
| and is never more permissive. *) | ||
| ["--no-verify"] | ||
| | Some cfg -> | ||
| ["--ca"; cfg.Stunnel.cert_bundle_path] |
There was a problem hiding this comment.
Should this be able to use peer (pined) certificates as well as CA ones? This is probably going to be useful for cros-pool migrations
| let close_ignore fd = | ||
| Xapi_stdext_pervasives.Pervasiveext.ignore_exn (fun () -> Unix.close fd) | ||
|
|
||
| let finally = Xapi_stdext_pervasives.Pervasiveext.finally |
There was a problem hiding this comment.
This could be reframed to avoid indentation on use:
let protect ~finally protected = Xapi_stdext_pervasives.Pervasiveext.finally protected finally
let ( let@ ) f x = f xthen on use:
let@ () = protect ~finally:(fun () -> close_ignore sock_xenopsd) in
let received_fd =
try ...| stream is byte-identical to stunnel's and is accepted by the unchanged | ||
| destination stunnel. | ||
|
|
||
| * same handshake: TLS 1.2, cipher `ECDHE-RSA-AES256-GCM-SHA384`, ECDHE group |
There was a problem hiding this comment.
Do the allowed ciphers still come from the same single source of truth?
stunnel is an external process that provides secure TLS transport for the xenguest VM-migrate stream. It's a separate process from xenguest, and the dom0 kernel pipes the data between these processes, wasting cpu and memory throughput. The result is a slower VM-migrate experience, and consequently a slower host-evacuate experience for the user.
This change removes this inefficient data pipe between stunnel and xenguest, with a new option to replace stunnel's TLS with kernel TLS (kTLS). When this ktls option is enabled, the kernel transparently uses TLS to securely transport the data produced by xenguest, without the need to modify xenguest and without the need to use stunnel.
Measurements indicate significant improvements:
Activation uses a new xenopsd.conf flag, so the existing stunnel datapath remains the default.
Tested with: