Skip to content

xapi: report a clear error when host evacuation is blocked by unprotected VMs#7145

Open
olivierlambert wants to merge 1 commit into
xapi-project:masterfrom
olivierlambert:fix/ha-evacuate-clear-error
Open

xapi: report a clear error when host evacuation is blocked by unprotected VMs#7145
olivierlambert wants to merge 1 commit into
xapi-project:masterfrom
olivierlambert:fix/ha-evacuate-clear-error

Conversation

@olivierlambert

Copy link
Copy Markdown

Why

When HA is enabled on a pool, host.evacuate (and host.restart, which calls
it) refuses to plan the evacuation of any VM that is not HA-protected, that is,
whose ha_restart_priority is not restart. This is by design: the HA planner
only accounts for protected VMs, so unprotected ones are left out of the plan.

The problem is the error reported in that situation. compute_evacuation_plan_no_wlb
marked every unprotected VM with HOST_NOT_ENOUGH_FREE_MEMORY, and host.evacuate
then raised it. That error is misleading in two ways:

  • It blames free memory, so operators investigate RAM on the destination hosts
    (which is usually plentiful) instead of the real cause.
  • It is raised with a single parameter (the VM reference), while
    HOST_NOT_ENOUGH_FREE_MEMORY is documented as taking [needed; available].
    Clients such as Xen Orchestra therefore render the available memory as
    <unknown>.

This has confused users for years. See #4323 and the forum reports linked from
it: evacuation fails with HOST_NOT_ENOUGH_FREE_MEMORY even when the destination
has tens of GB free, purely because a resident VM is not HA-protected.

How

Introduce a dedicated error, HOST_EVACUATE_VM_NOT_HA_PROTECTED, carrying the VM
reference, and raise it instead for unprotected VMs. The message states the real
cause and the ways to resolve it (set the VM's ha_restart_priority to
restart, shut down or suspend the VM, or disable HA before evacuating the
host).

Behaviour is otherwise unchanged: the evacuation still fails for these VMs, it is
just reported correctly.

Files touched:

  • ocaml/xapi-consts/api_errors.ml: declare the error.
  • ocaml/idl/datamodel_errors.ml: register it with a description.
  • ocaml/xapi/xapi_host.ml: raise it for unprotected VMs in the evacuation plan.

Testing

Built and tested on CI (build and test, unit tests, format, CodeChecker and
ShellCheck all green on a fork branch). The change is a pure diagnostic
improvement with no change to which VMs can be evacuated, so no new automated
test is included.

Closes #4323

…cted VMs

When HA is enabled on the pool, host.evacuate (and host.restart, which
calls it) refuses to plan the evacuation of any VM that is not HA-protected,
that is, whose ha_restart_priority is not "restart". This is intentional:
the HA planner only accounts for protected VMs, so unprotected ones are
excluded from the evacuation plan.

The problem is the error reported in that case. compute_evacuation_plan_no_wlb
marked every unprotected VM with HOST_NOT_ENOUGH_FREE_MEMORY, and
host.evacuate then raised it. That error is misleading in two ways:

 * It blames free memory, so operators look at RAM on the destination hosts
   (which is usually plentiful) instead of the real cause.
 * It was raised with a single parameter (the VM reference) while
   HOST_NOT_ENOUGH_FREE_MEMORY is documented as taking [needed; available],
   so clients such as Xen Orchestra render the available memory as "<unknown>".

This has confused users for years (see issue xapi-project#4323 and the forum reports
linked from it): evacuation fails with HOST_NOT_ENOUGH_FREE_MEMORY even when
the destination has tens of GB free, purely because a VM is not HA-protected.

Introduce a dedicated error, HOST_EVACUATE_VM_NOT_HA_PROTECTED, carrying the
VM reference, and raise it instead for unprotected VMs. The message states
the real cause and the ways to resolve it (protect the VM, shut it down or
suspend it, or disable HA before evacuating). Behaviour is otherwise
unchanged: the evacuation still fails for these VMs, it is just reported
correctly.

Closes xapi-project#4323

Signed-off-by: Olivier Lambert <olivier.lambert@vates.tech>

@minglumlu minglumlu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to get rid of the long standing confusing error message.

gthvn1

This comment was marked as duplicate.

@gthvn1 gthvn1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I only don't understand what is the host.restart in the commit message. I don't see such API in XAPI. Is it something higher level? I would consider removing that reference.

Comment on lines +766 to +767
"The host cannot be evacuated because HA is enabled on the pool and the \
VM is not HA-protected (its ha_restart_priority is not set to \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The host cannot be evacuated because HA is enabled on the pool and the \
VM is not HA-protected (its ha_restart_priority is not set to \
"The host cannot be evacuated because HA is enabled on the pool and a VM \
running on it is not HA-protected (its ha_restart_priority is not set to \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mysterious failure in HA with HOST_NOT_ENOUGH_FREE_MEMORY

4 participants