Skip to content

fix(verifier,vmm): resolve OVMF variant from metadata.json before parsing image name#693

Draft
Leechael wants to merge 1 commit into
Dstack-TEE:masterfrom
Leechael:fix/ovmf-variant-derive-from-version
Draft

fix(verifier,vmm): resolve OVMF variant from metadata.json before parsing image name#693
Leechael wants to merge 1 commit into
Dstack-TEE:masterfrom
Leechael:fix/ovmf-variant-derive-from-version

Conversation

@Leechael
Copy link
Copy Markdown
Collaborator

Summary

When vm_config.ovmf_variant is absent (any deployment persisted by VMM ≤ 0.5.11-pre), the verifier currently jumps straight to parsing the image directory name via extract_version_from_image_name. That parser only accepts dstack-X.Y.Z[.SUFFIX], so it returns None for the real meta-dstack release convention dstack-X.Y.Z-<HEXHASH> (rsplit on - lands on the hash segment, not the version). The default OvmfVariant::Pre202505 then mismatches the Stable202505 firmware shipped by 0.5.10+ images, producing an RTMR0 mismatch during KMS onboarding.

This PR closes the gap by reading the version straight from metadata.json, which both the verifier (ensure_image_downloaded) and the VMM (Image::load) already load:

  • vmm/src/app.rs: in make_vm_config, derive ovmf_variant from image.info.version when metadata.json does not declare it explicitly. The resulting vm_config becomes an authoritative source of truth for new deployments.
  • verifier/src/verification.rs: thread an image_ovmf_variant (resolved from metadata's explicit field or its version string) through ImagePaths into compute_measurement_details, inserting it as a middle priority between vm_config.ovmf_variant and the legacy image-name fallback. The image-name fallback remains for very old metadata.json files that predate the version field.
  • MEASUREMENT_CACHE_VERSION: bumped 2 → 3 so entries written with the old resolution order get invalidated.

The extract_version_from_image_name parser is intentionally left as-is — it is no longer on the hot path for any image whose metadata.json carries version.

Repro

Onboarding a fresh KMS node:

  • VMM v0.5.10-28-g53e217cae7 (no ovmf_variant field in make_vm_config)
  • OS image: dstack-0.5.10-4c9bd024/ (real release naming)
  • metadata.json.version = "0.5.10", no ovmf_variant field
  • KMS verifying the request: dstacktee/dstack-kms:0.5.11
Failed to onboard: Request failed with status=400 Bad Request, error={
  "error": "Failed to verify os image hash: Failed to verify os image hash: MRs do not match: RTMR0 mismatch: expected=68102e7b...875c, actual=d357b91a...875c"
}

expected is the verifier's locally computed value using Pre202505 (wrong). actual is the value the firmware actually extended into RTMR0 using the Stable202505 event layout. With this PR, the verifier resolves variant from metadata.json.version = "0.5.10"Stable202505, matching the quote.

Why three layers instead of just fixing the parser

Two layers are about making vm_config.ovmf_variant and metadata.json the source of truth. The image-name parser remains as a last-resort fallback, but it no longer needs to handle the -<HEXHASH> suffix because metadata.json answers the question more reliably and is always present alongside the image being measured. Extending the parser would re-encode information that is already structured elsewhere.

Test plan

  • CI builds dstack-kms and dstack-vmm with dstack-mr added as a vmm dep
  • Re-run KMS onboard against a CVM running OS dstack-0.5.10-<hash> with this build of KMS → RTMR0 matches
  • Existing dstack-mr / dstack-types unit tests still pass (no behavioural change to the parser)
  • Confirm measurement cache entries get re-computed on first hit after the version bump (cache key includes vm_config; new entries written with v3, old v2 entries ignored)

Out of scope (intentionally)

  • Fixing extract_version_from_image_name to recognise -<HEXHASH>. Skipped — metadata.json supersedes it on the resolution chain.
  • Backfilling ovmf_variant into the released dstack-0.5.10 metadata.json. Future meta-dstack releases can declare it explicitly; existing images are covered by the version derivation path.

Note

I could not run cargo check locally (macOS Xcode license blocked all native build scripts on this machine), so the build verification is delegated to CI. prek hooks pass.

…sing image name

PR Dstack-TEE#678 added the OvmfVariant dispatch so verifiers can pick between the
pre-edk2-stable202505 (13-event) and stable202505 (17-event) RTMR[0] layouts,
but the resolution chain had a gap: when `vm_config.ovmf_variant` is `None`
(any deployment persisted by VMM <= 0.5.11-pre), the verifier jumped straight
to parsing `vm_config.image` with `extract_version_from_image_name`, which
returns `None` for the real meta-dstack release naming convention
`dstack-X.Y.Z-<HEXHASH>` (the `rsplit('-')` tail is the git-hash segment, not
the version). The default fallback `OvmfVariant::Pre202505` then mismatches
the actual stable202505 firmware shipped by 0.5.10+ images, producing the
RTMR0 mismatch seen during KMS onboarding against a prod CVM running a
dstack-0.5.10-<hash> image.

Close the gap by reading the version straight from metadata.json, which the
verifier already loads inside `ensure_image_downloaded` and the VMM already
loads inside `Image::load`:

- vmm: in `make_vm_config`, derive `ovmf_variant` from `image.info.version`
  when metadata.json does not declare it explicitly. The resulting vm_config
  is the explicit source of truth for new deployments.
- verifier: thread an `image_ovmf_variant` (resolved from metadata.json's
  explicit field or its `version` string) through `ImagePaths` and into
  `compute_measurement_details`, inserting it as a middle priority between
  `vm_config.ovmf_variant` and the image-name fallback. The image-name
  fallback remains in place for legacy metadata.json files that predate the
  `version` field.
- Bump MEASUREMENT_CACHE_VERSION to 3 so entries written with the old
  resolution order (which may have cached the wrong variant) get ignored.

The image-name parser (`extract_version_from_image_name`) is left as-is and
keeps its `dstack-X.Y.Z[.SUFFIX]` shape requirement — it's no longer on the
hot path for any image whose metadata.json carries `version`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant