fix(verifier,vmm): resolve OVMF variant from metadata.json before parsing image name#693
Draft
Leechael wants to merge 1 commit into
Draft
Conversation
…sing image name PR Dstack-TEE#678 added the OvmfVariant dispatch so verifiers can pick between the pre-edk2-stable202505 (13-event) and stable202505 (17-event) RTMR[0] layouts, but the resolution chain had a gap: when `vm_config.ovmf_variant` is `None` (any deployment persisted by VMM <= 0.5.11-pre), the verifier jumped straight to parsing `vm_config.image` with `extract_version_from_image_name`, which returns `None` for the real meta-dstack release naming convention `dstack-X.Y.Z-<HEXHASH>` (the `rsplit('-')` tail is the git-hash segment, not the version). The default fallback `OvmfVariant::Pre202505` then mismatches the actual stable202505 firmware shipped by 0.5.10+ images, producing the RTMR0 mismatch seen during KMS onboarding against a prod CVM running a dstack-0.5.10-<hash> image. Close the gap by reading the version straight from metadata.json, which the verifier already loads inside `ensure_image_downloaded` and the VMM already loads inside `Image::load`: - vmm: in `make_vm_config`, derive `ovmf_variant` from `image.info.version` when metadata.json does not declare it explicitly. The resulting vm_config is the explicit source of truth for new deployments. - verifier: thread an `image_ovmf_variant` (resolved from metadata.json's explicit field or its `version` string) through `ImagePaths` and into `compute_measurement_details`, inserting it as a middle priority between `vm_config.ovmf_variant` and the image-name fallback. The image-name fallback remains in place for legacy metadata.json files that predate the `version` field. - Bump MEASUREMENT_CACHE_VERSION to 3 so entries written with the old resolution order (which may have cached the wrong variant) get ignored. The image-name parser (`extract_version_from_image_name`) is left as-is and keeps its `dstack-X.Y.Z[.SUFFIX]` shape requirement — it's no longer on the hot path for any image whose metadata.json carries `version`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When
vm_config.ovmf_variantis absent (any deployment persisted by VMM ≤ 0.5.11-pre), the verifier currently jumps straight to parsing the image directory name viaextract_version_from_image_name. That parser only acceptsdstack-X.Y.Z[.SUFFIX], so it returnsNonefor the real meta-dstack release conventiondstack-X.Y.Z-<HEXHASH>(rsplit on-lands on the hash segment, not the version). The defaultOvmfVariant::Pre202505then mismatches theStable202505firmware shipped by 0.5.10+ images, producing an RTMR0 mismatch during KMS onboarding.This PR closes the gap by reading the version straight from
metadata.json, which both the verifier (ensure_image_downloaded) and the VMM (Image::load) already load:make_vm_config, deriveovmf_variantfromimage.info.versionwhenmetadata.jsondoes not declare it explicitly. The resultingvm_configbecomes an authoritative source of truth for new deployments.image_ovmf_variant(resolved from metadata's explicit field or itsversionstring) throughImagePathsintocompute_measurement_details, inserting it as a middle priority betweenvm_config.ovmf_variantand the legacy image-name fallback. The image-name fallback remains for very old metadata.json files that predate theversionfield.The
extract_version_from_image_nameparser is intentionally left as-is — it is no longer on the hot path for any image whose metadata.json carriesversion.Repro
Onboarding a fresh KMS node:
v0.5.10-28-g53e217cae7(noovmf_variantfield inmake_vm_config)dstack-0.5.10-4c9bd024/(real release naming)metadata.json.version = "0.5.10", noovmf_variantfielddstacktee/dstack-kms:0.5.11expectedis the verifier's locally computed value usingPre202505(wrong).actualis the value the firmware actually extended into RTMR0 using theStable202505event layout. With this PR, the verifier resolves variant frommetadata.json.version = "0.5.10"→Stable202505, matching the quote.Why three layers instead of just fixing the parser
Two layers are about making
vm_config.ovmf_variantandmetadata.jsonthe source of truth. The image-name parser remains as a last-resort fallback, but it no longer needs to handle the-<HEXHASH>suffix because metadata.json answers the question more reliably and is always present alongside the image being measured. Extending the parser would re-encode information that is already structured elsewhere.Test plan
dstack-mradded as a vmm depdstack-0.5.10-<hash>with this build of KMS → RTMR0 matchesOut of scope (intentionally)
extract_version_from_image_nameto recognise-<HEXHASH>. Skipped — metadata.json supersedes it on the resolution chain.ovmf_variantinto the releaseddstack-0.5.10metadata.json. Future meta-dstack releases can declare it explicitly; existing images are covered by theversionderivation path.Note
I could not run
cargo checklocally (macOS Xcode license blocked all native build scripts on this machine), so the build verification is delegated to CI.prekhooks pass.