Chore/remove cdi#2394
Draft
Isteb4k wants to merge 65 commits into
Draft
Conversation
# Conflicts: # build/components/versions.yml
…f config The `importPackageImages` template emits entries that belong to the `import:` section (they reference other images, not git sources). They were being appended directly after the `git:` block, producing config where each `- image:` / `before:` entry was parsed as an extra field of the git mount, causing werf to reject it with `unknown fields: image, before`. Co-authored-by: Cursor <cursoragent@cursor.com>
…ixture After the CDI-as-operator removal, the disk importer is provided by the standalone `virtual-disk-importer` image. Its camelCase name `virtualDiskImporter` is referenced from `templates/virtualization-controller/_helpers.tpl` via `helm_lib_module_image`, so the kubeconform fixture must define a digest for it. Without it, `helm template` fails with `Image virtualization.virtualDiskImporter has no digest`. Co-authored-by: Cursor <cursoragent@cursor.com>
…DI cleanup The CDI removal commit added a `replace` directive in images/virtualization-artifact/go.mod that pointed at a vendored copy of the upstream API package under images/cdi-artifact/containerized-data-importer/staging/..., but the follow-up cleanup removed that entire `staging/` tree. As a result `go mod download` (and every downstream `go build`/`go test`) failed with: open .../staging/.../containerized-data-importer-api/go.mod: no such file or directory. Drop the dangling local replace so we resolve kubevirt.io/containerized-data-importer-api v1.63.1 from the Go module proxy, and restore the `fmt` import in cmd/virtualization-controller that was accidentally dropped alongside the temporary debug print removed by the same commit (the file still uses fmt.Sprintf). Co-authored-by: Cursor <cursoragent@cursor.com>
The changes that removed CDI and refactored VirtualDisk data sources to
the step-based pattern left a number of issues that golangci-lint v2.11
catches. Fix them in one pass:
- Drop unused helpers in pkg/controller/vd/internal/source/sources.go
(setPhaseCondition*, getNodePlacement, retryPeriod,
setQuotaExceededPhaseCondition, isStorageClassWFFC, the unused
SupplementsCleaner / CleanUpSupplements / Cleaner). The step pattern
handles these cases internally now.
- Drop the unused DiskService.isImmediateBindingMode helper.
- Auto-fix the gci/gofumpt/QF1008 issues in service/errors.go,
storageprofile/storageprofile_controller.go, and the new VirtualDisk
source tests.
- Replace the deprecated reconcile.Result.Requeue with RequeueAfter in
Create{Importer,Uploader}Step, WaitForPVCImportStep, and the matching
*_test.go assertions.
- Simplify VirtualImage reconcilePVCImportFromDVCR (its bool-result was
always true) and inline the call sites in vi http/registry/upload.
- Cleanup gocritic findings in vd internal watchers (unlambda Pod
watcher map func, singleCaseSwitch -> if in PVC watcher).
- test/e2e: drop the dangling local replace of
containerized-data-importer-api (the staging tree was removed) so
typecheck succeeds, and lowercase a few error strings in observer
predicates flagged by ST1005.
Co-authored-by: Cursor <cursoragent@cursor.com>
…ub to drop CVEs Trivy reports five github.com/docker/docker CVEs against the cdi-importer binary (CVE-2026-34040, CVE-2026-33997, CVE-2026-41567, CVE-2026-42306, CVE-2026-41568). docker/docker is only pulled into CDI transitively via github.com/containers/image/v5/manifest, which uses exactly one symbol — github.com/docker/docker/api/types/versions — for manifest version comparison. Mirror the workaround already in place in deckhouse/3p-containerized-data-importer: keep a minimal staging/src/github.com/docker/docker stub that only provides api/types/versions and an empty registry package, and replace the upstream module with it. go mod tidy correspondingly prunes the otel/grpc/genproto/containerd transitive dependencies that were only brought in by the full docker/docker module. Co-authored-by: Cursor <cursoragent@cursor.com>
The mocks regenerated in the previous chat were produced with a different moq version that aliased k8s.io/api/storage/v1 as storagev1. moq v0.5.3 (pinned in Taskfile.init.yaml) leaves the import unaliased and uses *v1.StorageClass throughout, which is what the "check auto-generated files are up-to-date" CI step expects. Run `task controller:dev:gogenerate` and commit the resulting diff. Co-authored-by: Cursor <cursoragent@cursor.com>
Move the vendored CDI importer code from images/cdi-artifact/containerized-data-importer/ one level up to images/cdi-artifact/ to remove the redundant nested directory. The former unpack-bundle.sh and .gitignore tailored for the wrapper layout are no longer needed. Werf and mount-point paths are updated to point at the flattened tree. Co-authored-by: Cursor <cursoragent@cursor.com>
cdi-importer now emits kubevirt_cdi_import_progress_total covering
both internal phases: TransferScratch (download from DVCR to the
scratch PVC) maps to 0-49% and qemu-img Convert (scratch to target
PVC) maps to 50-100%.
The virtualization-controller picks up both registry_progress and
kubevirt_cdi_import_progress_total metrics and applies a per-source
scale when computing vd/vi progress:
- ObjectRef CVI/VI imports (cdi-importer is the sole pipeline)
surface raw 0-100%.
- HTTP, Registry and Upload imports (DVCR-fed) surface cdi-importer
progress as 50-100%, after the dvcr-importer/uploader phase has
already filled 0-50%.
WaitForPVCImportStep now requeues every 2 seconds while an import is
in progress so disk/image status reflects up-to-date percentages.
Co-authored-by: Cursor <cursoragent@cursor.com>
After the importer pod name was shortened to "d8v-<prefix>-importer-<uid>"
both the dvcr-importer Pod (started by importer_service.go for HTTP,
Registry and Upload sources) and the cdi-importer Pod (started by
disk_import_service.go to copy from DVCR to the target PVC) ended up
sharing the same name via sup.ImporterPod().
The collision caused EnsureSupplementPVCImport to fetch the already
Completed dvcr-importer Pod, treat it as the cdi-importer Pod, observe
phase Succeeded and call cleanupPVCImport on it. cleanupPVCImport
deletes the pod by name, but the dvcr-importer Pod still carries the
vi-/vd-protection finalizer, so it was stuck in Terminating and the
controller looped on "Waiting for supplements to be terminated".
Introduce a separate name template ("d8v-<prefix>-pvc-importer-<uid>")
and a new Generator.PVCImporterPod() helper. Switch the disk-import
service to use it so the two phases never share a Pod name.
Co-authored-by: Cursor <cursoragent@cursor.com>
DiskService.CleanUpSupplements built a bare target PVC (no annotations) and passed it to cleanupPVCImport. cleanupPVCImport read the pod name from target.Annotations[AnnPVCImportPod] and fell back to target.Name when the annotation was empty - i.e. it tried to delete a pod sharing the target PVC's name. The actual cdi-importer pod is named d8v-<prefix>-pvc-importer-<uid>, so the delete was a silent no-op and the pod survived VD cleanup. The orphan pod kept its volume mount on the target PVC, which prevented kubernetes.io/pvc-protection from finalising the PVC and in turn blocked the vd-cleanup finalizer, leaving deleted VirtualDisks stuck in Terminating forever. Thread the supplements.Generator through cleanupPVCImport and use sup.PVCImporterPod() as the fallback pod name so cleanup targets the correct pod whether or not the target PVC carries the AnnPVCImportPod annotation. Co-authored-by: Cursor <cursoragent@cursor.com>
…rvice.Protect
The OwnerReference and the {vd,vi}-protection finalizer that disk import PVCs
need to participate in the controller-driven cleanup must be present from the
moment the PVC is created. Previously the finalizer was added later via
DiskService.Protect, opening a window where a PVC could be deleted before the
controller had stamped its protection on it.
Move the finalizer next to the OwnerReference inside StartPVCImport,
StartSupplementPVCImport and makePVCCloneTarget so every disk import target PVC
carries them at creation. Drop DiskService.Protect entirely along with all of
its callers in the VD ready step and the VI source handlers (http, registry,
upload, object_ref*, sources). Regenerate the VD source mocks and update the
VD source unit tests so they stop wiring the now-removed ProtectFunc.
Co-authored-by: Cursor <cursoragent@cursor.com>
The dvcr-importer and uploader pods are already created with the resource-quota-overrides.deckhouse.io/ignore=true label so they don't get charged against namespace quotas. Apply the same label, at creation time, to the scratch PVC and the cdi-importer pod that DiskService spawns to populate target PVCs. Co-authored-by: Cursor <cursoragent@cursor.com>
Introduce PersistentVolumeClaimService as the single entry point for filling a target PVC. Its Import method picks the strategy (smart clone via VolumeSnapshot, CSI clone via dataSource, or host-assisted via cdi-importer pod) and provisions every helper resource (scratch PVC, cdi-importer pod, DVCR auth/CA copies, VolumeSnapshot, etc.) with ownerRef, finalizer, and the resource-quota-overrides.deckhouse.io/ignore label set at creation. DiskService is slimmed down: StartPVCImport/EnsurePVCImport and clone helpers are removed and replaced by disk.PersistentVolumeClaim().Import. The VirtualDisk reconciler step builds the target PVC descriptor and hands it off to the service; data sources (HTTP, Registry, Upload, ObjectRef CVI/VI) and the VirtualImage source plumbing are updated accordingly. Tests and mocks are regenerated to match the new contract. Co-authored-by: Cursor <cursoragent@cursor.com>
After the CDI removal, the kubevirt-operator ServiceAccount lost access to the cdi.internal.virtualization.deckhouse.io group, but the embedded KubeVirt install strategy still generates the virt-api and virt-controller ClusterRoles with CDI rules (cdi.kubevirt.io, rewritten to cdi.internal.virtualization.deckhouse.io). The API server's privilege-escalation guard therefore blocked virt-operator from creating those ClusterRoles, leaving the KubeVirt CR stuck in Deploying/Degraded and virt-api/virt-controller/virt-handler never deployed. Grant the operator ClusterRole '*'/'*' on cdi.internal.virtualization.deckhouse.io (consistent with the other *.internal.virtualization.deckhouse.io grants) so the KubeVirt control plane can be deployed again. Co-authored-by: Cursor <cursoragent@cursor.com>
Controller: - monitoring: raise the import-pod metrics scrape timeout so the controller reads the live progress metric instead of only learning the final value when the pod completes (progress jumping 0%->50%->100%); - uploader step: drop the redundant IsUploadStarted scrape that could bail out before reporting progress. E2E: - add the HaveTimelyProgress observer invariant for VirtualDisk and VirtualImage: while provisioning, progress must update at least every 10s, otherwise the test fails; - boot VirtualMachines from CDROM/ISO VirtualImages (and fall back to a VirtualDisk for non-CDROM images) to avoid the non-CDROM first-device admission webhook rejection. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
- percent: render 0% and 100% without a fraction and everything else with one decimal; emit "50.0%" at the DVCR->PVC hand-off instead of "50%" so the reported progress stops flip-flopping between "50%" and "50.0%". - e2e: HaveTimelyProgress now requires progress to advance at least every 10s, allowing only the 0%/50% stage boundaries to pause (up to a minute), and rejects stage jumps (0%->>=50%, 50%->100%) so progress must stream intermediate percentages instead of only 0%/50%/100%. Co-authored-by: Cursor <cursoragent@cursor.com>
Controller: - pin the import prime PVC (and importer pod) to the consuming VM's node so a WaitForFirstConsumer node-local volume is provisioned where the VM runs, fixing the hotplug pod's "PV node affinity doesn't match node" failure - always expose a progress percentage while a VirtualDisk/VirtualImage is Provisioning (default 0%), and force 0% for WaitForUserUpload - surface a target-PVC quota rejection on the Ready condition (Pending/ QuotaExceeded) instead of returning a reconciler error E2E: - model block-device tests around a WaitForFirstConsumer storage class for the scenario's main resources and an Immediate one for dependencies - split the combined precheck into wffc, immediate and same-CSI-driver prechecks - replace BeReadyOrWaitForFirstConsumer with a per-disk BeWaitForFirstConsumer (WFFC) / BeReady (Immediate) wait - split VirtualImageCreation DVCR/PVC specs and share dependencies via BeforeAll - assert StorageProfile deletion and add a HaveProgressWhileProvisioning invariant Co-authored-by: Cursor <cursoragent@cursor.com>
…eck rename Controller: - report import progress monotonically: an importer pod restart resets its kubevirt_cdi_import_progress_total metric to 0, which made progress jump backwards (e.g. 46.4% -> 0%); GetProgress now never returns a value lower than the one already published - surface a "namespace is terminating" create rejection (Project/namespace cleanup) on the VirtualDisk/VirtualImage Ready condition instead of failing the reconcile; add common.ErrNamespaceTerminating and unit tests E2E: - wire HaveTimelyProgress into the Upload VirtualDisk spec so all creation scenarios verify intermediate progress values - rename precheck labels to kebab-case abbreviations (wffc-sc-precheck, immediate-sc-precheck, same-csi-sc-precheck, different-csi-sc-precheck, default-sc-precheck, precreated-cvi-precheck, post-cleanup-precheck, vmclass-precheck) Co-authored-by: Cursor <cursoragent@cursor.com>
…, bounder quota exclude * normalizeProgress on VI/VD/CVI clears Status.Progress for ""/Pending so a source path that optimistically sets "0%" before deciding to advance the phase no longer surfaces as an intermediate "Pending 0%" status; VD's WaitForFirstConsumer keeps its in-flight progress (it is an intra-Provisioning pause, not a reset). * monitoring.GetProgressReportFromURL caps scrape attempts at 1.5s and stops silently swallowing timeouts, so a busy importer /metrics endpoint cannot freeze Status.Progress for the full 5s scrape budget. * bounder Pod gets the resource-quota-overrides.deckhouse.io/ignore label, matching the other importer/uploader pods, so the WFFC host-binder is no longer billed against the project quota. Co-authored-by: Cursor <cursoragent@cursor.com>
…gress predicates * Observer interface exposes InvariantViolated() and Stopped() channels; WaitFor now rechecks Err() after a satisfying event so an Always/Never violation that latched on the same event takes precedence over a successful predicate match. * StartObserver wrappers (vd, vi, cvi, vdsnapshot, vm) launch a watcher goroutine that calls Ginkgo's Fail with defer GinkgoRecover() the moment the first invariant fires, instead of letting the breach surface 30s+ later in DeferCleanup while WaitFor blocks on an unrelated condition. * HaveNoProgressBeforeProvisioning (VD, VI) flags any non-empty Status.Progress while the resource is still in ""/Pending, catching the "Pending 0%" intermediate state. * HaveTimelyProgress now also evaluates the stage-jump rule on entry to Ready, so a controller leap of 0% -> 100% across the Provisioning -> Ready boundary is no longer hidden by the predicate stopping its tracking one event too early. * Wire the new no-progress invariant into createVirtualImageAndWait / uploadVirtualImageAndWait and into startVirtualDisk / the VD upload spec. Co-authored-by: Cursor <cursoragent@cursor.com>
…al size A VirtualImage imported to a PVC never streamed intermediate progress: the controller hardcoded 0%/50% during provisioning and jumped straight to 100%, violating the e2e progress invariants. Refresh Status.Progress from the pvc-importer pod metric while the import is in flight (mirroring the VD controller), scaling into the 50..100 slice for the DVCR-sourced path. The VI-on-PVC -> DVCR path also under-reported the produced image size: it copied the source VI's nominal size instead of the importer's measured virtual size. A raw block-device source exposes the full, possibly over-allocated device size, so the converted image is larger than the nominal size and any downstream PVC sized from it failed with "virtual image size ... is larger than the reported available storage". Report the actual size via GetSize(pod) (as the VD path already does) and align image-derived required PVC sizes up to the block boundary, matching CDI's GetRequiredSpace. Co-authored-by: Cursor <cursoragent@cursor.com>
Validate blockdevice progress against source-specific coverage rules so regressions, invalid phases, and stale updates fail immediately. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Keep progress empty in Pending states and stream importer progress through VD/VI handlers so e2e observer expectations match controller behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
Keep VI provisioning progress below 100% until Ready and update e2e progress expectations to match the source/storage contract. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Create explicit PVC target creation paths for VD/VI flows and move population work into a dedicated PVC populator controller. Co-authored-by: Cursor <cursoragent@cursor.com>
Keep the PVC population changes lint-clean by applying gofumpt formatting and removing an unused e2e helper option. Co-authored-by: Cursor <cursoragent@cursor.com>
Use watch-based observers for populator PVC and Pod waits so failed states abort immediately, and let standalone PVC population own its helper resources directly. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Use a separate source and target importer for host-assigned PVC clones so SDS replicated PVC-to-PVC copies avoid the broken snapshot restore path while keeping cleanup and progress reporting stable. Co-authored-by: Cursor <cursoragent@cursor.com>
Move the SDS Replicated host-assisted override into StorageProfile status, allow controller metrics scrape on pvc-importer pods, and start host-assigned source/target importers in parallel via an NBD Service endpoint. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Replace StorageClass annotations with WFFC_STORAGE_CLASS and IMMEDIATE_STORAGE_CLASS, auto-detecting sibling classes from the default StorageClass when the env vars are unset. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Why do we need it, and what problem does it solve?
What is the expected result?
Checklist
Changelog entries