Skip to content

STOR-2798: add e2e tests for GCP PD CSI Volume Snapshot type images#31050

Open
radeore wants to merge 1 commit intoopenshift:mainfrom
radeore:gcp-imagesnapshot
Open

STOR-2798: add e2e tests for GCP PD CSI Volume Snapshot type images#31050
radeore wants to merge 1 commit intoopenshift:mainfrom
radeore:gcp-imagesnapshot

Conversation

@radeore
Copy link
Copy Markdown
Contributor

@radeore radeore commented Apr 21, 2026

Feature: https://redhat.atlassian.net/browse/STOR-2793

Summary:

Adds extended tests under test/extended/storage/gcp_csi.go for GCP PD CSI image snapshots (VolumeSnapshotClass with snapshot-type: images): operator VolumeSnapshotClass checks, filesystem/block restore flows using a test-created class where needed, and multi–point-in-time snapshots when the operator class exists.

Test results:

passed: (2m26s) 2026-04-21T21:21:18 "[sig-storage][Feature:VolumeSnapshotDataSource][Driver:pd.csi.storage.gke.io] GCP PD CSI image volumesnapshot tests should support multiple point-in-time image snapshots with independent restores using same source PVC [Suite:openshift/conformance/parallel]"

passed: (1m14s) 2026-04-21T21:20:07 "[sig-storage][Feature:VolumeSnapshotDataSource][Driver:pd.csi.storage.gke.io] GCP PD CSI image volumesnapshot tests should create an image snapshot from RWO filesystem PVC and restore data [Suite:openshift/conformance/parallel]"

passed: (1m6s) 2026-04-21T21:19:59 "[sig-storage][Feature:VolumeSnapshotDataSource][Driver:pd.csi.storage.gke.io] GCP PD CSI image volumesnapshot tests should create an image snapshot from RWO block PVC and restore data [Suite:openshift/conformance/parallel]"

passed: (2.2s) 2026-04-21T21:18:55 "[sig-storage][Feature:VolumeSnapshotDataSource][Driver:pd.csi.storage.gke.io] GCP PD CSI image volumesnapshot tests should expose a VolumeSnapshotClass for snapshot-type images that is not the default [Suite:openshift/conformance/parallel]"

Summary by CodeRabbit

  • Tests
    • Improved storage test helpers to make PVC/pod/snapshot setup more flexible and reusable.
    • Added end-to-end tests for GCP PD CSI image snapshots: verifies snapshot creation, readiness, restore from snapshots, multiple point-in-time snapshots, and data-consistency checks for block and filesystem workloads.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 21, 2026

@radeore: This pull request references STOR-2798 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "5.0." or "openshift-5.0.", but it targets "openshift-4.22" instead.

Details

In response to this:

Feature: https://redhat.atlassian.net/browse/STOR-2793

Summary:

Adds extended tests under test/extended/storage/gcp_csi.go for GCP PD CSI image snapshots (VolumeSnapshotClass with snapshot-type: images): operator VolumeSnapshotClass checks, filesystem/block restore flows using a test-created class where needed, and multi–point-in-time snapshots when the operator class exists.

Test results:

passed: (2m26s) 2026-04-21T21:21:18 "[sig-storage][Feature:VolumeSnapshotDataSource][Driver:pd.csi.storage.gke.io] GCP PD CSI image volumesnapshot tests should support multiple point-in-time image snapshots with independent restores using same source PVC [Suite:openshift/conformance/parallel]"

passed: (1m14s) 2026-04-21T21:20:07 "[sig-storage][Feature:VolumeSnapshotDataSource][Driver:pd.csi.storage.gke.io] GCP PD CSI image volumesnapshot tests should create an image snapshot from RWO filesystem PVC and restore data [Suite:openshift/conformance/parallel]"

passed: (1m6s) 2026-04-21T21:19:59 "[sig-storage][Feature:VolumeSnapshotDataSource][Driver:pd.csi.storage.gke.io] GCP PD CSI image volumesnapshot tests should create an image snapshot from RWO block PVC and restore data [Suite:openshift/conformance/parallel]"

passed: (2.2s) 2026-04-21T21:18:55 "[sig-storage][Feature:VolumeSnapshotDataSource][Driver:pd.csi.storage.gke.io] GCP PD CSI image volumesnapshot tests should expose a VolumeSnapshotClass for snapshot-type images that is not the default [Suite:openshift/conformance/parallel]"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from gnufied and tsmetana April 21, 2026 21:34
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 7d8c56ff-c94a-4801-a29a-78d116a5f076

📥 Commits

Reviewing files that changed from the base of the PR and between fb81744 and 5d3683a.

📒 Files selected for processing (2)
  • test/extended/storage/driver_configuration.go
  • test/extended/storage/gcp_csi.go
✅ Files skipped from review due to trivial changes (1)
  • test/extended/storage/gcp_csi.go

Walkthrough

Added a new GCP PD CSI image VolumeSnapshot E2E test file and refactored storage test helpers to accept pod specs, variadic PVC mutators, and optional snapshot class names for conditional snapshot YAML generation.

Changes

Cohort / File(s) Summary
Helper function refactor
test/extended/storage/driver_configuration.go
Refactored pod creation to use *v1.Pod via new createTestPodFromSpec; createTestPVC now accepts variadic mutators ...func(*v1.PersistentVolumeClaim); createSnapshot accepts optional volumeSnapshotClassName ...string and conditionally includes spec.volumeSnapshotClassName in generated YAML. Pay attention to PVC mutator application and conditional YAML concatenation.
New GCP CSI test suite
test/extended/storage/gcp_csi.go
Added comprehensive Ginkgo E2E tests for GCP PD CSI image snapshots: snapshot class checks, RWO filesystem and block snapshot/restore flows, multi-point-in-time snapshots, and helper functions for snapshot class lifecycle, pod/PVC creation, snapshot readiness polling, content verification, and cleanup. Large new file — review test flow, waiting/polling logic, and resource teardown.

Sequence Diagram

sequenceDiagram
    participant Test as Test Framework
    participant K8s as Kubernetes API
    participant CSI as CSI Driver (pd.csi.storage.gke.io)
    participant Storage as GCP PD
    participant Pod as Reader/Writer Pod

    Test->>K8s: Create PVC (RWO)
    K8s->>CSI: Provision Volume
    CSI->>Storage: Create Disk
    Storage-->>CSI: Disk Ready
    CSI-->>K8s: PVC Bound

    Test->>Pod: Create Writer Pod
    Pod->>Pod: Write test data
    Pod-->>Test: Writer complete

    Test->>K8s: Create VolumeSnapshot (snapshot-type: images)
    K8s->>CSI: Request Snapshot
    CSI->>Storage: Create Image Snapshot
    Storage-->>CSI: Snapshot Created
    CSI-->>K8s: Snapshot readyToUse

    Test->>K8s: Create PVC from VolumeSnapshot
    K8s->>CSI: Provision Volume from Snapshot
    CSI->>Storage: Restore Disk from Snapshot
    Storage-->>CSI: Restored
    CSI-->>K8s: Restored PVC Bound

    Test->>Pod: Create Reader Pod
    Pod->>Pod: Validate data
    Pod-->>Test: Validation complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Microshift Test Compatibility ⚠️ Warning Test uses configv1.ClusterVersionCapabilityStorage from config.openshift.io/v1 API, which is unavailable on MicroShift and lacks explicit compatibility protection. Add [apigroup:config.openshift.io] tag to the Describe statement to enable automatic test filtering by MicroShift CI jobs.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the primary change: adding end-to-end tests for GCP PD CSI Volume Snapshot image functionality, which aligns with the main purpose of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All Ginkgo test titles in the new test file are stable and deterministic, containing only static descriptive strings without dynamic information.
Test Structure And Quality ✅ Passed The gcp_csi.go test file demonstrates good Ginkgo test quality: single responsibilities, proper setup/cleanup, explicit timeouts, meaningful assertions, and established repository patterns.
Single Node Openshift (Sno) Test Compatibility ✅ Passed GCP CSI tests use ReadWriteOnce storage and run pods sequentially on the same node without multi-node assumptions.
Topology-Aware Scheduling Compatibility ✅ Passed Test files contain no topology-aware scheduling constraints like affinity rules, node selectors, or topology spread constraints affecting OpenShift deployments.
Ote Binary Stdout Contract ✅ Passed The new gcp_csi.go file contains no process-level stdout writes; all potentially stdout-producing code occurs only within test blocks.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The test file contains no IPv4-specific assumptions, hardcoded addresses, external connectivity requirements, or IP family dependencies that would prevent execution in IPv6-only disconnected environments.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (8)
test/extended/storage/driver_configuration.go (1)

462-479: Variadic volumeSnapshotClassName semantics are permissive — consider documenting / narrowing.

Only volumeSnapshotClassName[0] is ever consulted, so passing two or more names silently ignores the rest. Since this is an internal helper, that's acceptable, but the current signature invites a caller to assume list semantics. Either tighten the doc comment to state "at most one is honored" or narrow to a single optional *string / separate helper.

Optional refactor
-// createSnapshot creates a VolumeSnapshot for pvcName. Optional volumeSnapshotClassName selects
-// a non-default VolumeSnapshotClass when the final argument is passed
+// createSnapshot creates a VolumeSnapshot for pvcName. At most one volumeSnapshotClassName
+// may be supplied; when non-empty, it selects a non-default VolumeSnapshotClass. Additional
+// values are ignored.
 func createSnapshot(oc *exutil.CLI, namespace string, snapshotName string, pvcName string, volumeSnapshotClassName ...string) error {

Also worth confirming: if an embedded %s (namespace/name/class) ever contains YAML-breaking characters (e.g. : or leading whitespace), the hand-rolled YAML would silently produce an invalid manifest. Not a concern today because all callers use sanitized identifiers, but a comment or quoting pass would make the helper more defensive.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/storage/driver_configuration.go` around lines 462 - 479, The
createSnapshot helper currently accepts variadic volumeSnapshotClassName but
only uses volumeSnapshotClassName[0], which can mislead callers; either update
the doc comment on createSnapshot to state explicitly that at most one
volumeSnapshotClassName is honored, or change the signature to accept a single
optional value (e.g., a *string) so callers cannot pass multiple names. While
editing, also make the YAML assembly in createSnapshot more defensive: ensure
the injected values (snapshotName, namespace, pvcName, volumeSnapshotClassName)
are safely quoted or validated to avoid YAML-breaking characters, or add a short
comment noting callers must supply sanitized identifiers.
test/extended/storage/gcp_csi.go (7)

53-56: dc hoisted to outer scope but assigned only inside individual Its.

dc is declared at the Describe scope but reassigned per-It (lines 71, 118, 187) and the multi-snapshot It never uses it at all. Since there is no shared state between Its and BeforeEach does not initialise it, this outer-scope var is misleading — each test could just declare its own local dc. Minor cleanup.

Suggested refactor
 	var (
 		oc = exutil.NewCLIWithPodSecurityLevel("gcp-snapshot-images", admissionapi.LevelPrivileged)
-		dc dynamic.Interface
 	)

…and declare dc := oc.AdminDynamicClient() locally inside each It that needs it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/storage/gcp_csi.go` around lines 53 - 56, The outer-scope
variable dc is misleading because it’s declared at Describe scope but only
assigned inside individual Its (and one It never uses it); remove the hoisted
declaration and instead declare a local dynamic client inside each test that
needs it by using dc := oc.AdminDynamicClient() within the respective It blocks
(the Its that currently reassign dc), leaving the multi-snapshot It unchanged
since it does not use dc.

139-142: Defer-based pod cleanup captures ctx at defer-registration time.

defer oc.AdminKubeClient().CoreV1().Pods(ns).Delete(ctx, writerPod.Name, metav1.DeleteOptions{}) evaluates ctx now but runs the Delete at function return. If the test's SpecContext is already cancelled by the time cleanup runs (e.g. on test timeout), the pod is leaked until the namespace GC runs. Same pattern appears at lines 171, 212, 238, 322. Consider using a fresh background context for deletion, or switch cleanup to g.DeferCleanup:

Suggested pattern
-	defer oc.AdminKubeClient().CoreV1().Pods(ns).Delete(ctx, writerPod.Name, metav1.DeleteOptions{})
+	g.DeferCleanup(func(cleanupCtx g.SpecContext) {
+		_ = oc.AdminKubeClient().CoreV1().Pods(ns).Delete(cleanupCtx, writerPod.Name, metav1.DeleteOptions{})
+	})

Non-blocking — namespace teardown will eventually reap the pods — but worth considering for cleaner cleanup semantics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/storage/gcp_csi.go` around lines 139 - 142, The deferred pod
deletion captures the current test ctx which may be cancelled by the time the
defer runs (causing leaked pods); update the defer calls that call
oc.AdminKubeClient().CoreV1().Pods(ns).Delete(ctx, writerPod.Name,
metav1.DeleteOptions{}) (and the similar deferred deletes at the other
locations) to use a fresh background context (e.g., context.Background()) when
calling Delete or register cleanup with the test harness (e.g., g.DeferCleanup)
instead of defer so deletion runs with a valid context; search for writerPod /
createTestPodFromSpec and replace the defer Delete invocations accordingly.

73-78: Redundant error handling around the operator VSC Get.

e2e.Failf panics via Ginkgo, so execution stops there; the subsequent o.Expect(err).NotTo(o.HaveOccurred()) is only reached on the non-NotFound error path — at which point the earlier branch did nothing. The explicit NotFound branch adds a clearer message but the flow is a little tangled. Either drop the NotFound branch (the Expect covers it) or use g.Skip / Failf with a single exit path.

Minor simplification
-	imagesVSC, err := dc.Resource(gcpSnapshotClassGVR).Get(ctx, gcpPDImagesVolumeSnapshotClass, metav1.GetOptions{})
-	if apierrors.IsNotFound(err) {
-		e2e.Failf("CSI driver generated VolumeSnapshotClass %q must exist", gcpPDImagesVolumeSnapshotClass)
-	}
-	o.Expect(err).NotTo(o.HaveOccurred())
+	imagesVSC, err := dc.Resource(gcpSnapshotClassGVR).Get(ctx, gcpPDImagesVolumeSnapshotClass, metav1.GetOptions{})
+	o.Expect(err).NotTo(o.HaveOccurred(), "CSI-driver-generated VolumeSnapshotClass %q must exist", gcpPDImagesVolumeSnapshotClass)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/storage/gcp_csi.go` around lines 73 - 78, The NotFound branch
around the operator VolumeSnapshotClass lookup creates redundant control flow:
locate the dc.Resource(gcpSnapshotClassGVR).Get call that assigns imagesVSC, err
and remove the explicit apierrors.IsNotFound(err) { e2e.Failf(...) } branch so
error handling is unified; instead rely on the existing
o.Expect(err).NotTo(o.HaveOccurred()) (or replace it with a single g.Fail/g.Skip
plus return if you want an early exit) and keep g.By and the Get invocation
intact, ensuring any failure message includes gcpPDImagesVolumeSnapshotClass for
clarity.

554-574: Snapshot / snapshot-content deletion waits use a non-image timeout.

The file comment on gcpSnapshotPollTimeout (line 39) explicitly calls out that "image snapshots can take longer to reach readyToUse". The deletion side shares the same characteristic (image backing artifacts can take a while to tear down) but these waiters use e2e.SnapshotDeleteTimeout, which may be too short for image-type snapshots and could produce flaky failures on slow GCE projects. Consider using gcpSnapshotPollTimeout (or a dedicated gcpSnapshotDeleteTimeout) here too.

Also: o.Expect(err).NotTo(o.HaveOccurred()) inside Eventually aborts the entire test on any transient API error rather than letting the poll retry — preferable to return false, err-style handling via wait.PollUntilContextTimeout, or to at least swallow-and-retry transient errors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/storage/gcp_csi.go` around lines 554 - 574, The deletion
waiters waitUntilVolumeSnapshotDeleted and waitUntilSnapshotContentDeleted
currently use e2e.SnapshotDeleteTimeout and call
o.Expect(err).NotTo(o.HaveOccurred()) inside the Eventually, which aborts on
transient API errors and may time out too quickly for image-backed snapshots;
update both functions to use the longer gcpSnapshotPollTimeout (or introduce
gcpSnapshotDeleteTimeout) instead of e2e.SnapshotDeleteTimeout and change the
polling closure to treat apierrors.IsNotFound(err) as success, return false on
transient errors (do not call o.Expect inside the closure) so the poll can
retry, and only fail the test after the Eventually times out.

70-114: Consider making "images class is not default" assertion scope match the claim.

Line 113 checks that defaultForGCPDriver (default classes for the GCP PD driver) does not contain the images class — this is correct. But line 112 requires some default to exist for the driver. If an environment temporarily has no default VSC (e.g., during an upgrade window), this test fails in a way that is not really about the images class itself. If the intent is only to verify "the images class is not default", the first assertion could be softened to o.Expect(defaultForGCPDriver).NotTo(o.ContainElement(gcpPDImagesVolumeSnapshotClass)) alone. Up to you — flagging for consideration, not blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/storage/gcp_csi.go` around lines 70 - 114, The test currently
asserts both that there is at least one default VolumeSnapshotClass for the GCP
PD driver (variable defaultForGCPDriver) and that gcpPDImagesVolumeSnapshotClass
is not among them; relax the scope to only verify the images class is not the
default by removing or weakening the assertion that defaultForGCPDriver is
non-empty (remove the o.Expect(defaultForGCPDriver).NotTo(o.BeEmpty(...)) check
and keep only
o.Expect(defaultForGCPDriver).NotTo(o.ContainElement(gcpPDImagesVolumeSnapshotClass)));
this keeps the intent of ensuring the images class is not marked default while
avoiding false failures when no default VSC exists for the driver.

116-245: Inconsistent cleanup between filesystem and block tests.

The filesystem test (lines 179-182) explicitly deletes the snapshot and waits for VolumeSnapshotContent removal, exercising the Delete deletion policy end-to-end. The block test ends at reader-pod success (line 244) and leaves snapshot/content teardown entirely to the deferred deleteVolumeSnapshot, which does not wait for VolumeSnapshotContent GC. If the intent is deliberate (to keep the block test focused), a short comment would help future readers. If not, mirroring the FS flow would give equal coverage for block snapshots.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/storage/gcp_csi.go` around lines 116 - 245, The block test
omits explicit snapshot/content deletion and GC waiting—mirror the filesystem
test by resolving the bound content name with snapshotBoundContentName(ctx, dc,
snapName), then explicitly delete the VolumeSnapshot using
oc.AsAdmin().Run("delete").Args("volumesnapshot", snapName, "-n", ns).Execute()
and call waitUntilVolumeSnapshotDeleted(ctx, dc, ns, snapName) and
waitUntilSnapshotContentDeleted(ctx, dc, contentName); place this after
waitPodSucceeded in the block test to provide the same end-to-end deletion
verification as in the filesystem test.

492-502: Fetch failed-pod logs as admin and include container name.

On failure, oc.Run("logs").Args(name).Output() runs as the test user (not admin) and without a -c <container> argument. For the block pods the container is blk, writer/reader are write/read; a multi-container pod or RBAC-restricted project will either produce ambiguous selection or an empty log. Using admin and being explicit makes triage of CI failures much easier.

Suggested tweak
-		if p.Status.Phase == v1.PodFailed {
-			logs, _ := oc.Run("logs").Args(name).Output()
-			e2e.Failf("pod %s/%s failed: reason=%q message=%q logs=%s", ns, name, p.Status.Reason, p.Status.Message, logs)
-		}
+		if p.Status.Phase == v1.PodFailed {
+			logs, _ := oc.AsAdmin().Run("logs").Args(name, "-n", ns, "--all-containers=true").Output()
+			e2e.Failf("pod %s/%s failed: reason=%q message=%q logs=%s", ns, name, p.Status.Reason, p.Status.Message, logs)
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/storage/gcp_csi.go` around lines 492 - 502, When handling a
failed pod in waitPodSucceeded, fetch logs as admin and specify the container
name: replace the oc.Run("logs").Args(name).Output() call with an admin logs
call (use oc.AsAdmin().Run("logs")) and pass a "-c <container>" arg; determine
<container> by inspecting the pod's spec (p.Spec.Containers) and prefer known
names ("blk" for block pods, "write"/"read" for writer/reader pods) and
otherwise fall back to p.Spec.Containers[0].Name so you always supply a concrete
container to the admin logs command before including logs in the e2e.Failf
message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/extended/storage/gcp_csi.go`:
- Around line 504-513: The waitUntilPodRemoved helper currently swallows all
non-NotFound errors by returning false, nil; update the
oc.AdminKubeClient().CoreV1().Pods(ns).Get(...) error handling so that if err is
IsNotFound you return true, nil, if err != nil you return false, err (so the
wait aborts and surfaces the API error), and otherwise return false, nil when
the Pod still exists; keep the surrounding wait.PollUntilContextTimeout call
intact.
- Around line 247-307: This test relies on the operator-installed snapshot class
gcpPDImagesVolumeSnapshotClass, causing flakiness if that VSC isn't present;
either modify this It block to create a per-test VSC using the same helper used
by other tests (createImagesVolumeSnapshotClass) and use that returned class
name when calling createSnapshot, or add an explicit precondition at the start
of the It that queries for gcpPDImagesVolumeSnapshotClass and calls g.Skip or
o.Expect/Failf with a clear message if it is absent; update references to
createSnapshot and waitForSnapshotReady to use the chosen VSC variable.

---

Nitpick comments:
In `@test/extended/storage/driver_configuration.go`:
- Around line 462-479: The createSnapshot helper currently accepts variadic
volumeSnapshotClassName but only uses volumeSnapshotClassName[0], which can
mislead callers; either update the doc comment on createSnapshot to state
explicitly that at most one volumeSnapshotClassName is honored, or change the
signature to accept a single optional value (e.g., a *string) so callers cannot
pass multiple names. While editing, also make the YAML assembly in
createSnapshot more defensive: ensure the injected values (snapshotName,
namespace, pvcName, volumeSnapshotClassName) are safely quoted or validated to
avoid YAML-breaking characters, or add a short comment noting callers must
supply sanitized identifiers.

In `@test/extended/storage/gcp_csi.go`:
- Around line 53-56: The outer-scope variable dc is misleading because it’s
declared at Describe scope but only assigned inside individual Its (and one It
never uses it); remove the hoisted declaration and instead declare a local
dynamic client inside each test that needs it by using dc :=
oc.AdminDynamicClient() within the respective It blocks (the Its that currently
reassign dc), leaving the multi-snapshot It unchanged since it does not use dc.
- Around line 139-142: The deferred pod deletion captures the current test ctx
which may be cancelled by the time the defer runs (causing leaked pods); update
the defer calls that call oc.AdminKubeClient().CoreV1().Pods(ns).Delete(ctx,
writerPod.Name, metav1.DeleteOptions{}) (and the similar deferred deletes at the
other locations) to use a fresh background context (e.g., context.Background())
when calling Delete or register cleanup with the test harness (e.g.,
g.DeferCleanup) instead of defer so deletion runs with a valid context; search
for writerPod / createTestPodFromSpec and replace the defer Delete invocations
accordingly.
- Around line 73-78: The NotFound branch around the operator VolumeSnapshotClass
lookup creates redundant control flow: locate the
dc.Resource(gcpSnapshotClassGVR).Get call that assigns imagesVSC, err and remove
the explicit apierrors.IsNotFound(err) { e2e.Failf(...) } branch so error
handling is unified; instead rely on the existing
o.Expect(err).NotTo(o.HaveOccurred()) (or replace it with a single g.Fail/g.Skip
plus return if you want an early exit) and keep g.By and the Get invocation
intact, ensuring any failure message includes gcpPDImagesVolumeSnapshotClass for
clarity.
- Around line 554-574: The deletion waiters waitUntilVolumeSnapshotDeleted and
waitUntilSnapshotContentDeleted currently use e2e.SnapshotDeleteTimeout and call
o.Expect(err).NotTo(o.HaveOccurred()) inside the Eventually, which aborts on
transient API errors and may time out too quickly for image-backed snapshots;
update both functions to use the longer gcpSnapshotPollTimeout (or introduce
gcpSnapshotDeleteTimeout) instead of e2e.SnapshotDeleteTimeout and change the
polling closure to treat apierrors.IsNotFound(err) as success, return false on
transient errors (do not call o.Expect inside the closure) so the poll can
retry, and only fail the test after the Eventually times out.
- Around line 70-114: The test currently asserts both that there is at least one
default VolumeSnapshotClass for the GCP PD driver (variable defaultForGCPDriver)
and that gcpPDImagesVolumeSnapshotClass is not among them; relax the scope to
only verify the images class is not the default by removing or weakening the
assertion that defaultForGCPDriver is non-empty (remove the
o.Expect(defaultForGCPDriver).NotTo(o.BeEmpty(...)) check and keep only
o.Expect(defaultForGCPDriver).NotTo(o.ContainElement(gcpPDImagesVolumeSnapshotClass)));
this keeps the intent of ensuring the images class is not marked default while
avoiding false failures when no default VSC exists for the driver.
- Around line 116-245: The block test omits explicit snapshot/content deletion
and GC waiting—mirror the filesystem test by resolving the bound content name
with snapshotBoundContentName(ctx, dc, snapName), then explicitly delete the
VolumeSnapshot using oc.AsAdmin().Run("delete").Args("volumesnapshot", snapName,
"-n", ns).Execute() and call waitUntilVolumeSnapshotDeleted(ctx, dc, ns,
snapName) and waitUntilSnapshotContentDeleted(ctx, dc, contentName); place this
after waitPodSucceeded in the block test to provide the same end-to-end deletion
verification as in the filesystem test.
- Around line 492-502: When handling a failed pod in waitPodSucceeded, fetch
logs as admin and specify the container name: replace the
oc.Run("logs").Args(name).Output() call with an admin logs call (use
oc.AsAdmin().Run("logs")) and pass a "-c <container>" arg; determine <container>
by inspecting the pod's spec (p.Spec.Containers) and prefer known names ("blk"
for block pods, "write"/"read" for writer/reader pods) and otherwise fall back
to p.Spec.Containers[0].Name so you always supply a concrete container to the
admin logs command before including logs in the e2e.Failf message.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b0602646-efa8-4b70-8534-ab96794e4725

📥 Commits

Reviewing files that changed from the base of the PR and between 72fbfa9 and fb81744.

📒 Files selected for processing (2)
  • test/extended/storage/driver_configuration.go
  • test/extended/storage/gcp_csi.go

Comment thread test/extended/storage/gcp_csi.go
Comment thread test/extended/storage/gcp_csi.go
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@radeore radeore force-pushed the gcp-imagesnapshot branch from fb81744 to 5d3683a Compare April 22, 2026 15:54
@openshift-merge-bot openshift-merge-bot Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Apr 22, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 22, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: coderabbitai[bot], radeore
Once this PR has been reviewed and has the lgtm label, please assign gnufied for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@radeore
Copy link
Copy Markdown
Contributor Author

radeore commented Apr 22, 2026

/retest-required

@openshift-trt
Copy link
Copy Markdown

openshift-trt Bot commented Apr 23, 2026

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: 5d3683a

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-gcp-ovn Medium - "[sig-storage][Feature:VolumeSnapshotDataSource][Driver: pd.csi.storage.gke.io] GCP PD CSI image volumesnapshot tests should create an image snapshot from RWO block PVC and restore data [Suite:openshift/conformance/parallel]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-gcp-ovn Medium - "[sig-storage][Feature:VolumeSnapshotDataSource][Driver: pd.csi.storage.gke.io] GCP PD CSI image volumesnapshot tests should create an image snapshot from RWO filesystem PVC and restore data [Suite:openshift/conformance/parallel]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-gcp-ovn Medium - "[sig-storage][Feature:VolumeSnapshotDataSource][Driver: pd.csi.storage.gke.io] GCP PD CSI image volumesnapshot tests should expose a VolumeSnapshotClass for snapshot-type images that is not the default [Suite:openshift/conformance/parallel]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-gcp-ovn Medium - "[sig-storage][Feature:VolumeSnapshotDataSource][Driver: pd.csi.storage.gke.io] GCP PD CSI image volumesnapshot tests should support multiple point-in-time image snapshots with independent restores using same source PVC [Suite:openshift/conformance/parallel]" is a new test, and was only seen in one job.

New tests seen in this PR at sha: 5d3683a

  • "[sig-storage][Feature:VolumeSnapshotDataSource][Driver: pd.csi.storage.gke.io] GCP PD CSI image volumesnapshot tests should create an image snapshot from RWO block PVC and restore data [Suite:openshift/conformance/parallel]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[sig-storage][Feature:VolumeSnapshotDataSource][Driver: pd.csi.storage.gke.io] GCP PD CSI image volumesnapshot tests should create an image snapshot from RWO filesystem PVC and restore data [Suite:openshift/conformance/parallel]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[sig-storage][Feature:VolumeSnapshotDataSource][Driver: pd.csi.storage.gke.io] GCP PD CSI image volumesnapshot tests should expose a VolumeSnapshotClass for snapshot-type images that is not the default [Suite:openshift/conformance/parallel]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[sig-storage][Feature:VolumeSnapshotDataSource][Driver: pd.csi.storage.gke.io] GCP PD CSI image volumesnapshot tests should support multiple point-in-time image snapshots with independent restores using same source PVC [Suite:openshift/conformance/parallel]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]

@radeore
Copy link
Copy Markdown
Contributor Author

radeore commented Apr 23, 2026

/retest-required

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 23, 2026

@radeore: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Copy Markdown
Contributor

@jsafrane jsafrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code re-implements lot of tests that are already upstream. IMO it would be better to reuse the existing tests, even though it's not that easy to set them up via the test manifests.

dc = oc.AdminDynamicClient()

g.By(fmt.Sprintf("Checking if the operator VolumeSnapshotClass %q exists", gcpPDImagesVolumeSnapshotClass))
imagesVSC, err := dc.Resource(gcpSnapshotClassGVR).Get(ctx, gcpPDImagesVolumeSnapshotClass, metav1.GetOptions{})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checkes one very specific VolumeSnapshotClass. Why we don't check the other snapshot classes? How is this one special?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is driver generated VolumeSnapshotClass is with parameters snapshot-type: images, it is introduced by this new feature.

o.Expect(defaultForGCPDriver).NotTo(o.ContainElement(gcpPDImagesVolumeSnapshotClass))
})

g.It("should create an image snapshot from RWO filesystem PVC and restore data", func(ctx g.SpecContext) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test for this upstream.

IMO it would be better to use the image snapshot class in e2e tests. I.e. add it to gcp-pd-csi-driver-operator test/e2e, add a new manifest + CI job. Or re-use the CI job that uses hyperdisk-manifest.yaml to test with the image snapshot class and let manifest.yaml test with the default snapshot class?

waitPodSucceeded(ctx, oc, ns, readerPod.Name, e2e.PodStartTimeout)
})

g.It("should support multiple point-in-time image snapshots with independent restores using same source PVC", func(ctx g.SpecContext) {
Copy link
Copy Markdown
Contributor

@jsafrane jsafrane Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a slightly different test upstream, that creates X PVCs + Pods and creates Y snapshots of them.

Try this in the manifest.yaml / hyperdisk-manifest.yaml (and actually all our CSI driver manifests):

DriverInfo:
  ...
  VolumeSnapshotStressTestOptions:
    NumPods: 2
    NumSnapshots: 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants