fix(network-volume): re-resolve stale cached volume id by name by deanq · Pull Request #348 · runpod/flash

deanq · 2026-06-29T12:09:33Z

Summary

NetworkVolume(name=...) caches its resolved volume id locally (.flash/resources.pkl, .runpod/resources.pkl). If the volume was later deleted on Runpod out of band, the stale id read as "deployed" and flowed straight into GraphQL provisioning, hard-failing with Network volume "<id>" not found (and a 503 in live mode) instead of re-resolving by name like a fresh checkout does.

Fixes SLS-337.

Root cause

NetworkVolume.is_deployed() only checked id is not None. Both early-return sites — ResourceManager.get_or_deploy_resource and NetworkVolume._do_deploy — trusted that result, so a stale cached id was never validated against the live API.

Changes

is_deployed() now validates the cached id against the live API (list_network_volumes), mirroring ServerlessResource.is_deployed(). A stale id reads as not-deployed, so both early-return sites fall back to resolve-by-name + create-if-missing, matching first-run behavior. Transient API errors return False, deferring to the idempotent resolve-by-name path.
_do_deploy() clears the stale id before recreating (so it isn't sent in the create payload) and raises a clear error for an unrecoverable id-only volume (no name to re-resolve by).

Test plan

6 new unit tests in tests/unit/resources/test_network_volume.py (TDD, written failing-first): stale-id reads as not-deployed, valid-id reads as deployed, no-id short-circuits without an API call, _do_deploy recreates by name when the cached id is stale (asserts the create payload drops the stale id), reuses on a valid id, and raises for a stale id-only volume.
make quality-check: ruff format + lint clean, full suite 2568 passed under xdist, coverage 86% (threshold 65%).

A cached NetworkVolume id (.flash/resources.pkl, .runpod/resources.pkl) read as deployed even after the volume was deleted on Runpod, so the stale id flowed into GraphQL provisioning and hard-failed with "Network volume \"<id>\" not found". is_deployed() now validates the cached id against the live API, mirroring ServerlessResource.is_deployed(). A stale id reads as not-deployed, so both early-return sites (ResourceManager and _do_deploy) fall back to resolve-by-name and create-if-missing, matching fresh-checkout behavior. _do_deploy() clears the stale id before recreating so it is not sent in the create payload, and raises a clear error for an unrecoverable id-only volume. Fixes SLS-337

capy-ai · 2026-06-29T12:09:43Z

Capy auto-review is paused for this organization because the usage-cycle auto-review limit has been reached. Increase the limit or turn it off in billing settings to resume automatic reviews.

promptless · 2026-06-29T12:13:43Z

Promptless prepared a documentation update related to this change.

Triggered by flash PR #348 (fix: re-resolve stale cached volume id by name, SLS-337)

Since this fix introduces a user-visible behavior difference between referencing a network volume by name (auto-recovers when deleted out of band) vs by id, the suggestion adds a subsection to the storage docs explaining how Flash resolves volumes by name or ID, the cached ID revalidation, and the new auto-recovery behavior.

Review: Document how Flash resolves network volumes by name or ID

Copilot

Pull request overview

This PR fixes a failure mode where NetworkVolume could treat a stale locally-cached volume ID as “deployed” even after the volume was deleted out-of-band on Runpod, causing downstream provisioning to hard-fail instead of re-resolving by name (SLS-337).

Changes:

Updated NetworkVolume.is_deployed() to validate the cached id against the live Runpod API (list_network_volumes) and treat missing/stale IDs as not deployed.
Updated _do_deploy() to (a) clear a stale id before creating a new volume by name, and (b) raise a clear error for id-only volumes that can’t be re-resolved.
Added unit tests covering stale/valid/no-id is_deployed() behavior and _do_deploy() recreation/reuse/error paths.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`src/runpod_flash/core/resources/network_volume.py`	Validates cached volume IDs against the live API, clears stale IDs before create, and errors for unrecoverable id-only volumes.
`tests/unit/resources/test_network_volume.py`	Adds targeted async unit tests for the new stale-ID validation and deploy behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ched-volume-id-hard-fails-instead-of

deanq requested a review from Copilot June 29, 2026 12:24

Copilot started reviewing on behalf of deanq June 29, 2026 12:24 View session

deanq requested review from KAJdev, jhcipar and runpod-Henrik June 29, 2026 12:25

Copilot AI reviewed Jun 29, 2026

View reviewed changes

jhcipar approved these changes Jun 29, 2026

View reviewed changes

Merge branch 'main' into deanquinanola/sls-337-networkvolume-stale-ca…

0a7e5a7

…ched-volume-id-hard-fails-instead-of

deanq merged commit efb3cc1 into main Jun 29, 2026
8 checks passed

deanq deleted the deanquinanola/sls-337-networkvolume-stale-cached-volume-id-hard-fails-instead-of branch June 29, 2026 18:20

runpod-release-please-bot Bot mentioned this pull request Jun 29, 2026

chore: release 1.18.0 #349

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(network-volume): re-resolve stale cached volume id by name#348

fix(network-volume): re-resolve stale cached volume id by name#348
deanq merged 2 commits into
mainfrom
deanquinanola/sls-337-networkvolume-stale-cached-volume-id-hard-fails-instead-of

deanq commented Jun 29, 2026

Uh oh!

capy-ai Bot commented Jun 29, 2026

Uh oh!

promptless Bot commented Jun 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

deanq commented Jun 29, 2026

Summary

Root cause

Changes

Test plan

Uh oh!

capy-ai Bot commented Jun 29, 2026

Uh oh!

promptless Bot commented Jun 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants