Skip to content

fix(atelet): stream sandbox asset downloads instead of buffering in memory#281

Open
Davanum Srinivas (dims) wants to merge 1 commit into
agent-substrate:mainfrom
dims:fix/atelet-stream-asset-download
Open

fix(atelet): stream sandbox asset downloads instead of buffering in memory#281
Davanum Srinivas (dims) wants to merge 1 commit into
agent-substrate:mainfrom
dims:fix/atelet-stream-asset-download

Conversation

@dims

Copy link
Copy Markdown
Collaborator

Previously, fetchAsset retrieved each asset through FetchFromGCS, which reads the entire object into a byte slice before hashing and writing it. Peak memory therefore scaled with the asset size. This was acceptable for the roughly 50 MB runsc binary, but when we go to micro-VM(s) the runtime's kernel and root filesystem images range from hundreds of megabytes to several gigabytes and would exhaust the memory of atelet, which is shared across all actors on a node.

The download now streams directly to the temporary file, computing the SHA-256 digest in the same pass via io.MultiWriter and bounding the transfer with an io.LimitReader. Peak memory is now the size of the io.Copy buffer, independent of the asset size. The LimitReader also provides a secondary safeguard for disk usage against a misconfigured or malicious URL that returns an unbounded stream. The limit is 8 GiB and is declared as a variable so that tests may lower it.

The digest is verified only after the copy completes. A size or hash failure therefore leaves the data at the temporary path, which is subsequently removed; it is never renamed to the content-addressed cache path, so a failed download cannot corrupt the cache.

This change also introduces ategcs.Open, a streaming reader, along with the accompanying TestFetchAssetStreaming.

Fixes #<issue_number_goes_here>

It's a good idea to open an issue first for discussion.

  • Tests pass
  • Appropriate changes to documentation are included in the PR

…emory

Previously, fetchAsset retrieved each asset through FetchFromGCS, which reads the entire object into a byte slice before hashing and writing it. Peak memory therefore scaled with the asset size. This was acceptable for the roughly 50 MB runsc binary, but the micro-VM runtime's kernel and root filesystem images range from hundreds of megabytes to several gigabytes and would exhaust the memory of atelet, which is shared across all actors on a node.

The download now streams directly to the temporary file, computing the SHA-256 digest in the same pass via io.MultiWriter and bounding the transfer with an io.LimitReader. Peak memory is now the size of the io.Copy buffer, independent of the asset size. The LimitReader also provides a secondary safeguard for disk usage against a misconfigured or malicious URL that returns an unbounded stream. The limit is 8 GiB and is declared as a variable so that tests may lower it.

The digest is verified only after the copy completes. A size or hash failure therefore leaves the data at the temporary path, which is subsequently removed; it is never renamed to the content-addressed cache path, so a failed download cannot corrupt the cache.

This change also introduces ategcs.Open, a streaming reader, along with the accompanying TestFetchAssetStreaming.
@dims

Copy link
Copy Markdown
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants