Skip to content

Object#upload_stream retains unbounded memory when the source outpaces the upload #3393

@npadgett

Description

@npadgett

Version

aws-sdk-s3 1.225.0 (Ruby 3.4)

Problem

Object#upload_stream holds memory proportional to the whole object when the block writes faster than parts upload: a fast local source feeding a slower S3 sink. That defeats the purpose of a streaming upload.

Cause

MultipartStreamUploader#upload_with_executor reads each part into a StringIO (read_to_part_body) and submits it through DefaultExecutor#post, which appends to an unbounded Queue and returns immediately (non-blocking). The internal IO.pipe backpressures the writer against the reader, but the reader then offloads into that unbounded queue, so nothing bounds read-ahead to the ~max_threads parts that are actually uploading. Unsent 5 MB StringIOs accumulate up to roughly object size minus bytes already uploaded.

tempfile: true masks it (parts spill to disk), but the default is in-memory and unbounded.

Impact

Streaming a multi-GB on-disk file spikes RSS by GBs. Concretely, Active Storage's S3Service archives large files via upload_stream { |out| IO.copy_stream(file, out) }; a 12 GB file drove ~3.6 GB of RSS and OOM'd a 4 GB worker. upload_file (lazy FilePart, reads from disk on demand) is unaffected under the same conditions; that divergence is the tell.

Suggested fix

Bound read-ahead to the worker count: a SizedQueue, or have post block while all workers are busy. (DefaultExecutor arrived with the #3302 executor refactor; #1824 previously tuned upload_stream memory.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions