Skip to content

Enable efficient many-small-concurrent GetObject via object_size_hint and sub-8MiB read part sizes (S3CrtClient) #3839

@Tommo56700

Description

@Tommo56700

Describe the feature

For workloads that issue many small GetObject requests concurrently, the S3CrtClient currently forces an oversized memory footprint per request and offers no way to tell the CRT how big an object actually is. Two related changes would remove this:

  1. Allow partSize (or a per-request part size) below the current 8 MiB floor for the read path.
  2. Expose object_size_hint on GetObjectRequest (and related read requests) so it is passed through to aws_s3_meta_request_options.object_size_hint.

Current behavior

1. The SDK clamps partSize to an 8 MiB minimum for all operations, including reads.
In S3CrtClient::init (generated/src/aws-cpp-sdk-s3-crt/source/S3CrtClient.cpp, ~L375):

static const size_t DEFAULT_PART_SIZE = 8 * 1024 * 1024;  // 8MB
s3CrtConfig.part_size = config.partSize < DEFAULT_PART_SIZE ? DEFAULT_PART_SIZE : config.partSize;

Any partSize below 8 MiB is silently raised to 8 MiB before it ever reaches the CRT.

2. aws-c-s3 imposes no minimum part size on reads — the 5 MiB minimum is upload-only.
In aws_s3_client_make_meta_request (aws-c-s3/source/s3_client.c), the AWS_S3_META_REQUEST_TYPE_GET_OBJECT branch passes part_size straight through, while the minimum (g_s3_min_upload_part_size, ~L1399) is enforced only in the PUT_OBJECT branch. Constants (aws-c-s3/source/s3_util.c):

const size_t   g_s3_min_upload_part_size    = MB_TO_BYTES(5);  // upload-only
const uint64_t g_default_part_size_fallback = MB_TO_BYTES(8);  // dynamic default

So the SDK's 8 MiB read floor is a wrapper-imposed limit with no corresponding constraint in the C library.

3. No per-request CRT options are surfaced.
The GetObject paths build options with AWS_ZERO_STRUCT(options) and only set endpoint, callbacks, shutdown_callback, type, signing_config, and message. object_size_hint, per-request part_size, and multipart_upload_threshold are left zeroed — even though aws_s3_meta_request_options exposes them (aws-c-s3/include/aws/s3/s3_client.h: object_size_hint ~L1009, per-request part_size ~L896).

Use Case

Our workload reads a large number of small objects (well under 8 MiB) in parallel. With S3CrtClient this is memory-bound rather than network-bound:

  • Each auto-ranged-get meta request's first (size-discovery) request reserves a full part_size buffer from the CRT buffer pool's primary area, regardless of the real object size. With the SDK's effective 8 MiB floor, every in-flight small download pins ~8 MiB.
  • This caps achievable concurrency for a given memoryLimitBytes and wastes pool memory on objects that may be a few KiB each.

The underlying CRT already supports the right primitives; the SDK simply does not surface them.

Proposed Solution

  1. Surface object_size_hint on GetObjectRequest (and other read meta requests), wiring it to options.object_size_hint in S3CrtClient.

  2. Relax the read-path part-size floor. Either stop clamping for GET_OBJECT, or apply the 8 MiB floor only where the CRT actually requires it (the PUT_OBJECT path already enforces its own minimum safely).

  3. (Nice to have) Expose per-request part_size (aws_s3_meta_request_options.part_size) so part sizing can be tuned per download without a client-wide setting.

  4. Fix the partSize doc comment in S3CrtClientConfiguration.h (~L87), which is inaccurate:

    "defaults to 8MB, if user set it to be less than 5MB, CRT will set it to 5MB."

    The real floor is 8 MiB (not 5 MiB), it is applied by the SDK wrapper (not the CRT), and it currently applies to both reads and writes. (Example of the inaccuracy: setting partSize = 6 MiB — legal per the docs — silently yields 8 MiB.)

Other Information

Enabler

awslabs/aws-c-s3#639 ("Use object_size_hint to size the discovery buffer for small, non-ranged GetObject") makes the CRT right-size the size-discovery buffer from object_size_hint instead of reserving a full part_size. Once that lands, passing object_size_hint from the SDK directly reduces per-request memory for small downloads — which is the main win for this workload.

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature-requestA feature should be added or improved.needs-triageThis issue or PR still needs to be triaged.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions