feat: Initial support for HTTP range requests#481
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
Thread `Option<ByteRange>` through all `get_object` paths and return `ContentRange` in every response so the HTTP handler can choose between 200, 206, and 416 with correct headers. Backends that store objects remotely (GCS, S3) forward the Range header and parse the response. Local-filesystem uses seek-based partial reads. In-memory and Bigtable ignore the range and always return the full payload with 200, which is RFC-compliant.
- Parse Range header unit case-insensitively per RFC 9110 (e.g. `Bytes=0-4` now works instead of being ignored) - Multi-range requests fall back to full 200 response instead of returning 400, matching RFC semantics for unsupported features
- Only set explicit Content-Length when content_range.total > 0, avoiding a Content-Length: 0 regression when metadata.size is unknown. - Use response Content-Length (not metadata.size) in 206 fallback paths for GCS and S3 so Content-Length matches the body if Content-Range is missing from a 206 response.
Reduce test boilerplate by consolidating 20 unit tests into 7 with identical coverage. Remove redundant integration test already covered by unit tests. Use HeaderValue::from_static, delegate to Display impl, and pass ByteRange directly instead of an intermediate HeaderMap.
Tests added to main after this branch diverged still used the old get_object signature (1 arg, 2-tuple return). Update them to pass None for the range parameter and destructure the 3-tuple.
85de81c to
d12a378
Compare
…r S3 partial size bytes=-0 (last 0 bytes) is semantically meaningless and should be rejected with RangeError::Invalid rather than parsed as Last(0). For S3 206 responses, Content-Length reflects the partial body length, not the full object size. Read metadata.size from the Content-Range total field instead so the metadata accurately represents the full object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
93e1429 to
8d40bb4
Compare
| match total { | ||
| Some(total) => return Err(Error::RangeNotSatisfiable { total }), | ||
| None => { | ||
| return Err(Error::Generic { |
There was a problem hiding this comment.
Seems like the GCS emulator was buggy and not returning the header for 406 responses.
Fixed it here getsentry/google-cloud-storage-testbench#3 and upstreaming it here googleapis/storage-testbench#802.
There was a problem hiding this comment.
The S3 backend doesn't really work right now, we need #442, that's why I didn't add tests.
2b78ff4 to
0e21c59
Compare
matt-codecov
left a comment
There was a problem hiding this comment.
now that i'm seeing it, i don't love how inlining range reads in our regular GET operations looks, but i also don't love duplicating all the GET code and just handling this one header and two response codes differently.
also, this is elaborated more in another comment, but it would be nice to add Accept-Ranges: bytes headers only for objects that live in backends that support ranges. but i don't have a clear idea of how to do that cleanly today
i thought about actionable feedback for about an hour and didn't come up with anything so i'm accepting to unblock, but please give them some thought on your own. re-request review if you think you have good solutions
| // We are free to decide whether we want to fall back to the whole object or error. | ||
| // Per RFC 9110: | ||
| // > A server that supports range requests MAY ignore or reject a Range header | ||
| // field that contains an invalid ranges-specifier [...] |
There was a problem hiding this comment.
to me, this RFC excerpt sounds a bit narrower in what it actually allows. if we advertise support for range reads and a client sends us a well-formed range specifier, it doesn't permit us to decide not to honor that range. if we want full spec compliance, i think we have to impl multi-range support and also build range reads ourselves atop backends that don't support it natively
a workable middle ground would be to only add Accept-Ranges: bytes to responses for objects stored in backends that support it. clients can HEAD req a specific object to learn in advance whether range reads are supported for it. any scenario i can think of can do what it needs to do that way:
- a client that wants to download a huge file can check whether to download in parallel chunks (if range-reads are supported) or in a single request (otherwise)
- a client that wants to resume an interrupted download can check whether to append (if range-reads are supported) or replace (otherwise) the incomplete download with the retry response.
- a client that wants to download a specific file from within an archive (based on some out-of-band index query) can blindly attempt a range-read and just deal with it locally if the range wasn't respected
- if they want multiple files within the archive, they may want to check with
HEADreq and DL in parallel, or maybe they will try multi-range and can similarly just deal with it locally if the range wasn't respected. idk
- if they want multiple files within the archive, they may want to check with
unfortunately i'm not sure if there is a great way to do that. it's breaking abstraction to give the service impl a way to manipulate HTTP headers. an ugly kludge would be to add a range_read bool to Metadata or to return it on the side
if we did the backend refactor idea i wrote about, it might be cleaner. StorageService would absorb TieredBackend and could expose a BackendDescriptor or something that can carry this information. idk
There was a problem hiding this comment.
The RFC says (source: [https://datatracker.ietf.org/doc/html/rfc9110#section-14.2-3])
A server MAY ignore the Range header field. However, origin servers and intermediate caches ought to support byte ranges when possible, since they support efficient recovery from partially failed transfers and partial retrieval of large representations.
and also (source: [https://datatracker.ietf.org/doc/html/rfc9110#section-14.3-3]):
For example, a server that supports byte-range requests (Section 14.1.2) can send the field
Accept-Ranges: bytesto indicate that it supports byte range requests for that target resource, thereby encouraging its use by the client for future partial requests on the same request path. Range units are defined in Section 14.1.A client MAY generate range requests regardless of having received an Accept-Ranges field. The information only provides advice for the sake of improving performance and reducing unnecessary network transfers.
Conversely, a client MUST NOT assume that receiving an Accept-Ranges field means that future range requests will return partial responses. The content might change, the server might only support range requests at certain times or under certain conditions, or a different intermediary might process the next request.
From this wording, it seems pretty clear to me that we're free to ignore or honor the header on a per-request basis (the server might only support range requests at certain times or under certain conditions), and even if the HEAD request returns Accept-Ranges, we don't necessarily need to honor it when we get a GET for the same object.
I agree that it would be nice to avoid advertising Accept-Ranges for backends that don't support it, however in practice this is not a big deal as the only backend that doesn't support it is Bigtable, for which objects are constrained to be pretty small anyways, so I wouldn't worry about it right now.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8f63a33. Configure here.

Adds standard HTTP range requests (RFC) support for single-object GET operations.
Spec vs implementation:
Range: bytes=0-1023. The spec allows for Multipart ranges too, i.e. fetching multiple ranges in a single request, which we don't support. We can add it in the future if needed.Looking for feedback on whether we think we need this now or not.
Implementation details:
get_objectpaths now takeOption<ByteRange>and returnContentRangealongside the payload. This seems to be the way to implement this with the least code duplication. Otherwise we would have to introduce aget_object_range, implement it for every backend and wrappers of the backends, etc.objectstore-servicereserializes theRangeheader and forwards it to the backend. On the response path, we proxy back thecontent-rangeprovided by the backend. TODO:Close FS-363