feat: add IndexTransform library for composable, lazy coordinate mappings#3906
feat: add IndexTransform library for composable, lazy coordinate mappings#3906d-v-b wants to merge 3 commits intozarr-developers:mainfrom
Conversation
…ings Add a new `src/zarr/core/transforms/` package implementing TensorStore-inspired index transforms. The core idea: every indexing operation (slicing, fancy indexing, etc.) produces a coordinate mapping from user space to storage space. These mappings compose lazily — no I/O until explicitly resolved. Key types: - `IndexDomain` — rectangular region in N-dimensional integer space - `ConstantMap`, `DimensionMap`, `ArrayMap` — three representations of a set of storage coordinates (singleton, arithmetic progression, explicit enumeration) - `IndexTransform` — pairs an input domain with output maps (one per storage dim) - `compose(outer, inner)` — chain two transforms Key operations on IndexTransform: - `__getitem__`, `.oindex[]`, `.vindex[]` — indexing produces new transforms - `.intersect(domain)` — restrict to coordinates within a region (chunk resolution) - `.translate(shift)` — shift coordinates (make chunk-local) The transform library is standalone with no dependency on Array. Includes comprehensive test suite (143 tests covering all types, operations, composition, chunk resolution, and edge cases). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3906 +/- ##
==========================================
- Coverage 93.07% 92.58% -0.50%
==========================================
Files 85 92 +7
Lines 11228 12326 +1098
==========================================
+ Hits 10451 11412 +961
- Misses 777 914 +137
🚀 New features to boost your workflow:
|
|
@d-v-b I'm new to zarr-python indexing, does My use case is mainly if I have an array of shape |
Add TypedDict definitions and conversion functions for serializing
IndexDomain, OutputIndexMap, and IndexTransform to/from JSON.
The JSON format follows TensorStore's conventions for interoperability:
- IndexDomain: input_inclusive_min, input_exclusive_max, input_labels
- OutputIndexMap: offset + optional stride/input_dimension/index_array
- IndexTransform: domain fields + output array
TypedDicts: IndexDomainJSON, OutputIndexMapJSON, IndexTransformJSON
Functions: index_domain_to_json, index_domain_from_json,
index_transform_to_json, index_transform_from_json
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
on this branch: >>> arr = create_array(store={}, shape=(100,100), dtype="uint8")
>>> arr.z[0]
<Array memory://136980013828096 shape=(100,) dtype=uint8 domain={ 0, [0, 100) }>
>>> arr.z[0].shape
(100,)the |
|
What is I have and I want to select the arrays for band 1 (second dim) but for all the times, would |
yeah it should! |
Merging this PR will degrade performance by 17.94%
Performance ChangesComparing Footnotes
|
My summary:
With dask maintenance on the decline, it's more important than ever that we give zarr-python users a dask-free way to do something very intuitive: index large zarr arrays without turning the whole thing into a numpy array first. This was discussed at length in #1603.
This PR, done with Claude, makes regular indexing go through a lazy indexing layer. The lazy indexing layer is based on abstractions defined in tensorstore. The basic idea is to explicitly model indexing an array as a transformation from some input coordinates to output coordinates, and to bind such a representation to our
Arrayclasses.Regular indexing via
.__getitem__is still immediate, but arrays have a new.zattribute that exposes the lazy indexing layer:Goals here:
mainare disparate ad-hoc copies of stuff from zarr-python 2.x. We can do better.Non-goals:
Claude's summary.
Add a new
src/zarr/core/transforms/package implementing TensorStore-inspiredindex transforms. The core idea: every indexing operation (slicing, fancy indexing,
etc.) produces a coordinate mapping from user space to storage space. These mappings
compose lazily — no I/O until explicitly resolved.
Key types:
IndexDomain— rectangular region in N-dimensional integer spaceConstantMap,DimensionMap,ArrayMap— three representations of a set ofstorage coordinates (singleton, arithmetic progression, explicit enumeration)
IndexTransform— pairs an input domain with output maps (one per storage dim)compose(outer, inner)— chain two transformsKey operations on IndexTransform:
__getitem__,.oindex[],.vindex[]— indexing produces new transforms.intersect(domain)— restrict to coordinates within a region (chunk resolution).translate(shift)— shift coordinates (make chunk-local)The transform library is standalone with no dependency on Array.
Includes comprehensive test suite (143 tests covering all types, operations,
composition, chunk resolution, and edge cases).
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com