Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
5987137
Add initial tests for remote storage workflows with UPath
SamirMoustafa Feb 28, 2026
865eb76
io: add dask.array.to_zarr compat for ome_zarr kwargs
SamirMoustafa Mar 2, 2026
2134386
io: add remote storage helpers in _utils
SamirMoustafa Mar 2, 2026
eee34d8
core: support UPath for SpatialData.path and write()
SamirMoustafa Mar 2, 2026
40af327
io: use resolved store and remote parquet in points, raster, shapes, …
SamirMoustafa Mar 2, 2026
540631c
ci: add test deps and Dockerfile for storage emulators (S3, Azure, GCS)
SamirMoustafa Mar 2, 2026
532af5a
test: move remote storage tests under tests/io/remote_storage and add…
SamirMoustafa Mar 2, 2026
c22b8bf
fix: update Dask internal keys for zarr compatibility
SamirMoustafa Mar 2, 2026
0c07169
test: refine subset and table validation in spatial data tests
SamirMoustafa Mar 2, 2026
f21bb52
feat: move Dockerfile for storage emulators to facilitate testing
SamirMoustafa Mar 2, 2026
072566a
ci: enhance GitHub Actions workflow to support storage emulators on L…
SamirMoustafa Mar 2, 2026
ee6e4dc
fix: handle RuntimeError in fsspec async session closure
SamirMoustafa Mar 2, 2026
9019e6a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 2, 2026
42c3133
refactor: add type hints to functions in _dask_zarr_compat, _utils, a…
SamirMoustafa Mar 2, 2026
70ababe
chore: remove pytest-timeout from test dependencies in pyproject.toml
SamirMoustafa Mar 4, 2026
cae2319
test: add unit tests for remote storage store resolution and credenti…
SamirMoustafa Mar 4, 2026
857327b
Merge main into cloud-storage-support
SamirMoustafa Apr 14, 2026
fe6bf24
chore(ci): fix GCS emulator tests (gcsfs, sync upload, multi-arch)
SamirMoustafa Apr 15, 2026
3cb2c93
refactor: remove deprecated dask array compatibility layer
SamirMoustafa Apr 15, 2026
6cf359a
Improve path handling in FsspecStore and update read_parquet options
SamirMoustafa Apr 15, 2026
df7be9a
Add fsspec integration by adding support for cloud object store proto…
SamirMoustafa Apr 15, 2026
a0bcc65
Enhance path handling for hierarchical URIs in SpatialData and relate…
SamirMoustafa Apr 15, 2026
f1cc651
Ensure existing Zarr stores are returned unchanged in _resolve_zarr_s…
SamirMoustafa Apr 15, 2026
55ba3d0
remove unused fsspec async handling code and update related test docu…
SamirMoustafa Apr 15, 2026
0e2e424
Updating the path setter to accept strings and normalize them to Path…
SamirMoustafa Apr 15, 2026
ce20830
write method safeguards for local and remote paths in SpatialData.
SamirMoustafa Apr 15, 2026
fbc3040
Support for UPath in data reading functions and improve error handlin…
SamirMoustafa Apr 15, 2026
175fbea
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 15, 2026
3beed0e
Refactor full_sdata fixture for consistency in remote I/O tests.
SamirMoustafa Apr 15, 2026
a7c51c2
rollback the unneeded changes for test cases within the core
SamirMoustafa Apr 15, 2026
6443422
rollback the unneeded changes for test cases within the query
SamirMoustafa Apr 15, 2026
be23021
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 15, 2026
738e611
init
selmanozleyen Apr 15, 2026
2d1caee
Merge branch 'cloud-storage-support' into feat/zarr-store-class
selmanozleyen Apr 15, 2026
53c45ee
Adding a dedicated job for remote storage tests, updating coverage up…
SamirMoustafa Apr 16, 2026
be7501a
Merge remote-tracking branch 'samirmoustafa/cloud-storage-support' in…
selmanozleyen Apr 16, 2026
341b8fa
add arrow filesystem
selmanozleyen Apr 16, 2026
389d8ec
no provider specific stuff
selmanozleyen Apr 16, 2026
b587971
error when overwrite=False and add remote storeage
selmanozleyen Apr 17, 2026
db4f286
add tests for abstractions and fix bugs
selmanozleyen Apr 17, 2026
5bffb4d
give readonly sotrs
selmanozleyen Apr 17, 2026
4be931b
test clarity
selmanozleyen Apr 17, 2026
b800d05
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 17, 2026
cb1d0d5
restore pointer
selmanozleyen Apr 17, 2026
e6fba59
mypy plus notebook pointer
selmanozleyen Apr 17, 2026
2c4a579
refactor helpers
selmanozleyen Apr 17, 2026
43269f6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 17, 2026
c706bbf
Revert "refactor helpers"
selmanozleyen Apr 17, 2026
318c0bd
too long line
selmanozleyen Apr 17, 2026
b14b2a8
if _cms is not None and isinstance(path.store, _cms): is always none …
selmanozleyen Apr 17, 2026
958dd1e
add clear comments
selmanozleyen Apr 17, 2026
d8ad5bb
fix: restore get_dask_backing_files for post-unpinning dask task-spec
selmanozleyen Apr 17, 2026
909b0f3
add github links
selmanozleyen Apr 17, 2026
12f2489
ome-zarr needs to be consolidation aware
selmanozleyen Apr 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
191 changes: 122 additions & 69 deletions src/spatialdata/_core/spatialdata.py

Large diffs are not rendered by default.

221 changes: 170 additions & 51 deletions src/spatialdata/_io/_utils.py

Large diffs are not rendered by default.

44 changes: 33 additions & 11 deletions src/spatialdata/_io/io_points.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,16 @@
from dask.dataframe import DataFrame as DaskDataFrame
from dask.dataframe import read_parquet
from ome_zarr.format import Format
from upath import UPath

from spatialdata._io._utils import (
_get_transformations_from_ngff_dict,
_resolve_zarr_store,
_write_metadata,
overwrite_coordinate_transformations_non_raster,
)
from spatialdata._io.format import CurrentPointsFormat, PointsFormats, _parse_version
from spatialdata._store import ZarrStore, make_zarr_store, make_zarr_store_from_group, open_zarr_for_read
from spatialdata.models import get_axes_names
from spatialdata.transformations._utils import (
_get_transformations,
Expand All @@ -21,21 +24,38 @@


def _read_points(
store: str | Path,
store: str | Path | UPath | ZarrStore,
) -> DaskDataFrame:
"""Read points from a zarr store."""
f = zarr.open(store, mode="r")
"""Read points from a zarr store (path, hierarchical URI string, or remote ``UPath``)."""
zarr_store = store if isinstance(store, ZarrStore) else make_zarr_store(store)
resolved_store = _resolve_zarr_store(zarr_store.path)
f = open_zarr_for_read(resolved_store, as_group=False)

version = _parse_version(f, expect_attrs_key=True)
assert version is not None
points_format = PointsFormats[version]

store_root = f.store_path.store.root
path = store_root / f.path / "points.parquet"
# cache on remote file needed for parquet reader to work
# TODO: allow reading in the metadata without caching all the data
points = read_parquet("simplecache::" + str(path) if str(path).startswith("http") else path)
parquet_store = zarr_store.child("points.parquet")
# Passing filesystem= to read_parquet makes pyarrow convert dictionary columns into pandas
# categoricals eagerly per partition and marks them known=True with an empty category list.
# This happens for ANY pyarrow filesystem (both LocalFileSystem and PyFileSystem(FSSpecHandler(.))
# return the same broken categorical), so it is a property of the filesystem= handoff itself,
# not of local-vs-remote. Left as is, it would make write_points' cat.as_known() a no-op and
# the next to_parquet(filesystem=.) would fail with a per-partition schema mismatch
# (dictionary<values=null> vs dictionary<values=string>). We demote the categoricals back to
# "unknown" right here so that write_points recomputes categories consistently across partitions.
# TODO: allow reading in the metadata without materializing the data.
points = read_parquet(
parquet_store.arrow_path(),
filesystem=parquet_store.arrow_filesystem(),
)
assert isinstance(points, DaskDataFrame)
for column_name in points.columns:
c = points[column_name]
if c.dtype == "category" and c.cat.known:
points[column_name] = c.cat.as_unknown()
if points.index.name == "__null_dask_index__":
points = points.rename_axis(None)

transformations = _get_transformations_from_ngff_dict(f.attrs.asdict()["coordinateTransformations"])
_set_transformations(points, transformations)
Expand Down Expand Up @@ -68,8 +88,7 @@ def write_points(
axes = get_axes_names(points)
transformations = _get_transformations(points)

store_root = group.store_path.store.root
path = store_root / group.path / "points.parquet"
parquet_store = make_zarr_store_from_group(group).child("points.parquet")

# The following code iterates through all columns in the 'points' DataFrame. If the column's datatype is
# 'category', it checks whether the categories of this column are known. If not, it explicitly converts the
Expand All @@ -84,7 +103,10 @@ def write_points(

points_without_transform = points.copy()
del points_without_transform.attrs["transform"]
points_without_transform.to_parquet(path)
points_without_transform.to_parquet(
parquet_store.arrow_path(),
filesystem=parquet_store.arrow_filesystem(),
)

attrs = element_format.attrs_to_dict(points.attrs)
attrs["version"] = element_format.spatialdata_format_version
Expand Down
10 changes: 7 additions & 3 deletions src/spatialdata/_io/io_raster.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,20 @@
from ome_zarr.writer import write_labels as write_labels_ngff
from ome_zarr.writer import write_multiscale as write_multiscale_ngff
from ome_zarr.writer import write_multiscale_labels as write_multiscale_labels_ngff
from upath import UPath
from xarray import DataArray, DataTree

from spatialdata._io._utils import (
_get_transformations_from_ngff_dict,
_resolve_zarr_store,
overwrite_coordinate_transformations_raster,
)
from spatialdata._io.format import (
CurrentRasterFormat,
RasterFormatType,
get_ome_zarr_format,
)
from spatialdata._store import ZarrStore, make_zarr_store
from spatialdata._utils import get_pyramid_levels
from spatialdata.models._utils import get_channel_names
from spatialdata.models.models import ATTRS_KEY
Expand Down Expand Up @@ -160,13 +163,14 @@ def _prepare_storage_options(


def _read_multiscale(
store: str | Path, raster_type: Literal["image", "labels"], reader_format: Format
store: str | Path | UPath | ZarrStore, raster_type: Literal["image", "labels"], reader_format: Format
) -> DataArray | DataTree:
assert isinstance(store, str | Path)
assert raster_type in ["image", "labels"]
zarr_store = store if isinstance(store, ZarrStore) else make_zarr_store(store)
resolved_store = _resolve_zarr_store(zarr_store.path)

nodes: list[Node] = []
image_loc = ZarrLocation(store, fmt=reader_format)
image_loc = ZarrLocation(resolved_store, fmt=reader_format)
if exists := image_loc.exists():
image_reader = Reader(image_loc)()
image_nodes = list(image_reader)
Expand Down
23 changes: 14 additions & 9 deletions src/spatialdata/_io/io_shapes.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@
from natsort import natsorted
from ome_zarr.format import Format
from shapely import from_ragged_array, to_ragged_array
from upath import UPath

from spatialdata._io._utils import (
_get_transformations_from_ngff_dict,
_resolve_zarr_store,
_write_metadata,
overwrite_coordinate_transformations_non_raster,
)
Expand All @@ -23,6 +25,7 @@
ShapesFormatV03,
_parse_version,
)
from spatialdata._store import ZarrStore, make_zarr_store, make_zarr_store_from_group, open_zarr_for_read
from spatialdata.models import ShapesModel, get_axes_names
from spatialdata.transformations._utils import (
_get_transformations,
Expand All @@ -31,10 +34,12 @@


def _read_shapes(
store: str | Path,
store: str | Path | UPath | ZarrStore,
) -> GeoDataFrame:
"""Read shapes from a zarr store."""
f = zarr.open(store, mode="r")
"""Read shapes from a zarr store (path, hierarchical URI string, or remote ``UPath``)."""
zarr_store = store if isinstance(store, ZarrStore) else make_zarr_store(store)
resolved_store = _resolve_zarr_store(zarr_store.path)
f = open_zarr_for_read(resolved_store, as_group=False)
version = _parse_version(f, expect_attrs_key=True)
assert version is not None
shape_format = ShapesFormats[version]
Expand All @@ -54,9 +59,9 @@ def _read_shapes(
geometry = from_ragged_array(typ, coords, offsets)
geo_df = GeoDataFrame({"geometry": geometry}, index=index)
elif isinstance(shape_format, ShapesFormatV02 | ShapesFormatV03):
store_root = f.store_path.store.root
path = Path(store_root) / f.path / "shapes.parquet"
geo_df = read_parquet(path)
parquet_store = zarr_store.child("shapes.parquet")
with parquet_store.arrow_filesystem().open_input_file(parquet_store.arrow_path()) as src:
geo_df = read_parquet(src)
else:
raise ValueError(
f"Unsupported shapes format {shape_format} from version {version}. Please update the spatialdata library."
Expand Down Expand Up @@ -169,13 +174,13 @@ def _write_shapes_v02_v03(
"""
from spatialdata.models._utils import TRANSFORM_KEY

store_root = group.store_path.store.root
path = store_root / group.path / "shapes.parquet"
parquet_store = make_zarr_store_from_group(group).child("shapes.parquet")

# Temporarily remove transformations from attrs to avoid serialization issues
transforms = shapes.attrs[TRANSFORM_KEY]
del shapes.attrs[TRANSFORM_KEY]
shapes.to_parquet(path, geometry_encoding=geometry_encoding)
with parquet_store.arrow_filesystem().open_output_stream(parquet_store.arrow_path()) as sink:
shapes.to_parquet(sink, geometry_encoding=geometry_encoding)
shapes.attrs[TRANSFORM_KEY] = transforms

attrs = element_format.attrs_to_dict(shapes.attrs)
Expand Down
11 changes: 8 additions & 3 deletions src/spatialdata/_io/io_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,26 @@
from anndata import read_zarr as read_anndata_zarr
from anndata._io.specs import write_elem as write_adata
from ome_zarr.format import Format
from upath import UPath

from spatialdata._io._utils import _resolve_zarr_store
from spatialdata._io.format import (
CurrentTablesFormat,
TablesFormats,
TablesFormatV01,
TablesFormatV02,
_parse_version,
)
from spatialdata._store import ZarrStore, make_zarr_store, open_zarr_for_read
from spatialdata.models import TableModel, get_table_keys


def _read_table(store: str | Path) -> AnnData:
table = read_anndata_zarr(str(store))
def _read_table(store: str | Path | UPath | ZarrStore) -> AnnData:
zarr_store = store if isinstance(store, ZarrStore) else make_zarr_store(store)
resolved_store = _resolve_zarr_store(zarr_store.path)
table = read_anndata_zarr(resolved_store)

f = zarr.open(store, mode="r")
f = open_zarr_for_read(resolved_store, as_group=False)
version = _parse_version(f, expect_attrs_key=False)
assert version is not None
table_format = TablesFormats[version]
Expand Down
Loading