Skip to content

ObjectStore.list_dir("g") raises ValueError when an object keyed "g/" exists #4032

@blinkybool

Description

@blinkybool

Zarr version

3.2.1

Numcodecs version

v0.16.5

Python Version

3.14

Operating System

Mac

Installation

uv

Description

I hit this while using zarr.ObjectStore to walk a S3/R2 looking for zarr stores in a bucket that also contains directory-marker keys. Walking the hierarchy with list_dir raises at any directory that has a marker.

Concretely, list_dir("g") (or "g/" — the argument is rstrip("/")-ed) raises whenever an object keyed "g/" exists. obstore lists with prefix "g/", and the marker keyed "g/" is returned as an object that obstore reports as path "g". That entry is then run through _transform_list_dir / _relativize_path (zarr/storage/_obstore.py, zarr/storage/_utils.py):

async def _transform_list_dir(
    list_result_coroutine: Coroutine[Any, Any, ListResult[Sequence[ObjectMeta]]], prefix: str
) -> AsyncGenerator[str, None]:
    list_result = await list_result_coroutine
    prefix = prefix.rstrip("/")
    for path in chain(
        list_result["common_prefixes"], map(itemgetter("path"), list_result["objects"])
    ):
        yield _relativize_path(path=path, prefix=prefix)


def _relativize_path(*, path: str, prefix: str) -> str:
    if prefix == "":
        return path
    else:
        _prefix = f"{prefix}/"
        if not path.startswith(_prefix):
            raise ValueError(f"The first component of {path} does not start with {prefix}.")
        return path.removeprefix(_prefix)

With path="g" and prefix="g", "g".startswith("g/") is false, so it raises instead of skipping the entry. (For prefix == "" it returns early, so only non-root listings are affected.)

Reproduced on Python 3.14.4, zarr 3.2.1, obstore 0.10.0. The key cannot be written through obstore, which strips the trailing slash on put, so the repro creates it with boto3. LocalStore and MemoryStore cannot reproduce this: a filesystem cannot hold both a file "g" and a directory "g/", and obstore's MemoryStore strips the trailing slash on put.

Steps to reproduce

# /// script
# requires-python = ">=3.11"
# dependencies = [
#     "zarr==3.2.1",
#     "obstore==0.10.0",
#     "boto3==1.43.22",
#     "moto[server]==5.2.1",
# ]
# ///
import asyncio, boto3, obstore.store
from moto.server import ThreadedMotoServer
from zarr.storage import ObjectStore

server = ThreadedMotoServer(port=0)
server.start()
host, port = server.get_host_and_port()
endpoint = f"http://{host}:{port}"

s3 = boto3.client("s3", endpoint_url=endpoint, region_name="us-east-1",
                  aws_access_key_id="x", aws_secret_access_key="x")
s3.create_bucket(Bucket="bucket")
s3.put_object(Bucket="bucket", Key="g/", Body=b"")  # directory-placeholder object

store = ObjectStore(obstore.store.S3Store(
    bucket="bucket", endpoint=endpoint, region="us-east-1",
    access_key_id="x", secret_access_key="x",
    client_options={"allow_http": True}, virtual_hosted_style_request=False,
))


async def main():
    # ValueError: The first component of g does not start with g.
    return [name async for name in store.list_dir("g")]


print(asyncio.run(main()))

Additional output

Starting a new Thread with MotoServer running on 0.0.0.0:0...
127.0.0.1 - - [04/Jun/2026 13:34:36] "PUT /bucket HTTP/1.1" 200 -
127.0.0.1 - - [04/Jun/2026 13:34:36] "PUT /bucket/g/ HTTP/1.1" 200 -
127.0.0.1 - - [04/Jun/2026 13:34:36] "GET /bucket?delimiter=/&list-type=2&prefix=g/ HTTP/1.1" 200 -
Traceback (most recent call last):
  File "/tmp/repro.py", line 36, in <module>
    print(asyncio.run(main()))
          ~~~~~~~~~~~^^^^^^^^
  File "/Users/billy/.local/share/uv/python/cpython-3.14.4-macos-aarch64-none/lib/python3.14/asyncio/runners.py", line 204, in run
    return runner.run(main)
           ~~~~~~~~~~^^^^^^
  File "/Users/billy/.local/share/uv/python/cpython-3.14.4-macos-aarch64-none/lib/python3.14/asyncio/runners.py", line 127, in run
    return self._loop.run_until_complete(task)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/Users/billy/.local/share/uv/python/cpython-3.14.4-macos-aarch64-none/lib/python3.14/asyncio/base_events.py", line 719, in run_until_complete
    return future.result()
           ~~~~~~~~~~~~~^^
  File "/tmp/repro.py", line 33, in main
    return [name async for name in store.list_dir("g")]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/billy/.cache/uv/environments-v2/repro-f019c5f27b4e421e/lib/python3.14/site-packages/zarr/storage/_obstore.py", line 270, in _transform_list_dir
    yield _relativize_path(path=path, prefix=prefix)
          ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/billy/.cache/uv/environments-v2/repro-f019c5f27b4e421e/lib/python3.14/site-packages/zarr/storage/_utils.py", line 272, in _relativize_path
    raise ValueError(f"The first component of {path} does not start with {prefix}.")
ValueError: The first component of g does not start with g.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions