Skip to content

feat/memory store registry#3679

Open
d-v-b wants to merge 28 commits intozarr-developers:mainfrom
d-v-b:feat/memory-store-registry
Open

feat/memory store registry#3679
d-v-b wants to merge 28 commits intozarr-developers:mainfrom
d-v-b:feat/memory-store-registry

Conversation

@d-v-b
Copy link
Copy Markdown
Contributor

@d-v-b d-v-b commented Jan 29, 2026

This PR adds a new managed memory store class (ManagedMemoryStore) that allows requesting in-memory stores with URL syntax like "memory://my-store/" That's the upside.

The downside is that users can't use their own mutable mappings with this store, because we can't make weak references to generic mutable mappings, and so tracking user-defined mutable mappings would keep them from being garbage collected. Instead, instances of ManagedMemoryStore are constructed with name and path parameters.

A note about pickling: ManagedMemoryStore can be pickled, but if it's unpickled in a separate process from where it was created, an exception is raised. Otherwise, writes in a multiprocessing context would appear successful when in fact they were writing to totally separate stores. I don't think there are any threading issues to worry about, but someone who knows more than me should check that.

Basic usage:

zarr.create_array("memory://foo", shape=(10,), dtype='int8')
# <Array memory://foo shape=(10,) dtype=int8>

closes #2906

d-v-b added 4 commits January 29, 2026 17:19
…ctionaries to manage memory-based zarr storage.

Instances of `ManagedMemoryStore` have a URL representation based on the `id` of the backing dict, e.g. `memory://<id>/`.
This means the same memory-backed store can be accessed without passing an explicit store reference.
Ensure that `ManagedMemoryStore` names do not contain the separator character "/"
Ensure that `ManagedMemoryStore` instances are tied the PID of the creating process
@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jan 29, 2026
@d-v-b d-v-b marked this pull request as ready for review January 29, 2026 20:54
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jan 29, 2026
@d-v-b d-v-b requested a review from a team March 13, 2026 08:44
@maxrjones
Copy link
Copy Markdown
Member

Do you still show the memory address after these changes @d-v-b? Like David and Deepak, I have found that behavior useful (xref #2906 (comment)).

@d-v-b
Copy link
Copy Markdown
Contributor Author

d-v-b commented Mar 26, 2026

Do you still show the memory address after these changes @d-v-b? Like David and Deepak, I have found that behavior useful (xref #2906 (comment)).

In this PR I don't think there's any use for displaying the memory address, because ManagedMemoryStore instances are instead uniquely identified by a url with the form memory://<name>, where <name> is unique. If you have a unique, human-readable identifier, what's the value of the memory address?

for reference, here's the current behavior in this PR:

>>> ManagedMemoryStore()
ManagedMemoryStore('memory://0')
>>> ManagedMemoryStore()
ManagedMemoryStore('memory://1')
>>> ManagedMemoryStore()
ManagedMemoryStore('memory://2')

@d-v-b
Copy link
Copy Markdown
Contributor Author

d-v-b commented Mar 26, 2026

Using an automatically-chosen name is opt-in. You can always specify your own name if you want:

>>> ManagedMemoryStore(name="foo")
ManagedMemoryStore('memory://foo')
>>> ManagedMemoryStore(name=str(id({})))
ManagedMemoryStore('memory://4396301440')

Copy link
Copy Markdown
Member

@maxrjones maxrjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together @d-v-b. the memory:// URL syntax is a nice UX improvement and the weakref-based lifecycle management is a clean design. I have one bug, a couple of design questions, and some nits.

Bug

file:// URLs create a broken LocalStore

In make_store line 380:

elif parsed.scheme == "file" or not parsed.scheme:
    return await make_store(Path(store_like), mode=mode, storage_options=storage_options)

Path("file:///tmp/data") produces file:/tmp/data, so LocalStore ends up with root file://file:/tmp/data. This should use the parsed path:

return await make_store(Path(parsed.path), mode=mode, storage_options=storage_options)

Verified:

>>> from zarr.storage._common import make_store
>>> store = await make_store("file:///tmp/zarr-test", mode="w")
>>> str(store)
'file://file:/tmp/zarr-test'

Design questions

Thread safety of _ManagedStoreDictRegistry

get_or_create has a TOCTOU gap between self._registry.get(name) and self._registry[name] = store_dict — two threads creating a store with the same name could produce two separate backing dicts with one silently overwriting the other's registry entry. The _counter increment for auto-generated names has a similar race. The GIL makes the window narrow in CPython, but a threading.Lock around get_or_create would close it properly.

Duplicated memory:// routing in make_store and make_store_path

Both functions independently parse and handle memory:// URLs. The make_store_path version combines the URL path with the explicit path parameter, while make_store only uses the URL path — is that difference intentional? If so, a comment explaining why would help. If not, could make_store_path fall through to make_store for memory URLs like it does for other URL types?

Nits

  • from_url error message for non-memory schemes is misleading — ManagedMemoryStore.from_url("file:///tmp/test") says "The store may have been garbage collected" when the real issue is a wrong scheme. Maybe check parsed.scheme != "memory" and raise a different message for that case vs. a missing registry entry.

  • parse_store_url is called twice in make_store — once at line 337 (inside storage_options validation) and again at line 373 (routing). The first result could be reused.

  • TestManagedMemoryStore.set/get fixtures write directly to _store_dict[key] without path prefixing, so the StoreTests base class tests don't exercise the path prefix logic. The dedicated path tests cover this, but flagging in case it matters for the base suite's coverage.

@d-v-b
Copy link
Copy Markdown
Contributor Author

d-v-b commented Apr 15, 2026

thanks max! I will fix the bugs. and in general, no duplicated functionality was intended. the needless store / storepath split just makes this duplication very easy to fall into...

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 98.63014% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.20%. Comparing base (ab88363) to head (7aae63b).

Files with missing lines Patch % Lines
src/zarr/storage/_common.py 92.85% 1 Missing ⚠️
src/zarr/storage/_memory.py 99.01% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3679      +/-   ##
==========================================
+ Coverage   93.10%   93.20%   +0.09%     
==========================================
  Files          85       85              
  Lines       11216    11325     +109     
==========================================
+ Hits        10443    10555     +112     
+ Misses        773      770       -3     
Files with missing lines Coverage Δ
src/zarr/storage/__init__.py 95.00% <100.00%> (ø)
src/zarr/storage/_fsspec.py 91.32% <100.00%> (ø)
src/zarr/storage/_utils.py 96.25% <100.00%> (+0.93%) ⬆️
src/zarr/testing/strategies.py 92.88% <100.00%> (-0.03%) ⬇️
src/zarr/storage/_common.py 86.47% <92.85%> (+0.95%) ⬆️
src/zarr/storage/_memory.py 96.74% <99.01%> (+2.26%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

d-v-b added 4 commits April 16, 2026 09:45
Also, change the `ExpectFail` helper class to require the `msg` parameter be set. Allowing a default
value of `None` will make all `pytest.raises(case.exception, match=case.msg)` checks pass, which is not
the intended behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

deterministic memorystore names

2 participants