refactor: define codec and data type classes upstream in a subpackage#3875
refactor: define codec and data type classes upstream in a subpackage#3875d-v-b wants to merge 3 commits intozarr-developers:mainfrom
Conversation
|
@zarr-developers/python-core-devs does anyone object to the basic proposal here: to upstream our basic codec + data type APIs? I think the current situation is untenable so I'd like to see it fixed. This PR is one approach, but I'm open to alternatives. |
| from zarr_interfaces.data_type.v1 import ZDType | ||
| ``` | ||
|
|
||
| Interfaces are versioned under a `v1` namespace to support future evolution |
There was a problem hiding this comment.
I wonder if the versioning will create confusion, because it is another version apart from the zarr package and the zarr data format versions.
There was a problem hiding this comment.
I hope it's not confusing! the goal here is to allow zarr-python to gracefully evolve things like the codec API. Since different codec APIs would not interact, we could define the current ABC-based API under v1, and a newer protocol-based API under v2. I think only codec and data type developers would need to know about this, and I would count on that crowd being able to know what the versions mean.
|
Since there have been no objections here, I am going to move forward with this PR. |
I'm not sure you've really addressed Tom's concerns from #3867 (comment). I've restated them in #3867 (comment). |
|
@maxrjones the primary goal of this change is to allow us to gracefully evolve our codec API by using a package structure that more accurately models the real dependency relationships. Nobody has objected to that. Being able to easily import externally-defined codecs is just a nice side-effect of this refactoring, but this direction is still valuable even if we define all our codecs internally. |
These seem orthogonal. Sorry that I need to step back from this discussion, but I also wanted to at least voice skepticism before you continue to invest time. I won't block the approach if you find a different approver, but am not convinced. |
they are not orthogonal at all. a circular dependency can look OK until you have to change one or the other pair. then the problems emerge. this is exactly what we experienced with 3.x and numcodecs. for context, the codec api for zarr-python 2.x was defined in a separate package (numcodecs). zarrs defines the codec API(, and many other APIs, in separate packages. zarritia.js defines the ndarray and storage APIs in separate packages. It's actually normal and OK to do this! |
Yes I am familiar with monorepos. I find it to be a matter of preference. I manage co-development in virtualizarr and virtual tiff just fine. Some times it's annoying to manage release timing, most times it's nice to have independent development. I'm willing to follow what others prefer here. |
I would also like to hear some more voices in the conversation. I am sensitive to this situation because:
these two directions are in tension as long as we use the current (ahistorical) arrangement of defining the codec API inside zarr-python. Can someone provide an alternative proposal for how we can evolve our codec API while also depending on externally-defined codecs that depend on zarr-python? |
Projects that want to implement their own codecs or data types have to import base classes from
zarr-python. This meanszarr-pythoncan practically never depend on any externally-defined codecs or data types without creating a circular dependency (unacceptable). See #3867.To remedy this situation, this PR defines our codec and data type ABCs in a separate package called
zarr-interfaces.zarr-interfacesis a sub-package in this repo. The interfaces inzarr-interfacesare in versioned namespaces, which makes evolution of these APIs straightforward. Projects that want to implement a zarr-compatible codec or data type should depend onzarr-interfacesinstead of depending onzarr-pythonitself. This will allowzarr-pythonto optionally depend on externally-defined codecs and data types.I'm opening this as a draft because I'm not sure about quite a few things, and I would appreciate feedback on the basic direction.
TODO:
docs/user-guide/*.mdchanges/