perf(load): skip Variable.load dispatch for numpy data#11355
Open
FBumann wants to merge 2 commits into
Open
Conversation
Variable.load() and Variable.load_async() always end with ``self._data = to_duck_array(self._data)`` which, for an in-memory ``numpy.ndarray``, walks the dispatch chain only to return ``self._data`` unchanged. The whole call is pure overhead in that case — the same no-op pattern that ``IndexVariable.load`` already short-circuits. Add an ``isinstance(self._data, np.ndarray)`` guard at the top of both methods. Behavior is unchanged on chunked, ExplicitlyIndexed, or non-numpy duck-array inputs. Measured on ``isel(...).load()`` of synthetic scalar-var datasets against upstream/main (best of 5, GC off): 400 scalar vars: 0.524 ms -> 0.324 ms ~1.62x 2000 scalar vars: 2.484 ms -> 1.490 ms ~1.67x Speedup scales with the number of variables (1.44x at 50 vars -> 1.67x at 2000 vars) and is flat across per-variable data size (~1.56x from size=0 to size=10,000), confirming the saving is pure dispatch overhead removal. Refs pydata#11352. Co-authored-by: Claude <noreply@anthropic.com>
The previous `isinstance(self._data, np.ndarray)` short-circuit incorrectly
returned `self` (skipping the load) for ndarray subclasses with a `chunks`
attribute — test fakes like DummyChunkedArray, or any third-party chunked
array implementation that subclasses ndarray.
Narrow to `isinstance + not hasattr("chunks")` so plain ndarrays and
non-chunked subclasses (MaskedArray, np.matrix) still skip the
to_duck_array dispatch, while subclasses that advertise chunks fall
through to the full path.
Co-authored-by: Claude <noreply@anthropic.com>
Author
|
After some more testing i think this is a pretty marginal improvement. Happy to take a "Not planned" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Variable.load()andVariable.load_async()always end withself._data = to_duck_array(self._data, ...). For an in-memorynumpy.ndarray, that dispatch walksis_chunked_array, theExplicitlyIndexed | ImplicitToExplicitIndexingAdapterisinstance check, andis_duck_arrayonly to returnself._dataunchanged. The whole call is pure overhead — the same no-op thatIndexVariable.load(xarray/core/variable.py:2781) already short-circuits withreturn self.This PR adds an
isinstance(self._data, np.ndarray)guard at the top of bothVariable.loadandVariable.load_async. Behavior is unchanged on chunked,ExplicitlyIndexed, and non-numpy duck-array inputs.Where this fires
Narrower than #11354 — this fast-path only fires when
Variable.load()itself runs. Every caller ofVariable.load()gets a per-variable saving on numpy-backed data:Top-level
.load()/.compute():ds.load(),da.load(),ds.compute(),da.compute(),xr.load_dataset(...),xr.load_dataarray(...),xr.load_datatree(...),da.persist()Dataset.loadfinishes its dict comprehension with[v.load() for k, v in self.variables.items() if k not in chunked_data](xarray/core/dataset.py:577) — every non-chunked variable goes throughVariable.load().DataTree.load(xarray/core/datatree.py:2482) walks every node's variables and callsv.load()on each.Backend writers:
xarray/backends/writers.py:768— the zarr writer materializes zero-size variables withv.load()(a workaround for dask<2023.12.1). On numpy-backed empty/scalar dims, this is the onlyVariable.load()call in the write path.Concat with
compat="...":xarray/structure/concat.py:489— when xarray must decide whether two variables are equal across input datasets, it callsdatasets[0].variables[k].load()(and.compute()on the others). On numpy-backed inputs theload()is pure dispatch overhead.Not affected:
da.values,da.to_numpy(),da.to_dataframe(),da.to_pandas(),da.plot(), repr — these go throughto_duck_arraydirectly withoutVariable.load, so they benefit from perf(load): short-circuit is_chunked_array for numpy arrays #11354 but not from this PR.AbstractDataStore.load(used inxarray/conventions.py:578) — same name, different method; not touched here.Benchmark numbers
isel(...).load()on synthetic scalar-var datasets, againstmain, best of 5, GC off:Scaling check across per-variable data size (200 vars fixed): flat ~1.56× speedup from
size=0tosize=10,000, confirming the saving is pure dispatch overhead — not work-per-element.Note on overlap with #11354
#11354 makes
is_chunked_array(numpy)near-free. This PR skips the entireto_duck_arraybody for numpyVariable._data, which makes theis_chunked_arraycall inside it dead code on that path. The two PRs are still complementary, not redundant:Dataset.loadcallsis_chunked_array(v._data)in its dict comprehension (xarray/core/dataset.py:563) before dispatching toVariable.load— that call is only sped up by perf(load): short-circuit is_chunked_array for numpy arrays #11354.Variable.loaditself is what this PR removes.If both land, the per-variable saving on in-memory datasets compounds.
Checklist
ExplicitlyIndexed/ImplicitToExplicitIndexingAdapteradapters are notnp.ndarrayinstances and still take theto_duck_arraypath; dask arrays likewise hit the existing chunked-compute branch.pytest xarray/tests/test_variable.py xarray/tests/test_dataset.py— 1051 passed, 76 skipped, 9 xfailed, 4 xpasseddoc/whats-new.rstentry under Internal ChangesAI Disclosure
Tools: Claude (Claude Code)
[This is Claude Code on behalf of Felix Bumann]