Skip to content

perf(load): short-circuit is_chunked_array for numpy arrays#11354

Open
FBumann wants to merge 3 commits into
pydata:mainfrom
FBumann:perf/load-chunked-check-overhead
Open

perf(load): short-circuit is_chunked_array for numpy arrays#11354
FBumann wants to merge 3 commits into
pydata:mainfrom
FBumann:perf/load-chunked-check-overhead

Conversation

@FBumann
Copy link
Copy Markdown

@FBumann FBumann commented May 25, 2026

Description

Dataset.load() / DataArray.load() walk every variable and call is_chunked_array(...) on each — first inside the dict comprehension in Dataset.load (xarray/core/dataset.py:563), then again per variable via Variable.load -> to_duck_array. is_chunked_array itself called is_duck_array twice (once directly, once via is_duck_dask_array).

For the dominant case — a numpy.ndarray — none of that work is needed. This PR:

  1. Adds an np.ndarray fast path returning False immediately.
  2. Consolidates to a single is_duck_array(x) check feeding both the dask-collection branch (now via is_dask_collection directly) and the hasattr(x, "chunks") fallback.

Behavior is unchanged on non-numpy inputs: any duck-array with a dask graph or a chunks attribute is still reported as chunked.

Where this fires

is_chunked_array is called from ~25 places across xarray — every site that branches "dask vs. numpy". On numpy-backed data, all of them now skip the dispatch chain. Notable categories:

  • Materialization: ds.load(), da.load(), compute(), xr.load_dataset/dataarray/datatree, persist(), .values, .to_numpy(), .to_dataframe(), .to_pandas(), plotting, repr previews.
  • CF encode/decode — every variable read or written: coding/times.py:1037,1249, coding/strings.py:182,221, coding/common.py:104.
  • Numerical pathsapply_ufunc (computation/apply_ufunc.py:739,763,875), corr/cov/polyval/polyfit, interp, interpolate_na, reductions.
  • Structural opsDataset.chunk(), interp, vectorized indexer dispatch (isel/sel), Variable._shuffle, contains_only_chunked_or_numpy.
  • Groupby internalsgroupers.py:121,242,427, groupby.py:351,557,679.
  • Backendsbackends/common.py:400 (ArrayWriter.add).

Not affected: arithmetic on lazy/dask objects (stays lazy); arithmetic on numpy-backed objects (already pure numpy, never reached is_chunked_array).

Benchmark numbers

asv_bench/benchmarks/indexing.py::Indexing.time_indexing_basic_ds_large (added in #9003 for this exact concern), best of 5×50 iterations, GC off:

per call
main 0.524 ms
this PR 0.335 ms
speedup ~1.56×

Scaling check on synthetic isel().load() workloads:

  • Speedup scales with variable count — 1.20× at 50 vars → 1.29× at 2000 vars — as expected for a per-variable saving.
  • Speedup is flat across per-var size (~1.24× from size=0 to size=10,000), confirming the saving is pure dispatch overhead, not work-per-element.

Profiler attribution previously showed ~25% of load() wall time inside the is_chunked_array dispatch chain; that portion is now near-free.

Checklist

  • Tests covering chunked paths preserved — any duck-array with a chunks attribute or a dask graph is still reported as chunked
  • pytest xarray/tests/test_variable.py — 544 passed, 69 skipped, 8 xfailed, 3 xpassed
  • doc/whats-new.rst entry under Internal Changes

Follow-up

A complementary PR — skipping the entire Variable.load dispatch on numpy data — will follow.

AI Disclosure

  • This PR contains AI-generated content.
    • I have tested any AI-generated content in my PR.
    • I take responsibility for any AI-generated content in my PR.

Tools: Claude (Claude Code)


[This is Claude Code on behalf of Felix Bumann]

FBumann and others added 2 commits May 23, 2026 19:21
For datasets with many variables, Dataset.load() called is_chunked_array
once per variable in its dict comprehension, then again per variable via
Variable.load() -> to_duck_array(). The function itself called
is_duck_array twice (once directly, once via is_duck_dask_array).

Add a numpy fast-path and consolidate the duck-array check to one call.
For non-numpy inputs the behavior is unchanged: any duck-array with a
dask graph or a `chunks` attribute is still reported as chunked.

Measured on isel(...).load() of a 400-scalar-var dataset
(asv_bench/benchmarks/indexing.py::Indexing.time_indexing_basic_ds_large):

    base:   0.524 ms / call   (best of 5x50, GC off)
    branch: 0.335 ms / call   ~1.56x

Profile attribution previously showed ~25% of the load wall time inside
the is_chunked_array dispatch chain; that portion is now near-free.

Closes #2 on the fork.

Co-authored-by: Claude <noreply@anthropic.com>
@github-actions github-actions Bot added the topic-NamedArray Lightweight version of Variable label May 25, 2026
The previous `isinstance(x, np.ndarray)` short-circuit incorrectly
returned False for ndarray subclasses with a `chunks` attribute (e.g.
DummyChunkedArray in test_parallelcompat.py, or any third-party chunked
array implementation that subclasses ndarray), breaking chunked-array
detection on those types.

Narrow the fast path to `isinstance + not hasattr("chunks")` so plain
ndarrays and non-chunked subclasses (MaskedArray, np.matrix) still skip
the duck-array dispatch, while subclasses that advertise chunks fall
through to the full check.

Co-authored-by: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change. Thanks!

Copy link
Copy Markdown
Contributor

@Illviljan Illviljan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be done with is_duck_array, it has the numpy short-circuit already.
Triggering isinstance twice for eager (and lazy) arrays seems wasteful too.

def is_duck_array(value: Any) -> TypeGuard[duckarray[Any, Any]]:
# TODO: replace is_duck_array with runtime checks via _arrayfunction_or_api protocol on
# python 3.12 and higher (see https://github.com/pydata/xarray/issues/8696#issuecomment-1924588981)
if isinstance(value, np.ndarray):
return True
return (
hasattr(value, "ndim")
and hasattr(value, "shape")
and hasattr(value, "dtype")
and (
(hasattr(value, "__array_function__") and hasattr(value, "__array_ufunc__"))
or hasattr(value, "__array_namespace__")
)
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic-NamedArray Lightweight version of Variable

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants