Skip to content

proximity/allocation/direction: consistent output dtype and .name across backends#2728

Open
brendancol wants to merge 3 commits into
mainfrom
deep-sweep-metadata-proximity-2026-05-29-01
Open

proximity/allocation/direction: consistent output dtype and .name across backends#2728
brendancol wants to merge 3 commits into
mainfrom
deep-sweep-metadata-proximity-2026-05-29-01

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2723

proximity, allocation, and direction returned backend-dependent output metadata. This makes all four backends agree.

  • dtype: the bounded dask+numpy path declared float64 output meta via da.map_overlap(..., meta=np.array(())), even though the chunk function returns float32 and every other backend/path returns float32. Set the meta dtype to float32.
  • .name: dask backends leaked an internal dask op name (_trim-<hash>, _kdtree_chunk_fn-<hash>, asarray-<hash>) into the result .name, while numpy/cupy return None. Reset .name to None after building the result so all backends match. (xarray ignores a name=None kwarg when the data is a named dask array, so the reset has to happen after construction.)

attrs, coords, and dims were already preserved and stay that way.

Verified end-to-end on all four backends, both bounded and unbounded max_distance, for all three functions.

Test plan:

  • New parametrized regression test asserts declared dtype float32 and .name is None across all four backends, both bounded and unbounded, for proximity/allocation/direction.
  • Full test_proximity.py suite passes (93 tests).

)

The bounded dask+numpy path declared float64 output meta while the chunk
function returns float32, disagreeing with numpy, cupy, dask+cupy, and the
unbounded KDTree path. Set the map_overlap meta dtype to float32.

Dask backends also leaked an internal dask op name (e.g. _trim-<hash>) into
the result .name, where numpy/cupy return None. Reset .name to None in
proximity/allocation/direction so all four backends agree.

Add a parametrized regression test asserting declared dtype float32 and
.name None for all three functions, both bounded and unbounded, on every
backend.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 29, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: consistent output dtype and .name across backends

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • test_proximity.py:299 - the new test asserts result.attrs == test_raster.attrs, but the test_raster fixture has empty attrs, so that line only checks empty == empty. It still guards against attrs being dropped, and the existing checks already cover attr preservation with the same fixture. Fine as-is; noting it only.

What looks good

  • The dtype fix hits the exact root cause. The bounded dask+numpy map_overlap declared float64 meta while the chunk function returns float32. meta=np.array((), dtype=np.float32) lines the declared dtype up with the real data and with the other three backends.
  • The .name = None reset is the right call. xarray ignores a name=None kwarg when the data is a named dask array, so resetting after construction is the only thing that clears the leaked dask op name. The inline comment says why.
  • Changes are surgical: three identical two-line resets plus one meta dtype, nothing else touched.
  • The regression test covers the bounded path (max_distance=10) and the unbounded/KDTree path (np.inf), across all four backends and all three functions. It checks the declared (pre-compute) dtype rather than the post-compute dtype, which is what general_output_checks already covers and what let this bug through.

Checklist

  • Output dtype consistent across backends (float32)
  • .name consistent across backends (None)
  • attrs / coords / dims preserved (unchanged by this PR)
  • Bounded and unbounded dask paths both covered by tests
  • No premature materialization or unnecessary copies
  • No benchmark needed (metadata-only fix)
  • No README/docstring change needed (no API change)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

proximity/allocation/direction: output dtype and .name differ across backends

1 participant