diff --git a/.claude/sweep-metadata-state.csv b/.claude/sweep-metadata-state.csv index 8e41af0ad..60c7328df 100644 --- a/.claude/sweep-metadata-state.csv +++ b/.claude/sweep-metadata-state.csv @@ -4,4 +4,5 @@ polygonize,2026-05-19,2149,MEDIUM,1,"Audited 2026-05-19 (agent-ad1070530d37a4fdf rasterize,2026-05-27,2504,HIGH,4,"rasterize() drops like.attrs, rebuilds like.coords via linspace (not bit-identical), and never emits _FillValue/nodatavals even when fill is non-NaN. Cat 1 HIGH: chained pipelines like slope(rasterize(gdf, like=elevation)) silently lose crs/res/transform. Cat 2 MEDIUM: linspace round-trip from re-derived bounds breaks xr.align with like. Cat 4 MEDIUM: rasterize(..., fill=-9999, dtype=int32) emits no _FillValue. All 4 backends share the same final return so the fix is one place. Fixed in deep-sweep-metadata-rasterize-2026-05-17-01 (worktree agent-ab7a9aee97c1e4cdf): _extract_grid_from_like now returns coords/attrs; rasterize() reuses like.coords directly when grid matches, copies like.attrs, and emits _FillValue + nodatavals when fill is not NaN. 9 new tests in TestMetadataPropagation cover attrs propagation, bit-identical coord reuse, fill-value emission, isolation from template attrs, and parity across numpy/cupy/dask+numpy/dask+cupy backends. Full test suite (193 passing) clean. | Re-audited 2026-05-21 (agent-a645dc07f847ae8ae worktree, branch deep-sweep-metadata-rasterize-2026-05-21). 4-backend (numpy/cupy/dask+numpy/dask+cupy) metadata parity reverified: all 4 backends route through the same final xr.DataArray constructor in rasterize(); crs / spatial_ref non-dim coord / coords / dims agree across backends. NEW HIGH finding #2251 (Cat 1): when rasterize(geoms, like=template, bounds=..., width=..., height=..., resolution=...) overrides the grid relative to like, the inherited attrs['transform'] and attrs['res'] from like are propagated unchanged so they describe the template's grid, not the actual output. get_dataarray_resolution() prefers attrs['res'] over calc_res from coords, so downstream slope/aspect/proximity see the wrong cellsize. Same class as #1407 sky_view_factor bug. Fix in rasterize(): out_attrs.pop('res') / out_attrs.pop('transform') when like_attrs is present but reuse_like_coords is False (output grid != template grid). Preserves crs / nodata triplet / spatial_ref handling. 9 new tests in TestLikeStaleGridAttrs2251 cover bounds override, width/height override, resolution override, matching width/height preserves attrs, get_dataarray_resolution consistency, and parity across all 4 backends. Full rasterize test suite (224 passed, 2 skipped) clean. | Re-audited 2026-05-27 (agent-ae44e871ba3e6bc50 worktree, branch deep-sweep-metadata-rasterize-2026-05-27). 4-backend (numpy/cupy/dask+numpy/dask+cupy) metadata parity reverified end-to-end with explicit cupy and dask+cupy live runs on the CUDA host. attrs / coords / dims / non-dim coords (spatial_ref) all agree across backends; the existing TestMetadataPropagation and TestLikeStaleGridAttrs2251 suites still pass cleanly. NEW HIGH finding #2504 (Cat 4): rasterize(..., dtype=) with the default fill=np.nan silently coerced NaN to a platform-specific sentinel (INT_MIN on x86, 0 on Apple Silicon, 0 for unsigned dtypes) and emitted no _FillValue / nodata / nodatavals attr to mark unwritten pixels. Downstream consumers (geotiff writer, rioxarray masks) had no sentinel to key off and treated unwritten cells as legitimate burns -- a metadata propagation failure equivalent in shape to #1407. Fix in rasterize() before any host/device allocation: detect NaN fill against an integer final_dtype via np.issubdtype + float(fill) + np.isnan and raise ValueError with a pointer to fill=0/fill=-9999 or a floating dtype. Same guard fires on all 4 backends because it runs before backend dispatch. 18 new tests in test_rasterize_nan_int_fill_2504.py cover every signed/unsigned int width, the like= branch, all 4 backends, explicit-vs-default NaN, numpy-typed NaN, and the unaffected float-dtype path. The previous TestIntegerDtypeNanFill test (which had pinned the silent cast as observed behaviour on 2026-05-17) was rewritten to pin the raise. Full rasterize test suite (476 passed, 2 skipped) clean." reproject,2026-05-10,1572;1573,HIGH,1;3;4,geoid_height_raster dropped input attrs and used dims[-2:] for 3D inputs (#1572). reproject/merge ignored nodatavals (rasterio convention) when rioxarray absent (#1573). Fixed in same branch. resample,2026-05-27,2542,MEDIUM,2;4;5,"Audited 2026-05-27 (agent-a8135a6a246ecb93c worktree, branch deep-sweep-metadata-resample-2026-05-27). Cat 2 MEDIUM + Cat 4 MEDIUM + Cat 5 MEDIUM all rolled into issue #2542. (a) 2D non-identity path dropped scalar non-dim coords like rioxarrays spatial_ref and squeezed time/band selectors; identity path (scale==1.0, agg.copy()) and 3D path (per-band xr.concat) preserved them, so the bug was path-inconsistent (Cat 5). (b) _resolve_nodata reads attrs[nodata] as a fallback sentinel but the output post-processing only refreshed _FillValue and nodatavals, leaving attrs[nodata]=-9999 alongside data that was now NaN. Fix in resample(): refresh attrs[nodata] to NaN whenever the input had it, and carry across zero-dim non-dim coords on the 2D non-identity path. 7 new tests in TestMetadataPropagation cover nodata-attr refresh, spatial_ref/scalar coord carry, identity-vs-downsample coord parity, and the explicit choice to drop spatially-shaped extra coords. 4-backend (numpy/cupy/dask+numpy/dask+cupy) parity verified for spatial_ref carry; nodata-attr refresh verified on numpy/cupy/dask+numpy (dask+cupy non-NaN nodata masking hits a pre-existing xarray xr.where + cupy.astype quirk unrelated to this audit). Full resample test suite (175 passed) clean." +viewshed,2026-05-29,2743,MEDIUM,4;5,output .name differed across backends (None/viewshed/dask-token) and dtype float32 on GPU vs float64 on CPU; added name= param and forced float64 on all backends; attrs/coords/dims already preserved zonal,2026-05-29,2611,MEDIUM,5,"Audited 2026-05-29 (agent-ae8d8b65cc3a5c40a worktree, branch deep-sweep-metadata-zonal-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. 5 DataArray-returning functions checked end-to-end: apply, regions, hypsometric_integral, trim, crop. attrs (res/crs/transform/nodatavals), dims, and coords preserved correctly on all 4 backends for every function; trim/crop slice coords with no half-pixel drift. stats() and crosstab() return DataFrames by design so Cat 1-3 DataArray checks N/A. NEW MEDIUM finding #2611 (Cat 5): apply() never set output .name, so numpy/cupy returned None while dask+numpy/dask+cupy inherited a non-deterministic internal dask task name (e.g. _chunk_fn-). regions/hypsometric_integral/trim/crop all set deterministic names; apply was the outlier. Fix in PR #2611/#2622: add name param (default None) and assign result.name after DataArray construction (setting name= at construction does not override the dask graph name). New parametrized test test_apply_name_consistent_across_backends covers default-None and explicit-name on all 4 backends. Full zonal suite 213 passed. No other CRITICAL/HIGH/MEDIUM findings; no LOW findings to document." diff --git a/xrspatial/gpu_rtx/viewshed.py b/xrspatial/gpu_rtx/viewshed.py index d4c38fb7d..dc1093596 100644 --- a/xrspatial/gpu_rtx/viewshed.py +++ b/xrspatial/gpu_rtx/viewshed.py @@ -2,7 +2,7 @@ # that the required dependent libraries are installed. import math -from typing import Union +from typing import Optional, Union import cupy import numba as nb @@ -187,6 +187,7 @@ def _viewshed_rt( observer_elev: float, target_elev: float, scale: float, + name: Optional[str] = 'viewshed', ) -> xr.DataArray: H, W = raster.shape @@ -257,9 +258,12 @@ def _viewshed_rt( else: visgrid = d_visgrid + # Emit float64 to match the CPU backends (the RT kernel works in float32) + visgrid = visgrid.astype(np.float64) + view = xr.DataArray( visgrid, - name="viewshed", + name=name, coords=raster.coords, dims=raster.dims, attrs=raster.attrs) @@ -273,6 +277,7 @@ def viewshed_gpu( y: Union[int, float], observer_elev: float, target_elev: float, + name: Optional[str] = 'viewshed', ) -> xr.DataArray: if not isinstance(raster.data, cupy.ndarray): raise TypeError("raster.data must be a cupy array") @@ -283,4 +288,5 @@ def viewshed_gpu( optix = RTX() scale = create_triangulation(raster, optix) - return _viewshed_rt(raster, optix, x, y, observer_elev, target_elev, scale) + return _viewshed_rt(raster, optix, x, y, observer_elev, target_elev, + scale, name) diff --git a/xrspatial/tests/test_viewshed.py b/xrspatial/tests/test_viewshed.py index c821a90e5..9534de55d 100644 --- a/xrspatial/tests/test_viewshed.py +++ b/xrspatial/tests/test_viewshed.py @@ -459,3 +459,73 @@ def test_viewshed_cpu_memory_guard_passes_with_max_distance(): v = viewshed(raster, x=50.0, y=50.0, observer_elev=5, max_distance=3.0) assert v.values[50, 50] == 180.0 + + +def _make_raster(backend): + """Build a small viewshed input raster for the given backend.""" + arr = np.array([ + [0, 0, 1, 0, 0], + [1, 3, 0, 0, 0], + [10, 2, 5, 2, -1], + [11, 1, 2, 9, 0]], dtype=np.float64) + xs = np.linspace(1, 5, 5) + ys = np.linspace(1, 4, 4) + attrs = {'res': (1.0, 1.0), 'crs': 'EPSG:4326'} + + if backend == "numpy": + data = arr + elif backend == "dask+numpy": + data = da.from_array(arr, chunks=(2, 3)) + elif backend == "cupy": + import cupy as cp + data = cp.asarray(arr) + elif backend == "dask+cupy": + import cupy as cp + data = da.from_array(cp.asarray(arr), chunks=(2, 3)) + else: + raise ValueError(backend) + + return xa.DataArray(data, dims=['y', 'x'], + coords={'y': ys, 'x': xs}, + attrs=attrs, name='elevation') + + +_METADATA_BACKENDS = [ + "numpy", + "dask+numpy", + pytest.param("cupy", marks=pytest.mark.skipif( + not has_rtx(), reason="requires rtxpy for the GPU viewshed path")), + pytest.param("dask+cupy", marks=pytest.mark.skipif( + not has_rtx(), reason="requires rtxpy for the GPU viewshed path")), +] + + +@pytest.mark.parametrize("backend", _METADATA_BACKENDS) +@pytest.mark.parametrize("max_distance", [None, 3.0]) +def test_viewshed_output_name_and_dtype_consistent(backend, max_distance): + """Output .name and dtype must not depend on the backend (#2743). + + The default name is 'viewshed' on every backend, and the output dtype + is float64 everywhere (the GPU path internally works in float32). + """ + raster = _make_raster(backend) + result = viewshed(raster, x=3, y=2, observer_elev=1, + max_distance=max_distance) + + assert result.name == "viewshed" + assert result.dtype == np.float64 + assert result.dims == raster.dims + # attrs/coords still pass through + assert result.attrs == raster.attrs + np.testing.assert_allclose(result.coords['x'].data, + raster.coords['x'].data) + np.testing.assert_allclose(result.coords['y'].data, + raster.coords['y'].data) + + +@pytest.mark.parametrize("backend", _METADATA_BACKENDS) +def test_viewshed_custom_name(backend): + """A user-supplied name is honoured on every backend.""" + raster = _make_raster(backend) + result = viewshed(raster, x=3, y=2, observer_elev=1, name="my_vs") + assert result.name == "my_vs" diff --git a/xrspatial/viewshed.py b/xrspatial/viewshed.py index b5d210192..edf6a2fa3 100644 --- a/xrspatial/viewshed.py +++ b/xrspatial/viewshed.py @@ -2,7 +2,7 @@ from math import atan, atan2, fabs from math import pi as PI from math import sqrt -from typing import Union +from typing import Optional, Union import numpy as np import xarray @@ -1510,6 +1510,7 @@ def _viewshed_cpu( y: Union[int, float], observer_elev: float = OBS_ELEV, target_elev: float = TARGET_ELEV, + name: Optional[str] = 'viewshed', ) -> xarray.DataArray: height, width = raster.shape @@ -1598,6 +1599,7 @@ def _viewshed_cpu( visibility_grid) visibility = xarray.DataArray(viewshed_img, + name=name, coords=raster.coords, attrs=raster.attrs, dims=raster.dims) @@ -1609,7 +1611,8 @@ def viewshed(raster: xarray.DataArray, y: Union[int, float], observer_elev: float = OBS_ELEV, target_elev: float = TARGET_ELEV, - max_distance: float = None) -> xarray.DataArray: + max_distance: float = None, + name: Optional[str] = 'viewshed') -> xarray.DataArray: """ Calculate viewshed of a raster (the visible cells in the raster) for the given viewpoint (observer) location. @@ -1634,6 +1637,9 @@ def viewshed(raster: xarray.DataArray, evaluated. When set and the raster is dask-backed, only the chunks within the distance window are loaded — this is the most efficient way to run viewshed on very large dask rasters. + name : str, default='viewshed' + Name of the output DataArray. Set on every backend so the + result name does not depend on which backend ran. Returns ------- @@ -1709,26 +1715,28 @@ def viewshed(raster: xarray.DataArray, # --- max_distance: extract spatial window for any backend --- if max_distance is not None: return _viewshed_windowed(raster, x, y, observer_elev, target_elev, - max_distance) + max_distance, name) if isinstance(raster.data, np.ndarray): - return _viewshed_cpu(raster, x, y, observer_elev, target_elev) + return _viewshed_cpu(raster, x, y, observer_elev, target_elev, name) elif has_cuda_and_cupy() and is_cupy_array(raster.data): if has_rtx(): # Run on gpu from .gpu_rtx.viewshed import viewshed_gpu - return viewshed_gpu(raster, x, y, observer_elev, target_elev) + return viewshed_gpu(raster, x, y, observer_elev, target_elev, name) else: # Convert to numpy and run on cpu import cupy as cp raster.data = cp.asnumpy(raster.data) - return _viewshed_cpu(raster, x, y, observer_elev, target_elev) + return _viewshed_cpu(raster, x, y, observer_elev, target_elev, + name) elif has_dask_array(): import dask.array as da if isinstance(raster.data, da.Array): - return _viewshed_dask(raster, x, y, observer_elev, target_elev) + return _viewshed_dask(raster, x, y, observer_elev, target_elev, + name) raise TypeError(f"Unsupported raster array type: {type(raster.data)}") @@ -2049,7 +2057,7 @@ def _viewshed_distance_sweep(dask_data, H, W, obs_r, obs_c, def _viewshed_windowed(raster, x, y, observer_elev, target_elev, - max_distance): + max_distance, name='viewshed'): """Run viewshed on a spatial window around the observer. Works for any backend: numpy, cupy, dask+numpy, dask+cupy. The window @@ -2146,11 +2154,11 @@ def _viewshed_windowed(raster, x, y, observer_elev, target_elev, full_vis = np.full((height, width), INVISIBLE, dtype=np.float64) full_vis[r_lo:r_hi, c_lo:c_hi] = local_vals - return xarray.DataArray(full_vis, coords=raster.coords, + return xarray.DataArray(full_vis, name=name, coords=raster.coords, dims=raster.dims, attrs=raster.attrs) -def _viewshed_dask(raster, x, y, observer_elev, target_elev): +def _viewshed_dask(raster, x, y, observer_elev, target_elev, name='viewshed'): """Dask-backed viewshed (no max_distance — handled by caller). Two-tier strategy: @@ -2202,8 +2210,10 @@ def _viewshed_dask(raster, x, y, observer_elev, target_elev): observer_elev, target_elev) result_np = result.data if isinstance(result.data, np.ndarray) \ else result.data.get() + # GPU path returns float32; emit float64 to match the CPU backends + result_np = result_np.astype(np.float64, copy=False) vis_da = da.from_array(result_np, chunks=raster.data.chunks) - return xarray.DataArray(vis_da, coords=raster.coords, + return xarray.DataArray(vis_da, name=name, coords=raster.coords, dims=raster.dims, attrs=raster.attrs) # --- Tier C: out-of-core distance sweep (CPU only) --- @@ -2241,5 +2251,5 @@ def _viewshed_dask(raster, x, y, observer_elev, target_elev): ) vis_da = da.from_array(visibility, chunks=raster.data.chunks) - return xarray.DataArray(vis_da, coords=raster.coords, + return xarray.DataArray(vis_da, name=name, coords=raster.coords, dims=raster.dims, attrs=raster.attrs)