[DOCS] Draft release post for SedonaDB 0.4.0 by paleolimbot · Pull Request #3064 · apache/sedona

paleolimbot · 2026-06-19T18:08:25Z

Did you read the Contributor Guide?

Yes, I have read the Contributor Rules and Contributor Development Guide

Is this PR related to a ticket?

No:
- this is a documentation update. The PR name follows the format [DOCS] my subject

What changes were proposed in this PR?

Added a (draft) post for the forthcoming SedonaDB release candidate

How was this patch tested?

Just docs

Did this PR include necessary documentation updates?

Yes, I have updated the documentation.

paleolimbot · 2026-06-19T18:09:49Z

+In addition to data frame operators, we increasingly realized that our hard-won library of 170+ spatial functions was difficult to explore and use (despite improved [SQL reference documentation](https://sedona.apache.org/sedonadb/latest/reference/sql/)!). Following the pattern of [Pandas-style datatype-specific accessors](https://pandas.pydata.org/docs/reference/series.html#accessors), you can now write expressions as chains with inline documentation helping you as you go.
+
+
+```python
+countries.select(
+    countries.name, geometry=countries.geometry.geo.centroid().geo.buffer(0.1)
+).limit(4)
+```


For this we could also have a screen cap video of what this looks like to type (where the functions and their parameters are shown as you go)

paleolimbot · 2026-06-19T18:10:32Z

+## Packaging for conda-forge
+
+We're excited to announce that sedonadb is now available on conda-forge! Users of the conda ecosystem can now install SedonaDB with:
+
+```shell
+conda install -c conda-forge sedonadb
+```
+
+Thank you to [p-vdp](https://github.com/p-vdp) for driving this work!


@p-vdp Did I get this right / is there anything else that should go in this section?

paleolimbot · 2026-06-19T18:13:37Z

+## GPU-Accelerated Spatial Join
+
+```bash
+docker run -it --rm --gpus all -p 8888:8888 apache/sedona:sedonadb-latest
+```


@pwrliang @jiayuasu Would either of you like to fill in this section? I can give it a try based on the docs but I think you both have a better handle on one good motivating example for this feature.

Hi, this is my release notes:

SedonaDB now introduces hardware acceleration via an integrated GPU Spatial Join Library. This feature significantly boosts the performance of compute-intensive spatial joins by offloading highly parallel filtering and refinement operations to the GPU.

Key Capabilities & Enhancements

Ray Tracing (RT) Core Acceleration: Repurposes dedicated GPU RT cores to accelerate the bounding-box filtering stage of spatial queries and Point-in-Polygon (PIP) tests. This delivers massive performance gains on complex spatial joins (e.g., intersects, contains). The evaluation of PIP queries is heavily optimized to exploit RT cores, while other geometric operations run on CUDA cores.

GPU-Optimized Storage Layout: Unlike conventional GPU databases that load entire datasets into device memory, SedonaDB only loads geometries in Well-Known Binary (WKB) format to the GPU during query execution. This allows large queries to run efficiently even with limited device memory. The WKB data is subsequently converted into a GPU-friendly format, maximizing memory throughput and enabling parallel random access directly on the device.

CPU Fallback: Currently, only a subset of spatial predicates are supported. When executing an unsupported spatial join, the engine automatically falls back to the CPU implementation.

Prerequisites & Deployment

By default, the GPU feature is disabled and is not included in the standard published python packages.

Hardware Requirements:

An NVIDIA GPU with a compute capability of $\ge$ 7.5. A GPU without RT cores (e.g., A100, H100) should also work.

Quick Start with Docker

We provide an official Docker image to easily try this feature with a single command:

docker run -it --rm --gpus all -p 8888:8888 apache/sedona:sedonadb-latest

⚠️ Note: This pre-built image supports GPU models with compute capabilities 7.5, 8.6, and 8.9.

For other GPU models, we encourage users to build the image from source to avoid time-consuming Just-In-Time (JIT) compilation:

docker build -f docker/sedonadb-gpu.dockerfile --build-arg CMAKE_CUDA_ARCHITECTURES="<your GPU compute capability>" -t sedonadb-gpu .

Usage

Launching the container provides a JupyterLab instance. From there, you can connect to SedonaDB and enable GPU acceleration using the following configuration:

import sedonadb ctx = sedonadb.connect() # Enable the GPU feature ctx.sql("SET gpu.enable = true") # Increase the batch size to feed sufficient data to the GPU ctx.sql("SET datafusion.execution.batch_size = 100000")

paleolimbot · 2026-06-19T18:16:33Z

+## Raster Infrastructure
+
+While we're not ready to announce that SedonaDB supports raster data, SedonaDB contributors dedicated significant time laying the foundation for first-class raster and ND-array data support, drawing the best from [Sedona Spark's Raster SQL](https://sedona.apache.org/latest/api/sql/Raster-Functions/), [PostGIS Raster Support](https://postgis.net/docs/RT_reference.html), [GDAL](https://gdal.org/), and [Zarr](https://zarr.dev/) with vectorized execution and SedonaDB's ground-up spatial support. We look forward to building this feature in earnest with the community over the next few months!
+
+Thank you to [Kontinuation](https://github.com/Kontinuation) and [james-willis](https://github.com/james-willis) for designing and driving this functionality!


@james-willis @Kontinuation @jiayuasu I made a passing effort at this paragraph but I'm happy to put whatever here if any of you have suggestions!

Revised draft — now reads a real, public ERA5 rainfall Zarr pyramid (EPSG:3857, anonymous; the cube is chunked by year and spatially), uses the .rst DataFrame accessor, and shows real output. Verified end-to-end against current main.

N-Dimensional Rasters and Zarr

Geospatial raster data is increasingly a datacube: climate reanalyses, satellite time series, and model outputs all stack extra axes — time, year, band — on top of the spatial grid. In 0.4.0, SedonaDB's raster type goes natively N-dimensional, and the new sedonadb-zarr extension reads Zarr groups (v2 or v3) — local or in cloud object storage — straight into a queryable raster column.

Point SedonaDB at a Zarr datacube and explore its shape without reading a single pixel:

import sedona.db import sedonadb_zarr sd = sedona.db.connect() sd.register(sedonadb_zarr.ZarrExtension()) # A public ERA5 rainfall pyramid (Zarr, anonymous). Reading + inspecting # dimensions is a metadata-only round-trip — no pixel bytes fetched. url = "https://weathermapdata.rdrn.me/era5_2015_2020_l5.zarr/2" spec = sedonadb_zarr.Zarr().with_options({"arrays": ["rain_ok"]}) cube = sd.read(url, format=spec) cube.select( cube.raster.rst.num_dimensions().alias("ndim"), cube.raster.rst.dim_names().alias("dims"), cube.raster.rst.shape().alias("shape"), cube.raster.rst.srid().alias("srid"), ).show(1)

┌───────┬──────────────┬───────────────┬────────┐ │ ndim ┆ dims ┆ shape ┆ srid │ │ int32 ┆ list ┆ list ┆ uint32 │ ╞═══════╪══════════════╪═══════════════╪════════╡ │ 3 ┆ [year, y, x] ┆ [1, 128, 128] ┆ 3857 │ └───────┴──────────────┴───────────────┴────────┘

sedonadb-zarr emits one row per Zarr chunk, so the storage layout is the data layout. This cube tiles each year into a 4×4 grid and chunks one year per chunk, so it loads as 16 × 6 = 96 rows — cube.count() confirms — each a single year of one spatial tile. Inspecting dimensions touches only the group schema: no pixel bytes.

RS_Slice collapses a named axis. Slicing the year dimension hands back a 2-D [y, x] rainfall field:

sliced = cube.select(plane=cube.raster.rst.slice("year", 0)) sliced.select( dims=sliced.plane.rst.dim_names(), shape=sliced.plane.rst.shape(), ).show(1)

┌────────┬────────────┐ │ dims ┆ shape │ ╞════════╪════════════╡ │ [y, x] ┆ [128, 128] │ └────────┴────────────┘

RS_Slice needs pixels, so SedonaDB resolves each row's Zarr chunk on demand — you never call a loader yourself.

Note: pixel-reading operations like RS_Slice fetch a chunk's bytes on demand, and do so eagerly when the operator runs. We're separately making slice and other "crop" operators lazy — a lightweight view over the chunk, so bytes aren't retrieved until their values are consumed (#813).

And because each chunk is a georeferenced row, you can see a Zarr's layout on a map without decoding a pixel. RS_Envelope turns each chunk into its footprint; reproject to lon/lat and the 4×4 chunk grid draws straight onto a map:

from lonboard import viz # in a notebook f = sd.funcs chunks = cube.select(geom=f.st_transform(cube.raster.rst.envelope(), "EPSG:4326")) # Draw outlines only, so the basemap shows through the chunk grid. viz( chunks, polygon_kwargs=dict( filled=False, stroked=True, get_line_color=[236, 64, 160], line_width_min_pixels=2, ), )

(Figure to attach: the chunk grid drawn as a 4×4 lattice over a world basemap.)

For the full walkthrough — load a cube, inspect its dimensions, slice a plane, map the chunks, and hand a plane to NumPy — see Working with Zarr and NDArray data in SedonaDB.

We're excited about what shipped here — and we're just getting started. There's more user-facing functionality for N-dimensional rasters and Zarr on the way, and we'd love your input on where it goes next. If you're working with datacubes or cloud-native raster data, open an issue and tell us what you're building and what you need.

@paleolimbot — revised this draft above. It now reads a real public ERA5 rainfall Zarr pyramid (EPSG:3857) instead of the placeholder cube, uses the .rst DataFrame accessor, and includes real output. The two pieces it needed both landed in sedona-db — CF/rioxarray CRS (#985) and the zlib codec (#987) — so it runs end-to-end against current main.

draft release post

eeae889

paleolimbot commented Jun 19, 2026

View reviewed changes

more

fff66a9

jbampton added the docs label Jun 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOCS] Draft release post for SedonaDB 0.4.0#3064

[DOCS] Draft release post for SedonaDB 0.4.0#3064
paleolimbot wants to merge 2 commits into
apache:masterfrom
paleolimbot:sedonadb-0-3-post

paleolimbot commented Jun 19, 2026

Uh oh!

paleolimbot Jun 19, 2026

Uh oh!

paleolimbot Jun 19, 2026

Uh oh!

paleolimbot Jun 19, 2026

Uh oh!

pwrliang Jun 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

paleolimbot Jun 19, 2026

Uh oh!

james-willis Jun 19, 2026 •

edited

Loading

Uh oh!

james-willis Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

paleolimbot commented Jun 19, 2026

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

How was this patch tested?

Did this PR include necessary documentation updates?

Uh oh!

paleolimbot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

paleolimbot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

paleolimbot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

pwrliang Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Hi, this is my release notes:

Key Capabilities & Enhancements

Prerequisites & Deployment

Quick Start with Docker

Usage

Uh oh!

Uh oh!

paleolimbot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

james-willis Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

N-Dimensional Rasters and Zarr

Uh oh!

james-willis Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pwrliang Jun 19, 2026 •

edited

Loading

james-willis Jun 19, 2026 •

edited

Loading