Skip to content

[DOCS] Draft release post for SedonaDB 0.4.0#3064

Draft
paleolimbot wants to merge 2 commits into
apache:masterfrom
paleolimbot:sedonadb-0-3-post
Draft

[DOCS] Draft release post for SedonaDB 0.4.0#3064
paleolimbot wants to merge 2 commits into
apache:masterfrom
paleolimbot:sedonadb-0-3-post

Conversation

@paleolimbot

Copy link
Copy Markdown
Member

Did you read the Contributor Guide?

Is this PR related to a ticket?

  • No:
    • this is a documentation update. The PR name follows the format [DOCS] my subject

What changes were proposed in this PR?

Added a (draft) post for the forthcoming SedonaDB release candidate

How was this patch tested?

Just docs

Did this PR include necessary documentation updates?

  • Yes, I have updated the documentation.

Comment on lines +111 to +118
In addition to data frame operators, we increasingly realized that our hard-won library of 170+ spatial functions was difficult to explore and use (despite improved [SQL reference documentation](https://sedona.apache.org/sedonadb/latest/reference/sql/)!). Following the pattern of [Pandas-style datatype-specific accessors](https://pandas.pydata.org/docs/reference/series.html#accessors), you can now write expressions as chains with inline documentation helping you as you go.


```python
countries.select(
countries.name, geometry=countries.geometry.geo.centroid().geo.buffer(0.1)
).limit(4)
```

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this we could also have a screen cap video of what this looks like to type (where the functions and their parameters are shown as you go)

Comment on lines +45 to +53
## Packaging for conda-forge

We're excited to announce that sedonadb is now available on conda-forge! Users of the conda ecosystem can now install SedonaDB with:

```shell
conda install -c conda-forge sedonadb
```

Thank you to [p-vdp](https://github.com/p-vdp) for driving this work!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@p-vdp Did I get this right / is there anything else that should go in this section?

Comment on lines +318 to +322
## GPU-Accelerated Spatial Join

```bash
docker run -it --rm --gpus all -p 8888:8888 apache/sedona:sedonadb-latest
```

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwrliang @jiayuasu Would either of you like to fill in this section? I can give it a try based on the docs but I think you both have a better handle on one good motivating example for this feature.

@pwrliang pwrliang Jun 19, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this is my release notes:

SedonaDB now introduces hardware acceleration via an integrated GPU Spatial Join Library. This feature significantly boosts the performance of compute-intensive spatial joins by offloading highly parallel filtering and refinement operations to the GPU.

Key Capabilities & Enhancements

  • Ray Tracing (RT) Core Acceleration: Repurposes dedicated GPU RT cores to accelerate the bounding-box filtering stage of spatial queries and Point-in-Polygon (PIP) tests. This delivers massive performance gains on complex spatial joins (e.g., intersects, contains). The evaluation of PIP queries is heavily optimized to exploit RT cores, while other geometric operations run on CUDA cores.
  • GPU-Optimized Storage Layout: Unlike conventional GPU databases that load entire datasets into device memory, SedonaDB only loads geometries in Well-Known Binary (WKB) format to the GPU during query execution. This allows large queries to run efficiently even with limited device memory. The WKB data is subsequently converted into a GPU-friendly format, maximizing memory throughput and enabling parallel random access directly on the device.
  • CPU Fallback: Currently, only a subset of spatial predicates are supported. When executing an unsupported spatial join, the engine automatically falls back to the CPU implementation.

Prerequisites & Deployment

By default, the GPU feature is disabled and is not included in the standard published python packages.

Hardware Requirements:

  • An NVIDIA GPU with a compute capability of $\ge$ 7.5. A GPU without RT cores (e.g., A100, H100) should also work.

Quick Start with Docker

We provide an official Docker image to easily try this feature with a single command:

docker run -it --rm --gpus all -p 8888:8888 apache/sedona:sedonadb-latest

⚠️ Note: This pre-built image supports GPU models with compute capabilities 7.5, 8.6, and 8.9.

For other GPU models, we encourage users to build the image from source to avoid time-consuming Just-In-Time (JIT) compilation:

docker build -f docker/sedonadb-gpu.dockerfile --build-arg CMAKE_CUDA_ARCHITECTURES="<your GPU compute capability>" -t sedonadb-gpu .

Usage

Launching the container provides a JupyterLab instance. From there, you can connect to SedonaDB and enable GPU acceleration using the following configuration:

import sedonadb

ctx = sedonadb.connect()

# Enable the GPU feature
ctx.sql("SET gpu.enable = true") 

# Increase the batch size to feed sufficient data to the GPU
ctx.sql("SET datafusion.execution.batch_size = 100000") 

Comment thread docs/blog/posts/intro-sedonadb-0-4.md
Comment thread docs/blog/posts/intro-sedonadb-0-4.md Outdated
Comment on lines +366 to +370
## Raster Infrastructure

While we're not ready to announce that SedonaDB supports raster data, SedonaDB contributors dedicated significant time laying the foundation for first-class raster and ND-array data support, drawing the best from [Sedona Spark's Raster SQL](https://sedona.apache.org/latest/api/sql/Raster-Functions/), [PostGIS Raster Support](https://postgis.net/docs/RT_reference.html), [GDAL](https://gdal.org/), and [Zarr](https://zarr.dev/) with vectorized execution and SedonaDB's ground-up spatial support. We look forward to building this feature in earnest with the community over the next few months!

Thank you to [Kontinuation](https://github.com/Kontinuation) and [james-willis](https://github.com/james-willis) for designing and driving this functionality!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@james-willis @Kontinuation @jiayuasu I made a passing effort at this paragraph but I'm happy to put whatever here if any of you have suggestions!

@james-willis james-willis Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised draft — now reads a real, public ERA5 rainfall Zarr pyramid (EPSG:3857, anonymous; the cube is chunked by year and spatially), uses the .rst DataFrame accessor, and shows real output. Verified end-to-end against current main.

N-Dimensional Rasters and Zarr

Geospatial raster data is increasingly a datacube: climate reanalyses, satellite time series, and model outputs all stack extra axes — time, year, band — on top of the spatial grid. In 0.4.0, SedonaDB's raster type goes natively N-dimensional, and the new sedonadb-zarr extension reads Zarr groups (v2 or v3) — local or in cloud object storage — straight into a queryable raster column.

Point SedonaDB at a Zarr datacube and explore its shape without reading a single pixel:

import sedona.db
import sedonadb_zarr

sd = sedona.db.connect()
sd.register(sedonadb_zarr.ZarrExtension())

# A public ERA5 rainfall pyramid (Zarr, anonymous). Reading + inspecting
# dimensions is a metadata-only round-trip — no pixel bytes fetched.
url = "https://weathermapdata.rdrn.me/era5_2015_2020_l5.zarr/2"
spec = sedonadb_zarr.Zarr().with_options({"arrays": ["rain_ok"]})
cube = sd.read(url, format=spec)

cube.select(
    cube.raster.rst.num_dimensions().alias("ndim"),
    cube.raster.rst.dim_names().alias("dims"),
    cube.raster.rst.shape().alias("shape"),
    cube.raster.rst.srid().alias("srid"),
).show(1)
┌───────┬──────────────┬───────────────┬────────┐
│  ndim ┆     dims     ┆     shape     ┆  srid  │
│ int32 ┆     list     ┆      list     ┆ uint32 │
╞═══════╪══════════════╪═══════════════╪════════╡
│     3 ┆ [year, y, x] ┆ [1, 128, 128] ┆   3857 │
└───────┴──────────────┴───────────────┴────────┘

sedonadb-zarr emits one row per Zarr chunk, so the storage layout is the data layout. This cube tiles each year into a 4×4 grid and chunks one year per chunk, so it loads as 16 × 6 = 96 rows — cube.count() confirms — each a single year of one spatial tile. Inspecting dimensions touches only the group schema: no pixel bytes.

RS_Slice collapses a named axis. Slicing the year dimension hands back a 2-D [y, x] rainfall field:

sliced = cube.select(plane=cube.raster.rst.slice("year", 0))
sliced.select(
    dims=sliced.plane.rst.dim_names(),
    shape=sliced.plane.rst.shape(),
).show(1)
┌────────┬────────────┐
│  dims  ┆    shape   │
╞════════╪════════════╡
│ [y, x] ┆ [128, 128] │
└────────┴────────────┘

RS_Slice needs pixels, so SedonaDB resolves each row's Zarr chunk on demand — you never call a loader yourself.

Note: pixel-reading operations like RS_Slice fetch a chunk's bytes on demand, and do so eagerly when the operator runs. We're separately making slice and other "crop" operators lazy — a lightweight view over the chunk, so bytes aren't retrieved until their values are consumed (#813).

And because each chunk is a georeferenced row, you can see a Zarr's layout on a map without decoding a pixel. RS_Envelope turns each chunk into its footprint; reproject to lon/lat and the 4×4 chunk grid draws straight onto a map:

from lonboard import viz   # in a notebook

f = sd.funcs
chunks = cube.select(geom=f.st_transform(cube.raster.rst.envelope(), "EPSG:4326"))

# Draw outlines only, so the basemap shows through the chunk grid.
viz(
    chunks,
    polygon_kwargs=dict(
        filled=False, stroked=True,
        get_line_color=[236, 64, 160], line_width_min_pixels=2,
    ),
)

(Figure to attach: the chunk grid drawn as a 4×4 lattice over a world basemap.)

For the full walkthrough — load a cube, inspect its dimensions, slice a plane, map the chunks, and hand a plane to NumPy — see Working with Zarr and NDArray data in SedonaDB.

We're excited about what shipped here — and we're just getting started. There's more user-facing functionality for N-dimensional rasters and Zarr on the way, and we'd love your input on where it goes next. If you're working with datacubes or cloud-native raster data, open an issue and tell us what you're building and what you need.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paleolimbot — revised this draft above. It now reads a real public ERA5 rainfall Zarr pyramid (EPSG:3857) instead of the placeholder cube, uses the .rst DataFrame accessor, and includes real output. The two pieces it needed both landed in sedona-db — CF/rioxarray CRS (#985) and the zlib codec (#987) — so it runs end-to-end against current main.

@jbampton jbampton added the docs label Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants