Integration: BerlinMOD benchmark branch — 907/907 green on the b183b12 MEOS + th3index/h3 surface by estebanzimanyi · Pull Request #16 · MobilityDB/MobilitySpark

estebanzimanyi · 2026-05-22T12:18:18Z

What this is

The accumulated BerlinMOD benchmark integration branch — it consumes the
currently-open feature PRs so the full Spark UDF surface can be exercised and
benchmarked together against a single MEOS pin. It is an integration/visibility
branch; the individual PRs below remain the unit of review.

Consumes (open): #5, #7, #8, #9, #10, #11, #12, #13, #14, #15.

State

Builds and passes the full suite — 907 tests, 0 failures, 0 errors — against
a JMEOS-1.4.jar regenerated from the estebanzimanyi/MobilityDB
fix/meos-pg-symbol-collision-plus-h3 MEOS API (th3index + static-geometry
h3_geo surface), with libmeos built H3=ON.

The h3 prefilter UDFs (geo_to_h3index_set, ever_eq_anyof_h3indexset_th3index)
are bound directly to libmeos via H3IndexJnrBindings; the rest go through the
regenerated functions facade.

Reconciliation carried here

The regenerated jar corrects several MEOS C-type resolutions the prior jar got
wrong: H3Index/int64→long, bool→boolean (including subdir-header
functions), and an 8-byte TimestampTz * out-param that was under-allocated to
4 bytes — the cause of the stbox/tbox tmin/tmax and timestampN errors and
of the inflated native-heap reading in the leak test. UDF call sites were
reconciled to the corrected signatures (interpType enum, direct h3 binding,
acontains_geo_tgeo/tspatial_transform_pipeline renames, a no-exit MEOS error
handler, and out-param/array marshalling). One test was corrected: tnumber_trend
on a step-interpolated tint correctly returns null.

…trators for UDT and UDF.

Period implementation

…for demo.

Satria/poc

… UDTs using Meos Datatypes.

Meos datatype

…tion Timestampset implementation

Pairs with the BerlinMOD-side change (MobilityDB-BerlinMOD #24): berlinmod_portability_export() now writes a 4-column trips.csv with trip_h3. setup/generate_data.sh in MobilitySpark previously overrode the trips.csv with a 3-column version (hex-EWKB trip but no trip_h3) because the loaders couldn't consume WKT. Update the override to also include trip_h3 (th3index hex-WKB at resolution 7) so the generated CSV matches the schema expected by the load_mbdb.sql / load_mduck.sql loaders. Loaders still fall back to recomputing trip_h3 when it's absent (legacy 3-column CSV), so the change is forward-compatible.

Expand Th3IndexUDFs from the 10-UDF BerlinMOD-relevant subset to the full public h3 API surface — every extern function declared in meos/include/meos_h3.h — th3index temporal type (66 fns) meos/include/h3/h3index.h — static H3Index scalar (10 fns) meos/include/h3/h3index_sets.h — h3indexset (Set of cells) (9 fns) + 1 composed helper (geomToH3Cell, single POINT → H3Index) now has a registered MobilitySpark UDF. Audited via diff between meos_h3.h `extern`-declared symbols and the symbols referenced from Th3IndexUDFs.java — the diff is empty. ## Section-by-section coverage - Static h3index ops (parse/format/compare/hash) — 12 UDFs - h3indexset ops (gridDisk / gridRing / pathCells / cellToChildren / compactCells / uncompactCells / originToDirectedEdges / cellToVertexes / getIcosahedronFaces) — 9 UDFs - th3index I/O + constructors (text inputs + scalar/array makers) — 6 UDFs - Accessors (start/end/value_n/values/value_at_timestamp) — 5 UDFs - MEOS-level conversions to/from tbigint — 2 UDFs - Ever/always cell-side comparisons — 8 UDFs - Ever/always trip×trip comparisons — 4 UDFs - Temporal teq/tne (3 directions × 2 ops) — 6 UDFs - Inspection (resolution / base cell / valid cell / class III / pentagon) — 5 UDFs - Hierarchy (parent / center child / child pos × variants) — 6 UDFs - Lat/Lng conversion (tgeo↔th3index, cell_to_boundary, geomToH3Cell) — 6 UDFs - Directed edges (neighbor cells / cells_to_directed_edge / valid / origin / destination / boundary) — 6 UDFs - Vertices (cell_to_vertex / vertex_to_latlng / valid_vertex) — 3 UDFs - Grid traversal (grid_distance / cell_to_local_ij / local_ij_to_cell) — 3 UDFs - Metrics (cell_area / edge_length / great_circle_distance) — 3 UDFs Total: 86 registered UDFs. ## Implementation notes - Common patterns extracted to private helpers: parseTs (Spark Timestamp / String → MEOS OffsetDateTime), tempHex / setHex (Pointer → hex-WKB + free), evCmp (8 ever/always cell-side variants), ttCmp (4 ever/always trip×trip variants), tempUnary (16 unary Temporal* → Temporal* ops). Keeps individual UDF bodies to ~3-5 lines each despite the volume. - Set return types serialise as hex-WKB STRING via set_as_hexwkb / set_from_hexwkb — same round-trip pattern as Temporal hex-WKB. - Output-pointer wrappers (th3index_value_n, th3index_value_at_timestamptz) honour feedback_jnr_allocated_buffer_nofree.md — the JNR-allocated output Pointer has a Cleaner attached and is NOT MeosMemory.free'd. - Array constructors (th3indexseq_make, th3indexseqset_make) accept Spark ArrayType inputs marshalled to native arrays / Pointer[] respectively. - th3index_values returns long[] mapped to Spark ArrayType(LongType). ## Why 100% Per ecosystem policy (project_jmeos14_multiplatform.md, the 'cross-platform uniformity' policy): MobilitySpark must expose every public MEOS symbol so portable SQL works regardless of which platform the user invokes a function on. The previous 10-UDF subset only covered BerlinMOD's prefilter path; full parity makes any future portable SQL file using th3index (hierarchy / directed-edge / vertex / grid-traversal / metrics-side queries) work on Spark out of the box. ## Dependencies — same chain as the rest of PR MobilityDB#9 - MobilityDB PRs #807 / #866 / #893 — th3index in master - JMEOS regen exposes the 86 referenced symbols - MobilityDuck th3index port (parallel session) Source-complete; CI builds will green once the binding catches up.

…eom→cells API Consumer side of MobilityDB PR #938 (static-geometry → H3 cell set public API). Two new UDFs + the portable Q2 update so the polygon cross-join (Q2: eIntersects(t.trip, region polygon)) gains the same spatial prefilter Q4 / Q5 / Q6 / Q10 already have. ## What lands - Th3IndexUDFs.geoToH3IndexSet(geomWkt, resolution) → STRING (hex-WKB h3indexset). Handles every WKT geometry type — POINT, LINESTRING, POLYGON, MULTI*, GEOMETRYCOLLECTION — via the new geo_to_h3index_set MEOS kernel. - Th3IndexUDFs.everIntersectsH3IndexSetTh3Index(cellSetHex, th3idx) → BOOLEAN. Returns TRUE iff the trip's th3index sequence ever lies in any cell of the candidate set. Wraps the new ever_eq_anyof_h3indexset_th3index predicate. - berlinmod/q02.sql adopts the prefilter directly: JOIN QueryRegions r ON everIntersectsH3IndexSet_Th3Index(geoToH3IndexSet(r.geom, 7), t.trip_h3) AND eIntersects(t.trip, r.geom) Every backend executes the same expression. Soundness comment in the file header explains the prefilter property — a trip can only intersect the region if its th3index path ever passes through a cell that covers part of the region. ## Coverage matrix update Q2 eIntersects(trip, region polygon) — yes (this commit) Q4 eIntersects(trip, point geom) — yes (existing) Q5 nearestApproachDistance(trip, trip) — yes (existing) Q6 eDwithin(trip, trip, 10.0) — yes (existing) Q10 tDwithin(trip, trip, 3.0) — yes (existing) All five BerlinMOD cross-join queries now have the cross-platform th3index prefilter applied directly in the portable SQL. ## Dependencies — feedback_issued_pr_treat_as_landed.md Stacks on **MobilityDB PR #938** (static-geom→cells public API), which itself stacks on the th3index branch (#807 / #866 / #893). JMEOS regen will pick up the two new geo_to_h3index_set / ever_eq_anyof_h3indexset_th3index symbols once those land.

Bind two new MEOS exports from PR #1007 via the MeosNative supplementary JNR-FFI interface: mindistance_tgeo_tgeo(temporal, temporal, threshold) and tgeoarr_tgeoarr_mindist(arr1, count1, arr2, count2). The threshold parameter is exposed but unused at this layer; the canonical Spark form relies on the kernel's outer STBox prune to absorb far-apart pairs. Add two scalar UDFs in DistanceUDFs: minDistanceTgeoGeo(trip, geomWkt) reuses the NAD kernel since NAD reduces to spatial-min when one argument has no time dimension minDistanceTgeoTgeo(trip1, trip2) calls mindistance_tgeo_tgeo Both are also registered under the bare name minDistance with the (tgeo, tgeo) overload as the default, matching the MobilityDB SQL surface used by the canonical BerlinMOD Q5. Update berlinmod/q05.sql to use minDistance instead of nearestApproachDistance. The BerlinMOD spec asks for the minimum spatial distance between the places the vehicles have been, which is the spatial-min, not the time-synchronous NAD. Until PR #1007 the spatial-min form was not portable across the three backends; now it is, so the canonical Q5 adopts it.

…f-udaf-scaffold # Conflicts: # README.md

close() runs before spark.stop() in the standard try-with-resources benchmark/usage pattern, so meos_finalize() tears down MEOS global and per-thread TLS state while Spark executor threads are still alive; their subsequent teardown then double-frees the already-finalized MEOS TLS, aborting the JVM with double free or corruption (fasttop) during shutdown. The OS reclaims native MEOS memory at JVM exit, so the explicit finalize is unnecessary and unsafe in the Spark and surefire lifecycles; it belongs only in a standalone main that owns the whole JVM with no live MEOS-using threads at exit.

expandSpace and geoTimeStbox serialised the STBox with stbox_as_hexwkb(box, (byte) 0, ...). WKB variant 0 omits the SRID, so bboxOverlaps re-parsing it via stbox_from_hexwkb gets SRID 0; overlaps_tspatial_stbox then compares an SRID-3812 trip against an SRID-0 box, returns false for every pair, and Q10's WHERE ... AND bboxOverlaps(t2.trip, expandSpace(t1.trip, 3)) silently drops all matches (0 rows instead of the expected count). Serialise with WKB_EXTENDED (0x04) so the SRID round-trips; Q10 then returns the correct rows, matching MobilityDB's native && operator.

CI vendors $GITHUB_WORKSPACE/lib/libmeos.so for the unit tests (.github/workflows/maven.yml + pom surefire -Djava.library.path). The committed binary was a stale MEOS build predating the ensure_linear_interp guard in tnumber_trend, so tnumber_trend on a step-interpolated tint returned a computed trend instead of NULL, deterministically failing MathUDFsExtTest.tnumberTrend_tint_step_returns_null (expected null, got a tfloat hex-WKB). The test and the AnalyticsUDFs.tnumberTrend wrapper are correct against current MEOS: verified that the current libmeos returns NULL for that exact input while the stale one returns non-null. Replace lib/libmeos.so with a current MEOS 1.4 build that carries the guard.

This reverts commit ca9676d.

…lityDB State present coverage only (858/858 active addressable temporal+geo, 100%) with the scope partition and deferred families shared with MobilityDuck; drop dated-milestone and changelog narrative. parity-status.md regenerated from scripts/parity-audit.py against current MobilityDB master.

…C #920) Register all 29 portable bare-name operator UDFs from the single source of truth — MobilityDB/MEOS-API meta/portable-aliases.json (RFC #920, discussion MobilityDB#861, native in MobilityDB#1075) — vendored read-only at meta/portable-aliases.json. PortableOperatorAliasUDFs reuses each operator's own existing backing field verbatim (equivalence by construction) at the MEOS superclass entrypoint (*_temporal_temporal, *_tspatial_tspatial, t*_temporal_temporal, tdistance_tgeo_tgeo, nad_tgeo_*), so all six type families — temporal, geo, cbuffer, npoint, pose, rgeo — are covered by construction; left/right/overleft/overright select between the existing tnumber and tspatial backings. The type-qualified spellings they supersede 1:1 (temporalBefore, tnumberLeft, teqTemporal, ...) are dropped: the bare name is the portable contract. scripts/portable_parity.py gates the dialect with the same prefix logic as MobilityDB/MEOS-API portable_parity.py: 29/29 backed, 0 unbacked, all six families; wired into Maven CI. parity-audit.py no longer defers cbuffer/npoint/pose/rgeo (DEFERRED_FAMILIES empty by invariant); docs/parity-status.md and docs/parity-100.md regenerated/reframed so the four families are in scope and never excluded from any headline (1353/1462 = 92.5% active addressable; the remaining typed per-family surface is tracked as a gap with a plan, not a stopping point).

…amilies JMEOS 1.4 ships no cbuffer/npoint/pose/rgeo symbols, so the four sibling temporal families had no typed UDF surface. Bind them via the sanctioned MeosNative raw-FFI interface against the real MEOS C API (lowercase meos.h symbols, every one verified exported by lib/libmeos.so with nm -D), reusing each function's own symbol — no reimplementation. - CbufferUDFs / NpointUDFs / PoseUDFs / RgeoUDFs: constructors, accessors, I/O (hex-WKB/WKB/EWKT/text/MFJSON), cbuffer spatial relationships, tnpoint route-id ops, trgeometry→tpose, Seq/SeqSet/ SeqSetGaps — reusing generic functions.* helpers where they already exist, MeosNative for the family-specific symbols. Wired into MobilitySparkSession before the portable dialect. - parity-audit.py: the five GiST/GIN opclass-support index sections (tcbuffer/tnpoint/tpose/trgeometry indexes + tnpoint_gin) classified OUT_OF_SCOPE — the same PG-only index-access-method class already excluded for temporal/geo *_gist/*_spgist; uniform, labelled, not a family exclusion. DEFERRED_FAMILIES stays empty. Active addressable parity 92.5% -> 99.6% (1571/1577), all six families counted in the headline. The only remainder is rgeo/133_trgeo_vclip (6 functions): the v_clip user surface lives in the mobilitydb PG extension, not libmeos.so — lib/libmeos.so exports only low-level LWPOLY/Pose kernels with no GSERIALIZED/Temporal entry point reachable from the hex-WKB convention. Recorded as a documented MEOS-library ABI gap with a concrete upstream fix (export v_clip_trgeo_geo / v_clip_trgeo_trgeo GSERIALIZED/Temporal wrappers), intentionally not stubbed. Portable bare-name gate unchanged: 29/29, 0 unbacked.

…00-percent

…' into preview/100-percent

… into preview/100-percent

… preview/100-percent

Adds two consolidated improvements to the 3-platform BerlinMOD benchmark: (1) Three-tier index framework ============================== Each query can now be benchmarked under three configurations, isolating the contribution of different acceleration mechanisms: | Tier | What it measures | Platforms | |---|---|---| | 1 — baseline | th3index columnar prefilter only | PG, Duck, Spark | | 2 — native-only | Engine-native spatial index in isolation | PG, Duck | | 3 — combined | Best-of-platform, production-realistic | PG, Duck | Spark is excluded from Tiers 2 / 3: Spark SQL has no native spatial index, so forcing it to compete without th3index would measure lack-of-feature. Tier 1 stays the only honest Spark measurement. `bench_mbdb.sh` and `bench_mduck.sh` gain a `--tier {1,2,3}` flag. The runner toggles indexes after loading and writes the tier into the results JSON so report.py can pivot by configuration. For MobilityDuck, Tiers 2/3 use the new `TRTREE` multi-entry index delivered by MobilityDuck PRs #143 (multi-entry index) + #144 (constant-geometry pushdown). Run the bundle with the user's MobilityDuck install pinned to those PR branches (or use the preview/100-percent release). (2) NxN mitigations on Spark (Q5, Q6, Q10, Q16) ============================================== The four Trips × Trips queries don't fit Spark's no-spatial-index model naturally — the row-by-row `everEqTh3IndexTh3Index(...)` prefilter inside a non-equi join becomes O(N²) on the full Trips × Trips space. Two complementary mitigations are wired: (a) Spark `/*+ BROADCAST(...) */` hints in the portable q05/q06/q10/q16 SQL. Spark recognises the hint and pins the listed small/filtered tables to every executor; PostgreSQL and DuckDB treat the `/*+ ... */` block as an ordinary comment, so the SQL stays byte-identical and portable. (b) Spark-optimised query variants `q05_spark.sql`, `q06_spark.sql`, `q10_spark.sql`, `q16_spark.sql`. These explode each trip's th3index into one row per H3 cell via `explode(th3IndexValues(trip_h3))` and run the spatial prefilter as an **equi-join on the cell column** — which Spark accelerates natively (sort-merge / shuffle-hash). The expensive `tDwithin` / `minDistance` / `eDwithin` then runs on the much smaller deduplicated candidate set. The MobilitySpark runner (`bench_mspark.sh` → `BerlinMODBench.java`) auto-prefers `<query>_spark.sql` over `<query>.sql` when present and tags the result line with `[spark]` for clarity. PG and DuckDB runners always use the portable `<query>.sql`. Variants are semantically equivalent to their portable counterparts — same input, same output, same scientific question — just expressed in the form Spark's engine can accelerate. At default scale (~10K trips, ~50 cells per trip): * portable q10.sql on Spark : ~10⁸ candidate pairs row-by-row * q10_spark.sql via UNNEST : ~5×10⁵ candidate pairs equi-joined (200× reduction before tDwithin) Files ===== * `berlinmod/README.md` — adds "Three-tier index framework" and "NxN mitigations on Spark" sections + per-query tier-applicability table * `berlinmod/bench/bench_mbdb.sh` + `bench_mduck.sh` — `--tier {1,2,3}` flag + tier-specific index DROP/CREATE * `berlinmod/bench/bench_mspark.sh` — doc-only: explicit Tier 1, points at the Spark variants * `berlinmod/q05.sql` / `q06.sql` / `q10.sql` / `q16.sql` — add `/*+ BROADCAST(...) */` hints (no-op comment on PG/Duck) * `berlinmod/q05_spark.sql` / `q06_spark.sql` / `q10_spark.sql` / `q16_spark.sql` — new Spark-optimised variants * `src/main/java/.../BerlinMODBench.java` — prefer `_spark.sql` when present + emit `tier: 1` in results JSON Refs: MobilityDuck#143, MobilityDuck#144 (TRTREE multi-entry + pushdown).

…ayer) Introduces `org.mobilitydb.spark.util.TimeUtil` as the single named boundary-conversion layer between Spark / Java native timestamp types (java.sql.Timestamp, java.time.OffsetDateTime, java.time.Instant) and MEOS canonical `TimestampTz` (microseconds since 2000-01-01 UTC). Per the ecosystem's closed-algebra-boundary rule: every binding owns ONE thin layer that converts platform-native types to/from MEOS canonical, and every call-site goes through that layer. Inlining the magic constant 946684800 (seconds between Unix epoch and PG epoch) at individual call-sites scatters the conversion logic and risks drift if one site is updated and another is missed. Before this PR: 13 separate files contained inline `946684800L * 1000L` (or `946684800L * 1000000L`) expressions, with three of them re-declaring private static finals like `PG_UNIX_OFFSET_MS` locally — making the magic number ambient across the codebase. After this PR: every callsite uses `TimeUtil.PG_UNIX_EPOCH_OFFSET_{S,MS,US}` or the named helpers `TimeUtil.toMeosTimestamp(...)` / `TimeUtil.fromMeosTimestamp(...)`. One place defines the constant, every site references it. Files touched: * src/main/java/org/mobilitydb/spark/util/TimeUtil.java (new, 85 lines) * 13 callsite files in spark/cbuffer/, spark/geo/, spark/npoint/, spark/pose/, spark/temporal/ — mechanical substitution of the inlined magic numbers / per-file constants with TimeUtil refs; added `import org.mobilitydb.spark.util.TimeUtil;` where used. Companion: `MobilityDuck/src/include/time_util.hpp` plays the same role on the DuckDB side; `GoMEOS/datetime_utils.go` on the Go side; PyMEOS-CFFI / MEOS.NET / JMEOS route through `pg_timestamptz_in` which is functionally equivalent. This brings MobilitySpark into alignment with the rest of the ecosystem. No behaviour change — every substitution is value-preserving and type-preserving (long → long; same numeric value of 946684800000). Note on this branch's pre-existing build state: the `preview/100-percent` base contains a small set of pre-existing JMEOS API-drift errors (`temporal_lower_inc` / `tpoint_transform_pipeline` etc.) introduced by recent JMEOS signature uniformization that haven't been propagated to MobilitySpark yet. Those errors are independent of this refactor — they exist in files unaffected by this PR's diff. Confirm by running `git diff main` on any error-emitting file; this PR's diff is purely the symbol substitution above.

…ure-tracked Adds doc/spark-version.md as the canonical position statement on Spark-version targeting: - MobilitySpark targets Apache Spark 3.5.x (LTS through 2026). - Spark 4.0 (May 2025 GA) is early-adoption, not production-default. - The MEOS-C-library-version axis is orthogonal and handled inside JMEOS (one JMEOS jar per MEOS version; MobilitySpark pins one JMEOS jar). - Five trigger signals enumerated for when to commit to Spark 4 support; commit when two or more fire. - The future Spark 4 effort, if/when needed, will mirror MobilityDuck's multi-DuckDB-version foundation (Maven profile per Spark version, CI matrix, per-version jar artefacts). README updated to bump the Spark requirement from the stale 3.4.0 line to 3.5.x with a pointer to the new doc. The position is asymmetric to MobilityDuck's multi-DuckDB-version work: DuckDB v1.4/v1.5 are both production with split adopter base today, whereas Spark 3.5 is the production line and 4.0 is the early-adoption line.

… integration/berlinmod-bench # Conflicts: # README.md # berlinmod/bench/bench_mbdb.sh # berlinmod/bench/bench_mduck.sh # berlinmod/bench/bench_mspark.sh # docs/parity-100.md # docs/parity-status.md # libs/JMEOS-1.4.jar # scripts/parity-audit.py # src/main/java/org/mobilitydb/spark/MobilitySparkSession.java # src/main/java/org/mobilitydb/spark/demo/BerlinMODUDFs.java # src/main/java/org/mobilitydb/spark/geo/AlwaysSpatialRelsUDFs.java # src/main/java/org/mobilitydb/spark/geo/DistanceUDFs.java # src/main/java/org/mobilitydb/spark/geo/GeoAffineUDFs.java # src/main/java/org/mobilitydb/spark/geo/GeoAnalyticsUDFs.java # src/main/java/org/mobilitydb/spark/geo/STBoxUDFs.java # src/main/java/org/mobilitydb/spark/geo/TPointSTBoxOpsUDFs.java # src/main/java/org/mobilitydb/spark/temporal/AccessorAliasUDFs.java # src/main/java/org/mobilitydb/spark/temporal/AggregateUDAFs.java # src/main/java/org/mobilitydb/spark/temporal/BucketUDFs.java # src/main/java/org/mobilitydb/spark/temporal/IOAliasUDFs.java # src/main/java/org/mobilitydb/spark/temporal/MoreAccessorUDFs.java # src/main/java/org/mobilitydb/spark/temporal/RestrictionUDFs.java # src/main/java/org/mobilitydb/spark/temporal/SimilarityUDFs.java # src/main/java/org/mobilitydb/spark/temporal/SpanAccessorUDFs.java # src/main/java/org/mobilitydb/spark/temporal/SpanAlgebraUDFs.java # src/main/java/org/mobilitydb/spark/temporal/SubtypeConstructorUDFs.java # src/main/java/org/mobilitydb/spark/temporal/TBoxUDFs.java # src/main/java/org/mobilitydb/spark/temporal/TTextUDFs.java # src/main/java/org/mobilitydb/spark/temporal/TemporalBoxOpsUDFs.java # src/main/java/org/mobilitydb/spark/temporal/TemporalCompUDFs.java # src/main/java/org/mobilitydb/spark/temporal/TileUDFs.java # src/main/java/org/mobilitydb/spark/temporal/TransformUDFs.java # src/test/java/org/mobilitydb/spark/temporal/AggregateUDAFsTest.java # src/test/java/org/mobilitydb/spark/temporal/MathUDFsExtTest.java # src/test/java/org/mobilitydb/spark/temporal/TBoxOpsUDFsTest.java

expandSpace and geoTimeStbox serialised the STBox with stbox_as_hexwkb(box, (byte) 0, ...). WKB variant 0 omits the SRID, so bboxOverlaps re-parsing it via stbox_from_hexwkb gets SRID 0; overlaps_tspatial_stbox then compares an SRID-3812 trip against an SRID-0 box, returns false for every pair, and Q10's WHERE ... AND bboxOverlaps(t2.trip, expandSpace(t1.trip, 3)) silently drops all matches (0 rows instead of the expected count). Serialise with WKB_EXTENDED (0x04) so the SRID round-trips; Q10 then returns the correct rows, matching MobilityDB's native && operator.

…ation/berlinmod-bench # Conflicts: # .github/workflows/maven.yml # pom.xml

The accumulated integration branch carries PR MobilityDB#11's Th3IndexUDFs full API (86 UDFs), 8 of which call JMEOS methods (always_eq_th3index_h3index etc.) that don't exist in either bundled JMEOS jar (1.4 or 1.5). The JMEOS regen needs to happen against MEOS-with-th3index (the parallel JMEOS-session task) before this branch will fully compile. This note unblocks the benchmark-running parallel session — they can either swap the JMEOS jar to a th3index-aware regen, or comment out the 8 call sites (Th3IndexUDFs.java:491-498, 527-531) which disables only the Spark Tier-1 h3-cell prefilter; Trips × Trips queries Q5/Q6/Q10/Q16 still run via portable SQL + BROADCAST hints.

Build the accumulated integration branch against a JMEOS-1.4.jar regenerated from the b183b12 MEOS API (th3index + h3_geo surface) and reconcile the UDF call sites to the corrected signatures. MEOS C-type fixes carried by the regenerated jar: - H3Index / h3index -> long (was int) - int64 -> long (was int) - bool -> boolean (incl. subdir-header functions) - TimestampTz *out -> 8-byte buffer (was a 4-byte under-allocation that overflowed on every stbox/tbox tmin/tmax and timestampN call) UDF reconciliation: - interpType: pass the int enum (3 = LINEAR) / interpToInt(), not a string - h3 prefilter (geo_to_h3index_set, ever_eq_anyof_h3indexset_th3index) bound directly via H3IndexJnrBindings, not functions.* - acontains_geo_tgeo (was acontains_geo_tpoint); tspatial_transform_pipeline (was tpoint_transform_pipeline); acovers_{geo_tgeo,tgeo_geo} now exposed - no-exit MEOS error handler via meos_initialize_error_handler with a kept-alive callback (MEOS default exit()s the JVM on error) - out-param/array marshalling: temporal_value_at_timestamptz (folded return), th3index_values, th3indexseq[set]_make, tpointinst_make Test correction: tnumberTrend on a step-interpolated tint correctly returns null (tnumber_trend requires linear interpolation). JMEOS jar -> libs/JMEOS-1.4.jar; pom points at it.

bench_mbdb.sh and bench_mduck.sh ran each query with stderr discarded and `|| true`, then recorded the elapsed time unconditionally. A query that errored therefore produced a fast phantom timing, and report.md looked complete even when queries had silently failed. Run each query with ON_ERROR_STOP (mbdb) / a checked exit status (mduck), print FAILED with the first error line, and exclude failed queries from the results JSON so they show as missing rather than fast.

- trip_h3 called tgeompoint_to_th3index(), which is the internal MEOS C name, not a SQL function. The SQL converter is h3_latlng_to_cell, and its tgeompoint overload requires lon/lat, so reproject with transform(..., 4326) (Trip is stored in EPSG:3857). - Add the geomWKT text column selected by q11/q12/q15; the loader previously created only geom, so those queries errored on a missing column.

The portable query files referenced functions this MobilityDB build exposes under different names, so the queries errored at run time: - asHexWKB(tgeompoint) -> asHexEWKB (no plain-WKB output for tgeompoint in this build) - everEq{H3Index,Th3Index}Th3Index -> ever_eq - geomToH3Cell(geom, r) -> geoToH3Cell(ST_Transform(geom,4326), r) (geoToH3Cell needs lon/lat; query points are EPSG:3857) - minDistance(tgeompoint, tgeompoint) -> ST_Distance(trajectory(t1), trajectory(t2)) (spatial minimum ignoring time, distinct from the time-synchronous nad) With these all 18 queries run on MobilityDB. The cross-platform aim is to expose these as portable aliases in MEOS so the shared queries can revert to the single portable dialect.

Committed snapshot of the honest MobilityDB-only run (all 18 queries verified to execute under ON_ERROR_STOP; MobilityDB 1.4.0 on PG18, 1620 trips, median of 3 runs). results/ is normally gitignored — this is a deliberate reference snapshot, drop it if you prefer #913 only. Note: report.py's static q05 label still reads "nearestApproachDistance"; the SQL uses ST_Distance(trajectory,...) (spatial-min).

query_regions.csv holds lon/lat (the exporter emits ST_Transform(geom, 4326)), but load_mbdb.sql parsed it with ST_GeomFromText(geom, 3857) — stamping degrees as metres, ~480 km from the trips. Every region query (q02/q13/q14/q16) therefore silently returned 0 rows (masked, before the harness fix, as phantom fast timings). - load_mbdb.sql: parse QueryRegions as 4326, then reproject to 3857. - q02.sql: geoToH3IndexSet needs lon/lat, so transform the (now 3857) region back to 4326 for the th3 prefilter, as q04 already does for points. Soundness verified against the exhaustive predicate (no stale oracle needed): q02 with the th3 prefilter = 139 rows = q02 with eIntersects alone; q04 likewise 43 = 43.

q02/q13/q14/q16 now execute real work. q16 still returns 0 rows by construction (the 10x10x10 subset has co-occurring pairs but none are always-disjoint), but it is now timed honestly rather than phantom-fast.

Luis Alfredo Leon Villapun and others added 30 commits August 7, 2023 12:05

PeriodUDT simple implementation.

e17f8ca

Added comments and definitions for PeriodUDT class.

822ae6d

Deleted JMEOS jars from git tracking, added to gitignore. Added Regis…

88d7b1b

…trators for UDT and UDF.

Partially implemented UDFs for PeriodUDT class.

69b8234

Add tgeompointinst UDF

5975da9

Add PeriodSet

97cd188

Update pom

747c016

Modify main

62a51c9

Add period set UDT

f5de6a6

Finished UDFs and UDF registrator.

6639c3f

Finished UDFs and UDF registrator.

2fdc6cd

Added sample tests and testing utility.

0e192f9

Started working on TimestampSet UDTs

8e699f2

Merge pull request #2 from satriabw/period-implementation

3cf33a6

Period implementation

Add PeriodSet registrator for UDT and UDF

3b1c81c

Add the rest of PointUDF implemenation

c36010b

Implement Period Set UDF

3fa4056

Implemented basic version of TimestampSetUDT. Also modified examples …

c6240cb

…for demo.

Implement PeriodSet

c9b4e45

Skip the test for now

c33e269

Merge pull request #3 from satriabw/satria/poc

a3b2309

Satria/poc

Implemented changes on Period using Binaries.

0505997

Implemented new structure with MeosDatatype as parent class for Spark…

24b5271

… UDTs using Meos Datatypes.

Reformated Factory to minimize reduncancies

f290586

Merge pull request #4 from satriabw/meos-datatype

45d193f

Meos datatype

Merge branch 'develop' into timestampset-implementation

de5ae39

Merge pull request MobilityDB#6 from satriabw/timestampset-implementa…

ca441df

…tion Timestampset implementation

Add implementation for ais dataset

78180d4

Tidy up implementation

91afa4c

Finish AISDataExample implementation

7724506

estebanzimanyi added 30 commits May 14, 2026 14:37

Merge remote-tracking branch 'upstream/main' into feat/mindistance-ud…

e7101af

…f-udaf-scaffold # Conflicts: # README.md

Revert "Refresh stale committed lib/libmeos.so to current MEOS 1.4"

0f89aed

This reverts commit ca9676d.

Merge remote-tracking branch 'fork/doc/reviewer-guide' into preview/1…

c3d66aa

…00-percent

Merge remote-tracking branch 'fork/feat/mindistance-udf-udaf-scaffold…

a30bb44

…' into preview/100-percent

Merge remote-tracking branch 'fork/feat/portable-operator-bare-names'…

b24dfb7

… into preview/100-percent

Merge remote-tracking branch 'fork/feat/sibling-families-parity' into…

4d0607d

… preview/100-percent

Merge remote-tracking branch 'fork/fix/license-main-java' into integr…

7721c25

…ation/berlinmod-bench # Conflicts: # .github/workflows/maven.yml # pom.xml

Refresh MobilityDB benchmark report after region SRID fix

d6ebaa5

q02/q13/q14/q16 now execute real work. q16 still returns 0 rows by construction (the 10x10x10 subset has co-occurring pairs but none are always-disjoint), but it is now timed honestly rather than phantom-fast.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration: BerlinMOD benchmark branch — 907/907 green on the b183b12 MEOS + th3index/h3 surface#16

Integration: BerlinMOD benchmark branch — 907/907 green on the b183b12 MEOS + th3index/h3 surface#16
estebanzimanyi wants to merge 173 commits into
MobilityDB:mainfrom
estebanzimanyi:integration/berlinmod-bench

estebanzimanyi commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

estebanzimanyi commented May 22, 2026

What this is

State

Reconciliation carried here

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants