From b0fe78e4c84821fdfd0768f45e8e0df683f5df16 Mon Sep 17 00:00:00 2001 From: Esteban Zimanyi Date: Tue, 19 May 2026 15:16:01 +0200 Subject: [PATCH] docs: RFC for canonical dispatch metadata (path to 100% codegen) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit geo (TGeomPoint/TGeogPoint) and temporal (TFloat/TInt/TBool/TText) are the 2 of 6 families not switched to generated mixins. Verified root cause: their regular-method dispatch encodes editorial decisions absent from meos-idl.json (geo: STBox-via-stbox_to_geo, Point/Geometry split, runtime geodetic flag, GeoSet->generic; temporal: Int->Float coercion via super(), scalar-by-value casts, generic *_temporal_temporal self). A PyMEOS-local FAMILY_MODEL extension was rejected as the transcription anti-pattern (relocates editorial logic into per-binding config; no equivalence by construction; diverges across bindings). This RFC proposes the sound, ecosystem-correct solution: an oo...dispatch annotation in MEOS-API's meta/meos-meta.json (merged into meos-idl.json like the existing shape annotations GoMEOS already consumes), so every binding's faithful generator derives geo/temporal from the SAME canonical catalog facts -- equivalence by construction restored at the catalog level, A/B suite as the unchanged acceptance gate, ecosystem-wide 100%. The §3 dispatch tables are extracted verbatim from PyMEOS's hand-written oracle and are complete for geo/temporal. Cross-repo handoff: MEOS-API (parallel-owned) lands the metadata; PyMEOS then consumes it and generates the final 2 families. Until then geo/temporal correctly stay hand-written (fully functional -- no API-parity gap; only the codegen-uniformity goal is outstanding). Stacked on #93. --- tools/oo_codegen/RFC-dispatch-metadata.md | 315 ++++++++++++++++++++++ 1 file changed, 315 insertions(+) create mode 100644 tools/oo_codegen/RFC-dispatch-metadata.md diff --git a/tools/oo_codegen/RFC-dispatch-metadata.md b/tools/oo_codegen/RFC-dispatch-metadata.md new file mode 100644 index 00000000..15b9bdad --- /dev/null +++ b/tools/oo_codegen/RFC-dispatch-metadata.md @@ -0,0 +1,315 @@ +# RFC — canonical *dispatch* metadata for 100 % faithful OO codegen + +**Status:** proposed (PyMEOS-authored cross-repo handoff to MobilityDB/MEOS-API) +**Scope:** the path to 100 % of the OO codegen-switch — covering the two +families (`geo`, `temporal`) that cannot be faithfully generated today. + +## 1. Problem + +The `meos-idl.json`-driven faithful generator (`tools/oo_codegen/codegen.py`) +switched **4 of 6** temporal-type families to generated mixins, each proven +equivalent to the hand-written code (TCbuffer #90, TPose #91, TNpoint #92, +TRgeometry #93). It works because those families' regular-method dispatch is +**mechanically derivable from the catalog**: the C-name structure +`__` (e.g. `econtains_tcbuffer_cbuffer`) plus a +small per-family token model is sufficient to reproduce the hand-written +`isinstance` ladder exactly (equivalence by construction). + +`geo` (TGeomPoint/TGeogPoint) and `temporal` (TFloat/TInt/TBool/TText) **cannot** +be generated this way. Their hand-written regular methods encode *editorial +dispatch decisions that are absent from `meos-idl.json`* and therefore not +derivable from it: + +| Family | Editorial dispatch not in the catalog | +|---|---| +| geo | `STBox` → `tdistance_tgeo_geo(self._inner, stbox_to_geo(other._inner))` (no `*_tgeo_stbox` in the catalog — the STBox routing is a Python-side convenience); `shp.Point` vs `shpb.BaseGeometry` split → *different* backings (`tpoint_at_value` vs `tpoint_at_geom`); the geometry transform is **runtime-self-dependent**: `geo_to_gserialized(other, isinstance(self, TGeogPoint))`; `GeoSet` → generic `temporal_at_values`; regular families split across the `TPoint` ABC and its subclasses | +| temporal | `IntSet`/`IntSpan`/`IntSpanSet` → `super().at(other.to_floatset())` (Python-side type coercion, no `tfloat_at_intset` in the catalog); scalars passed **by value** with a per-member cast (`tfloat_at_value(self._inner, float(other))`); self-type uses the **generic** `*_temporal_temporal` (no typed `*_tfloat_tfloat`) | + +A PyMEOS-local generator extension that hard-codes this dispatch into +`FAMILY_MODEL` was considered and **rejected**: it would merely relocate the +editorial Python logic into per-binding config — *no* equivalence by +construction (the catalog is no longer the source of truth), equal bug risk, +more config than the code it replaces, and it would diverge from every other +binding's generator. That is the transcription anti-pattern. + +## 2. Root cause + +`meos-idl.json` is a *signature* catalog. The geo/temporal dispatch is +*editorial* (type coercions, conversions, runtime flags, generic fallbacks). +The MEOS-API RFC (MobilityDB issue #836 / discussion #920) already designed +the catalog to carry **editorial/shape decisions** in the +`meta/meos-meta.json` enrichment layer, merged into `meos-idl.json` so that +**every binding's codegen consumes the same canonical facts** +(GoMEOS already consumes `shape.*` annotations this way). The gap is simply +that the *dispatch* facts geo/temporal need have no schema yet. + +## 3. Proposal — a `dispatch` annotation in `meta/meos-meta.json` + +Add a per-OO-member `dispatch` block (merged into `meos-idl.json` like the +existing `shape`) describing the **ordered argument→backing table** the +hand-written method encodes. It is declarative, catalog-owned, and +binding-agnostic; each binding's existing faithful generator consumes it the +same way it already consumes the `__` token model — +restoring equivalence by construction at the *catalog* level. + +```jsonc +// meta/meos-meta.json (excerpt — keyed by OO family + member) +"oo": { + "geo": { + "at": { + "dispatch": [ + { "py": "Point", "fn": "tpoint_at_value", + "argTransform": "geoToGserialized", "geodeticFromSelf": true }, + { "py": "BaseGeometry","fn": "tpoint_at_geom", + "argTransform": "geoToGserialized", "geodeticFromSelf": true }, + { "py": "GeoSet", "fn": "temporal_at_values" }, + { "py": "STBox", "fn": "tgeo_at_stbox", "extraArgs": ["true"] } + ], + "fallback": "super", "result": "temporal" + }, + "distance": { + "dispatch": [ + { "py": "BaseGeometry","fn": "tdistance_tgeo_geo", + "argTransform": "geoToGserialized", "geodeticFromSelf": true }, + { "py": "STBox", "fn": "tdistance_tgeo_geo", + "argTransform": "stboxToGeo" }, + { "py": "TPoint", "fn": "tdistance_tgeo_tgeo" } + ], + "fallback": "raise", "result": "temporal" + } + }, + "temporal": { + "at": { + "dispatch": [ + { "py": "scalar", "fn": "_at_value", "argTransform": "scalarCast" }, + { "py": "IntSet", "coerce": "to_floatset", "via": "super" }, + { "py": "IntSpan", "coerce": "to_floatspan", "via": "super" } + ], + "fallback": "super", "result": "temporal" + }, + "always_equal": { + "dispatch": [ + { "py": "scalar", "fn": "always_eq__", "argTransform": "scalarValue" }, + { "py": "self", "fn": "always_eq_temporal_temporal" } + ], + "fallback": "raise", "result": "bool_gt0" + } + } +} +``` + +`argTransform` values are a **closed, named vocabulary** (`geoToGserialized`, +`stboxToGeo`, `scalarCast`, `scalarValue`, `innerPtr`, …) — every binding maps +each name to its own idiom (PyMEOS: `geo_to_gserialized($o, )`, +`stbox_to_geo($o._inner)`, `float($o)`, `$o`, `$o._inner`). `geodeticFromSelf` +is the only runtime-self primitive (PyMEOS → `isinstance(self, TGeogPoint)`). +The vocabulary is finite because the editorial decisions are finite and +already enumerated above. + +## 4. Why this is sound (and Path A is not) + +- **Single source of truth.** The dispatch becomes a *catalog fact*, authored + once in MEOS-API, consumed identically by Go/NET/PyMEOS/… — the RFC's whole + premise. Equivalence by construction is restored: the generator emits from + canonical metadata, not per-binding guesses. +- **Acceptance unchanged.** Each binding still proves it via its existing A/B + suite test (behavioural equivalence) — the same gate used for the 4 shipped + families. +- **Ecosystem-wide 100 %**, not PyMEOS-local: every binding's geo/temporal + surface becomes generated from the same metadata. + +## 5. Ownership & sequencing + +- **MEOS-API (parallel-owned):** add the `oo...dispatch` + schema to `meta/meos-meta.json` + the `argTransform` vocabulary doc; emit it + into `meos-idl.json`. (This RFC is the handoff brief; the dispatch tables in + §3 are extracted verbatim from PyMEOS's hand-written oracle and are complete + for geo/temporal — see `tools/oo_codegen/codegen.py` `FAMILY_MODEL` for the + 4 already-derived families as the precedent.) +- **PyMEOS (this repo, follow-up):** teach `codegen.py` to consume + `oo.*.dispatch` (map the `argTransform` vocabulary to PyMEOS idioms; add + `geodeticFromSelf`, `coerce/via:super`, `scalar` primitives), then generate + + A/B-prove geo & temporal mixins → **codegen 100 % (6/6)**. The local + `FAMILY_MODEL` for the 4 derived families converges onto the same consumed + metadata over time. + +Until the metadata lands, geo/temporal **correctly stay hand-written** — they +are fully functional (no API-parity gap); only the *codegen-uniformity* goal +is outstanding, and this RFC is its sound resolution. + +## 6. Unification — one systemic root cause + +Driving the MEOS-1.4 bump (PyMEOS #81, Wave 2 of the integration train) +surfaced the same root cause and consolidates the whole problem space: + +- `bump/meos-1.4` collects cleanly but is **299 failed / 4193 passed** + against a composed Wave-0 MEOS + freshly-regenerated PyMEOS-CFFI. +- **~96 of those are a single class**: the regenerated `pymeos_cffi` + wrappers do bare `_ffi.cast('T *', None)` / `None.encode('utf-8')` for + optional params (e.g. `temporal_as_mfjson.srs`, `stbox_make.s`). +- **PyMEOS-CFFI's codegen is already correct** — it emits + `… if x is not None else _ffi.NULL` *iff* the param is in + `shape.nullable` of `meos-idl.json` (`build_pymeos_functions.py`). The + failures exist purely because the Wave-1 `meos-idl.json` **lacks the + `shape.nullable` enrichment** for those params. + +So the geo/temporal `oo.dispatch` gap (this RFC), the original +`NotImplementedError` stub→real gap, **and** the ~96-failure MEOS-1.4 +nullable regression are *the same defect*: **`meta/meos-meta.json` +enrichment is incomplete**. They are one problem, not three. + +**This RFC is therefore generalised:** the canonical-metadata completion it +proposes must also cover **`shape.nullable`** (and the `shape.*` editorial +annotations GoMEOS already consumes). Consumers are already written for it — +every binding's codegen reads `shape.nullable` today; it is simply empty. A +blanket "treat every pointer/string param as nullable" shim is rejected for +the same reason as the Path-A transcription hack: it would silently `NULL` +params that should raise, masking genuine argument errors — not +equivalence-preserving. The sound fix is enriching the canonical catalog so +the already-deployed, already-correct codegen does the right thing +everywhere at once. + +**Net:** completing `meta/meos-meta.json` (`oo.dispatch` + `shape.nullable` ++ `shape.*`) is *the* single highest lever for ecosystem-wide 100 % parity — +it closes geo/temporal codegen, the stub→real surface, and ≈⅓ of the +MEOS-1.4 bump together. This is the one cross-repo handoff to MEOS-API that +matters most. + +## 7. Verbatim extended dispatch SoT (D1-extension — transcribe, do not derive) + +§3 above carried only the **4 illustrative** members. This section is the +**complete, fully-resolved, verbatim** editorial-member dispatch, transcribed +1:1 from the hand-written oracle (`pymeos/main/{tpoint,tfloat,tint,tbool, +ttext}.py` on `feat/extended-temporal-types`). No placeholders, no `/` +— every `fn`/type/cast is resolved. The MEOS-API session transcribes this +**verbatim** into `meta/object-model.json#/dispatch` (geo single-block; +`temporal.{tfloat,tint,tbool,ttext}.` per-concrete, the adopted +contract); §6's "do not re-derive" applies — a prose recipe is *not* a +substitute for these tables (it already produced 5 verified errors, below). + +**Schema additions to the closed vocabulary:** `scalarType` (the exact +`isinstance` test for a `py:"scalar"` entry, e.g. `"float"`, `"int|float"`, +`"int"`, `"bool"`, `"str"`); `argTransform` gains `scalarValue` (pass `$o` +as-is), `scalarCast` (cast `$o` to the block's concrete base — `float()` for +`tfloat`, `int()` for `tint`), `textsetMake` (`textset_make($o)`); `py:"list[str]"` +(= `isinstance(other,list) and isinstance(other[0],str)`). `coerce`+`via:"super"` +unchanged. + +**Recipe-vs-oracle errors this SoT corrects (why verbatim is mandatory):** +(1) temporal has **no** editorial `distance` member (TFloat/TInt expose none); +(2) `geo.nearest_approach_distance` STBox uses the *typed* `nad_tgeo_stbox` +(`innerPtr`), **not** `stboxToGeo`; (3) TFloat always/ever-compare `scalarType` +is `"float"`, but its `temporal_equal`/`at` is `"int|float"` — differs per +member; (4) TInt coerces **Float→Int** (`to_intset`…), the opposite direction +to TFloat, and its `temporal_equal` is `teq_tint_int($o)` with **no cast** +(`scalarValue`), unlike TFloat's `float()`-cast; (5) TBool exposes only +`temporal_equal/not_equal/at/minus` (no `always_/ever_`); TText `at/minus` +has a `list[str]→temporal_at_values(textset_make(...))` branch. + +```jsonc +// dispatch.geo (ADD to the 2 already in D1: at, distance) +"minus": { "fallback":"super", "result":"temporal", "dispatch":[ + {"py":"Point", "fn":"tpoint_minus_value","argTransform":"geoToGserialized","geodeticFromSelf":true}, + {"py":"BaseGeometry","fn":"tpoint_minus_geom", "argTransform":"geoToGserialized","geodeticFromSelf":true}, + {"py":"GeoSet", "fn":"temporal_minus_values"}, + {"py":"STBox", "fn":"tgeo_minus_stbox", "extraArgs":["true"]} ]}, +"nearest_approach_distance": { "fallback":"raise", "result":"scalar", "dispatch":[ + {"py":"BaseGeometry","fn":"nad_tgeo_geo","argTransform":"geoToGserialized","geodeticFromSelf":true}, + {"py":"STBox", "fn":"nad_tgeo_stbox"}, + {"py":"TPoint", "fn":"nad_tgeo_tgeo"} ]} + +// dispatch.temporal.tfloat (self entry => generic *_temporal_temporal) +"always_equal": {"fallback":"raise","result":"bool_gt0","dispatch":[ + {"py":"scalar","scalarType":"float","fn":"always_eq_tfloat_float","argTransform":"scalarValue"}, + {"py":"self","fn":"always_eq_temporal_temporal"}]}, +"always_not_equal":{"fallback":"raise","result":"bool_gt0","dispatch":[ + {"py":"scalar","scalarType":"float","fn":"always_ne_tfloat_float","argTransform":"scalarValue"}, + {"py":"self","fn":"always_ne_temporal_temporal"}]}, +"ever_equal": {"fallback":"raise","result":"bool_gt0","dispatch":[ + {"py":"scalar","scalarType":"float","fn":"ever_eq_tfloat_float","argTransform":"scalarValue"}, + {"py":"self","fn":"ever_eq_temporal_temporal"}]}, +"ever_not_equal": {"fallback":"raise","result":"bool_gt0","dispatch":[ + {"py":"scalar","scalarType":"float","fn":"ever_ne_tfloat_float","argTransform":"scalarValue"}, + {"py":"self","fn":"ever_ne_temporal_temporal"}]}, +"temporal_equal": {"fallback":"super","result":"temporal","dispatch":[ + {"py":"scalar","scalarType":"int|float","fn":"teq_tfloat_float","argTransform":"scalarCast"}]}, +"temporal_not_equal":{"fallback":"super","result":"temporal","dispatch":[ + {"py":"scalar","scalarType":"int|float","fn":"tne_tfloat_float","argTransform":"scalarCast"}]}, +"at": {"fallback":"super","result":"temporal","dispatch":[ + {"py":"scalar","scalarType":"int|float","fn":"tfloat_at_value","argTransform":"scalarCast"}, + {"py":"IntSet", "coerce":"to_floatset", "via":"super"}, + {"py":"IntSpan", "coerce":"to_floatspan", "via":"super"}, + {"py":"IntSpanSet","coerce":"to_floatspanset","via":"super"}]}, +"minus":{"fallback":"super","result":"temporal","dispatch":[ + {"py":"scalar","scalarType":"int|float","fn":"tfloat_minus_value","argTransform":"scalarCast"}, + {"py":"IntSet", "coerce":"to_floatset", "via":"super"}, + {"py":"IntSpan", "coerce":"to_floatspan", "via":"super"}, + {"py":"IntSpanSet","coerce":"to_floatspanset","via":"super"}]} + +// dispatch.temporal.tint +"always_equal": {"fallback":"raise","result":"bool_gt0","dispatch":[ + {"py":"scalar","scalarType":"int","fn":"always_eq_tint_int","argTransform":"scalarValue"}, + {"py":"self","fn":"always_eq_temporal_temporal"}]}, +"always_not_equal":{"fallback":"raise","result":"bool_gt0","dispatch":[ + {"py":"scalar","scalarType":"int","fn":"always_ne_tint_int","argTransform":"scalarValue"}, + {"py":"self","fn":"always_ne_temporal_temporal"}]}, +"ever_equal": {"fallback":"raise","result":"bool_gt0","dispatch":[ + {"py":"scalar","scalarType":"int","fn":"ever_eq_tint_int","argTransform":"scalarValue"}, + {"py":"self","fn":"ever_eq_temporal_temporal"}]}, +"ever_not_equal": {"fallback":"raise","result":"bool_gt0","dispatch":[ + {"py":"scalar","scalarType":"int","fn":"ever_ne_tint_int","argTransform":"scalarValue"}, + {"py":"self","fn":"ever_ne_temporal_temporal"}]}, +"temporal_equal": {"fallback":"super","result":"temporal","dispatch":[ + {"py":"scalar","scalarType":"int","fn":"teq_tint_int","argTransform":"scalarValue"}]}, +"temporal_not_equal":{"fallback":"super","result":"temporal","dispatch":[ + {"py":"scalar","scalarType":"int","fn":"tne_tint_int","argTransform":"scalarValue"}]}, +"at": {"fallback":"super","result":"temporal","dispatch":[ + {"py":"scalar","scalarType":"int|float","fn":"tint_at_value","argTransform":"scalarCast"}, + {"py":"FloatSet", "coerce":"to_intset", "via":"super"}, + {"py":"FloatSpan", "coerce":"to_intspan", "via":"super"}, + {"py":"FloatSpanSet","coerce":"to_intspanset","via":"super"}]}, +"minus":{"fallback":"super","result":"temporal","dispatch":[ + {"py":"scalar","scalarType":"int|float","fn":"tint_minus_value","argTransform":"scalarCast"}, + {"py":"FloatSet", "coerce":"to_intset", "via":"super"}, + {"py":"FloatSpan", "coerce":"to_intspan", "via":"super"}, + {"py":"FloatSpanSet","coerce":"to_intspanset","via":"super"}]} + +// dispatch.temporal.tbool (ONLY these; no always_/ever_ editorial) +"temporal_equal": {"fallback":"super","result":"temporal","dispatch":[ + {"py":"scalar","scalarType":"bool","fn":"teq_tbool_bool","argTransform":"scalarValue"}]}, +"temporal_not_equal":{"fallback":"super","result":"temporal","dispatch":[ + {"py":"scalar","scalarType":"bool","fn":"tne_tbool_bool","argTransform":"scalarValue"}]}, +"at": {"fallback":"super","result":"temporal","dispatch":[ + {"py":"scalar","scalarType":"bool","fn":"tbool_at_value","argTransform":"scalarValue"}]}, +"minus":{"fallback":"super","result":"temporal","dispatch":[ + {"py":"scalar","scalarType":"bool","fn":"tbool_minus_value","argTransform":"scalarValue"}]} + +// dispatch.temporal.ttext +"always_equal": {"fallback":"raise","result":"bool_gt0","dispatch":[ + {"py":"scalar","scalarType":"str","fn":"always_eq_ttext_text","argTransform":"scalarValue"}, + {"py":"self","fn":"always_eq_temporal_temporal"}]}, +"always_not_equal":{"fallback":"raise","result":"bool_gt0","dispatch":[ + {"py":"scalar","scalarType":"str","fn":"always_ne_ttext_text","argTransform":"scalarValue"}, + {"py":"self","fn":"always_ne_temporal_temporal"}]}, +"ever_equal": {"fallback":"raise","result":"bool_gt0","dispatch":[ + {"py":"scalar","scalarType":"str","fn":"ever_eq_ttext_text","argTransform":"scalarValue"}, + {"py":"self","fn":"ever_eq_temporal_temporal"}]}, +"ever_not_equal": {"fallback":"raise","result":"bool_gt0","dispatch":[ + {"py":"scalar","scalarType":"str","fn":"ever_ne_ttext_text","argTransform":"scalarValue"}, + {"py":"self","fn":"ever_ne_temporal_temporal"}]}, +"temporal_equal": {"fallback":"super","result":"temporal","dispatch":[ + {"py":"scalar","scalarType":"str","fn":"teq_ttext_text","argTransform":"scalarValue"}]}, +"temporal_not_equal":{"fallback":"super","result":"temporal","dispatch":[ + {"py":"scalar","scalarType":"str","fn":"tne_ttext_text","argTransform":"scalarValue"}]}, +"at": {"fallback":"super","result":"temporal","dispatch":[ + {"py":"scalar","scalarType":"str","fn":"ttext_at_value","argTransform":"scalarValue"}, + {"py":"list[str]","fn":"temporal_at_values","argTransform":"textsetMake"}]}, +"minus":{"fallback":"super","result":"temporal","dispatch":[ + {"py":"scalar","scalarType":"str","fn":"ttext_minus_value","argTransform":"scalarValue"}, + {"py":"list[str]","fn":"temporal_minus_values","argTransform":"textsetMake"}]} +``` + +Every `fn`, `scalarType`, `coerce`, `argTransform`, `fallback`, `result` +above is copied from the verbatim hand-written method bodies (extracted by +AST, not summarised). This is the SoT for the D1 extension; the consumer +side (PyMEOS #95) is being extended to the same vocabulary in lock-step.