diff --git a/README.md b/README.md index fb0a8d0..a812024 100644 --- a/README.md +++ b/README.md @@ -9,14 +9,17 @@ This catalog is the foundation for generating language bindings (Python, Java, R - [How it works](#how-it-works) - [Getting started](#getting-started) - [Output format](#output-format) +- [Service-projection metadata](#service-projection-metadata) - [Adding metadata](#adding-metadata) ## How it works -The pipeline runs in two steps: +The pipeline runs in four steps: 1. **Parser** — scans the MEOS `.h` header files using libclang and extracts every function signature, struct, and enum into structured JSON. -2. **Merger** — enriches the parser output with manual annotations from `meta/meos-meta.json`, such as documentation and memory ownership rules. +2. **Reconcile** — restores opaque types the PostgreSQL stub headers `#define` to `int` (`Interval`, `text`, …) from the header source, so they are not mistaken for `int *` out-parameters. +3. **Enrich** — derives the service-projection metadata (`category` / `typeEncodings` / `network` / `wire`). +4. **Merger** — applies manual annotations from `meta/meos-meta.json` (documentation, ownership, overrides) on top. ## Getting started @@ -80,6 +83,57 @@ A typical function entry looks like this: } ``` +## Service-projection metadata + +C headers describe *signatures*; they do not say what a function **is**, how an +opaque type crosses the wire, or whether an operation can be served +*statelessly*. A second pass (`parser/enrich.py`) derives that — the metadata a +service generator (OpenAPI, MCP, gRPC, …) needs to project MEOS onto a network +API. It runs **before** the merge, so every derived field is overridable from +`meta/meos-meta.json`. + +Each function gains a `category`, a `network` verdict, and a `wire` mapping: + +```json +{ + "name": "temporal_eq", + "returnType": { "c": "bool", "canonical": "int" }, + "params": [ { "name": "temp1", "canonical": "const struct Temporal *" }, + { "name": "temp2", "canonical": "const struct Temporal *" } ], + "category": "predicate", + "network": { "exposable": true, "method": "POST", "reason": null }, + "wire": { + "params": [ + { "name": "temp1", "kind": "serialized", "cType": "const struct Temporal *", + "decode": "temporal_in", "encodings": ["mfjson","text","wkb"] }, + { "name": "temp2", "kind": "serialized", "cType": "const struct Temporal *", + "decode": "temporal_in", "encodings": ["mfjson","text","wkb"] } + ], + "result": { "kind": "json", "json": "integer" } + } +} +``` + +(MEOS predicates return `int`, and libclang emits canonical spellings such as +`const struct Temporal *` — the enrichment matches those.) + +```text +Live coverage (MobilityDB master): 2161 public + 511 internal functions. +The service projects the public user API; internal (meos_internal*.h, +Datum-generic) is policy-excluded. + 1963 / 2161 = 91% of the public API stateless-exposable (verified). +``` + +The catalog also gains a top-level `typeEncodings` map (opaque type → its +in/out functions) and an `enrichment` summary (category counts, exposable +count) for coverage tracking. Non-exposable functions carry a precise +`reason` (`array-or-out-param:…`, `no-encoder:…`, `lifecycle`, `index`, …) so +generators can report exactly what they can and cannot emit. + +See [`docs/enrichment.md`](docs/enrichment.md) for the full contract and +[`tests/test_enrich.py`](tests/test_enrich.py) for worked examples on real +MEOS signatures (run: `python3 tests/test_enrich.py`). + ## Adding metadata -Manual annotations (ownership rules, additional documentation, deprecation flags, etc.) live in `meta/meos-meta.json`. The merger applies them on top of the libclang-parsed structure when generating the final catalog. +Manual annotations (ownership rules, additional documentation, deprecation flags, etc.) live in `meta/meos-meta.json`. The merger applies them on top of the libclang-parsed structure when generating the final catalog — including any field derived by the service-projection pass (e.g. correcting a `category` or forcing `network.exposable`). diff --git a/docs/enrichment.md b/docs/enrichment.md new file mode 100644 index 0000000..388cf09 --- /dev/null +++ b/docs/enrichment.md @@ -0,0 +1,251 @@ +# Service-projection enrichment + +The libclang parser can only report what is written in the C headers: +function signatures, structs, enums. Projecting MEOS onto a network service +(OpenAPI / MCP / gRPC) needs information the headers do **not** contain — what +each function *is*, how each opaque type crosses the wire, and whether an +operation can be served *statelessly*. + +`parser/enrich.py` derives that metadata from the parsed catalog. It runs +**before** `merge_meta`, so every field below can be overridden per +function/type from `meta/meos-meta.json`. + +> All values here are heuristic defaults. They are intended to be *curated* +> over time through `meta/meos-meta.json`; the heuristics give a correct-by- +> default starting point and keep coverage measurable. + +## 1. Function `category` + +Each function gets one `category` (first matching rule wins): + +| category | rule (by name / signature) | +|------------------|-------------------------------------------------------------| +| `lifecycle` | name starts `meos_` (init/finalize/configuration) | +| `index` | name starts `rtree_` | +| `io` | name matches an in/out encoding pattern (`_in`, `_out`, `_from_mfjson`, `_as_hexwkb`, `_from_wkb`, …) | +| `aggregate` | name ends `_transfn`/`_combinefn`/`_finalfn`, or `_tagg`/`_collect` | +| `predicate` | comparison/topological/temporal predicate by name (`*_eq`, `*_lt`, `contains_*`, `intersects_*`, `dwithin_*`, `ever_*`, `always_*`, `t{eq,contains,…}_*`, …) or returns `bool`. MEOS predicates return `int`, so name patterns — not the return type — drive this. | +| `constructor` | name ends `_make`/`_copy` | +| `setop` | name starts `union_`/`intersection_`/`minus_`/`difference_` | +| `conversion` | name contains `_to_`/`_from_`/`_from_base`/`_as_` | +| `accessor` | name ends with a component/property pattern (`_value`, `_srid`, `_duration`, `_num_*`, …) | +| `transformation` | default: value → value of the same family | + +## 2. `typeEncodings` + +A top-level map: opaque C type → how it round-trips to the wire. Built by +scanning the catalog for the type's own in/out functions. + +```json +"typeEncodings": { + "Temporal": { + "encodings": ["mfjson", "text", "wkb"], + "decoders": { "text": "temporal_in", "mfjson": "temporal_from_mfjson" }, + "encoders": { "text": "temporal_out", "wkb": "temporal_as_hexwkb" }, + "in": "temporal_in", + "out": "temporal_out" + } +} +``` + +- **decoder** — `const char * (+ aux) → T *` (`*_in`, `*_from_mfjson`, …) +- **encoder** — `const T * (+ aux) → char *` (`*_out`, `*_as_mfjson`, …) +- `in`/`out` — the preferred decoder/encoder, `text` > `mfjson` > `wkb`; + among candidates the **generic root** (`_in`/`_out`) is preferred + (so `temporal_out` serialises *every* subtype), else a deterministic + alphabetical pick. `in_aux`/`out_aux` carry the trailing args. + +> **Auxiliary arguments.** Real MEOS in/out wrappers (the public functions +> in the `*_meos.c` files) are not pure `(str)->T` / `(T)->str`: they take +> trailing *formatting* scalars — `temporal_out(temp, int maxdd)`, +> `*_as_mfjson(temp, with_bbox, flags, precision, srs)`. Those are safe to +> default (`maxdd`/`precision` → 15, flags/bbox → 0, `srs` → NULL), so the +> wrapper still satisfies the stateless contract; the defaults are recorded +> in `in_aux`/`out_aux` for the runtime to pass. A trailing arg that is +> *not* a defaultable formatting scalar disqualifies the wrapper: a +> semantic `*type` tag (`temporal_in`'s `temptype` — tagged +> `@ingroup meos_internal` in MEOS) or a pointer/array (`*_as_wkb`'s +> `size_out`). So polymorphic `Temporal` *decoding* resolves to a typed +> wrapper (`tbool_in`, …) — subtype-narrow on input; carrying the subtype +> on the wire for a universal decode is future work. *Encoding* is already +> universal via the generic `temporal_out`. + +The same data is folded onto each `structs[*]` entry as `serialization`. + +## 3. `network` and `wire` + +For every function: + +```json +"category": "predicate", +"network": { "exposable": true, "method": "POST", "reason": null }, +"wire": { + "params": [ + { "name": "temp1", "kind": "serialized", "cType": "Temporal *", + "decode": "temporal_in", "encodings": ["mfjson","text","wkb"] }, + { "name": "temp2", "kind": "serialized", "cType": "Temporal *", + "decode": "temporal_in", "encodings": ["mfjson","text","wkb"] } + ], + "result": { "kind": "json", "json": "boolean" } +} +``` + +`wire` element `kind`: + +| kind | meaning | JSON Schema hint | +|--------------|------------------------------------------------------|------------------| +| `json` | scalar; `json` ∈ integer/number/boolean/string. Enum types add `"enum": ""` | direct | +| `serialized` | opaque value carried as a string in its `encodings`; `decode`/`encode` names the MEOS function | `{"type":"string"}` + media type | +| `array` | JSON array of `element` (an `Elem` builder param, or an `Elem **`+count return) | `{"type":"array","items":…}` | +| `void` | no return value | `204` | +| `unsupported`| cannot be represented in a stateless request/response| — | + +> **Canonical spellings.** libclang emits canonical C, not source aliases: +> `struct Temporal *` (not `Temporal *`), `unsigned char` (not `uint8_t`), +> `long` (not `int64_t`), and MEOS uses `int` for booleans. The heuristics +> match canonical spellings; `struct`/`union`/`enum` qualifiers are stripped +> so `typeEncodings` keys are clean (`Temporal`, not `struct Temporal`). +> `enum` parameters are scalars; only declared structs become `serialized`. + +### Exposability + +`network.exposable` is `true` iff **every** parameter is `json` or +`serialized` **and** the result is `json`, `serialized`, or `void`. +Otherwise `exposable` is `false` and `reason` lists the blockers +(deduplicated, `;`-joined): + +| reason | cause | +|------------------------------|-------------------------------------------------------------| +| `lifecycle` / `index` | library plumbing, not a domain operation | +| `array-or-out-param:` | parameter is a pointer to a scalar/array or an out-parameter (`T **`, `int *`, `double *`, …) | +| `no-decoder:` | opaque parameter type has no parser function | +| `no-encoder:` | opaque return type has no serializer function | +| `unsupported-return:` | return cannot be represented on the wire | + +> **Out-parameters.** A common MEOS accessor shape — +> `bool f(.., T *result)` (or `T **result`) — returns its value through a +> trailing out-parameter, with the `bool`/`int` return acting as a presence +> flag (`void` = always present). Two safe shapes are recognised: a +> **scalar** `T *result` becomes the JSON `result`; an **opaque** +> `T **result` becomes a `serialized` result via the type's encoder. In +> both cases `from_outparam`/`out_ctype`/`presence_return` annotate the +> wire result, the function is `exposable`, and a false presence return +> maps to *no value* (HTTP 204). This recovers the public `*_value_n`, +> box-bound, and `*_value_at`/geo-accessor families. + +> **Input-array builders.** A builder taking `(Elem **arr, int count)` +> becomes one wire param of `kind: "array"` (element = the serialized +> `Elem`); the `count` is implicit (the JSON array length). This recovers +> `*_make` / `*_merge_array` / `*arr_to_*` builders whose element type is +> decodable. + +> **Array returns.** An accessor `Elem **f(.., int *count)` returning a +> freshly-allocated element array becomes a `result` of `kind: "array"` +> (element = the serialized `Elem`, `count_outparam` names the byref +> length). Recovers `temporal_instants`/`segments`/`sequences`, +> `tgeo_values`, `geo_pointarr`, … whose element type is encodable. + +This is the precise, machine-checkable boundary of what a generator can emit +today. Functions still blocked only by `array-or-out-param` (multi- or +array out-params, builder `T **`+count) are the candidates for the next, +hand-designed composite-endpoint unit; everything `exposable` can be +generated mechanically. + +## 4. Overriding + +`merge_meta` applies `meta/meos-meta.json` *after* enrichment, so any derived +field can be corrected by hand: + +```json +{ + "functions": { + "temporal_at_value": { "category": "transformation" }, + "some_fn": { "network": { "exposable": false, "reason": "side-effect" } } + } +} +``` + +## 5. Catalog summary + +`enrich_idl` adds an `enrichment` block for coverage tracking. Run against +the live MobilityDB `master` catalog (2672 functions, 47 structs, 6 enums): + +```json +"enrichment": { + "categoryCounts": { ... }, + "publicFunctions": 2161, + "internalFunctions": 511, + "exposableFunctions": 1963 +} +``` + +MEOS has two API surfaces: the **public user API** (`meos.h` + the public +type headers, 2161 functions) and the **internal programmer API** +(`meos_internal*.h`, 511 functions — type-erased `Datum`-generic, +undocumented for end users). A network service projects the *user* API, so +internal functions are **policy-excluded** (`reason: internal`, like +`lifecycle`/`index`); `132/133` `Datum` functions are internal and never +belonged in the parity denominator. + +So **1963 / 2161 = 91% of the public API** projects onto a stateless +endpoint as-is — verified by a strict invariant (every exposable function +has only scalar/enum/serialized/array params with a real decoder, and an +encodable result; 0 violations; 0 internal leaks). The 209-function public +remainder is dominated by **irregular** signatures with no clean stateless +shape (mixed/odd `array-or-out-param`, raw-array `unsupported-return`, +out-params of codec-less types) plus `SkipList` aggregate state and +`lifecycle`/`index` plumbing — all excluded with a truthful `reason`, +never silently mis-called. + +`report.py` emits `output/meos-coverage.json`: an **actionable worklist**. +Every non-exposable public function carries a `class` and a concrete +`suggest` — the precise upstream regularization that closes it — and +`byClass` ranks classes by leverage. Live (`master`, gap 198): + +| class | n | upstream action | +|---|---|---| +| `out-param-naming` | 76 | rename the lone out-parameter to `result` (one-liner each) | +| `plumbing` | 37 | none — `lifecycle`/`index`, intentionally not exposed | +| `stateful` | 30 | none — aggregate state; needs a stateful endpoint, not a stateless RPC | +| `array-return-shape` | 21 | add a trailing `int *count` (+ element encoder) | +| `multi-out` | 13 | return a struct, or split into single-result accessors | +| `other` / `no-codec` / `array-shape` / `internal-generic` | 11+7+3+1 | add a 1-arg `T_in`/`T_out`; keep `Datum`-generic internal | + +**Honest ceiling.** ~91% is the *safe principled* maximum for this layer. +The `out-param-naming` 76 are **not** closed on this side on purpose: +a name-agnostic "trailing pointer ⇒ out-parameter" rule would misread +genuine pointer *inputs* as out-parameters — silently wrong answers. +Correctness-over-coverage makes these a cheap **upstream** rename, now +enumerated. Everything remaining is therefore upstream (precisely +specified above) or definitional (`stateful`/`plumbing` — correct, +labelled exclusions, never silent). The catalog regenerates from headers, +so upstream fixes flow in automatically on the next run (no coupling). + +### Compensation register (retirement path) + +Each heuristic exists only to absorb a current irregularity and is +**deleted** once the irregularity is uniformized upstream: + +| compensation | absorbs | retire when upstream… | +|---|---|---| +| `header_types.reconcile` / `_preserved_opaque` | stub `#define`s erasing `Interval`/`text`/`Datum` to `int` | the public headers no longer route opaque types through `int` stubs | +| `_aux_specs` (default `maxdd`/mfjson flags) | in/out wrappers carrying formatting args | I/O wrappers are pure `(str)->T`/`(T)->str` | +| typed-decode (`tbool_in` for polymorphic `Temporal`) | generic `temporal_in` needing a `meosType` tag | a tag-free polymorphic decoder exists | +| out-param / array-builder / array-return shapes | irregular out/array signatures | signatures follow the canonical shapes in the worklist | + +`tests/test_coverage_gate.py` asserts coverage does not regress. + +> **Type reconciliation.** The PostgreSQL stub headers `#define` +> `Interval`/`text`/`TimestampTz`/… to `int` *before* libclang parses, so +> those opaque pointers reached the catalog as `int *` — +> indistinguishable from a real `int *` out-parameter. `parser/ +> header_types.py` re-scans the header *source* and restores the true +> named type wherever libclang produced a bare scalar but the source +> declares a distinct named pointer (scalar typedefs like `TimestampTz` +> are deliberately left resolved). This is primarily a *correctness* fix — +> `add_timestamptz_interval`'s `interv` is now honestly `Interval`, not a +> phantom `int *`. `text *` is then treated as a JSON string (it *is* one), +> and the opaque-codec gate spans *any* named non-scalar pointer type (not +> just parsed structs), so reconciled types register their own in/out +> (`Interval` ↔ `pg_interval_in`/`interval_out`). Together this lifts +> verified public coverage to 1963/2161 (91%). diff --git a/parser/enrich.py b/parser/enrich.py new file mode 100644 index 0000000..e699e85 --- /dev/null +++ b/parser/enrich.py @@ -0,0 +1,549 @@ +"""Service-projection enrichment. + +Derives, for every function and type in the parsed catalog, the metadata a +service generator (OpenAPI, MCP, gRPC, ...) needs but that *cannot* be read +from C headers: + +- ``category`` — a coarse semantic class (constructor, predicate, io, ...). +- ``typeEncodings``— for each opaque C type, how it round-trips to the wire + (text / MF-JSON / WKB) and the function names that do it. +- ``network`` — whether the function can be projected onto a *stateless* + endpoint, and if not, why. +- ``wire`` — per-parameter and return value, the concrete request / + response representation a generator should emit. + +Everything here is a heuristic *default*. The pass runs before the manual +merge step, so any field can be overridden per function/type from +``meta/meos-meta.json`` (the merger applies on top). This module is +deliberately free of any libclang dependency: it operates purely on the +parsed ``idl`` dict and is therefore unit-testable on its own. + +Note: libclang emits *canonical* C spellings — ``struct Temporal *`` (not +``Temporal *``), ``unsigned char`` (not ``uint8_t``), ``long`` (not +``int64_t``), and MEOS uses ``int`` for booleans. The heuristics below match +those canonical spellings. See ``docs/enrichment.md`` for the full contract. +""" + +import re + +# Ordered category vocabulary. The first matching rule wins. +CATEGORIES = ( + "lifecycle", # library/process setup, configuration, teardown + "index", # in-memory index objects (RTree, ...) + "io", # parse/serialize between a value and a wire encoding + "aggregate", # aggregate transition/combine/final functions + "predicate", # boolean question about value(s) + "constructor", # build a value (_make, _copy) + "setop", # set algebra (union/intersection/minus/...) + "conversion", # convert one value into another representation + "accessor", # read a component/property of a value + "transformation", # value -> value of the same family + "other", # anything not classified above +) + +# Canonical scalar spellings as emitted by libclang. +_INT_BASES = { + "char", "signed char", "unsigned char", + "short", "unsigned short", "int", "unsigned int", + "long", "unsigned long", "long long", "unsigned long long", +} +_FLOAT_BASES = {"float", "double", "long double"} +_BOOL_BASES = {"bool", "_Bool"} +# char-like pointers carry text or bytes (WKB) — represented as a JSON string. +_STRING_PTR_BASES = {"char", "signed char", "unsigned char"} + +# name suffix -> wire encoding produced (encoder) / consumed (decoder) +_ENCODERS = [ + (re.compile(r"_out$"), "text"), + (re.compile(r"_as_text$"), "text"), + (re.compile(r"_as_e?wkt$"), "text"), + (re.compile(r"_as_mfjson$"), "mfjson"), + (re.compile(r"_as_geojson$"), "mfjson"), + (re.compile(r"_as_hex_?wkb$"), "wkb"), + (re.compile(r"_as_e?wkb$"), "wkb"), +] +_DECODERS = [ + (re.compile(r"_in$"), "text"), + (re.compile(r"_from_e?wkt$"), "text"), + (re.compile(r"_from_text$"), "text"), + (re.compile(r"_from_mfjson$"), "mfjson"), + (re.compile(r"_from_geojson$"), "mfjson"), + (re.compile(r"_from_hex_?wkb$"), "wkb"), + (re.compile(r"_from_e?wkb$"), "wkb"), +] +_IO_RE = [rx for rx, _ in _DECODERS + _ENCODERS] + +_LIFECYCLE_RE = re.compile(r"^meos_") +_INDEX_RE = re.compile(r"^rtree_") +_AGG_RE = re.compile(r"(_transfn|_combinefn|_finalfn)$|_tagg|_collect$") +_CONSTRUCTOR_RE = re.compile(r"(_make|_copy)$") +_SETOP_RE = re.compile(r"^(union|intersection|minus|difference)_") +_CONVERSION_RE = re.compile(r"_to_|_from_base|_from_|_as_") +_ACCESSOR_RE = re.compile( + r"(_values?|_start_value|_end_value|_min_value|_max_value|_srid|_timespan|" + r"_duration|_length|_num_[a-z]+|_n|_lower|_upper|_start_[a-z]+|_end_[a-z]+|" + r"_get_[a-z]+|_value_n)$" +) +# MEOS predicates return `int`, so they must be recognised by name. +_PREDICATE_RE = re.compile( + r"^(ever|always)_|(_eq|_ne|_lt|_le|_gt|_ge|_cmp)$|" + r"^(contains|contained|overlaps|overbefore|overafter|overleft|overright|" + r"overbelow|overabove|overfront|overback|left|right|below|above|front|" + r"back|before|after|adjacent|same|intersects|disjoint|touches|dwithin|" + r"covers|coveredby|equals|crosses|within|relate)_|" + r"^t(eq|ne|lt|le|gt|ge|contains|intersects|disjoint|touches|dwithin)_" +) + +_QUAL_RE = re.compile(r"\b(const|volatile|struct|union|enum)\b") + + +def _base(c_type: str) -> str: + """Bare type token: qualifiers, ``struct``/``union``/``enum`` and ``*`` + stripped (so ``const struct Temporal *`` -> ``Temporal``).""" + return " ".join(_QUAL_RE.sub(" ", c_type).replace("*", " ").split()) + + +def _ptr_depth(c_type: str) -> int: + return c_type.count("*") + + +def _scalar_wire(c_type: str, enums: set): + """Wire descriptor for a non-opaque scalar/string/enum, else ``None``.""" + base, depth = _base(c_type), _ptr_depth(c_type) + if base == "void" and depth == 0: + return {"kind": "void"} + # char-likes and PostgreSQL `text` are just strings on the wire. + if depth == 1 and (base in _STRING_PTR_BASES or base == "text"): + return {"kind": "json", "json": "string"} + if depth == 0: + if base in _BOOL_BASES: + return {"kind": "json", "json": "boolean"} + if base in _FLOAT_BASES: + return {"kind": "json", "json": "number"} + if base in _INT_BASES: + return {"kind": "json", "json": "integer"} + if base in enums: + return {"kind": "json", "json": "string", "enum": base} + return None + + +def _is_scalar_pointer(c_type: str, enums: set) -> bool: + """``int *`` / ``double *`` / ``interpType *`` etc. — an array/out-param, + as opposed to a pointer to an opaque value (``struct Temporal *``).""" + base, depth = _base(c_type), _ptr_depth(c_type) + if depth >= 2: + return True + if depth == 1 and base not in _STRING_PTR_BASES: + return (base in _INT_BASES or base in _FLOAT_BASES + or base in _BOOL_BASES or base in enums) + return False + + +def _aux_specs(params: list): + """Defaults for the trailing args of an in/out helper. + + Real MEOS in/out helpers are not pure ``(str)->T`` / ``(T)->str``: they + take trailing *formatting* scalars — ``temporal_out(temp, int maxdd)``, + ``*_as_mfjson(temp, with_bbox, flags, precision, srs)``. Those are safe + to default, so the helper still satisfies the stateless contract. + + Returns the aux spec list (one ``{name, kind, default}`` per trailing + param), or ``None`` if any trailing parameter is *not* a defaultable + formatting scalar — a semantic ``*type`` tag (``temporal_in``'s + ``temptype``), a pointer/array (``*_as_wkb``'s ``size_out``), etc. — + which disqualifies the helper entirely. + """ + specs = [] + for p in params: + sc = _scalar_wire(p["canonical"], set()) + if sc is None or sc.get("kind") != "json": + return None # pointer / array / opaque aux + nm = p["name"].lower() + if "type" in nm: # temptype/basetype/settype tag + return None + j = sc["json"] + if j == "integer": + default = (15 if any(k in nm for k in + ("maxdd", "decimal", "digit", "precision")) + else 0) + elif j == "number": + default = 0.0 + elif j == "boolean": + default = False + else: # string (e.g. srs) -> NULL + default = None + specs.append({"name": p["name"], "kind": j, "default": default}) + return specs + + +def build_type_encodings(functions: list, structs: set) -> dict: + """Scan the catalog for the in/out functions of every opaque struct. + + A *decoder* turns a wire string into an object (returns ``struct T *``, + first arg a char-like string). An *encoder* turns an object into a wire + string (returns ``char *``, first arg ``const struct T *``). Trailing + *formatting* scalars are allowed and defaulted (see ``_aux_specs``); a + non-defaultable trailing arg disqualifies the helper. Only declared + structs qualify, so primitives never register by accident. + """ + enc: dict[str, dict] = {} + + def slot(b: str) -> dict: + return enc.setdefault(b, {"encodings": set(), "encoders": {}, + "decoders": {}}) + + for fn in functions: + name = fn["name"] + ret = fn["returnType"]["canonical"] + params = fn.get("params", []) + if not params: + continue + p0 = params[0]["canonical"] + rb, rd = _base(ret), _ptr_depth(ret) + pb, pd = _base(p0), _ptr_depth(p0) + aux = _aux_specs(params[1:]) # None => non-defaultable trailing arg + + # Decoder: const char* (+ defaultable scalar aux) -> opaque struct + if (aux is not None and rd >= 1 and rb in structs + and pd == 1 and pb in _STRING_PTR_BASES): + for rx, encoding in _DECODERS: + if rx.search(name): + s = slot(rb) + s["encodings"].add(encoding) + s["decoders"].setdefault(encoding, {})[name] = aux + break + + # Encoder: const struct T* (+ defaultable scalar aux) -> char* + if (aux is not None and rd == 1 and rb in _STRING_PTR_BASES + and pd >= 1 and pb in structs): + for rx, encoding in _ENCODERS: + if rx.search(name): + s = slot(pb) + s["encodings"].add(encoding) + s["encoders"].setdefault(encoding, {})[name] = aux + break + + # Several typed functions can serve one encoding (e.g. tbool_in, + # tint_in, ... all decode a `Temporal *`). Prefer the *generic root* + # (`_in` / `_out`) so the chosen in/out works for every subtype + # (and `temporal_out` correctly serialises any subtype); fall back to a + # deterministic alphabetical pick otherwise. + order = ("text", "mfjson", "wkb") + dec_suffix = {"text": "_in", "mfjson": "_from_mfjson", + "wkb": "_from_hexwkb"} + enc_suffix = {"text": "_out", "mfjson": "_as_mfjson", + "wkb": "_as_hexwkb"} + + def choose(cands: dict, base: str, suffix: str) -> str: + generic = base.lower() + suffix + return generic if generic in cands else sorted(cands)[0] + + out: dict[str, dict] = {} + for base, s in enc.items(): + dec = {e: choose(c, base, dec_suffix[e]) + for e, c in s["decoders"].items()} + encd = {e: choose(c, base, enc_suffix[e]) + for e, c in s["encoders"].items()} + in_e = next((e for e in order if e in dec), None) + out_e = next((e for e in order if e in encd), None) + out[base] = { + "encodings": sorted(s["encodings"]), + "decoders": dec, + "encoders": encd, + "in": dec.get(in_e) if in_e else None, + "out": encd.get(out_e) if out_e else None, + "in_aux": s["decoders"][in_e][dec[in_e]] if in_e else [], + "out_aux": s["encoders"][out_e][encd[out_e]] if out_e else [], + } + return out + + +def classify_category(fn: dict) -> str: + name = fn["name"] + ret = fn["returnType"]["canonical"] + + if _LIFECYCLE_RE.match(name): + return "lifecycle" + if _INDEX_RE.match(name): + return "index" + if any(rx.search(name) for rx in _IO_RE): + return "io" + if _AGG_RE.search(name): + return "aggregate" + if _PREDICATE_RE.search(name) or _base(ret) in _BOOL_BASES: + return "predicate" + if _CONSTRUCTOR_RE.search(name): + return "constructor" + if _SETOP_RE.match(name): + return "setop" + if _CONVERSION_RE.search(name): + return "conversion" + if _ACCESSOR_RE.search(name): + return "accessor" + return "transformation" + + +# MEOS splits its API: the public *user* surface (meos.h + the public type +# headers) and the *internal* programmer surface. The latter is type-erased +# (``Datum``-generic), undocumented for end users, and must not be projected +# onto a network service — it is policy-excluded, like lifecycle/index. +_INTERNAL_FILES = {"meos_internal.h", "meos_internal_geo.h"} + + +def _outparam(fn: dict, enums: set, type_encodings: dict): + """A MEOS accessor of the form ``bool f(.., T *result)`` returns its + value through a trailing out-parameter, with the ``bool``/``int`` return + (or ``void``) as a presence flag. Two safe shapes: + + - ``scalar``: ``T *result`` where ``T`` is a JSON scalar. + - ``opaque``: ``T **result`` where ``T`` has an encoder (serialised). + + Returns ``(leading_params, outparam, mode)`` or ``(None, None, None)``. + """ + ps = fn.get("params", []) + ret = fn["returnType"]["canonical"] + if (not ps or _ptr_depth(ret) != 0 + or _base(ret) not in ("bool", "_Bool", "int", "void")): + return None, None, None + last = ps[-1] + if last["name"] not in ("result", "value"): + return None, None, None + c = last["canonical"] + if _ptr_depth(c) == 1: + pointee = c.replace("const", "").replace("*", "").strip() + sw = _scalar_wire(pointee, enums) + if sw is not None and sw.get("kind") == "json": + return ps[:-1], last, "scalar" + elif _ptr_depth(c) == 2: + te = type_encodings.get(_base(c)) + if te and te.get("out"): + return ps[:-1], last, "opaque" + return None, None, None + + +def _array_param(params: list, type_encodings: dict): + """A MEOS builder takes an element array as a ``(Elem **arr, int count)`` + pair. Detect the first such pair whose element type is decodable, so the + array can be projected as a JSON list (the ``count`` is then implicit). + Returns ``(arr_index, count_index)`` or ``(None, None)``. + """ + for i, p in enumerate(params): + c = p["canonical"] + if (_ptr_depth(c) == 2 and p["name"] not in ("result", "value") + and i + 1 < len(params)): + te = type_encodings.get(_base(c)) + nxt = params[i + 1]["canonical"] + if (te and te.get("in") and _base(nxt) == "int" + and _ptr_depth(nxt) == 0): + return i, i + 1 + return None, None + + +def _array_return(fn: dict, type_encodings: dict): + """A MEOS accessor that returns a freshly-allocated element array as + ``Elem **f(.., int *count)`` (e.g. ``temporal_sequences``). The element + type must be encodable. Returns ``(count_param_name, elem_base)`` or + ``(None, None)``. + """ + ret = fn["returnType"]["canonical"] + if _ptr_depth(ret) != 2: + return None, None + rb = _base(ret) + te = type_encodings.get(rb) + if not (te and te.get("out")): + return None, None + cnt = [p for p in fn.get("params", []) + if _ptr_depth(p["canonical"]) == 1 + and _base(p["canonical"]) in ("int", "long") + and p["name"] in ("count", "n", "nvalues", "size", "npoints")] + if len(cnt) != 1: + return None, None + return cnt[0]["name"], rb + + +def assess(fn: dict, type_encodings: dict, enums: set) -> tuple: + """Return ``(network, wire)`` for one function. + + Exposable over a stateless endpoint iff every parameter can be decoded + from the request and the return can be encoded into the response. + Pointer-to-scalar parameters (arrays / out-params) and opaque types + lacking an in/out function make it non-exposable; the reason is recorded. + """ + reasons: list[str] = [] + wire_params = [] + + out_lead, out_p, out_mode = _outparam(fn, enums, type_encodings) + eff = out_lead if out_p is not None else fn.get("params", []) + arr_i, count_i = _array_param(eff, type_encodings) + ret_count_name, ret_elem = _array_return(fn, type_encodings) + ret_count_i = (next((i for i, p in enumerate(eff) + if p["name"] == ret_count_name), None) + if ret_elem is not None else None) + + for idx, p in enumerate(eff): + if idx == count_i or idx == ret_count_i: + continue # array length is implicit + c = p["canonical"] + if idx == arr_i: + elem = type_encodings[_base(c)] + wire_params.append({ + "name": p["name"], "kind": "array", + "count_param": eff[count_i]["name"], + "element": { + "kind": "serialized", + "cType": " ".join(c.replace("*", " ").split()) + " *", + "decode": elem["in"], + "decode_aux": elem.get("in_aux", []), + "encodings": elem["encodings"], + }, + }) + continue + scalar = _scalar_wire(c, enums) + if scalar is not None: + wire_params.append({"name": p["name"], **scalar}) + continue + if _is_scalar_pointer(c, enums): + reasons.append(f"array-or-out-param:{p['name']}") + wire_params.append({"name": p["name"], "kind": "unsupported"}) + continue + base = _base(c) + te = type_encodings.get(base) + if te and te["in"]: + wire_params.append({ + "name": p["name"], "kind": "serialized", "cType": c, + "decode": te["in"], "decode_aux": te.get("in_aux", []), + "encodings": te["encodings"], + }) + else: + reasons.append(f"no-decoder:{base}") + wire_params.append({"name": p["name"], "kind": "unsupported"}) + + ret = fn["returnType"]["canonical"] + if out_p is not None: + # a bool/int C return is a presence flag; void = always present + presence = _base(ret) in ("bool", "_Bool", "int") + if out_mode == "scalar": + pointee = out_p["canonical"].replace("const", "").replace( + "*", "").strip() + wire_result = { + **_scalar_wire(pointee, enums), # kind:"json", json:… + "from_outparam": out_p["name"], + "out_ctype": out_p["canonical"], + "presence_return": presence, + } + else: # opaque T **result + te = type_encodings[_base(out_p["canonical"])] + wire_result = { + "kind": "serialized", + "cType": out_p["canonical"], + "encode": te["out"], "encode_aux": te.get("out_aux", []), + "encodings": te["encodings"], + "from_outparam": out_p["name"], + "out_ctype": out_p["canonical"], + "presence_return": presence, + } + scalar = "handled" + elif ret_elem is not None: + te = type_encodings[ret_elem] + wire_result = { + "kind": "array", + "element": { + "kind": "serialized", + "cType": " ".join(ret.replace("*", " ").split()) + " *", + "encode": te["out"], "encode_aux": te.get("out_aux", []), + "encodings": te["encodings"], + }, + "count_outparam": ret_count_name, + } + scalar = "handled" + else: + scalar = _scalar_wire(ret, enums) + if scalar == "handled": + pass + elif scalar is not None: + wire_result = scalar + elif _is_scalar_pointer(ret, enums): + reasons.append(f"unsupported-return:{ret}") + wire_result = {"kind": "unsupported"} + else: + base, depth = _base(ret), _ptr_depth(ret) + te = type_encodings.get(base) + if depth == 1 and te and te["out"]: + wire_result = {"kind": "serialized", "cType": ret, + "encode": te["out"], + "encode_aux": te.get("out_aux", []), + "encodings": te["encodings"]} + else: + reasons.append( + f"no-encoder:{base}" if depth == 1 + else f"unsupported-return:{ret}" + ) + wire_result = {"kind": "unsupported"} + + if fn["category"] in ("lifecycle", "index"): + reasons.insert(0, fn["category"]) + if fn.get("api") == "internal": + reasons.insert(0, "internal") + + exposable = not reasons + network = { + "exposable": exposable, + "method": "POST" if exposable else None, + "reason": None if exposable else "; ".join(dict.fromkeys(reasons)), + } + return network, {"params": wire_params, "result": wire_result} + + +def enrich_idl(idl: dict) -> dict: + """Augment ``idl`` in place with service-projection metadata.""" + functions = idl.get("functions", []) + struct_names = {s["name"] for s in idl.get("structs", [])} + enum_names = {e["name"] for e in idl.get("enums", [])} + + # An opaque type is any *named* pointer type that is not a scalar / enum + # / string. Beyond parsed structs this also covers reconciled + # PostgreSQL/PostGIS types (`Interval`, `GBOX`, ...) so their own in/out + # wrappers can register a codec instead of being dead `no-decoder`s. + _scalarish = (_INT_BASES | _FLOAT_BASES | _BOOL_BASES | _STRING_PTR_BASES + | enum_names | {"void", "text"}) + opaque_names = set(struct_names) + for fn in functions: + for c in ([fn["returnType"]["canonical"]] + + [p["canonical"] for p in fn.get("params", [])]): + b = _base(c) + if _ptr_depth(c) >= 1 and b and b not in _scalarish: + opaque_names.add(b) + + type_encodings = build_type_encodings(functions, opaque_names) + + for fn in functions: + fn["api"] = ("internal" if fn.get("file") in _INTERNAL_FILES + else "public") + fn["category"] = classify_category(fn) + network, wire = assess(fn, type_encodings, enum_names) + fn["network"] = network + fn["wire"] = wire + + for struct in idl.get("structs", []): + te = type_encodings.get(struct["name"]) + if te: + struct["serialization"] = { + "encodings": te["encodings"], "in": te["in"], "out": te["out"], + } + idl["typeEncodings"] = type_encodings + + counts: dict[str, int] = {} + for fn in functions: + counts[fn["category"]] = counts.get(fn["category"], 0) + 1 + public = [fn for fn in functions if fn["api"] == "public"] + idl["enrichment"] = { + "categoryCounts": counts, + "publicFunctions": len(public), + "internalFunctions": len(functions) - len(public), + # Internal functions are policy-excluded, so this equals the + # public-exposable count — the meaningful parity numerator. + "exposableFunctions": sum( + 1 for fn in functions if fn["network"]["exposable"] + ), + } + return idl diff --git a/parser/extractors.py b/parser/extractors.py index a5855e9..e67f542 100644 --- a/parser/extractors.py +++ b/parser/extractors.py @@ -31,6 +31,48 @@ def _c_spelling(ty) -> str: return spelling +# Canonical spellings of plain C scalars/builtins. +_SCALAR_CANON = { + "void", "_Bool", "bool", "char", "signed char", "unsigned char", + "short", "unsigned short", "int", "unsigned int", "long", + "unsigned long", "long long", "unsigned long long", + "float", "double", "long double", +} +# Named opaque types that the PostgreSQL *stub* headers collapse to a bare +# scalar even without a pointer (type-erased values). Kept by name so they +# read as themselves, not as the stub's underlying integer. +_EXPLICIT_OPAQUE = {"Datum"} + + +def _strip(s: str) -> str: + return " ".join( + re.sub(r"\b(const|volatile|struct|union|enum)\b", " ", s) + .replace("*", " ").split() + ) + + +def _preserved_opaque(ty) -> str | None: + """Keep the *declared* name of opaque types the PG stubs canonicalise to + a bare scalar (``Interval *`` / ``text *`` -> ``const int *``, ``Datum`` + -> ``unsigned long``). A pointer whose typedef'd pointee resolves to a + plain scalar is, in practice, always a stubbed opaque struct — so the + declared spelling is the truthful one. Genuine scalar pointers + (``int *result``) are unaffected: their pointee is a builtin, not a + distinct typedef name. + """ + if ty.kind == clang.cindex.TypeKind.POINTER: + pointee = ty.get_pointee() + dname = _strip(pointee.spelling) + cname = _strip(pointee.get_canonical().spelling) + if (dname and dname not in _SCALAR_CANON and "(" not in dname + and cname in _SCALAR_CANON and dname != cname): + return ty.spelling.replace("_Bool", "bool") + return None + if _strip(ty.spelling) in _EXPLICIT_OPAQUE: + return _strip(ty.spelling) + return None + + def _canonical_c_spelling(ty) -> str: # Like ``_canonical_spelling`` but normalises boolean types to ``"bool"``. # Handles: @@ -42,6 +84,9 @@ def _canonical_c_spelling(ty) -> str: # Fallback: also catch _Bool reached through other typedef chains if ty.get_canonical().kind == clang.cindex.TypeKind.BOOL: return "bool" + preserved = _preserved_opaque(ty) + if preserved is not None: + return preserved return _canonical_spelling(ty) diff --git a/parser/header_types.py b/parser/header_types.py new file mode 100644 index 0000000..550f3ee --- /dev/null +++ b/parser/header_types.py @@ -0,0 +1,123 @@ +"""Recover opaque parameter/return types from the header *source*. + +The PostgreSQL stub headers ``#define`` several opaque types (``Interval``, +``text``, ``TimestampTz`` …) to ``int`` *before* libclang parses, so the +typedef name is destroyed: a ``const Interval *`` argument reaches the +catalog as ``const int *``, indistinguishable from a real ``int *`` +out-parameter and impossible to project correctly. + +The public MEOS headers, however, declare every function on a regular +``extern ();`` line with the *true* spellings. This +module re-scans that source and reconciles the catalog: where libclang +produced a bare scalar but the header says a distinct **named pointer** +type, the header is the truth. Scalar typedefs (``TimestampTz``, ``int64``, +enums) are deliberately left as their resolved scalar — only mis-rendered +*opaque pointers* are restored. + +Pure text + ``re``; no libclang. +""" + +import re +from pathlib import Path + +_COMMENT = re.compile(r"/\*.*?\*/|//[^\n]*", re.DOTALL) +_DECL = re.compile( + r"\bextern\b\s+(?P[^;{}]+?\([^;{}]*\))\s*;", re.DOTALL) +_FUNC = re.compile(r"^(?P.+?)\b(?P\w+)\s*\((?P.*)\)$", + re.DOTALL) + +_SCALARS = { + "void", "bool", "_Bool", "char", "signed char", "unsigned char", + "short", "unsigned short", "int", "unsigned int", "long", + "unsigned long", "long long", "unsigned long long", "size_t", + "float", "double", "long double", + # scalar typedefs we *want* left resolved to their integer form + "int8", "int16", "int32", "int64", "uint8", "uint16", "uint32", + "uint64", "int8_t", "int16_t", "int32_t", "int64_t", "uint8_t", + "uint16_t", "uint32_t", "uint64_t", "TimestampTz", "TimeADT", + "DateADT", "Timestamp", "Datum", "meosType", "interpType", +} + + +def _norm(t: str) -> str: + return " ".join(t.replace("*", " * ").split()) + + +def _base(t: str) -> str: + return " ".join( + re.sub(r"\b(const|volatile|struct|union|enum)\b", " ", t) + .replace("*", " ").split() + ) + + +def _split_params(s: str) -> list: + out, depth, cur = [], 0, "" + for ch in s: + if ch == "(": + depth += 1 + elif ch == ")": + depth -= 1 + if ch == "," and depth == 0: + out.append(cur) + cur = "" + else: + cur += ch + if cur.strip(): + out.append(cur) + return out + + +def _param_type(p: str) -> str: + p = p.strip() + if p in ("void", ""): + return "void" + p = re.sub(r"\[[^\]]*\]", " *", p) # arr[] -> arr * + m = re.match(r"^(.*?)(\w+)\s*$", p, re.DOTALL) # strip trailing name + return _norm(m.group(1) if m and "*" not in m.group(2) else p) + + +def scan_headers(headers_dir: Path) -> dict: + """``{func_name: {"ret": type, "params": [type, …]}}`` from source.""" + out: dict = {} + for h in sorted(Path(headers_dir).glob("**/*.h")): + text = _COMMENT.sub(" ", h.read_text(errors="replace")) + for d in _DECL.finditer(text): + m = _FUNC.match(" ".join(d.group("sig").split())) + if not m: + continue + params = [_param_type(x) for x in _split_params(m.group("params"))] + out[m.group("name")] = { + "ret": _norm(m.group("ret")), + "params": params if params != ["void"] else [], + } + return out + + +def _restore(decl_canon: str, header_t: str, enums: set) -> str | None: + """Return the header type if ``decl_canon`` is a scalar but the header + says a distinct named *pointer* opaque type, else ``None``.""" + cb = _base(decl_canon) + hb, hd = _base(header_t), header_t.count("*") + if (hd >= 1 and hb and hb not in _SCALARS and hb not in enums + and cb in _SCALARS): + return _norm(header_t) + return None + + +def reconcile(idl: dict, headers_dir: Path) -> dict: + """Restore opaque pointer types the stub headers erased to ``int``.""" + headers = scan_headers(headers_dir) + enums = {e["name"] for e in idl.get("enums", [])} + for fn in idl.get("functions", []): + h = headers.get(fn["name"]) + if not h: + continue + rt = fn["returnType"] + fixed = _restore(rt["canonical"], h["ret"], enums) + if fixed: + rt["c"] = rt["canonical"] = fixed + for p, ht in zip(fn.get("params", []), h["params"]): + fixed = _restore(p["canonical"], ht, enums) + if fixed: + p["cType"] = p["canonical"] = fixed + return idl diff --git a/report.py b/report.py new file mode 100644 index 0000000..29faa70 --- /dev/null +++ b/report.py @@ -0,0 +1,153 @@ +# Emit a coverage / irregularity report from the enriched catalog. +# +# python run.py # produce output/meos-idl.json +# python report.py # -> output/meos-coverage.json (+ stderr summary) +# +# `worklist` is the actionable form: one entry per non-exposable public +# function with a `class` and a concrete `suggest` — the precise upstream +# regularization that would make it stateless-projectable (e.g. "rename the +# out-parameter to `result`", "add a trailing `int *count`", "return a +# struct instead of N out-parameters", "add a single-arg `T_in`/`T_out`"). +# `byClass` ranks the classes by size, so the upstream work is prioritised +# by leverage. Fixing an irregularity upstream removes its entry and lifts +# coverage toward 100%; internal (`meos_internal*.h`) functions and +# inherently-stateful aggregates are reported but are correct exclusions, +# never silent gaps. This is the direct, no-coupling input for the +# cross-repo API-uniformization workstream. + +import json +import re +import sys +from collections import Counter, defaultdict +from pathlib import Path + + +def _base(c: str) -> str: + return " ".join(re.sub(r"\b(const|volatile|struct|union|enum)\b", " ", + c).replace("*", " ").split()) + + +def _dep(c: str) -> int: + return c.count("*") + + +def _classify(fn: dict): + """``(klass, suggestion)`` — the concrete upstream regularization that + would make this function stateless-projectable. Turns the worklist from + *what is broken* into *what to change upstream*. + """ + reason = fn["network"]["reason"] or "" + tags = {p.split(":")[0] for p in reason.split("; ")} + detail = {p.split(":", 1)[1] for p in reason.split("; ") if ":" in p} + params = fn.get("params", []) + ret = fn.get("returnType", {}).get("canonical", "") + + if tags & {"lifecycle", "index"}: + return ("plumbing", + "intentionally not exposed (process/library plumbing)") + if fn.get("category") == "aggregate" or "SkipList" in detail: + return ("stateful", + "stateful aggregation — expose via a stateful endpoint, " + "not a stateless RPC") + if "Datum" in detail or "MeosArray" in detail: + return ("internal-generic", + "Datum/array-generic — expose a typed variant; keep the " + "generic form internal") + + outptrs = [p for p in params if _dep(p["canonical"]) >= 2 + or (_dep(p["canonical"]) == 1 and _base(p["canonical"]) in + ("int", "long", "double", "float", "bool"))] + if "unsupported-return" in tags and _dep(ret) >= 1: + return ("array-return-shape", + f"array return `{ret}` lacks a length: add a trailing " + "`int *count` out-parameter (+ an element encoder)") + if len(outptrs) >= 2: + return ("multi-out", + f"{len(outptrs)} out-parameters: return a struct (or split " + "into separate single-result accessors)") + if len(outptrs) == 1 and outptrs[0]["name"] not in ("result", "value"): + return ("out-param-naming", + f"rename out-parameter `{outptrs[0]['name']}` to `result` " + "(`bool f(.., T *result)` convention)") + nocodec = sorted(d for d in detail if d[:1].isupper()) + if tags & {"no-decoder", "no-encoder"} and nocodec: + t = nocodec[0] + return ("no-codec", + f"type `{t}` has no stateless codec: add a single-argument " + f"`{t.lower()}_in`/`{t.lower()}_out` wrapper, or keep it " + "internal") + if "array-or-out-param" in tags: + return ("array-shape", + "pass element arrays as an adjacent `(Elem **arr, int " + "count)`; use `bool f(.., T *result)` for out-values") + return ("other", "regularize the signature to a stateless shape") + + +IN_PATH = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("output/meos-idl.json") +OUT_PATH = Path(sys.argv[2]) if len(sys.argv) > 2 else Path("output/meos-coverage.json") + + +def build_report(catalog: dict) -> dict: + fns = catalog.get("functions", []) + pub = [f for f in fns if f.get("api") == "public"] + exposable = [f for f in pub if f["network"]["exposable"]] + by_reason: dict = defaultdict(list) + for f in pub: + if f["network"]["exposable"]: + continue + # collapse "tag:detail; tag:detail" to the set of distinct tags + tags = sorted({p.split(":")[0] + for p in (f["network"]["reason"] or "").split("; ")}) + by_reason["; ".join(tags)].append(f["name"]) + + worklist = [] + for f in pub: + if f["network"]["exposable"]: + continue + klass, suggest = _classify(f) + worklist.append({ + "name": f["name"], "file": f.get("file"), + "reason": f["network"]["reason"], + "class": klass, "suggest": suggest, + }) + worklist.sort(key=lambda w: (w["class"], w["name"])) + by_class = Counter(w["class"] for w in worklist) + + total = len(pub) + n_exp = len(exposable) + return { + "publicTotal": total, + "exposable": n_exp, + "coveragePct": round(n_exp * 100 / total, 1) if total else 0, + "internalExcluded": len(fns) - total, + "gap": total - n_exp, + "byClass": dict(by_class.most_common()), + "byReason": {k: sorted(v) + for k, v in sorted(by_reason.items(), + key=lambda kv: -len(kv[1]))}, + # actionable: one upstream-change suggestion per gap function + "worklist": worklist, + } + + +def main() -> None: + if not IN_PATH.exists(): + sys.exit(f"Catalog not found: {IN_PATH} — run `python run.py` first.") + catalog = json.loads(IN_PATH.read_text()) + if not any("network" in f for f in catalog.get("functions", [])): + sys.exit(f"{IN_PATH} is not enriched.") + + rep = build_report(catalog) + OUT_PATH.parent.mkdir(parents=True, exist_ok=True) + OUT_PATH.write_text(json.dumps(rep, indent=2)) + + print(f"[coverage] public {rep['exposable']}/{rep['publicTotal']} " + f"({rep['coveragePct']}%), gap {rep['gap']}, " + f"{rep['internalExcluded']} internal excluded → {OUT_PATH}", + file=sys.stderr) + for klass, n in rep["byClass"].items(): + print(f" {n:4d} {klass}", file=sys.stderr) + + +if __name__ == "__main__": + main() diff --git a/run.py b/run.py index 0161d22..82cafe4 100644 --- a/run.py +++ b/run.py @@ -3,6 +3,8 @@ from pathlib import Path from parser.parser import parse_all_headers, merge_meta +from parser.header_types import reconcile +from parser.enrich import enrich_idl HEADERS_DIR = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("./meos/include") @@ -14,22 +16,33 @@ def main(): OUTPUT_DIR.mkdir(parents=True, exist_ok=True) # 1. Parse C headers - print(f"[1/2] Parsing {HEADERS_DIR}...", file=sys.stderr) + print(f"[1/4] Parsing {HEADERS_DIR}...", file=sys.stderr) idl = parse_all_headers(HEADERS_DIR) - # 2. Merge with manual metadata + # 2. Restore opaque types the PG stub headers #define'd away to int. + print(f"[2/4] Reconciling types from header source...", file=sys.stderr) + idl = reconcile(idl, HEADERS_DIR) + + # 3. Derive service-projection metadata (category / encodings / network). + # Runs before the merge so manual annotations override the heuristics. + print(f"[3/4] Enriching {len(idl['functions'])} functions...", file=sys.stderr) + idl = enrich_idl(idl) + + # 4. Merge with manual metadata if META_PATH.exists(): - print(f"[2/2] Merging with {META_PATH}...", file=sys.stderr) + print(f"[4/4] Merging with {META_PATH}...", file=sys.stderr) idl = merge_meta(idl, META_PATH) else: - print(f"[2/2] No meta found at {META_PATH}, skipping.", file=sys.stderr) + print(f"[4/4] No meta found at {META_PATH}, skipping.", file=sys.stderr) idl_path = OUTPUT_DIR / "meos-idl.json" with open(idl_path, "w") as f: json.dump(idl, f, indent=2) print(f" → {idl_path} written", file=sys.stderr) - print(f"\nDone: {len(idl['functions'])} functions, " + exposable = idl.get("enrichment", {}).get("exposableFunctions", 0) + print(f"\nDone: {len(idl['functions'])} functions " + f"({exposable} stateless-exposable), " f"{len(idl['structs'])} structs, " f"{len(idl['enums'])} enums", file=sys.stderr) diff --git a/tests/test_coverage_gate.py b/tests/test_coverage_gate.py new file mode 100644 index 0000000..5b0e8e9 --- /dev/null +++ b/tests/test_coverage_gate.py @@ -0,0 +1,53 @@ +"""Coverage non-regression gate. python3 tests/test_coverage_gate.py + +Skipped unless an enriched ``output/meos-idl.json`` is present (so CI +without libclang still passes). When present, it asserts public coverage +does not regress below the established floor and that the worklist stays +consistent — a heuristic change that silently drops coverage fails here. +""" + +import json +import sys +import unittest +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parents[1])) + +from report import build_report + +_CATALOG = Path(__file__).resolve().parents[1] / "output" / "meos-idl.json" +_FLOOR = 90.0 # public %, ratchet up as upstream uniformization lands + + +@unittest.skipUnless(_CATALOG.exists(), "run `python run.py` first") +class CoverageGateTests(unittest.TestCase): + @classmethod + def setUpClass(cls): + cls.r = build_report(json.loads(_CATALOG.read_text())) + + def test_public_coverage_not_regressed(self): + self.assertGreaterEqual( + self.r["coveragePct"], _FLOOR, + f"public coverage {self.r['coveragePct']}% < floor {_FLOOR}% " + "— a heuristic change dropped coverage") + + def test_worklist_consistent(self): + # one actionable entry per gap; classes partition the gap + self.assertEqual(len(self.r["worklist"]), self.r["gap"]) + self.assertEqual(sum(self.r["byClass"].values()), self.r["gap"]) + self.assertTrue(all(w["suggest"] and w["class"] + for w in self.r["worklist"])) + + def test_no_internal_or_silent_gap(self): + # internal never counted as public; every gap has a reason + cat = json.loads(_CATALOG.read_text()) + for f in cat["functions"]: + if f.get("api") == "internal": + continue + if not f["network"]["exposable"]: + self.assertTrue(f["network"]["reason"], + f"{f['name']}: gap without a reason") + + +if __name__ == "__main__": + unittest.main(verbosity=2) diff --git a/tests/test_enrich.py b/tests/test_enrich.py new file mode 100644 index 0000000..df2b823 --- /dev/null +++ b/tests/test_enrich.py @@ -0,0 +1,287 @@ +"""Unit tests for parser/enrich.py. + +Runs without libclang or pytest: python3 tests/test_enrich.py + +The fixture uses the *canonical* C spellings libclang actually emits +(``struct Temporal *``, ``unsigned char``, ``int`` for booleans, enum +parameters), so the assertions double as a specification. +""" + +import sys +import unittest +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parents[1])) + +from parser.enrich import enrich_idl, classify_category, build_type_encodings + + +def fn(name, ret, *params): + return { + "name": name, + "file": "meos.h", + "returnType": {"c": ret, "canonical": ret}, + "params": [{"name": n, "cType": t, "canonical": t} for t, n in params], + } + + +T = "const struct Temporal *" +FUNCTIONS = [ + fn("temporal_in", "struct Temporal *", ("const char *", "str")), + fn("temporal_out", "char *", (T, "temp")), + fn("temporal_from_mfjson", "struct Temporal *", ("const char *", "str")), + fn("temporal_as_hexwkb", "char *", + (T, "temp"), ("unsigned char", "variant"), ("int *", "size_out")), + fn("bigintset_in", "struct Set *", ("const char *", "str")), + fn("bigintset_out", "char *", ("const struct Set *", "set")), + fn("temporal_eq", "int", (T, "temp1"), (T, "temp2")), + fn("tpoint_speed", "struct Temporal *", (T, "temp")), + fn("tjsonb_to_ttext", "struct Temporal *", (T, "temp")), + fn("union_set_set", "struct Set *", + ("const struct Set *", "s1"), ("const struct Set *", "s2")), + fn("temporal_num_instants", "int", (T, "temp")), + fn("tsequence_make", "struct TSequence *", + ("struct TInstant **", "instants"), ("int", "count"), + ("interpType", "interp")), + fn("tjsonb_value_at_timestamptz", "int", + (T, "temp"), ("long", "t"), ("int", "strict"), + ("struct Jsonb **", "value")), + fn("temporal_set_interp", "struct Temporal *", + (T, "temp"), ("interpType", "interp")), + fn("meos_initialize", "void"), + fn("rtree_insert", "void", + ("struct RTree *", "rtree"), ("void *", "box"), ("long", "id")), + fn("temporal_timestamps", "int *", (T, "temp"), ("int *", "count")), + # aux args: a defaultable formatting scalar (maxdd) is allowed and + # defaulted; a semantic *type tag disqualifies the helper entirely. + fn("box_in", "struct Box *", ("const char *", "str")), + fn("box_out", "char *", + ("const struct Box *", "box"), ("int", "maxdd")), + fn("weird_in", "struct Weird *", + ("const char *", "str"), ("int", "basetype")), + # An otherwise-exposable function living in the internal header: it must + # be policy-excluded (api=internal), like the programmer Datum API. + dict(fn("internal_op", "struct Temporal *", (T, "temp")), + file="meos_internal.h"), + # Scalar out-parameter accessor: bool f(.., int *result) — the value is + # returned through the trailing out-param, the bool is a presence flag. + fn("setspan_value_n", "int", + ("const struct Set *", "s"), ("int", "n"), ("int *", "result")), + # Opaque out-parameter accessor: bool f(.., Box **result) — the value + # comes back as an opaque pointer, serialised via the type's encoder. + fn("boxset_value_n", "int", + ("const struct Set *", "s"), ("int", "n"), + ("struct Box **", "result")), + # Input-array builder: f(Elem **arr, int count) — the (array,count) pair + # becomes one JSON-array wire param; the count is implicit. + fn("temporal_merge_array", "struct Temporal *", + ("struct Temporal **", "temparr"), ("int", "count")), + # Array return: Elem **f(.., int *count) — a freshly-allocated element + # array; the count out-param is implicit, result is a JSON array. + fn("temporal_components", "struct Temporal **", + (T, "temp"), ("int *", "count")), +] + +STRUCTS = [{"name": n, "fields": []} for n in + ("Temporal", "TSequence", "Set", "RTree", "Jsonb", "TInstant", + "Box", "Weird")] +ENUMS = [{"name": "interpType", "values": []}] + + +def make_idl(): + return enrich_idl({ + "functions": [dict(f, returnType=dict(f["returnType"]), + params=[dict(p) for p in f["params"]]) + for f in FUNCTIONS], + "structs": [dict(s) for s in STRUCTS], + "enums": [dict(e) for e in ENUMS], + }) + + +def by_name(idl): + return {f["name"]: f for f in idl["functions"]} + + +class CategoryTests(unittest.TestCase): + def test_categories(self): + c = {f["name"]: classify_category(f) for f in FUNCTIONS} + self.assertEqual(c["temporal_in"], "io") + self.assertEqual(c["temporal_from_mfjson"], "io") + self.assertEqual(c["temporal_as_hexwkb"], "io") + self.assertEqual(c["temporal_eq"], "predicate") # returns int + self.assertEqual(c["tpoint_speed"], "transformation") + self.assertEqual(c["tjsonb_to_ttext"], "conversion") + self.assertEqual(c["union_set_set"], "setop") + self.assertEqual(c["temporal_num_instants"], "accessor") + self.assertEqual(c["tsequence_make"], "constructor") + self.assertEqual(c["meos_initialize"], "lifecycle") + self.assertEqual(c["rtree_insert"], "index") + + +class TypeEncodingTests(unittest.TestCase): + def setUp(self): + self.te = build_type_encodings( + FUNCTIONS, {s["name"] for s in STRUCTS}) + + def test_struct_prefix_stripped_and_round_trip(self): + self.assertIn("Temporal", self.te) # not "struct Temporal" + # temporal_as_hexwkb is still excluded — its `size_out` is a pointer + # (out-param), not a defaultable scalar — so no wkb encoder here. + self.assertEqual(self.te["Temporal"]["encodings"], + ["mfjson", "text"]) + self.assertEqual(self.te["Temporal"]["in"], "temporal_in") + self.assertEqual(self.te["Temporal"]["out"], "temporal_out") + self.assertEqual(self.te["Set"]["in"], "bigintset_in") + self.assertEqual(self.te["Set"]["out"], "bigintset_out") + + def test_defaultable_aux_accepted_type_tag_rejected(self): + # box_out(box, int maxdd) qualifies; maxdd defaults to 15. + self.assertEqual(self.te["Box"]["out"], "box_out") + self.assertEqual(self.te["Box"]["out_aux"], + [{"name": "maxdd", "kind": "integer", + "default": 15}]) + self.assertEqual(self.te["Box"]["in"], "box_in") + self.assertEqual(self.te["Box"]["in_aux"], []) + # weird_in(str, int basetype): the *type tag disqualifies it, so + # Weird gets no decoder at all. + self.assertNotIn("Weird", self.te) + + def test_no_primitive_or_intermediate_false_positives(self): + self.assertNotIn("int", self.te) # was a real false positive + self.assertNotIn("char", self.te) + self.assertNotIn("TSequence", self.te) # builder-only type + for k in self.te: + self.assertNotIn("struct ", k) + + def test_struct_serialization_folded(self): + s = {x["name"]: x for x in make_idl()["structs"]} + self.assertIn("serialization", s["Temporal"]) + self.assertNotIn("serialization", s["TSequence"]) + + +class ExposabilityTests(unittest.TestCase): + def setUp(self): + self.fns = by_name(make_idl()) + + def n(self, name): + return self.fns[name]["network"] + + def test_int_returning_predicate_exposable(self): + self.assertTrue(self.n("temporal_eq")["exposable"]) + self.assertEqual(self.fns["temporal_eq"]["wire"]["result"], + {"kind": "json", "json": "integer"}) + + def test_serialized_round_trip(self): + w = self.fns["tpoint_speed"]["wire"] + self.assertTrue(self.n("tpoint_speed")["exposable"]) + self.assertEqual(w["params"][0]["kind"], "serialized") + self.assertEqual(w["params"][0]["decode"], "temporal_in") + self.assertEqual(w["result"]["encode"], "temporal_out") + + def test_enum_param_is_scalar_and_exposable(self): + f = self.fns["temporal_set_interp"] + self.assertTrue(f["network"]["exposable"]) + self.assertEqual(f["wire"]["params"][1], + {"name": "interp", "kind": "json", + "json": "string", "enum": "interpType"}) + + def test_io_parse_serialize_exposable(self): + for name in ("temporal_in", "temporal_out", "temporal_from_mfjson", + "bigintset_in", "bigintset_out"): + self.assertTrue(self.n(name)["exposable"], name) + + def test_out_param_not_exposable(self): + r = self.n("temporal_as_hexwkb")["reason"] + self.assertFalse(self.n("temporal_as_hexwkb")["exposable"]) + self.assertIn("array-or-out-param:size_out", r) + r2 = self.n("tjsonb_value_at_timestamptz")["reason"] + self.assertIn("array-or-out-param:value", r2) + + def test_array_param_and_missing_encoder(self): + r = self.n("tsequence_make")["reason"] + self.assertFalse(self.n("tsequence_make")["exposable"]) + self.assertIn("array-or-out-param:instants", r) + self.assertIn("no-encoder:TSequence", r) + + def test_array_return_not_exposable(self): + r = self.n("temporal_timestamps")["reason"] + self.assertFalse(self.n("temporal_timestamps")["exposable"]) + self.assertIn("unsupported-return:int *", r) + + def test_lifecycle_and_index_not_exposable(self): + self.assertIn("lifecycle", self.n("meos_initialize")["reason"]) + self.assertIn("index", self.n("rtree_insert")["reason"]) + + +class ApiClassificationTests(unittest.TestCase): + def setUp(self): + self.fns = by_name(make_idl()) + + def test_internal_policy_excluded(self): + f = self.fns["internal_op"] + self.assertEqual(f["api"], "internal") + self.assertFalse(f["network"]["exposable"]) + self.assertIn("internal", f["network"]["reason"]) + + def test_public_default(self): + self.assertEqual(self.fns["temporal_eq"]["api"], "public") + self.assertTrue(self.fns["temporal_eq"]["network"]["exposable"]) + + def test_scalar_outparam_projected_as_result(self): + f = self.fns["setspan_value_n"] + self.assertTrue(f["network"]["exposable"]) + pnames = [p["name"] for p in f["wire"]["params"]] + self.assertEqual(pnames, ["s", "n"]) # 'result' not a param + r = f["wire"]["result"] + self.assertEqual(r["kind"], "json") + self.assertEqual(r["json"], "integer") + self.assertEqual(r["from_outparam"], "result") + self.assertTrue(r["presence_return"]) # int return = presence + + def test_opaque_outparam_projected_as_serialized(self): + f = self.fns["boxset_value_n"] + self.assertTrue(f["network"]["exposable"]) + self.assertEqual([p["name"] for p in f["wire"]["params"]], ["s", "n"]) + r = f["wire"]["result"] + self.assertEqual(r["kind"], "serialized") # opaque -> encoded + self.assertEqual(r["encode"], "box_out") + self.assertEqual(r["from_outparam"], "result") + self.assertTrue(r["presence_return"]) + + def test_array_return(self): + f = self.fns["temporal_components"] + self.assertTrue(f["network"]["exposable"]) + self.assertEqual([p["name"] for p in f["wire"]["params"]], ["temp"]) + r = f["wire"]["result"] + self.assertEqual(r["kind"], "array") + self.assertEqual(r["count_outparam"], "count") + self.assertEqual(r["element"]["kind"], "serialized") + self.assertEqual(r["element"]["encode"], "temporal_out") + + def test_input_array_builder(self): + f = self.fns["temporal_merge_array"] + self.assertTrue(f["network"]["exposable"]) + params = f["wire"]["params"] + self.assertEqual(len(params), 1) # count is implicit + a = params[0] + self.assertEqual(a["name"], "temparr") + self.assertEqual(a["kind"], "array") + self.assertEqual(a["count_param"], "count") + self.assertEqual(a["element"]["kind"], "serialized") + self.assertEqual(a["element"]["decode"], "temporal_in") + self.assertEqual(f["wire"]["result"]["kind"], "serialized") + + +class SummaryTests(unittest.TestCase): + def test_enrichment_summary(self): + e = make_idl()["enrichment"] + self.assertEqual(sum(e["categoryCounts"].values()), len(FUNCTIONS)) + self.assertEqual(e["internalFunctions"], 1) # internal_op + self.assertEqual(e["publicFunctions"], len(FUNCTIONS) - 1) + # 13 + setspan_value_n + boxset_value_n + temporal_merge_array + # + temporal_components (array return); internal_op excluded. + self.assertEqual(e["exposableFunctions"], 17) + + +if __name__ == "__main__": + unittest.main(verbosity=2) diff --git a/tests/test_header_types.py b/tests/test_header_types.py new file mode 100644 index 0000000..a91cb74 --- /dev/null +++ b/tests/test_header_types.py @@ -0,0 +1,98 @@ +"""Unit tests for parser/header_types.py. + +Runs without libclang or pytest: python3 tests/test_header_types.py +""" + +import sys +import tempfile +import unittest +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parents[1])) + +from parser.header_types import scan_headers, reconcile + +_HEADER = """ +/* a comment + spanning lines */ +extern TimestampTz add_timestamptz_interval(TimestampTz t, + const Interval *interv); +extern bool contains_set_text(const Set *s, text *t); // trailing comment +extern bool bigintset_value_n(const Set *s, int n, int64 *result); +extern Temporal *temporal_copy(const Temporal *temp); +extern void meos_initialize(void); +""" + + +class ScanTests(unittest.TestCase): + def setUp(self): + self.d = tempfile.TemporaryDirectory() + (Path(self.d.name) / "meos.h").write_text(_HEADER) + self.h = scan_headers(Path(self.d.name)) + + def tearDown(self): + self.d.cleanup() + + def test_signatures_recovered(self): + self.assertEqual(self.h["add_timestamptz_interval"]["params"], + ["TimestampTz", "const Interval *"]) + self.assertEqual(self.h["contains_set_text"]["params"], + ["const Set *", "text *"]) + self.assertEqual(self.h["meos_initialize"]["params"], []) + self.assertEqual(self.h["temporal_copy"]["ret"], "Temporal *") + + +class ReconcileTests(unittest.TestCase): + def setUp(self): + self.d = tempfile.TemporaryDirectory() + (Path(self.d.name) / "meos.h").write_text(_HEADER) + + def tearDown(self): + self.d.cleanup() + + def idl(self): + # Mimic the libclang output *after* the stub erased the names. + return { + "enums": [], + "functions": [ + {"name": "add_timestamptz_interval", + "returnType": {"c": "int", "canonical": "int"}, + "params": [{"name": "t", "cType": "int", "canonical": "int"}, + {"name": "interv", "cType": "const int *", + "canonical": "const int *"}]}, + {"name": "contains_set_text", + "returnType": {"c": "int", "canonical": "int"}, + "params": [{"name": "s", "cType": "const struct Set *", + "canonical": "const struct Set *"}, + {"name": "t", "cType": "int *", + "canonical": "int *"}]}, + {"name": "bigintset_value_n", + "returnType": {"c": "int", "canonical": "int"}, + "params": [{"name": "s", "cType": "const struct Set *", + "canonical": "const struct Set *"}, + {"name": "n", "cType": "int", "canonical": "int"}, + {"name": "result", "cType": "int *", + "canonical": "int *"}]}, + ], + } + + def test_opaque_pointers_restored_scalars_left_alone(self): + idl = reconcile(self.idl(), Path(self.d.name)) + f = {x["name"]: x for x in idl["functions"]} + # const Interval * restored from the header source + self.assertEqual(f["add_timestamptz_interval"]["params"][1]["canonical"], + "const Interval *") + # TimestampTz return stays the resolved scalar (int) — not restored + self.assertEqual(f["add_timestamptz_interval"]["returnType"]["canonical"], + "int") + # text * restored + self.assertEqual(f["contains_set_text"]["params"][1]["canonical"], + "text *") + # genuine int* out-param (header also says int64*) is a scalar + # pointer -> left exactly as libclang produced it + self.assertEqual(f["bigintset_value_n"]["params"][2]["canonical"], + "int *") + + +if __name__ == "__main__": + unittest.main(verbosity=2) diff --git a/tests/test_report.py b/tests/test_report.py new file mode 100644 index 0000000..213a921 --- /dev/null +++ b/tests/test_report.py @@ -0,0 +1,72 @@ +"""Unit tests for report.py. python3 tests/test_report.py""" + +import sys +import unittest +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parents[1])) + +from report import build_report + + +def f(name, api, exposable, reason=None, ret="int", params=None): + return {"name": name, "api": api, + "returnType": {"canonical": ret}, "params": params or [], + "network": {"exposable": exposable, "reason": reason}} + + +CATALOG = {"functions": [ + f("temporal_eq", "public", True), + f("tpoint_speed", "public", True), + f("tsequence_make", "public", False, "array-or-out-param:instants"), + f("geo_collect", "public", False, + "array-or-out-param:a; no-decoder:GBOX"), + f("temporal_tagg", "public", False, "no-decoder:SkipList"), + f("meos_initialize", "public", False, "lifecycle"), + f("some_internal", "internal", False, "internal; no-decoder:Datum"), +]} + + +class ReportTests(unittest.TestCase): + def setUp(self): + self.r = build_report(CATALOG) + + def test_counts(self): + self.assertEqual(self.r["publicTotal"], 6) # excludes internal + self.assertEqual(self.r["exposable"], 2) + self.assertEqual(self.r["gap"], 4) + self.assertEqual(self.r["internalExcluded"], 1) + self.assertEqual(self.r["coveragePct"], round(2 * 100 / 6, 1)) + + def test_grouping_by_reason_tagset(self): + br = self.r["byReason"] + # detail is stripped; multi-tag reasons collapse to the tag set + self.assertIn("array-or-out-param", br) + self.assertEqual(br["array-or-out-param"], ["tsequence_make"]) + self.assertEqual(br["array-or-out-param; no-decoder"], ["geo_collect"]) + self.assertEqual(br["no-decoder"], ["temporal_tagg"]) + self.assertEqual(br["lifecycle"], ["meos_initialize"]) + # internal never appears as a public gap + self.assertNotIn("some_internal", + [n for v in br.values() for n in v]) + + def test_byreason_sorted_largest_first(self): + sizes = [len(v) for v in self.r["byReason"].values()] + self.assertEqual(sizes, sorted(sizes, reverse=True)) + + def test_worklist_is_actionable(self): + wl = {w["name"]: w for w in self.r["worklist"]} + self.assertEqual(len(wl), self.r["gap"]) # one entry per gap + self.assertNotIn("some_internal", wl) # internal excluded + # each gap gets a class + a concrete upstream suggestion + self.assertEqual(wl["meos_initialize"]["class"], "plumbing") + self.assertEqual(wl["temporal_tagg"]["class"], "stateful") + gc = wl["geo_collect"] # no-decoder:GBOX + self.assertEqual(gc["class"], "no-codec") + self.assertIn("gbox_in", gc["suggest"]) + self.assertTrue(all(w["suggest"] for w in self.r["worklist"])) + self.assertEqual(sum(self.r["byClass"].values()), self.r["gap"]) + + +if __name__ == "__main__": + unittest.main(verbosity=2)