diff --git a/README.md b/README.md index f3812d1d..328f0ffc 100644 --- a/README.md +++ b/README.md @@ -98,6 +98,7 @@ Two further modules build but are never published: `sdk-example`, the runnable e | [I/O module](docs/io.md) | I/O contracts and the `IoProvider` seam | | [HTTP body logging and concurrency](docs/http-body-logging-and-concurrency.md) | Body logging system, concurrency model, thread safety | | [Pipeline mechanism](docs/pipelines.md) | Pipeline architecture, stages, step composition, async pipeline | +| [Codegen design specs](docs/codegen/README.md) | Design specifications for the planned model-layer code generator | | [Style guides](styleguide/README.md) | Kotlin and Kotlin-on-JVM style guides this codebase follows | ## Usage diff --git a/docs/codegen/README.md b/docs/codegen/README.md new file mode 100644 index 00000000..b799a449 --- /dev/null +++ b/docs/codegen/README.md @@ -0,0 +1,41 @@ +# Codegen design specifications + +This directory holds design specifications for the planned model-layer code generator. They are +**design documents only** — there is no generator code, no KotlinPoet templates, and no generated +sources in this repository yet. Every Kotlin/Java snippet in these specs is illustrative *target +output*: it shows the shape a future generator would emit, and is not compiled as part of the build. + +The guiding principle across all of these specs is the same one that already governs `sdk-core`: +**logic lives in a hand-written runtime, generated code is thin.** A generated model is a field list +plus accessors; everything that is invariant across models — the four-state field representation, +serde wiring, validation scoring, dual typed/raw access — is written once in `sdk-core` (or an +adapter) and shared. This keeps generated files small, keeps the binary-compatibility baseline for +generated code stable, and keeps the coverage floor meaningful. + +These specs build on the existing `sdk-core` serde surface — primarily +[`Tristate`](../../sdk-core/src/main/kotlin/org/dexpace/sdk/core/serde/Tristate.kt) and the +[`Serde`](../../sdk-core/src/main/kotlin/org/dexpace/sdk/core/serde/Serde.kt) SPI — and on the +Jackson adapter pattern established by +[`TristateModule`](../../sdk-serde-jackson/src/main/kotlin/org/dexpace/sdk/serde/jackson/TristateModule.kt). + +## Specifications + +| Spec | Covers | +|---|---| +| [The four-state JSON field model](json-field-model.md) | `JsonField` + `RawJson`: a dependency-free four-state field wrapper and embedded JSON tree, with all Jackson conversion behind the `Serde` SPI. The foundation the rest build on. | +| [Thin model classes over a hand-written runtime](model-classes.md) | Generated models as a field map + typed accessors; runtime carries the invariant machinery. Coverage and binary-compatibility strategy for generated code. | +| [The validate()/isValid()/validity() triad](model-validation.md) | An opt-in, memoized, fail-soft validation triad on generated models — never run on the deserialize path; the fallback union-disambiguation strategy. | +| [Discriminator and const fields](discriminator-const-fields.md) | Const/discriminator fields generated as defaulted raw values with dual (typed + raw) accessors. | + +## Dependency order + +``` +json-field-model.md (the foundation: JsonField + RawJson) + | + +-- model-classes.md (thin models over the runtime) + +-- model-validation.md (validate/isValid/validity triad) + +-- discriminator-const-fields.md (defaulted raw + dual accessors) +``` + +Read `json-field-model.md` first; the other three assume its vocabulary (`Known` / `Missing` / +`Null` / `Raw`, `RawJson`, the read-path vs PATCH-path boundary). diff --git a/docs/codegen/discriminator-const-fields.md b/docs/codegen/discriminator-const-fields.md new file mode 100644 index 00000000..a80db4e2 --- /dev/null +++ b/docs/codegen/discriminator-const-fields.md @@ -0,0 +1,154 @@ +# Discriminator and const fields + +> **Status:** design specification. The snippets show the *target shape* a future generator would +> emit. Nothing here is compiled in this repository. + +Builds on [the four-state JSON field model](json-field-model.md) and +[thin model classes](model-classes.md). + +## Problem + +Two closely related field kinds fall out of the four-state field model and need a dedicated codegen +template: + +- **Const fields.** A schema pins a field to a fixed value (`"object": "user"`, `"version": 2`). The + generated model should default it to that value so a caller never has to set it, while still + surviving a server that sends something *other* than the expected constant (forward compatibility — + a const today may gain new allowed values tomorrow). +- **Discriminator fields.** A union keys member selection off a field's value (`"type": "circle"` vs + `"type": "square"`). The discriminator must be readable *both* as the typed enum the model expects + *and* as the raw wire value, because union resolution reads the raw value before any member has been + chosen, and forward compat requires tolerating a discriminator value we do not yet have a member + for. + +Both want the same thing: a field that has a **sensible default** and exposes **both a typed and a +raw view**. That is the dual-accessor pattern. + +## Proposed shape: defaulted raw value + dual accessors + +A const/discriminator field is generated as a `JsonField` (from +[json-field-model.md](json-field-model.md)) that **defaults to the const's raw value** and exposes +two getters and two setters. + +```kotlin +// TARGET OUTPUT — generated const field on a model. Illustrative; not compiled here. +public class User /* private constructor(...) : JsonModel() */ { + + // ---- const field "object" pinned to "user" ---- + + /** Typed accessor: the const projected to its declared type. Returns the default const when the + * field was absent, and the typed value when the server sent the expected shape. A server value + * that does not match T comes back via the raw accessor instead (forward-compat). */ + public fun objectType(): JsonField = field("object").orDefault(DEFAULT_OBJECT) + + /** Raw accessor: the underlying wire value, whatever it was — including an unexpected constant a + * newer server sent that this model has no typed mapping for. */ + public fun _objectType(): RawJson = field("object").asRaw(DEFAULT_OBJECT_RAW) + + public companion object { + private const val DEFAULT_OBJECT: String = "user" + private val DEFAULT_OBJECT_RAW: RawJson = RawJson.Str("user") + } + + public class Builder /* : Builder */ { + private val acc = LinkedHashMap>() + + /** Typed setter: set the const/discriminator to a typed value. */ + public fun objectType(value: String): Builder = apply { acc["object"] = JsonField.known(value) } + + /** Raw setter: forward an arbitrary wire value verbatim — used to round-trip an unknown + * discriminator value the SDK does not model yet. */ + public fun objectType(raw: RawJson): Builder = apply { acc["object"] = JsonField.raw(raw) } + } +} +``` + +### The default is applied at the accessor, not baked into the stored field + +The stored field map ([model-classes.md](model-classes.md)) keeps the *actual* state — `Missing` when +the server omitted the key, `Known`/`Raw` when it sent something. The const default is applied by the +accessor (`orDefault(...)` / `asRaw(default)`), not written into the map on construction. This keeps +two properties: + +- **Round-trip fidelity.** A model that was deserialized from a payload that omitted the const + re-serializes without inventing a key (`additionalProperties()` and the serializer see `Missing`), + unless a caller explicitly set it. The default is a *read-time* convenience, not a *write-time* + fabrication. +- **Forward compatibility.** A server that sends an unexpected const value stores it as `Raw`; the + typed accessor still has a sane default to fall back on, and the raw accessor surfaces the real + value so nothing is lost. + +### Discriminator fields are the same template, plus a const value the union keys on + +A discriminator is a const field whose value is what union resolution matches against. The dual +accessor is what makes resolution work *before* a member is chosen: + +```kotlin +// TARGET OUTPUT — union member matching by discriminator. Not compiled here. +public fun matchByDiscriminator(raw: RawJson): Shape? { + // Read the RAW discriminator off the undecoded tree — no member committed yet. + val tag = (raw as? RawJson.Object)?.entries?.get("type") as? RawJson.Str ?: return null + return when (tag.value) { + "circle" -> Circle.fromRaw(raw) + "square" -> Square.fromRaw(raw) + else -> null // unknown tag: let the caller fall back to validity scoring + } +} +``` + +The raw accessor is load-bearing here: resolution must read the discriminator from `RawJson` before +any typed model exists, and an unknown tag returns `null` so the union strategy can fall back to +[validity scoring](model-validation.md) rather than throwing. This is exactly why the discriminator +is the **first**, cheap path in `resolveUnion` (one raw read, O(1) member lookup) and scoring is the +fallback. + +## Why two getters and two setters per field + +The dual-accessor pattern means every const/discriminator field contributes **two getters +(`objectType()` typed, `_objectType()` raw) and two setters (typed `objectType(String)`, raw +`objectType(RawJson)`)** to the public surface. This is deliberate, and it reinforces the binary- +compatibility decision from [model-classes.md](model-classes.md): + +- The generated surface is **wide and regular** — exactly the kind of large, mechanically-generated + API that should live behind its **own `.api` baseline**, separate from the curated `sdk-core` + surface, and be regenerated (`apiDump`) as part of a schema-update change rather than mixed into a + hand-written API change. +- Because every field follows the identical two-getter/two-setter template, the baseline churns + predictably with the schema and never with the runtime. + +## How it ties into the existing runtime + +- **`JsonField` / `RawJson`.** The whole template is expressed in the four-state field types from + [json-field-model.md](json-field-model.md): the typed accessor reads `Known`/falls back to the + const default, the raw accessor reads `asRaw(...)`, and an unexpected server constant lands in + `Raw`. No new field machinery is introduced. +- **`Serde` SPI.** Re-serializing a const field round-trips through the + [`Serde`](../../sdk-core/src/main/kotlin/org/dexpace/sdk/core/serde/Serde.kt) serializer; an + explicitly-set typed value serializes as the value, a `Raw` serializes verbatim, and a `Missing` + const is omitted (the Jackson `JsonField` module's property writer skips it, the same way + [`TristatePropertyWriter`](../../sdk-serde-jackson/src/main/kotlin/org/dexpace/sdk/serde/jackson/TristateModule.kt) + skips `Tristate.Absent`). +- **Union resolution and validation.** The discriminator template is the cheap first path of the + union strategy described in [model-validation.md](model-validation.md); validity scoring is only + reached when the discriminator is absent or carries an unknown value. + +## Design decisions and trade-offs + +- **Default at read time, not construction time.** Applying the const default in the accessor keeps + re-serialization faithful (no fabricated keys) and keeps the stored field map a truthful record of + the wire. The trade-off is that the default lives in a generated constant per field rather than in + the map; that is cheap and keeps models immutable and round-trip-safe. +- **Dual accessors instead of a single typed-or-throw accessor.** A typed-only accessor would have to + throw or lie when a server sends an unmodelled const/discriminator value. The raw sibling makes + forward compatibility explicit and gives union resolution the pre-decode read it needs. +- **Accepting the wide surface.** Two getters and two setters per field is more public API than a + single typed accessor, but it is the cost of honest forward compatibility, and the separate `.api` + baseline absorbs the churn so it never destabilizes the curated `sdk-core` surface. + +## Acceptance mapping + +- Const/discriminator template — const and discriminator fields generated as `JsonField` defaulted + to the const's raw value, with the default applied at the accessor and the discriminator readable + pre-decode for union matching. +- Dual accessors generated — a typed getter + raw getter and a typed setter + raw setter per field, + forming the wide, regular surface that lives behind the separate generated-code `.api` baseline. diff --git a/docs/codegen/json-field-model.md b/docs/codegen/json-field-model.md new file mode 100644 index 00000000..400f0740 --- /dev/null +++ b/docs/codegen/json-field-model.md @@ -0,0 +1,221 @@ +# The four-state JSON field model + +> **Status:** design specification. No code in this document is compiled. The Kotlin snippets show +> the *target shape* of types that would live in `sdk-core` (`JsonField`, `RawJson`) and in the +> `sdk-serde-jackson` adapter (the conversion module). They are illustrative, not the real API. + +This is the foundation the rest of the codegen specs build on +([model classes](model-classes.md), [validation](model-validation.md), +[discriminator/const fields](discriminator-const-fields.md)). Read it first. + +## Problem + +A model generator needs a per-field wrapper that distinguishes **four** states, not three: + +1. **present and well-typed** — the key is in the payload and decoded cleanly to the declared `T`. +2. **present but the wrong shape** — the key is in the payload but does not match `T` (a server + sent an object where we expected a string, or a newer enum case we do not know). We must not + throw on the deserialize path; we must keep the raw shape for round-tripping and let an explicit + opt-in step decide what to do with it. +3. **explicit null** — the key is present with a JSON `null`. +4. **absent** — the key is missing from the payload entirely. + +The existing [`Tristate`](../../sdk-core/src/main/kotlin/org/dexpace/sdk/core/serde/Tristate.kt) +covers states 3, 4, and the well-typed half of 1 (`Absent` / `Null` / `Present`). It deliberately +has no "present but wrong type" escape hatch and carries no embedded JSON tree — by design, because +its job is the PATCH-write boundary, where the only question is *did the caller touch this field* +(absent vs null vs a value). `Tristate.Present` is even bounded `T : Any` specifically to make the +illegal "present-null" fourth state unconstructable. + +Forward compatibility — surviving an unknown or wrong-typed value without throwing and without +losing it on re-serialize — is a different problem from the PATCH-write boundary, and it needs a +different type. That type is `JsonField`. + +### Prior art and the constraint it must not repeat + +openai-java's `JsonField` / `JsonValue` (its `Values.kt`) solves the same four-state problem, but it +welds Jackson onto the value type: `JsonValue` is a Jackson tree, and `JsonField` carries Jackson +annotations and `JsonNode`s directly. That violates the `sdk-core` rule that the core has **zero +non-SLF4J runtime dependencies**. We must split exactly the way `Tristate` already splits: the sum +type and the JSON tree live dependency-free in `sdk-core`, and *all* Jackson ↔ core-type conversion +lives in `sdk-serde-jackson`, reachable only through the [`Serde`](../../sdk-core/src/main/kotlin/org/dexpace/sdk/core/serde/Serde.kt) +SPI. + +## Proposed shape + +### `RawJson` — a dependency-free JSON tree (target output) + +`RawJson` is a small immutable sum type modelling the seven JSON shapes. It is the "escape hatch" +container: when a value is present but does not match the declared `T`, we keep it as `RawJson` so it +can be inspected, validated, or re-serialized byte-faithfully. It is the embedded tree that +`Tristate` deliberately lacks. + +```kotlin +// TARGET OUTPUT — lives in sdk-core, package org.dexpace.sdk.core.serde. Not compiled here. +public sealed class RawJson { + public object Null : RawJson() + public data class Bool(public val value: Boolean) : RawJson() + + /** Numbers are kept as their lexical form so 1e3, 1000, and 1000.0 round-trip unchanged and + * no precision is lost coercing through Double. Typed reads parse on demand. */ + public data class Number(public val literal: String) : RawJson() + public data class Str(public val value: String) : RawJson() + public data class Array(public val elements: List) : RawJson() + + /** Insertion-ordered to preserve wire order on re-serialize. */ + public data class Object(public val entries: Map) : RawJson() +} +``` + +`RawJson` compiles to Java-8 bytecode: a Kotlin `sealed class` lowers to an abstract class plus +subclasses with no `permits` clause, so it is safe in the Java-8 `sdk-core` module (the same property +that lets `Tristate` be sealed there). It has no dependency on any parser; it is just data. + +### `JsonField` — the four-state wrapper (target output) + +```kotlin +// TARGET OUTPUT — lives in sdk-core, package org.dexpace.sdk.core.serde. Not compiled here. +public sealed class JsonField { + + /** State 1: present and decoded to the declared T. */ + public data class Known(public val value: T) : JsonField() + + /** State 4: key absent from the payload. Singleton (like Tristate.Absent). */ + public object Missing : JsonField() + + /** State 3: key present, explicit JSON null. Singleton (like Tristate.Null). */ + public object Null : JsonField() + + /** State 2: present but not matching T — kept as the raw tree for round-trip and validation. */ + public data class Raw(public val raw: RawJson) : JsonField() + + public val isKnown: Boolean get() = this is Known<*> + public val isMissing: Boolean get() = this is Missing + public val isNull: Boolean get() = this is Null + public val isRaw: Boolean get() = this is Raw + + /** Typed value if and only if state 1; null for Missing / Null / Raw. A Raw value is NOT + * silently coerced here — coercion is an explicit, opt-in step (see model-validation.md). */ + public fun getOrNull(): T? = (this as? Known)?.value + + /** The underlying raw tree for ANY state, for byte-faithful re-serialization. */ + public fun asRaw(): RawJson = when (this) { + is Raw -> raw + is Known -> error("requires Serde to project T back to RawJson; see asRaw(Serde)") + Null -> RawJson.Null + Missing -> RawJson.Null // callers must check isMissing to omit the key entirely + } + + public inline fun fold( + onMissing: () -> R, + onNull: () -> R, + onRaw: (RawJson) -> R, + onKnown: (T) -> R, + ): R = when (this) { + Missing -> onMissing() + Null -> onNull() + is Raw -> onRaw(raw) + is Known -> onKnown(value) + } + + public companion object { + @JvmStatic public fun missing(): JsonField = Missing + @JvmStatic @JvmName("nullValue") public fun nullValue(): JsonField = Null + @JvmStatic public fun known(value: T): JsonField = Known(value) + @JvmStatic public fun raw(raw: RawJson): JsonField = Raw(raw) + } +} +``` + +This mirrors `Tristate`'s shape on purpose: singleton objects for the value-less states (`Missing`, +`Null`) so they satisfy any `JsonField` target via `Nothing`, a `data class` for the carriers, and +`@JvmStatic` factories so Java callers and generated code can construct fields without touching the +constructors. `Known` is bounded `T : Any` for the same reason `Tristate.Present` is — a +`Known(null)` would be an illegal alias of `Null`. + +### Jackson conversion lives only in the adapter + +The conversion between Jackson's `JsonNode`/parser tokens and `RawJson`/`JsonField` goes in +`sdk-serde-jackson`, structured exactly like +[`TristateModule`](../../sdk-serde-jackson/src/main/kotlin/org/dexpace/sdk/serde/jackson/TristateModule.kt): + +- A `ContextualDeserializer` resolves the declared `T` per call-site (the same + `createContextual` + `containedType(0)` trick `TristateDeserializer` already uses to extract the + `T` from `Tristate`). On a present, well-typed token it produces `Known`; on a present token + that fails to bind to `T` it does **not** throw — it captures the subtree as `RawJson` and produces + `Raw`. `getNullValue` produces `Null`; `getEmptyValue` / the field default produces `Missing`. +- A `BeanSerializerModifier` + custom `BeanPropertyWriter` omits the property entirely for `Missing` + (the same mechanism `TristatePropertyWriter` uses for `Tristate.Absent`) and emits the captured + raw tree verbatim for `Raw`. + +Generated models therefore declare `JsonField` fields without importing anything from Jackson, and +a different `Serde` implementation (a future XML or CBOR adapter) can supply its own conversion. The +[`Serde`](../../sdk-core/src/main/kotlin/org/dexpace/sdk/core/serde/Serde.kt) interface stays the +single injection point: a model never reaches for a concrete mapper. + +The one place this needs the `Serde` SPI inside `sdk-core` is projecting a `Known` *back* to +`RawJson` (the `asRaw(serde)` overload above) — that round-trips `T` through +`serde.serializer.serializeToByteArray` and re-parses, which is why the no-arg `asRaw()` cannot do it +for `Known`. Validation ([model-validation.md](model-validation.md)) is the main caller. + +## The read-path vs PATCH-path boundary + +This is the rule that keeps `JsonField` and `Tristate` from being wrongly merged, and it must be +honored by the generator and the runtime: + +- **Read path (responses, forward-compat): `JsonField`.** Decoding a server response must never + throw on an unexpected shape and must preserve unknowns for re-serialization. State 2 (`Raw`) is + the whole point. A response model's fields are `JsonField`. +- **Write path (PATCH bodies, three-state intent): `Tristate`.** A PATCH body asks a different + question — *did the caller intend to set, clear, or leave alone this field*. There is no + "wrong-typed" state on the write side; the caller is constructing well-typed values. A PATCH + request model's fields stay `Tristate`. + +Mapping between them is one-directional and lossy on purpose: + +```kotlin +// TARGET OUTPUT — interop helper, sdk-core. Not compiled here. +public fun JsonField.toTristate(): Tristate = when (this) { + is JsonField.Known -> Tristate.Present(value) + JsonField.Null -> Tristate.Null + JsonField.Missing -> Tristate.Absent + // A read-side Raw has no well-typed value to PATCH with. Collapsing it to Absent ("leave + // alone") is the only safe default; a caller that wants to forward an unknown verbatim must + // do so explicitly rather than have it silently become a typed PATCH. + is JsonField.Raw -> Tristate.Absent +} +``` + +There is intentionally **no** total `Tristate -> JsonField` that fabricates a `Raw`: a +three-state write value can only ever land in `Known` / `Null` / `Missing`, never `Raw`. Keeping the +`Raw` case unreachable from the write side is what prevents a forward-compat unknown from leaking into +a PATCH payload as if the caller had set it. + +## Design decisions and trade-offs + +- **Why a separate type instead of a fourth `Tristate` case.** Adding `Raw` to `Tristate` would force + every PATCH-write call-site to handle a state it can never legitimately produce, and would weaken + the `T : Any` guarantee that makes `Tristate.Present` safe. Two narrow types each enforce their own + invariant; one wide type enforces neither. +- **Numbers as lexical strings.** Keeping `RawJson.Number` as its source literal avoids the classic + `Double` round-trip bugs (`20000000000000001` → `2.0E16`, trailing-zero loss) and lets typed reads + choose `Int`/`Long`/`BigDecimal` on demand. The cost is that numeric equality is lexical unless a + caller parses; that is the right default for a faithful-round-trip container. +- **`Raw` does not auto-coerce to `T`.** `getOrNull()` returns null for `Raw` rather than attempting a + late parse. Coercion is expensive and can fail; it belongs in the explicit, opt-in validation step + ([model-validation.md](model-validation.md)), never on an accessor that callers expect to be cheap. +- **Insertion-ordered `RawJson.Object`.** Preserves wire order on re-serialize, which matters for + signatures, golden-file tests, and human-diffable payloads. + +## Acceptance mapping + +- Dependency-free `JsonField` + `RawJson` in `sdk-core` — the `RawJson` / `JsonField` snippets above, + with no parser dependency, Java-8-safe sealed classes. +- Jackson conversion in the adapter only — the `ContextualDeserializer` + `BeanSerializerModifier` + module in `sdk-serde-jackson`, mirroring `TristateModule`, behind the `Serde` SPI. +- `Tristate` interop documented, including the read-path (forward-compat) vs PATCH-path (three-state) + boundary — the section above, with the one-directional `toTristate()` and the deliberately-absent + reverse mapping. +- `apiDump` — adding `JsonField` / `RawJson` to `sdk-core` is a public-API change and requires a + regenerated `sdk-core/api/*.api` snapshot committed with the implementation (out of scope for this + design doc, noted for the implementing change). diff --git a/docs/codegen/model-classes.md b/docs/codegen/model-classes.md new file mode 100644 index 00000000..92a12955 --- /dev/null +++ b/docs/codegen/model-classes.md @@ -0,0 +1,186 @@ +# Thin model classes over a hand-written runtime + +> **Status:** design specification. The Kotlin/Java snippets show the *target shape* a future +> generator would emit and the *target shape* of the hand-written runtime they sit on. Nothing here +> is compiled in this repository. + +Builds on [the four-state JSON field model](json-field-model.md) — read that first for the +`JsonField` / `RawJson` vocabulary used throughout. + +## Problem + +The reference SDK we are modelling against is roughly **850k lines across ~1,100 model files**. If +every generated model inlines its own copy of the forward-compatibility machinery — the +additional-properties map, the [validate/validity triad](model-validation.md), the +[dual typed/raw accessors](discriminator-const-fields.md), null/absent bookkeeping — three things +break: + +1. **The coverage floor.** `sdk-core` enforces an **aggregate 80% line-coverage floor** (`minBound(80)` + on the root-aggregate `:koverVerify`, wired into `check`). A thousand model files each carrying a + few hundred lines of near-identical, largely-untested boilerplate would swamp the aggregate and + make the floor meaningless — and would be impossible to test meaningfully per-class. +2. **The binary-compatibility baseline.** `apiCheck` validates every public signature against a + committed `.api` snapshot. Inlined machinery means every model contributes dozens of public + members to the baseline; a single change to the shared shape (say, adding a `validity()` overload) + would require regenerating a thousand `.api` entries. +3. **Build and review cost.** Detekt, ktlint, and `allWarningsAsErrors` run over every line. A + deprecation or style nit replicated a thousand times is a thousand failures. + +## Proposed shape: thin model, fat runtime + +Push the invariant machinery into a hand-written runtime and emit only **the field list plus +accessors** per model — target ≈ under 100 lines each. + +### The runtime base (target output, hand-written) + +```kotlin +// TARGET OUTPUT — hand-written runtime in sdk-core, package org.dexpace.sdk.core.model. +// Not compiled here. +public abstract class JsonModel { + + /** The full field map, including unknown keys the server sent that this model has no accessor + * for. Insertion-ordered so re-serialization preserves wire order. This is the single source + * of truth; every typed accessor reads through it. */ + protected abstract val fields: Map> + + /** Forward-compat: keys present on the wire with no declared accessor. Backed by the same map. */ + public fun additionalProperties(): Map = + fields.filterKeys { it !in declaredKeys() } + .mapValues { (_, f) -> f.asRaw() } + + /** Declared keys for this model — generated as a constant set per subclass (see below). */ + protected abstract fun declaredKeys(): Set + + /** Shared accessor helper: read a declared field, four-state-aware, no coercion of Raw. */ + protected fun field(name: String): JsonField { + @Suppress("UNCHECKED_CAST") + return (fields[name] as? JsonField) ?: JsonField.Missing + } + + override fun equals(other: Any?): Boolean = + other is JsonModel && other::class == this::class && other.fields == fields + override fun hashCode(): Int = fields.hashCode() +} +``` + +The runtime owns: the field map, `additionalProperties()` (forward-compat round-tripping), +equality/hashing, and the typed-read helper. It is hand-written, fully tested **once**, and counts +toward coverage **once**. + +### The generated model (target output, generated — thin) + +```kotlin +// TARGET OUTPUT — generated, one file per model. Illustrative; not compiled here. +public class User private constructor( + override val fields: Map>, +) : JsonModel() { + + // Typed accessors. Each is a one-liner delegating to the runtime helper. No logic here. + public fun id(): JsonField = field("id") + public fun email(): JsonField = field("email") + public fun roles(): JsonField> = field("roles") + + // Raw accessors (the dual-accessor pattern — see discriminator-const-fields.md) are likewise + // one-liners: public fun _id(): RawJson = field("id").asRaw() + + override fun declaredKeys(): Set = DECLARED_KEYS + + public companion object { + private val DECLARED_KEYS = setOf("id", "email", "roles") + @JvmStatic public fun builder(): Builder = Builder() + } + + public class Builder internal constructor() : org.dexpace.sdk.core.util.Builder { + private val acc = LinkedHashMap>() + public fun id(value: String): Builder = apply { acc["id"] = JsonField.known(value) } + public fun email(value: String): Builder = apply { acc["email"] = JsonField.known(value) } + // raw setter sibling: public fun id(raw: RawJson): Builder = apply { acc["id"] = JsonField.raw(raw) } + override fun build(): User = User(acc.toMap()) + } +} +``` + +Everything model-specific is data: the accessor names, their `JsonField` return types, and the +`DECLARED_KEYS` constant. There is no per-class machinery — no inlined validation, no inlined +additional-properties handling, no inlined serde. A 40-field model is ~80 accessor lines plus a +builder, not several hundred lines of replicated logic. + +The model follows the house conventions already in `sdk-core`: private constructor + `Builder` +implementing `Builder` (`apply { … }` setters with explicit return types under explicit-API strict +mode), `@JvmStatic` factories for Java callers, immutable backing map. + +## How it ties into the existing runtime + +- **Serde SPI.** Deserialization produces the `Map>` via the + [`Serde`](../../sdk-core/src/main/kotlin/org/dexpace/sdk/core/serde/Serde.kt) deserializer; the + Jackson adapter's `JsonField` module (see [json-field-model.md](json-field-model.md)) decodes each + field into `Known` / `Null` / `Raw` / `Missing`. The model class itself imports nothing from + Jackson. +- **`Tristate` for writes.** Request/PATCH models stay + [`Tristate`](../../sdk-core/src/main/kotlin/org/dexpace/sdk/core/serde/Tristate.kt)-typed per the + read-path vs PATCH-path boundary in the field-model spec. The generator emits `JsonModel`-based + read models and `Tristate`-field write models from the same schema, never mixing the two. +- **Pipelines and paging.** A generated model is an ordinary immutable value; it flows through the + existing request/response pipeline and is what a + [`Paginator`](../../sdk-core/src/main/kotlin/org/dexpace/sdk/core/pagination/Paginator.kt) + yields as its `T` (`iterateAll()` / `streamAll()`). Nothing in the model layer reaches into the + pipeline; the model is just the payload type. +- **Request bodies.** A write model serializes through `Serde` and is wrapped in a + [`RequestBody`](../../sdk-core/src/main/kotlin/org/dexpace/sdk/core/http/request/RequestBody.kt) + (`RequestBody.create(...)`); because the serialized form is a buffered byte payload it is naturally + `isReplayable()` for retries. + +## Coverage and binary-compatibility strategy for generated code + +This is an explicit, up-front decision, not something to discover later: + +### Coverage — exclude generated modules from the aggregate floor + +Generated model code is thin delegation with no branching logic to test; the logic it delegates to +(the runtime base, the serde module) is hand-written and tested directly. Running the aggregate 80% +floor over generated code would either force meaningless generated tests or drag the floor down. + +**Decision:** generated model code lives in its own module(s), and those modules are **excluded from +the root aggregate Kover floor**. The runtime base (`JsonModel` and friends) stays in `sdk-core` and +remains fully covered by the existing floor. Concretely: the generated module applies Kover but is +not summed into the root-aggregate `:koverVerify`, mirroring how the root Kover filter already +excludes test fixtures (`org.dexpace.sdk.core.testing.*`) — except here the exclusion is module-scoped +rather than a package glob, so we never reintroduce a broad package-glob exclude in `sdk-core`. + +This keeps the invariant in `CLAUDE.md` honest: the 80% floor still means something for hand-written +code, and the generated-module exclusion is narrow and deliberate. + +### Binary compatibility — a separate `.api` baseline for generated code + +The generated public surface is large and churns with the upstream schema. Validating it against the +same baseline as the curated `sdk-core` surface would mean a schema bump regenerates a thousand `.api` +entries inside the same review as a hand-written change. + +**Decision:** generated modules get their **own `.api` baseline**, separate from `sdk-core`'s. The +binary-compatibility-validator runs per-module, so the generated module's `api/*.api` snapshot lives +with the generated module and is regenerated (`apiDump`) as part of a schema-update change — never +mixed into a hand-written `sdk-core` API change. The two-getters-plus-two-setters dual-accessor +pattern from [discriminator-const-fields.md](discriminator-const-fields.md) is exactly the kind of +wide, regular surface that benefits from living in its own baseline. + +## Design decisions and trade-offs + +- **One backing map vs. one field per property.** A single `Map>` is what makes + models thin and makes `additionalProperties()` free. The cost is a map lookup per accessor and a + cast; both are negligible next to the JSON parse that produced the model, and the cast is contained + in one runtime helper. +- **Accessors return `JsonField`, not `T?`.** Callers see the four-state distinction (known / + null / raw / missing) rather than a lossy `T?`. A convenience `…OrNull()` sibling can be generated + for callers that only want the happy path, but the four-state accessor is primary so forward-compat + information is never silently discarded at the accessor boundary. +- **Generated module separation is structural, not cosmetic.** Putting generated code in its own + module is what makes both the coverage exclusion and the separate `.api` baseline clean — they fall + out of module boundaries rather than needing per-package allowlists. + +## Acceptance mapping + +- Generated models are thin — the `User` snippet: field list + one-line accessors + builder, the + invariant machinery pushed into the hand-written `JsonModel` runtime. +- Coverage + binary-compat strategy for generated code documented — the two "Decision" subsections: + module-scoped Kover exclusion for the aggregate floor, and a separate per-module `.api` baseline + regenerated on schema bumps. diff --git a/docs/codegen/model-validation.md b/docs/codegen/model-validation.md new file mode 100644 index 00000000..d73d4cc6 --- /dev/null +++ b/docs/codegen/model-validation.md @@ -0,0 +1,174 @@ +# The validate() / isValid() / validity() triad + +> **Status:** design specification. The snippets show the *target shape* of the generated triad and +> the hand-written runtime that scores validity. Nothing here is compiled in this repository. + +Builds on [the four-state JSON field model](json-field-model.md) and +[thin model classes](model-classes.md). + +## Problem + +Two features need to ask "how well does this payload match this model?": + +1. **Union disambiguation.** When a schema says a value is one of several shapes and there is no + discriminator to key on (see [discriminator-const-fields.md](discriminator-const-fields.md) for + the case where there *is* one), the only way to pick the right member is to score how well the + payload fits each candidate. +2. **Explicit caller-side validation.** A caller building a request, or inspecting a response, may + want to know whether a model is fully populated and well-typed before acting on it. + +But scoring is **expensive** — it walks the whole tree, attempts typed coercions of every `Raw` +field, and recurses into nested models — and it must **never run implicitly**. If validation ran on +the deserialize path, every response would pay full-tree validation cost whether or not anyone asked, +and forward-compatibility would be defeated: an unknown field that decode tolerated as `Raw` would +suddenly make deserialization "fail." Validation has to be opt-in and off the hot path. + +## Proposed shape: a memoized, fail-soft triad + +Generate three members on each model, all delegating to a hand-written runtime scorer: + +```kotlin +// TARGET OUTPUT — generated on each model; bodies delegate to the runtime. Not compiled here. +public class User /* ... : JsonModel() ... */ { + + /** Force full validation, returning the structured result. Memoized. Never called on the + * deserialize path. */ + public fun validity(): Validity = VALIDATION.validate(this) + + /** Convenience: validity().isValid. */ + public fun isValid(): Boolean = validity().isValid + + /** Throws if invalid, returns this otherwise — for callers that want fail-fast ergonomics on + * top of the fail-soft core. */ + public fun validate(): User = also { check(isValid()) { validity().describe() } } + + private companion object { + // The per-model validation plan: which fields are required, their expected types, and which + // are themselves models to recurse into. Generated as data, scored by the runtime. + private val VALIDATION = ModelValidation.of(/* required keys, field type table, nested-model refs */) + } +} +``` + +### The structured result (target output, hand-written runtime) + +```kotlin +// TARGET OUTPUT — hand-written runtime in sdk-core. Not compiled here. +public class Validity internal constructor( + /** A score in [0.0, 1.0]: fraction of fields that matched their declared shape, recursively. */ + public val score: Double, + /** Per-field problems: missing-required, wrong-type (a Raw where a typed value was required), + * or a nested model's own problems. Empty iff fully valid. */ + public val problems: List, +) { + public val isValid: Boolean get() = problems.isEmpty() + public fun describe(): String = problems.joinToString("; ") { it.path + ": " + it.reason } + + public data class Problem(public val path: String, public val reason: String) +} +``` + +### The scorer (target output, hand-written runtime) + +```kotlin +// TARGET OUTPUT — hand-written runtime in sdk-core. Not compiled here. +public class ModelValidation internal constructor(/* generated plan */) { + + // Memoized per instance — see "Memoization" below. + public fun validate(model: M): Validity { /* walk fields against the plan */ } +} +``` + +The scorer is **fail-soft**: it never throws on a bad shape. A required field that is `Missing` or a +field that arrived as `Raw` where a typed value was required becomes a `Problem`, not an exception. +The result is *structured* — a score plus an itemized problem list with JSON-pointer-ish paths — so +both the union strategy (which wants the score) and a human caller (who wants the reasons) are served +by one pass. + +## Recursion + +Validation is recursive: a nested model field is validated by recursing into *its* triad, and its +problems are folded into the parent's `problems` with the path prefixed. Because the per-model plan +knows which fields are themselves `JsonModel` subtypes, the runtime walks the object graph without +the generated model carrying any traversal code — the generated side is still just the plan-as-data +from [model-classes.md](model-classes.md). + +Cycles (a model that can transitively contain itself) are handled in the runtime by tracking visited +instances, so recursion terminates regardless of schema shape. + +## Memoization + +`validity()` is **memoized per model instance** — the first call scores the tree, subsequent calls +return the cached `Validity`. This matters because the union strategy may call `validity()` on the +same candidate more than once, and because a caller doing `isValid()` then `validity().describe()` +should pay for one pass, not two. Models are immutable (the backing field map never changes after +`build()`), so the memoized result can never go stale. + +The cache lives in the runtime, keyed off the instance, so the generated model carries no mutable +state and stays a clean immutable value. + +## Validity-scoring as the *fallback* union strategy + +Validation scoring is the **last resort** for union disambiguation, not the first: + +1. **Prefer a discriminator.** If the union has a discriminator field + ([discriminator-const-fields.md](discriminator-const-fields.md)), read it once, look up the member + by its const value, and deserialize exactly that one member. This is O(1) member selection plus a + single deserialization — no scoring. +2. **Fall back to validity scoring only when there is no discriminator.** Deserialize the payload + against each candidate member (each tolerant, producing `Raw` for fields it cannot bind), call + `validity()` on each, and pick the highest `score`. This is N deserializations plus N validations, + which is exactly why it is the fallback and why scoring must be cheap to *avoid* (discriminator + first) rather than cheap to *run*. + +```kotlin +// TARGET OUTPUT — union resolution sketch in the runtime. Not compiled here. +public fun resolveUnion(raw: RawJson, members: List>): U { + val discriminated = members.firstNotNullOfOrNull { it.matchByDiscriminator(raw) } + if (discriminated != null) return discriminated // path 1: no scoring + + // path 2: fall back to scoring, highest validity wins + return members + .map { it.deserializeTolerant(raw) to it } + .maxByOrNull { (model, _) -> model.validity().score } + ?.first ?: error("no union member matched: " + raw) +} +``` + +## How it ties into the existing runtime + +- **`JsonField` / `RawJson`.** A `Problem` of kind "wrong type" is precisely a field that decoded to + `JsonField.Raw` where the plan required a typed value; the scorer reads the four-state field map + from [json-field-model.md](json-field-model.md) directly. Projecting a `Known` back to `RawJson` + for re-emission uses the `asRaw(serde)` overload, which is why scoring needs a + [`Serde`](../../sdk-core/src/main/kotlin/org/dexpace/sdk/core/serde/Serde.kt) handle when it must + re-serialize a value. +- **Off the deserialize path.** The Jackson `JsonField` module (see the field-model spec) never calls + the triad; deserialization stays tolerant and fast. The triad is reachable only through the three + generated members, which a caller — or the union strategy — invokes explicitly. +- **Thin generated side.** Per [model-classes.md](model-classes.md), the generated model contributes + only the three one-line members plus a `VALIDATION` plan constant; all walking, scoring, + memoization, and cycle-handling is hand-written runtime, tested and covered once. + +## Design decisions and trade-offs + +- **Fail-soft core, fail-fast wrapper.** The scorer never throws (`validity()` / `isValid()` are + total), and `validate()` adds the throwing ergonomics on top. This serves both the union strategy + (which must compare invalid candidates without exceptions) and callers who want `validate()` to be a + guard clause. +- **Structured result, not a boolean.** Returning a score plus itemized problems means one pass + serves disambiguation (needs the number) and debugging (needs the reasons). A boolean would force a + second pass for diagnostics. +- **Discriminator-first.** Making scoring the explicit fallback — and documenting it as such — is what + keeps the common case (discriminated unions) at one deserialization instead of N. Scoring is the + safety net for schemas that genuinely lack a discriminator, not the default path. +- **Memoize because immutable.** Immutability is what makes per-instance memoization correct; we lean + on the existing immutable-model invariant rather than adding cache-invalidation logic. + +## Acceptance mapping + +- Opt-in triad generated — the three generated members (`validate()` / `isValid()` / `validity()`), + delegating to the hand-written `ModelValidation` runtime, memoized per instance. +- Not invoked during deserialize — the deserialize path (the Jackson `JsonField` module) never calls + the triad; it stays tolerant and produces `Raw` rather than scoring. Validation is reachable only + through the explicit members and the union fallback.