Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions docs/codegen/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Code-generation design specs

This directory holds **design specifications** for a future code generator that emits per-API
SDKs on top of the hand-written `sdk-core` toolkit. Nothing here is built yet, and nothing here
ships as part of `sdk-core`.

Two ground rules apply to every spec in this directory:

1. **`sdk-core` stays a toolkit, not a generator.** No KotlinPoet, no generator runtime, and no
schema/validation library ever lands in `sdk-core` or any published toolkit module. Anything a
generated SDK needs at runtime is expressed in terms of types `sdk-core` already exposes
(`Serde`, `Tristate`, `RequestBody`, `Paginator`, the context chain, `HttpClient` /
`AsyncHttpClient`, `HttpPipeline`, `Io` / `IoProvider`).
2. **Generated artifacts are physically separate from hand-written code.** A generator writes into
a generated SDK's own module(s); it never edits the toolkit. Any code-shaped snippet in these
docs is *target generator output*, labelled as such — it is not compiled in this repository.

## Specs

| Spec | Topic |
|---|---|
| [strict-structured-output-schema.md](strict-structured-output-schema.md) | Strict JSON-schema encoding rules for structured outputs: all-required + `additionalProperties:false` + optional-as-nullable-union. Adapter-only derivation, hand-rolled subset validator. |
| [fail-soft-validator-skeleton.md](fail-soft-validator-skeleton.md) | Design of a reusable fail-soft recursive validator skeleton for generator output (path-prefixed error collection, recursion guard, deterministic definition names). Deferred — design only, no runtime type today. |
| [generated-sdk-provenance.md](generated-sdk-provenance.md) | Provenance file stamped into generated SDKs: generator version + input-contract hash, format, and location. Generated output only. |
| [spring-boot-starter.md](spring-boot-starter.md) | Per-API Spring Boot starter shape: `@ConfigurationProperties`, a `fun interface` customizer, and an `@AutoConfiguration` bean assembling `{IoProvider + transport + HttpPipeline}`. Spring deps confined to the generated starter. |

## Grounding in `sdk-core`

Each spec cites the real runtime types it builds on so that the eventual generator targets the
current API rather than an invented one. The most-referenced anchors:

- **`org.dexpace.sdk.core.serde`** — `Serde`, `Serializer`, `Deserializer`, and `Tristate<T>`
(the three-state container for `absent` / `null` / `present` PATCH fields).
- **`org.dexpace.sdk.core.client`** — `HttpClient` / `AsyncHttpClient` transport SPIs.
- **`org.dexpace.sdk.core.http.pipeline`** — `HttpPipeline` / `HttpPipelineBuilder` and the
stage-ordered `HttpStep` model.
- **`org.dexpace.sdk.core.io`** — the `Io.installProvider(...)` seam and `IoProvider`.
- **`org.dexpace.sdk.core.pagination`** — `Paginator` and its `PaginationStrategy`.
- **`org.dexpace.sdk.core.http.context`** — the `CallContext` → `DispatchContext` →
`RequestContext` → `ExchangeContext` promotion chain.
113 changes: 113 additions & 0 deletions docs/codegen/fail-soft-validator-skeleton.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Fail-soft recursive validator skeleton

Status: design spec, **deferred**. Closes #67.

> This captures the *design* of a validator idiom for generator output. There is **no runtime type
> for it today** and none is added by this document. It is built only when the code generator
> exists; until then it lives here as a shape to implement against. Like all codegen runtime, it
> would land in a generator/adapter module, never in `sdk-core`.

## Problem

Validators over a spec or schema tree — "is this input contract well-formed?", "is this derived
strict schema in the allowed subset?" (see
[strict-structured-output-schema.md](strict-structured-output-schema.md)) — want a consistent
**fail-soft** shape:

- collect **all** problems with a path prefix, rather than throwing on the first one, so a user
fixing a contract sees every error in one pass;
- guard against **cyclic** trees (a `$ref` chain that loops) so recursion terminates;
- name definitions **deterministically** so two distinct nodes with the same simple name do not
alias each other in error messages or in a visited-set.

The idiom is tiny — on the order of fifteen lines of recursion — but it has to be the *same* tiny
idiom everywhere a validator is written, so that error output and cycle handling are uniform.

## The skeleton

Parameterised over **our own tree type**, not a third-party node model. The generator already has a
normalized in-memory representation of the input contract; the validator walks that. The shape:

```kotlin
// DESIGN ONLY — not built today; would live in a generator module, never sdk-core.

/** One problem, addressed by a slash-joined path from the tree root. */
data class ValidationError(val path: String, val message: String)

class Validator(private val errors: MutableList<ValidationError> = mutableListOf()) {

// Recursion guard keyed by deterministic node id — see "Deterministic definition names".
private val visiting = HashSet<String>()

/** Verify [cond]; on failure record a path-prefixed error and signal "stop this branch". */
private inline fun verify(cond: Boolean, path: String, message: () -> String): Boolean {
if (!cond) errors += ValidationError(path, message())
return cond
}

fun validate(node: Node, path: String = "") {
val id = node.definitionName // FQN/deterministic, never the simple name
if (!visiting.add(id)) return // one-shot recursion guard: already on this branch
try {
// early-return verify helper: a failed precondition stops descent into a broken node,
// but earlier siblings' errors are already collected.
if (!verify(node.isWellFormed, path) { "malformed node '$id'" }) return
for ((key, child) in node.children) {
validate(child, if (path.isEmpty()) key else "$path/$key")
}
} finally {
visiting.remove(id) // pop on the way out so siblings can revisit shared defs
}
}

fun result(): List<ValidationError> = errors.toList()
}
```

Three load-bearing pieces, matching the issue's acceptance criteria:

1. **One-shot recursion guard.** `visiting.add(id)` returns `false` if `id` is already on the
current descent path; we return immediately, so a cyclic `$ref` cannot loop forever. The guard is
popped in `finally` so the same shared definition can be re-entered down a *different* branch
(we guard against cycles, not against repeated visits).
2. **Path-prefixed error list.** Every `verify` failure is recorded as `(path, message)` and
appended; nothing throws mid-walk. The caller gets the full list from `result()` and decides
whether a non-empty list is fatal.
3. **Early-return `verify` helper.** `verify` both records the error *and* returns the boolean, so a
call site can `if (!verify(...)) return` to stop descending into a node that is too broken to walk
while still having collected the error and everything found before it.

## Deterministic definition names

`definitionName` must be a **fully-qualified, deterministic** identifier — the package-qualified
type name plus a stable mangling for generic arguments — never the bare simple name. This matters in
two places:

- **The recursion guard.** Two unrelated nodes that happen to share a simple name (`Metadata` in two
packages) must hash to *different* ids, or the guard would wrongly treat the second as a cycle of
the first and skip it.
- **Error messages.** `"malformed node 'com.example.a.Metadata'"` is actionable; `"malformed node
'Metadata'"` is ambiguous across a large API surface.

This is the same deterministic-name rule the strict-schema spec applies to `$defs` keys
([strict-structured-output-schema.md](strict-structured-output-schema.md) §R6), so a single naming
function serves both the schema derivation and its validator.

## Why fail-soft (decisions / trade-offs)

- **All errors in one pass.** Throwing on the first problem forces an edit-rerun-edit loop over a
contract; collecting every error lets a user fix a batch at once. The cost is that the walk must be
defensive — hence the early-return helper, so a broken node does not cause a cascade of spurious
child errors.
- **Our tree type, not a generic JSON model.** Validating the generator's own normalized model keeps
the validator decoupled from whatever parser produced the contract and lets the same skeleton check
both input contracts and derived strict schemas.
- **No library.** The skeleton is small enough to hand-write and keeps shipped modules free of a
schema/validation dependency, consistent with the rest of the codegen design.

## Acceptance mapping

- *Validator skeleton* — the `Validator` shape above (recursion guard + path-prefixed list +
early-return `verify`).
- *Deterministic definition names* — `definitionName` is FQN/deterministic, used for both the guard
and error messages; shared with the strict-schema `$defs` naming.
104 changes: 104 additions & 0 deletions docs/codegen/generated-sdk-provenance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Generated-SDK provenance file

Status: design spec. Closes #68.

## Problem

A generated SDK is a build artifact: the same input contract run through a different generator
version can produce different output. When a bug is reported against a generated SDK, the first two
questions are always "which generator produced this?" and "from which input contract?". Without a
recorded answer, the only way to reproduce is to guess the generator version and re-derive — which is
exactly the situation provenance metadata exists to avoid.

This spec defines a small, machine-readable **provenance file** stamped into generated output.

## Scope: generated output only

The provenance file is written **only into generated SDKs, never into the hand-written toolkit.**
`sdk-core` and the other published toolkit modules are authored by hand and version themselves
through `gradle.properties` (`group` / `version`); stamping a generator provenance file into them
would be meaningless and is forbidden. The generator writes the file into the SDK module it emits and
touches nothing else.

## What metadata

| Field | Meaning |
|---|---|
| `generatorName` | Stable identifier of the generator (e.g. `org.dexpace:sdk-codegen`). |
| `generatorVersion` | Exact released version of the generator that produced this SDK. |
| `inputContractHash` | Content hash of the **normalized** input contract (OpenAPI / JSON-schema), `sha256:` prefixed. Normalization (sort keys, strip insignificant whitespace) so semantically-identical contracts hash identically regardless of formatting. |
| `inputContractName` | Human-readable contract identifier (e.g. the API title + version from the OpenAPI `info` block). |
| `generatedAt` | ISO-8601 UTC timestamp of generation. Informational only — it is **excluded** from any reproducibility comparison (see below). |
| `schemaVersion` | Version of this provenance-file format itself, so consumers can parse older stamps. |

`generatorVersion` + `inputContractHash` together are the reproducibility key: same generator
version + same normalized contract hash ⇒ byte-identical SDK (modulo the timestamp). `generatedAt` is
deliberately **not** part of that key, so reproducibility checks compare everything except the
timestamp.

## Format

JSON, so it is trivially machine-readable and round-trips through the existing `Serde` /
`Deserializer` SPI at runtime without any new dependency — a tool can read it back with
`Deserializer.deserialize(json, GeneratedProvenance::class.java)` using the toolkit's own serde, with
no schema/provenance library involved.

Target output:

```json
{
"schemaVersion": "1",
"generatorName": "org.dexpace:sdk-codegen",
"generatorVersion": "0.4.2",
"inputContractName": "Example API 2025-11",
"inputContractHash": "sha256:9f2b1c…",
"generatedAt": "2026-06-17T09:30:00Z"
}
```

## Location

Two copies, serving two different consumers:

1. **On the classpath, as a resource:**
`META-INF/dexpace/<artifactId>/generated-provenance.json` in the generated module's
`src/main/resources` (so it ships inside the published jar). Programmatic access:

```kotlin
// GENERATED accessor — illustrative target output, not compiled here.
public object GeneratedProvenance {
public fun read(): String =
checkNotNull(
javaClass.getResourceAsStream(
"/META-INF/dexpace/example-api/generated-provenance.json",
),
).bufferedReader().use { it.readText() }
}
```

Namespacing under `META-INF/dexpace/<artifactId>/` keeps multiple generated SDKs on one classpath
from clobbering each other's stamp.

2. **At the generated source root**, as a checked-in `PROVENANCE.json`, so the metadata is visible in
the SDK's repository / diff without unpacking a jar. Both copies are written from the same values
in one pass, so they cannot drift.

## Decisions / trade-offs

- **Normalized hash, not raw-bytes hash.** A reformatted-but-equivalent contract should not look like
a different input. Hashing the normalized form makes the reproducibility key robust to cosmetic
changes.
- **Timestamp present but excluded from the key.** Engineers want to know *when* an SDK was cut;
reproducibility checks must not be defeated by that timestamp. Keeping `generatedAt` informational
satisfies both.
- **JSON over `.properties` or a manifest entry.** JSON nests cleanly (room for future fields like a
list of source files or a transport matrix) and reuses the toolkit's existing serde, so no new
parsing dependency is introduced.
- **Resource + source copy.** The resource serves runtime/diagnostic code; the source copy serves
humans reading the generated repo. One write pass, two destinations, no drift.

## Acceptance mapping

- *Provenance stamped in generated output* — `generatorVersion` + `inputContractHash` (plus
supporting fields) written to `META-INF/dexpace/<artifactId>/generated-provenance.json` and a
source-root `PROVENANCE.json`, in generated output only.
Loading
Loading