diff --git a/docs/decoders.md b/docs/decoders.md new file mode 100644 index 0000000..2f8b35a --- /dev/null +++ b/docs/decoders.md @@ -0,0 +1,138 @@ +# Writing a custom decoder + +`httpware`'s typed-response extension point is the **`ResponseDecoder` protocol**. A decoder turns raw response bytes into a typed object: when you pass `response_model=` to `send` / `send_with_response`, the client walks its decoder list, picks the first one that claims your model, and hands it the body. + +The built-in `PydanticDecoder` and `MsgspecDecoder` are themselves implementations of this protocol; nothing about them is privileged. Reach for a custom decoder when you need a body **format** the built-ins don't speak (CSV, XML, MessagePack, a bespoke binary frame) or a **type system** they don't cover (`attrs`, `marshmallow`, your own class hierarchy). If pydantic or msgspec already decodes your model, you don't need one — see [When NOT to write a decoder](#when-not-to-write-a-decoder). + +## The protocol + +One symbol, exported from `httpware`: + +```python +from typing import Protocol, TypeVar, runtime_checkable + +T = TypeVar("T") + + +@runtime_checkable +class ResponseDecoder(Protocol): + def can_decode(self, model: type) -> bool: ... + def decode(self, content: bytes, model: type[T]) -> T: ... +``` + +Two methods, two distinct jobs: + +- **`can_decode(model) -> bool`** — the dispatch predicate. The client walks `decoders=[...]` in order and picks the **first** decoder that returns `True`. Claim every model you can actually handle (broad is correct — list ordering, not narrow predicates, encodes the caller's preference), but **reject another library's native types**: a CSV decoder has no business claiming a `pydantic.BaseModel`. `can_decode` **MUST NOT raise** — it runs at dispatch time, before the HTTP call and *outside* the `DecodeError` wrap that protects `decode`, so an exception here escapes `httpware`'s `ClientError` contract instead of being translated. A decoder that can't decide must return `False` (decline), not raise. +- **`decode(content, model) -> T`** — the decode itself, raw response bytes in, a `model` instance out. Any exception you raise here is caught by the client and wrapped as `httpware.DecodeError` (carrying `response`, `model`, and the `original` exception). You do **not** need to raise `DecodeError` yourself — raise whatever your parser raises and let the seam translate it. + +The protocol is `@runtime_checkable` and structural: any object with these two methods satisfies it. You do not subclass anything. + +## How the client resolves a model + +Both clients take `decoders: Sequence[ResponseDecoder] | None = None`, composed once at `__init__` and frozen for the client's lifetime. + +- **Order is preference.** `decoders=[CsvDecoder(), PydanticDecoder()]` asks the CSV decoder first; pydantic only sees models CSV declined. List position is how you disambiguate a shape two decoders could both claim. +- **`decoders=None`** resolves against installed extras — pydantic-first when both are present, either-only when one is, an empty tuple when neither. To *add* a decoder without losing the built-ins, list them explicitly: `decoders=[CsvDecoder(), PydanticDecoder()]`. +- **No claimer is a pre-flight error.** When `response_model=` is set and no decoder claims it, the client raises `MissingDecoderError` **before** sending the request — you find out at wiring time, not after a wasted round-trip. This is distinct from `DecodeError`: `MissingDecoderError` means *nothing handles this model* (fix: install an extra or pass `decoders=[...]`); `DecodeError` means *a decoder ran and the payload was malformed* (fix: the server or the model). See [Errors](errors.md). + +## Decoders are sync — for both clients + +Unlike middleware, which has separate `AsyncMiddleware` and `Middleware` flavors, there is **one** `ResponseDecoder` protocol, shared by `AsyncClient` and `Client` alike. `decode` is a synchronous method: by the time it runs, the body has already been read off the wire, so decoding is pure CPU work with nothing to await. Write one decoder and pass it to either client. + +## Worked example: a CSV decoder + +A decoder for `text/csv` endpoints that returns a `list` of dataclass rows. Both built-ins are JSON, so this is the case they can't cover — and it shows the seam's real shape: raw bytes in, typed object out, no JSON anywhere. + +```python +import csv +import dataclasses +import io +import typing + +from httpware import AsyncClient +from httpware.decoders.pydantic import PydanticDecoder + +T = typing.TypeVar("T") + + +class CsvDecoder: + """Decode a text/csv body into a list of dataclass rows. + + Claims only `list[]`; declines everything else so the JSON + decoders keep their models. + """ + + def can_decode(self, model: type) -> bool: + if typing.get_origin(model) is not list: + return False + args = typing.get_args(model) + return len(args) == 1 and dataclasses.is_dataclass(args[0]) + + def decode(self, content: bytes, model: type[T]) -> T: + (row_type,) = typing.get_args(model) + field_types = {f.name: f.type for f in dataclasses.fields(row_type)} + reader = csv.DictReader(io.StringIO(content.decode("utf-8"))) + return [ + row_type(**{name: field_types[name](value) for name, value in row.items()}) + for row in reader + ] +``` + +`can_decode` is total and never raises: a non-`list` model, a bare `list`, or `list[int]` all fall through to `False`. `decode` coerces each CSV cell with its field's type (CSV values arrive as strings) — a real decoder would handle optionals, dates, and missing columns; this is where your domain logic goes. Wire it ahead of the built-ins so it gets first refusal on `list[...]` models while pydantic still handles everything else: + +```python +@dataclasses.dataclass +class Sale: + id: int + amount: float + region: str + + +async def main() -> None: + async with AsyncClient( + base_url="https://reports.example.com", + decoders=[CsvDecoder(), PydanticDecoder()], + ) as client: + sales = await client.send( + client.build_request("GET", "/sales.csv"), + response_model=list[Sale], + ) + # sales: list[Sale] +``` + +The same decoder instance works with a sync `Client(decoders=[CsvDecoder(), PydanticDecoder()])`. + +## A note on claiming the right models + +`can_decode` is a contract with the *rest of the list*. Claim too broadly and you steal models from decoders behind you; claim too narrowly and your decoder never runs. The rule of thumb: claim exactly the types you natively own, and reject another library's. An adapter for a third-party type system narrows its claim to that system — for example, a [`cattrs`](https://catt.rs)-backed decoder for `attrs` classes: + +```python +import json + +import attrs + + +class CattrsDecoder: + def __init__(self, converter): # a configured cattrs.Converter + self._converter = converter + + def can_decode(self, model: type) -> bool: + return attrs.has(model) # only attrs classes; everything else declines + + def decode(self, content, model): + return self._converter.structure(json.loads(content), model) +``` + +Note this decoder is **two-pass** (`json.loads`, then `structure`). The built-in adapters deliberately decode in a single bytes-in pass (`TypeAdapter.validate_json`, `msgspec.json.Decoder.decode`) to skip the intermediate `dict` allocation — but that's a *performance choice for the built-ins*, not a protocol obligation. A custom decoder may go two-pass when its underlying library only structures from native Python objects; you pay one extra allocation, nothing more. + +## When NOT to write a decoder + +- **Your model is JSON.** Dataclasses, `TypedDict`s, primitives, pydantic models, and msgspec `Struct`s are all covered by the built-in `PydanticDecoder` / `MsgspecDecoder`. Install the extra (`httpware[pydantic]` or `httpware[msgspec]`) instead of writing a decoder. +- **You only want raw bytes or text.** Don't pass `response_model=` at all — call `send` (or a verb method) without it and read `response.content` / `response.text` directly. Decoders are for *typed* bodies. +- **The transform is per-call, not per-type.** If the shaping depends on the request rather than the model, it's a [middleware](middleware.md) concern, not a decoder. + +## See also + +- **[`architecture/decoders.md`](https://github.com/modern-python/httpware/blob/main/architecture/decoders.md) (Seam B)** — the formal protocol contract: dispatch order, the `can_decode` no-raise obligation, the single-pass rule, and the per-instance adapter cache. +- **`src/httpware/decoders/pydantic.py` and `msgspec.py`** — the built-in adapters as reference implementations, including how they memoize a `can_decode` verdict and cache the underlying parser per model. +- **[Quick-Start: typed responses](index.md)** — composing `response_model=` with the default decoder list. diff --git a/mkdocs.yml b/mkdocs.yml index b7cbef1..151fc74 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -7,6 +7,7 @@ edit_uri: edit/main/docs/ nav: - Quick-Start: index.md - Middleware: middleware.md + - Decoders: decoders.md - Resilience: resilience.md - Errors: errors.md - Testing: testing.md diff --git a/planning/README.md b/planning/README.md index f6f5602..545cef8 100644 --- a/planning/README.md +++ b/planning/README.md @@ -74,6 +74,7 @@ _None._ ### Archived (shipped) +- **[custom-decoder-guide](changes/archive/2026-06-15.01-custom-decoder-guide/change.md)** (#67, 2026-06-15) — Docs: a "write your own `ResponseDecoder`" guide for Seam B, mirroring `docs/middleware.md`. Closed deferred item G6. - **[audit-doc-fixes](changes/archive/2026-06-14.06-audit-doc-fixes/change.md)** (#66, 2026-06-14) — Closed the [deep-audit](audits/2026-06-14-deep-audit.md) doc-accuracy findings: `Client.stream()` docs, terminal-call attribution, the four auto-raise sites, the pydantic upper bound, and root import paths. - **[audit-test-quality](changes/archive/2026-06-14.05-audit-test-quality/change.md)** (#65, 2026-06-14) — Closed 11 [deep-audit](audits/2026-06-14-deep-audit.md) test-quality findings: sync-terminal + CookieConflict coverage, the `StatusError.__init__` invariant, missing status constructions, sync mirrors, typing overloads, a deterministic bulkhead barrier, a pinned budget clock, an observability assertion, and the `TimeoutError` circuit trigger. - **[audit-correctness](changes/archive/2026-06-14.04-audit-correctness/change.md)** (#64, 2026-06-14) — Closed 8 [deep-audit](audits/2026-06-14-deep-audit.md) correctness + public-API findings: RetryBudget token ordering, two `OverflowError` crashes, the redaction triple-slash, the msgspec guard, streaming-body symmetry, the RetryBudget docstring caveat, and `middleware/__all__`. diff --git a/planning/changes/archive/2026-06-15.01-custom-decoder-guide/change.md b/planning/changes/archive/2026-06-15.01-custom-decoder-guide/change.md new file mode 100644 index 0000000..6e741ba --- /dev/null +++ b/planning/changes/archive/2026-06-15.01-custom-decoder-guide/change.md @@ -0,0 +1,95 @@ +--- +status: shipped +date: 2026-06-15 +slug: custom-decoder-guide +supersedes: null +superseded_by: null +pr: 67 +outcome: Shipped docs/decoders.md (the Seam B "write your own ResponseDecoder" guide); closed deferred item G6. +--- + +# Change: Add a "Writing a custom decoder" guide + +**Lane:** lightweight — docs-only. New page + one-line nav edit + an +`architecture/decoders.md` cross-link on ship. No source change, no public-API +change. Mirrors the existing `docs/middleware.md` extension-seam guide. + +Closes deferred item **G6** (custom-`ResponseDecoder` guide), the +[2026-06-13 docs audit](../../../audits/2026-06-13-docs-audit.md) finding parked +in [`deferred.md`](../../../deferred.md). Revisit trigger now met: the guide was +explicitly requested. + +## Goal + +Seam B (`ResponseDecoder`) is a documented extension point, but unlike +middleware it has no "write your own" guide. Add `docs/decoders.md` showing the +`can_decode` / `decode` protocol, how `decoders=[...]` ordering resolves a +model, and a worked custom-decoder example. Prose carries the signatures — no +mkdocstrings / auto API reference (per the `2026-06-14.01` docs-UX decision). + +## Approach + +A prose + code-block page modeled on `docs/middleware.md`, the sibling +"write your own" guide for Seam A. Sections, scaled to complexity: + +1. **Intro / when to write one** — Seam B in a paragraph; reach for a custom + decoder when you need a body *format* (non-JSON) or a *type system* the + pydantic/msgspec built-ins don't cover. +2. **The protocol** — the `ResponseDecoder` Protocol verbatim from + `src/httpware/decoders/__init__.py`; `can_decode(model) -> bool` + (first-match dispatch, claim broadly but reject other libraries' native + types, **MUST NOT raise** — runs outside the `DecodeError` wrap, decline by + returning False) and `decode(content: bytes, model) -> T` (raw bytes in; + any exception is auto-wrapped as `DecodeError`, so don't raise it yourself). +3. **How the client resolves a model** — `decoders=[...]` order = preference, + first claimer wins; `decoders=None` default is pydantic-first; + `MissingDecoderError` fires *before* the HTTP call when nothing claims; the + `MissingDecoderError` (no decoder) vs `DecodeError` (decoder ran, payload + bad) distinction and their distinct corrective actions. +4. **Sync, not async** (callout) — one sync protocol shared by `Client` *and* + `AsyncClient`; there is no async `decode`, in contrast to middleware's two + flavors. `decode` runs synchronously after the body is read. +5. **Worked example: a CSV decoder** — `text/csv` bytes → `list[]`. + Chosen because both built-ins are JSON, so the highest-value lesson is that + the seam is raw-bytes-in / typed-object-out and **not** JSON-bound. Stdlib + `csv` only (a reader runs it with zero extra installs), naturally + single-pass. `can_decode` claims `list[]` and rejects everything + else; wired as `decoders=[CsvDecoder(), PydanticDecoder()]`. +6. **A note on claiming the right models** — the `can_decode` discrimination + obligation (claiming too broadly steals models from later decoders in the + list); how an adapter for another type system (e.g. cattrs/attrs) narrows + its claim to its own types; and that the single-pass rule is a *built-in + performance choice*, not a hard protocol obligation — a custom decoder may + go two-pass (`json.loads` → structure) at the cost of one extra allocation. +7. **When NOT to write a decoder** — the built-ins already cover + pydantic/msgspec/dataclasses/primitives; if you only want raw bytes or text, + use `response.content` / `response.text` without `response_model=`. +8. **See also** — `architecture/decoders.md` (Seam B, the formal contract), + the built-in adapters (`decoders/pydantic.py`, `decoders/msgspec.py`) as + reference implementations, the Quick-Start typed-response example. + +Truth home: [`architecture/decoders.md`](../../../../architecture/decoders.md) +— Seam B's contract does not move; on ship, add a cross-link from it to the new +guide. + +## Files + +- `docs/decoders.md` — new guide (the work). +- `mkdocs.yml` — add `- Decoders: decoders.md` after the Middleware nav entry. +- `planning/deferred.md` — remove the G6 entry (closed). + +No `architecture/decoders.md` cross-link: `architecture/middleware.md` does not +link to its `docs/middleware.md` guide either, so adding one only for decoders +would break that symmetry. Seam B's contract is unchanged, so `architecture/` +needs no promotion edit. + +## Verification + +- [ ] Every code block in the guide is runnable as written — the CSV + `can_decode` predicate and `decode` body type-check and execute against + the real `ResponseDecoder` protocol (manually exercised, not a doctest). +- [ ] `uv run mkdocs build --strict` — clean (no broken internal links, nav + resolves). +- [ ] `just lint` — clean (eof-fixer / formatting on the new markdown). +- [ ] Cross-references resolve: links to `architecture/decoders.md`, + `middleware.md`, and `index.md` are valid. diff --git a/planning/deferred.md b/planning/deferred.md index 7ec6272..1e5c21e 100644 --- a/planning/deferred.md +++ b/planning/deferred.md @@ -36,6 +36,4 @@ As of 0.7.0, all planned epics (3, 4, 5, 6) are closed — see the [change Index ### Documentation -- **Custom-`ResponseDecoder` guide** (audit finding G6, [2026-06-13 docs audit](audits/2026-06-13-docs-audit.md)) — the decoder seam (Seam B) is a documented extension point, but unlike middleware it has no "write your own" guide. A short page would show the `can_decode(model: type) -> bool` / `decode(content: bytes, model: type[T]) -> T` protocol, how `decoders=[...]` ordering resolves a model, and a worked third-party-adapter example. Decided alongside the `2026-06-14.01` docs-UX restructure: **defer the guide, and ship no auto API reference / mkdocstrings** (prose carries the signatures). Demand-gated. Revisit trigger: someone asks how to write a custom decoder, a third-party decoder adapter ships, or the `decoders/` protocol surface changes. (`docs/`, `src/httpware/decoders/`) - - **Non-streaming hard response-body cap** (2026-06-14 deep audit, Medium) — for a non-streaming `send()`, httpx2 buffers the whole body before httpware reaches the decode seam, so a true cap needs a streaming-with-capped-accumulator rework of the Seam-A terminal. The current `max_error_body_bytes` guard only applies at `stream()` entry and only when `Content-Length` is declared. Revisit trigger: the Seam-A terminal is next reworked, or a concrete large-response abuse is reported. (`src/httpware/client.py`)