Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 138 additions & 0 deletions docs/decoders.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Writing a custom decoder

`httpware`'s typed-response extension point is the **`ResponseDecoder` protocol**. A decoder turns raw response bytes into a typed object: when you pass `response_model=` to `send` / `send_with_response`, the client walks its decoder list, picks the first one that claims your model, and hands it the body.

The built-in `PydanticDecoder` and `MsgspecDecoder` are themselves implementations of this protocol; nothing about them is privileged. Reach for a custom decoder when you need a body **format** the built-ins don't speak (CSV, XML, MessagePack, a bespoke binary frame) or a **type system** they don't cover (`attrs`, `marshmallow`, your own class hierarchy). If pydantic or msgspec already decodes your model, you don't need one — see [When NOT to write a decoder](#when-not-to-write-a-decoder).

## The protocol

One symbol, exported from `httpware`:

```python
from typing import Protocol, TypeVar, runtime_checkable

T = TypeVar("T")


@runtime_checkable
class ResponseDecoder(Protocol):
def can_decode(self, model: type) -> bool: ...
def decode(self, content: bytes, model: type[T]) -> T: ...
```

Two methods, two distinct jobs:

- **`can_decode(model) -> bool`** — the dispatch predicate. The client walks `decoders=[...]` in order and picks the **first** decoder that returns `True`. Claim every model you can actually handle (broad is correct — list ordering, not narrow predicates, encodes the caller's preference), but **reject another library's native types**: a CSV decoder has no business claiming a `pydantic.BaseModel`. `can_decode` **MUST NOT raise** — it runs at dispatch time, before the HTTP call and *outside* the `DecodeError` wrap that protects `decode`, so an exception here escapes `httpware`'s `ClientError` contract instead of being translated. A decoder that can't decide must return `False` (decline), not raise.
- **`decode(content, model) -> T`** — the decode itself, raw response bytes in, a `model` instance out. Any exception you raise here is caught by the client and wrapped as `httpware.DecodeError` (carrying `response`, `model`, and the `original` exception). You do **not** need to raise `DecodeError` yourself — raise whatever your parser raises and let the seam translate it.

The protocol is `@runtime_checkable` and structural: any object with these two methods satisfies it. You do not subclass anything.

## How the client resolves a model

Both clients take `decoders: Sequence[ResponseDecoder] | None = None`, composed once at `__init__` and frozen for the client's lifetime.

- **Order is preference.** `decoders=[CsvDecoder(), PydanticDecoder()]` asks the CSV decoder first; pydantic only sees models CSV declined. List position is how you disambiguate a shape two decoders could both claim.
- **`decoders=None`** resolves against installed extras — pydantic-first when both are present, either-only when one is, an empty tuple when neither. To *add* a decoder without losing the built-ins, list them explicitly: `decoders=[CsvDecoder(), PydanticDecoder()]`.
- **No claimer is a pre-flight error.** When `response_model=` is set and no decoder claims it, the client raises `MissingDecoderError` **before** sending the request — you find out at wiring time, not after a wasted round-trip. This is distinct from `DecodeError`: `MissingDecoderError` means *nothing handles this model* (fix: install an extra or pass `decoders=[...]`); `DecodeError` means *a decoder ran and the payload was malformed* (fix: the server or the model). See [Errors](errors.md).

## Decoders are sync — for both clients

Unlike middleware, which has separate `AsyncMiddleware` and `Middleware` flavors, there is **one** `ResponseDecoder` protocol, shared by `AsyncClient` and `Client` alike. `decode` is a synchronous method: by the time it runs, the body has already been read off the wire, so decoding is pure CPU work with nothing to await. Write one decoder and pass it to either client.

## Worked example: a CSV decoder

A decoder for `text/csv` endpoints that returns a `list` of dataclass rows. Both built-ins are JSON, so this is the case they can't cover — and it shows the seam's real shape: raw bytes in, typed object out, no JSON anywhere.

```python
import csv
import dataclasses
import io
import typing

from httpware import AsyncClient
from httpware.decoders.pydantic import PydanticDecoder

T = typing.TypeVar("T")


class CsvDecoder:
"""Decode a text/csv body into a list of dataclass rows.

Claims only `list[<dataclass>]`; declines everything else so the JSON
decoders keep their models.
"""

def can_decode(self, model: type) -> bool:
if typing.get_origin(model) is not list:
return False
args = typing.get_args(model)
return len(args) == 1 and dataclasses.is_dataclass(args[0])

def decode(self, content: bytes, model: type[T]) -> T:
(row_type,) = typing.get_args(model)
field_types = {f.name: f.type for f in dataclasses.fields(row_type)}
reader = csv.DictReader(io.StringIO(content.decode("utf-8")))
return [
row_type(**{name: field_types[name](value) for name, value in row.items()})
for row in reader
]
```

`can_decode` is total and never raises: a non-`list` model, a bare `list`, or `list[int]` all fall through to `False`. `decode` coerces each CSV cell with its field's type (CSV values arrive as strings) — a real decoder would handle optionals, dates, and missing columns; this is where your domain logic goes. Wire it ahead of the built-ins so it gets first refusal on `list[...]` models while pydantic still handles everything else:

```python
@dataclasses.dataclass
class Sale:
id: int
amount: float
region: str


async def main() -> None:
async with AsyncClient(
base_url="https://reports.example.com",
decoders=[CsvDecoder(), PydanticDecoder()],
) as client:
sales = await client.send(
client.build_request("GET", "/sales.csv"),
response_model=list[Sale],
)
# sales: list[Sale]
```

The same decoder instance works with a sync `Client(decoders=[CsvDecoder(), PydanticDecoder()])`.

## A note on claiming the right models

`can_decode` is a contract with the *rest of the list*. Claim too broadly and you steal models from decoders behind you; claim too narrowly and your decoder never runs. The rule of thumb: claim exactly the types you natively own, and reject another library's. An adapter for a third-party type system narrows its claim to that system — for example, a [`cattrs`](https://catt.rs)-backed decoder for `attrs` classes:

```python
import json

import attrs


class CattrsDecoder:
def __init__(self, converter): # a configured cattrs.Converter
self._converter = converter

def can_decode(self, model: type) -> bool:
return attrs.has(model) # only attrs classes; everything else declines

def decode(self, content, model):
return self._converter.structure(json.loads(content), model)
```

Note this decoder is **two-pass** (`json.loads`, then `structure`). The built-in adapters deliberately decode in a single bytes-in pass (`TypeAdapter.validate_json`, `msgspec.json.Decoder.decode`) to skip the intermediate `dict` allocation — but that's a *performance choice for the built-ins*, not a protocol obligation. A custom decoder may go two-pass when its underlying library only structures from native Python objects; you pay one extra allocation, nothing more.

## When NOT to write a decoder

- **Your model is JSON.** Dataclasses, `TypedDict`s, primitives, pydantic models, and msgspec `Struct`s are all covered by the built-in `PydanticDecoder` / `MsgspecDecoder`. Install the extra (`httpware[pydantic]` or `httpware[msgspec]`) instead of writing a decoder.
- **You only want raw bytes or text.** Don't pass `response_model=` at all — call `send` (or a verb method) without it and read `response.content` / `response.text` directly. Decoders are for *typed* bodies.
- **The transform is per-call, not per-type.** If the shaping depends on the request rather than the model, it's a [middleware](middleware.md) concern, not a decoder.

## See also

- **[`architecture/decoders.md`](https://github.com/modern-python/httpware/blob/main/architecture/decoders.md) (Seam B)** — the formal protocol contract: dispatch order, the `can_decode` no-raise obligation, the single-pass rule, and the per-instance adapter cache.
- **`src/httpware/decoders/pydantic.py` and `msgspec.py`** — the built-in adapters as reference implementations, including how they memoize a `can_decode` verdict and cache the underlying parser per model.
- **[Quick-Start: typed responses](index.md)** — composing `response_model=` with the default decoder list.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ edit_uri: edit/main/docs/
nav:
- Quick-Start: index.md
- Middleware: middleware.md
- Decoders: decoders.md
- Resilience: resilience.md
- Errors: errors.md
- Testing: testing.md
Expand Down
1 change: 1 addition & 0 deletions planning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ _None._

### Archived (shipped)

- **[custom-decoder-guide](changes/archive/2026-06-15.01-custom-decoder-guide/change.md)** (#67, 2026-06-15) — Docs: a "write your own `ResponseDecoder`" guide for Seam B, mirroring `docs/middleware.md`. Closed deferred item G6.
- **[audit-doc-fixes](changes/archive/2026-06-14.06-audit-doc-fixes/change.md)** (#66, 2026-06-14) — Closed the [deep-audit](audits/2026-06-14-deep-audit.md) doc-accuracy findings: `Client.stream()` docs, terminal-call attribution, the four auto-raise sites, the pydantic upper bound, and root import paths.
- **[audit-test-quality](changes/archive/2026-06-14.05-audit-test-quality/change.md)** (#65, 2026-06-14) — Closed 11 [deep-audit](audits/2026-06-14-deep-audit.md) test-quality findings: sync-terminal + CookieConflict coverage, the `StatusError.__init__` invariant, missing status constructions, sync mirrors, typing overloads, a deterministic bulkhead barrier, a pinned budget clock, an observability assertion, and the `TimeoutError` circuit trigger.
- **[audit-correctness](changes/archive/2026-06-14.04-audit-correctness/change.md)** (#64, 2026-06-14) — Closed 8 [deep-audit](audits/2026-06-14-deep-audit.md) correctness + public-API findings: RetryBudget token ordering, two `OverflowError` crashes, the redaction triple-slash, the msgspec guard, streaming-body symmetry, the RetryBudget docstring caveat, and `middleware/__all__`.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
status: shipped
date: 2026-06-15
slug: custom-decoder-guide
supersedes: null
superseded_by: null
pr: 67
outcome: Shipped docs/decoders.md (the Seam B "write your own ResponseDecoder" guide); closed deferred item G6.
---

# Change: Add a "Writing a custom decoder" guide

**Lane:** lightweight — docs-only. New page + one-line nav edit + an
`architecture/decoders.md` cross-link on ship. No source change, no public-API
change. Mirrors the existing `docs/middleware.md` extension-seam guide.

Closes deferred item **G6** (custom-`ResponseDecoder` guide), the
[2026-06-13 docs audit](../../../audits/2026-06-13-docs-audit.md) finding parked
in [`deferred.md`](../../../deferred.md). Revisit trigger now met: the guide was
explicitly requested.

## Goal

Seam B (`ResponseDecoder`) is a documented extension point, but unlike
middleware it has no "write your own" guide. Add `docs/decoders.md` showing the
`can_decode` / `decode` protocol, how `decoders=[...]` ordering resolves a
model, and a worked custom-decoder example. Prose carries the signatures — no
mkdocstrings / auto API reference (per the `2026-06-14.01` docs-UX decision).

## Approach

A prose + code-block page modeled on `docs/middleware.md`, the sibling
"write your own" guide for Seam A. Sections, scaled to complexity:

1. **Intro / when to write one** — Seam B in a paragraph; reach for a custom
decoder when you need a body *format* (non-JSON) or a *type system* the
pydantic/msgspec built-ins don't cover.
2. **The protocol** — the `ResponseDecoder` Protocol verbatim from
`src/httpware/decoders/__init__.py`; `can_decode(model) -> bool`
(first-match dispatch, claim broadly but reject other libraries' native
types, **MUST NOT raise** — runs outside the `DecodeError` wrap, decline by
returning False) and `decode(content: bytes, model) -> T` (raw bytes in;
any exception is auto-wrapped as `DecodeError`, so don't raise it yourself).
3. **How the client resolves a model** — `decoders=[...]` order = preference,
first claimer wins; `decoders=None` default is pydantic-first;
`MissingDecoderError` fires *before* the HTTP call when nothing claims; the
`MissingDecoderError` (no decoder) vs `DecodeError` (decoder ran, payload
bad) distinction and their distinct corrective actions.
4. **Sync, not async** (callout) — one sync protocol shared by `Client` *and*
`AsyncClient`; there is no async `decode`, in contrast to middleware's two
flavors. `decode` runs synchronously after the body is read.
5. **Worked example: a CSV decoder** — `text/csv` bytes → `list[<dataclass>]`.
Chosen because both built-ins are JSON, so the highest-value lesson is that
the seam is raw-bytes-in / typed-object-out and **not** JSON-bound. Stdlib
`csv` only (a reader runs it with zero extra installs), naturally
single-pass. `can_decode` claims `list[<dataclass>]` and rejects everything
else; wired as `decoders=[CsvDecoder(), PydanticDecoder()]`.
6. **A note on claiming the right models** — the `can_decode` discrimination
obligation (claiming too broadly steals models from later decoders in the
list); how an adapter for another type system (e.g. cattrs/attrs) narrows
its claim to its own types; and that the single-pass rule is a *built-in
performance choice*, not a hard protocol obligation — a custom decoder may
go two-pass (`json.loads` → structure) at the cost of one extra allocation.
7. **When NOT to write a decoder** — the built-ins already cover
pydantic/msgspec/dataclasses/primitives; if you only want raw bytes or text,
use `response.content` / `response.text` without `response_model=`.
8. **See also** — `architecture/decoders.md` (Seam B, the formal contract),
the built-in adapters (`decoders/pydantic.py`, `decoders/msgspec.py`) as
reference implementations, the Quick-Start typed-response example.

Truth home: [`architecture/decoders.md`](../../../../architecture/decoders.md)
— Seam B's contract does not move; on ship, add a cross-link from it to the new
guide.

## Files

- `docs/decoders.md` — new guide (the work).
- `mkdocs.yml` — add `- Decoders: decoders.md` after the Middleware nav entry.
- `planning/deferred.md` — remove the G6 entry (closed).

No `architecture/decoders.md` cross-link: `architecture/middleware.md` does not
link to its `docs/middleware.md` guide either, so adding one only for decoders
would break that symmetry. Seam B's contract is unchanged, so `architecture/`
needs no promotion edit.

## Verification

- [ ] Every code block in the guide is runnable as written — the CSV
`can_decode` predicate and `decode` body type-check and execute against
the real `ResponseDecoder` protocol (manually exercised, not a doctest).
- [ ] `uv run mkdocs build --strict` — clean (no broken internal links, nav
resolves).
- [ ] `just lint` — clean (eof-fixer / formatting on the new markdown).
- [ ] Cross-references resolve: links to `architecture/decoders.md`,
`middleware.md`, and `index.md` are valid.
2 changes: 0 additions & 2 deletions planning/deferred.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,4 @@ As of 0.7.0, all planned epics (3, 4, 5, 6) are closed — see the [change Index

### Documentation

- **Custom-`ResponseDecoder` guide** (audit finding G6, [2026-06-13 docs audit](audits/2026-06-13-docs-audit.md)) — the decoder seam (Seam B) is a documented extension point, but unlike middleware it has no "write your own" guide. A short page would show the `can_decode(model: type) -> bool` / `decode(content: bytes, model: type[T]) -> T` protocol, how `decoders=[...]` ordering resolves a model, and a worked third-party-adapter example. Decided alongside the `2026-06-14.01` docs-UX restructure: **defer the guide, and ship no auto API reference / mkdocstrings** (prose carries the signatures). Demand-gated. Revisit trigger: someone asks how to write a custom decoder, a third-party decoder adapter ships, or the `decoders/` protocol surface changes. (`docs/`, `src/httpware/decoders/`)

- **Non-streaming hard response-body cap** (2026-06-14 deep audit, Medium) — for a non-streaming `send()`, httpx2 buffers the whole body before httpware reaches the decode seam, so a true cap needs a streaming-with-capped-accumulator rework of the Seam-A terminal. The current `max_error_body_bytes` guard only applies at `stream()` entry and only when `Content-Length` is declared. Revisit trigger: the Seam-A terminal is next reworked, or a concrete large-response abuse is reported. (`src/httpware/client.py`)