|
| 1 | +# Writing a custom decoder |
| 2 | + |
| 3 | +`httpware`'s typed-response extension point is the **`ResponseDecoder` protocol**. A decoder turns raw response bytes into a typed object: when you pass `response_model=` to `send` / `send_with_response`, the client walks its decoder list, picks the first one that claims your model, and hands it the body. |
| 4 | + |
| 5 | +The built-in `PydanticDecoder` and `MsgspecDecoder` are themselves implementations of this protocol; nothing about them is privileged. Reach for a custom decoder when you need a body **format** the built-ins don't speak (CSV, XML, MessagePack, a bespoke binary frame) or a **type system** they don't cover (`attrs`, `marshmallow`, your own class hierarchy). If pydantic or msgspec already decodes your model, you don't need one — see [When NOT to write a decoder](#when-not-to-write-a-decoder). |
| 6 | + |
| 7 | +## The protocol |
| 8 | + |
| 9 | +One symbol, exported from `httpware`: |
| 10 | + |
| 11 | +```python |
| 12 | +from typing import Protocol, TypeVar, runtime_checkable |
| 13 | + |
| 14 | +T = TypeVar("T") |
| 15 | + |
| 16 | + |
| 17 | +@runtime_checkable |
| 18 | +class ResponseDecoder(Protocol): |
| 19 | + def can_decode(self, model: type) -> bool: ... |
| 20 | + def decode(self, content: bytes, model: type[T]) -> T: ... |
| 21 | +``` |
| 22 | + |
| 23 | +Two methods, two distinct jobs: |
| 24 | + |
| 25 | +- **`can_decode(model) -> bool`** — the dispatch predicate. The client walks `decoders=[...]` in order and picks the **first** decoder that returns `True`. Claim every model you can actually handle (broad is correct — list ordering, not narrow predicates, encodes the caller's preference), but **reject another library's native types**: a CSV decoder has no business claiming a `pydantic.BaseModel`. `can_decode` **MUST NOT raise** — it runs at dispatch time, before the HTTP call and *outside* the `DecodeError` wrap that protects `decode`, so an exception here escapes `httpware`'s `ClientError` contract instead of being translated. A decoder that can't decide must return `False` (decline), not raise. |
| 26 | +- **`decode(content, model) -> T`** — the decode itself, raw response bytes in, a `model` instance out. Any exception you raise here is caught by the client and wrapped as `httpware.DecodeError` (carrying `response`, `model`, and the `original` exception). You do **not** need to raise `DecodeError` yourself — raise whatever your parser raises and let the seam translate it. |
| 27 | + |
| 28 | +The protocol is `@runtime_checkable` and structural: any object with these two methods satisfies it. You do not subclass anything. |
| 29 | + |
| 30 | +## How the client resolves a model |
| 31 | + |
| 32 | +Both clients take `decoders: Sequence[ResponseDecoder] | None = None`, composed once at `__init__` and frozen for the client's lifetime. |
| 33 | + |
| 34 | +- **Order is preference.** `decoders=[CsvDecoder(), PydanticDecoder()]` asks the CSV decoder first; pydantic only sees models CSV declined. List position is how you disambiguate a shape two decoders could both claim. |
| 35 | +- **`decoders=None`** resolves against installed extras — pydantic-first when both are present, either-only when one is, an empty tuple when neither. To *add* a decoder without losing the built-ins, list them explicitly: `decoders=[CsvDecoder(), PydanticDecoder()]`. |
| 36 | +- **No claimer is a pre-flight error.** When `response_model=` is set and no decoder claims it, the client raises `MissingDecoderError` **before** sending the request — you find out at wiring time, not after a wasted round-trip. This is distinct from `DecodeError`: `MissingDecoderError` means *nothing handles this model* (fix: install an extra or pass `decoders=[...]`); `DecodeError` means *a decoder ran and the payload was malformed* (fix: the server or the model). See [Errors](errors.md). |
| 37 | + |
| 38 | +## Decoders are sync — for both clients |
| 39 | + |
| 40 | +Unlike middleware, which has separate `AsyncMiddleware` and `Middleware` flavors, there is **one** `ResponseDecoder` protocol, shared by `AsyncClient` and `Client` alike. `decode` is a synchronous method: by the time it runs, the body has already been read off the wire, so decoding is pure CPU work with nothing to await. Write one decoder and pass it to either client. |
| 41 | + |
| 42 | +## Worked example: a CSV decoder |
| 43 | + |
| 44 | +A decoder for `text/csv` endpoints that returns a `list` of dataclass rows. Both built-ins are JSON, so this is the case they can't cover — and it shows the seam's real shape: raw bytes in, typed object out, no JSON anywhere. |
| 45 | + |
| 46 | +```python |
| 47 | +import csv |
| 48 | +import dataclasses |
| 49 | +import io |
| 50 | +import typing |
| 51 | + |
| 52 | +from httpware import AsyncClient |
| 53 | +from httpware.decoders.pydantic import PydanticDecoder |
| 54 | + |
| 55 | +T = typing.TypeVar("T") |
| 56 | + |
| 57 | + |
| 58 | +class CsvDecoder: |
| 59 | + """Decode a text/csv body into a list of dataclass rows. |
| 60 | +
|
| 61 | + Claims only `list[<dataclass>]`; declines everything else so the JSON |
| 62 | + decoders keep their models. |
| 63 | + """ |
| 64 | + |
| 65 | + def can_decode(self, model: type) -> bool: |
| 66 | + if typing.get_origin(model) is not list: |
| 67 | + return False |
| 68 | + args = typing.get_args(model) |
| 69 | + return len(args) == 1 and dataclasses.is_dataclass(args[0]) |
| 70 | + |
| 71 | + def decode(self, content: bytes, model: type[T]) -> T: |
| 72 | + (row_type,) = typing.get_args(model) |
| 73 | + field_types = {f.name: f.type for f in dataclasses.fields(row_type)} |
| 74 | + reader = csv.DictReader(io.StringIO(content.decode("utf-8"))) |
| 75 | + return [ |
| 76 | + row_type(**{name: field_types[name](value) for name, value in row.items()}) |
| 77 | + for row in reader |
| 78 | + ] |
| 79 | +``` |
| 80 | + |
| 81 | +`can_decode` is total and never raises: a non-`list` model, a bare `list`, or `list[int]` all fall through to `False`. `decode` coerces each CSV cell with its field's type (CSV values arrive as strings) — a real decoder would handle optionals, dates, and missing columns; this is where your domain logic goes. Wire it ahead of the built-ins so it gets first refusal on `list[...]` models while pydantic still handles everything else: |
| 82 | + |
| 83 | +```python |
| 84 | +@dataclasses.dataclass |
| 85 | +class Sale: |
| 86 | + id: int |
| 87 | + amount: float |
| 88 | + region: str |
| 89 | + |
| 90 | + |
| 91 | +async def main() -> None: |
| 92 | + async with AsyncClient( |
| 93 | + base_url="https://reports.example.com", |
| 94 | + decoders=[CsvDecoder(), PydanticDecoder()], |
| 95 | + ) as client: |
| 96 | + sales = await client.send( |
| 97 | + client.build_request("GET", "/sales.csv"), |
| 98 | + response_model=list[Sale], |
| 99 | + ) |
| 100 | + # sales: list[Sale] |
| 101 | +``` |
| 102 | + |
| 103 | +The same decoder instance works with a sync `Client(decoders=[CsvDecoder(), PydanticDecoder()])`. |
| 104 | + |
| 105 | +## A note on claiming the right models |
| 106 | + |
| 107 | +`can_decode` is a contract with the *rest of the list*. Claim too broadly and you steal models from decoders behind you; claim too narrowly and your decoder never runs. The rule of thumb: claim exactly the types you natively own, and reject another library's. An adapter for a third-party type system narrows its claim to that system — for example, a [`cattrs`](https://catt.rs)-backed decoder for `attrs` classes: |
| 108 | + |
| 109 | +```python |
| 110 | +import json |
| 111 | + |
| 112 | +import attrs |
| 113 | + |
| 114 | + |
| 115 | +class CattrsDecoder: |
| 116 | + def __init__(self, converter): # a configured cattrs.Converter |
| 117 | + self._converter = converter |
| 118 | + |
| 119 | + def can_decode(self, model: type) -> bool: |
| 120 | + return attrs.has(model) # only attrs classes; everything else declines |
| 121 | + |
| 122 | + def decode(self, content, model): |
| 123 | + return self._converter.structure(json.loads(content), model) |
| 124 | +``` |
| 125 | + |
| 126 | +Note this decoder is **two-pass** (`json.loads`, then `structure`). The built-in adapters deliberately decode in a single bytes-in pass (`TypeAdapter.validate_json`, `msgspec.json.Decoder.decode`) to skip the intermediate `dict` allocation — but that's a *performance choice for the built-ins*, not a protocol obligation. A custom decoder may go two-pass when its underlying library only structures from native Python objects; you pay one extra allocation, nothing more. |
| 127 | + |
| 128 | +## When NOT to write a decoder |
| 129 | + |
| 130 | +- **Your model is JSON.** Dataclasses, `TypedDict`s, primitives, pydantic models, and msgspec `Struct`s are all covered by the built-in `PydanticDecoder` / `MsgspecDecoder`. Install the extra (`httpware[pydantic]` or `httpware[msgspec]`) instead of writing a decoder. |
| 131 | +- **You only want raw bytes or text.** Don't pass `response_model=` at all — call `send` (or a verb method) without it and read `response.content` / `response.text` directly. Decoders are for *typed* bodies. |
| 132 | +- **The transform is per-call, not per-type.** If the shaping depends on the request rather than the model, it's a [middleware](middleware.md) concern, not a decoder. |
| 133 | + |
| 134 | +## See also |
| 135 | + |
| 136 | +- **[`architecture/decoders.md`](https://github.com/modern-python/httpware/blob/main/architecture/decoders.md) (Seam B)** — the formal protocol contract: dispatch order, the `can_decode` no-raise obligation, the single-pass rule, and the per-instance adapter cache. |
| 137 | +- **`src/httpware/decoders/pydantic.py` and `msgspec.py`** — the built-in adapters as reference implementations, including how they memoize a `can_decode` verdict and cache the underlying parser per model. |
| 138 | +- **[Quick-Start: typed responses](index.md)** — composing `response_model=` with the default decoder list. |
0 commit comments