Skip to content

Releases: modern-python/faststream-outbox

0.10.3 — schema validation matches the lease CHECK by predicate, not name

17 Jun 05:30
7015ade

Choose a tag to compare

faststream-outbox 0.10.3 — schema validation matches the lease CHECK by predicate, not name

Patch release. One bug fix to validate_schema()'s CHECK-constraint probe,
correcting the 0.10.2 approach. No public-API change. The only change to the
installed package is how the probe identifies the lease CHECK; everything else is
identical to 0.10.2.

Fixes

  • validate_schema() no longer false-fails when the lease CHECK was created
    under a literal name while the MetaData carries a ck naming_convention.

    0.10.2 made the probe read the expected name off the Table object, which
    under a ck convention resolves to the doubled ck_<table>_<table>_lease_ck.
    But a hand-written migration —
    op.create_check_constraint('<table>_lease_ck', ...) — creates the literal
    name verbatim, because Alembic op functions don't apply target_metadata's
    convention. So the probe demanded a name the migration never produces and
    raised a spurious

    Outbox schema mismatch: missing CHECK constraint 'ck_<table>_<table>_lease_ck' …

    on a valid schema. The live constraint name is simply not predictable from
    the package side — it depends on how the migration was authored
    (MetaData.create_all / convention-aware autogenerate → doubled name; a
    hand-written op.create_check_constraint → literal name).

    _validate_check_constraints_sync now matches the lease CHECK by predicate,
    not name
    : it normalizes every live CHECK's definition and passes if one
    enforces (acquired_token IS NULL) = (acquired_at IS NULL) under any name.
    When none does (including a drifted predicate — that just means the correct one
    is absent), it reports

    missing CHECK constraint enforcing 'acquired_token is null = acquired_at is null' (the lease invariant; name it e.g. <table>_lease_ck)

    This is strictly more correct: a CHECK's name is irrelevant to whether it
    enforces the invariant. _resolve_check_constraint_name (added in 0.10.2) is
    removed.

Compatibility

validate_schema() is opt-in (you call it from a health check / CI gate; it
is never run by broker.start()). If you don't call it, nothing changes.

If you do: any schema whose lease CHECK enforces the correct predicate now
validates regardless of the constraint's name — with no convention, with a ck
convention and the doubled name, or with a ck convention and a literal-named
hand-written migration (the case 0.10.2 broke). Schemas that genuinely lack the
invariant still fail, now with a predicate-describing message instead of a
guessed name.

The "wrong predicate" drift case now reports as "missing CHECK constraint
enforcing …" rather than "has wrong predicate" — a drifted CHECK doesn't enforce
the invariant, so it reads as the correct one being absent. The remediation
pointer (#fixing-drift-autogenerate-cant-see) still fires.

No other behavior change — producers, subscribers, the lease / terminal-write
paths, timers, the index/uniqueness probes, and the dlq_table=None path are all
identical to 0.10.2.

Docs

  • docs/operations/alembic.md — replaced the "introspect the rendered name"
    recipe with "the CHECK name doesn't matter; match the predicate." The literal
    op.create_check_constraint('outbox_lease_ck', ...) recipe is now correct even
    under a convention.
  • architecture/dlq.md — recorded the match-by-predicate invariant and why
    name-prediction was reverted.

Touched surface

  • faststream_outbox/client.py_validate_check_constraints_sync matches by
    predicate; _resolve_check_constraint_name and the now-unused CheckConstraint
    import removed. Only package code change.
  • docs/operations/alembic.md, architecture/dlq.md — recipe + invariant.
  • tests/test_unit.py, tests/test_integration.py — regression coverage
    (predicate matched under convention-doubled name, under literal name, missing
    describes the predicate, drifted-reads-as-missing, and an end-to-end Postgres
    pass for convention metadata + a literally-named constraint).

See also

  • Corrects the 0.10.2 approach
    (#102).

0.10.2 — schema validation honors naming_convention

16 Jun 19:34
b95225a

Choose a tag to compare

faststream-outbox 0.10.2 — schema validation honors naming_convention

Patch release. One bug fix to validate_schema()'s CHECK-constraint probe.
No public-API change. The only change to the installed package is which
constraint name the probe looks up; everything else is identical to 0.10.1.

Fixes

  • validate_schema() no longer false-fails under a SQLAlchemy ck
    naming_convention.
    A MetaData carrying a naming_convention with a ck
    key re-templates the package's explicitly-named lease CheckConstraint — the
    given name becomes the %(constraint_name)s token, so the live constraint is
    named e.g. ck_<table>_<table>_lease_ck, not <table>_lease_ck. The probe
    hard-coded the literal <table>_lease_ck, never found the re-templated name,
    and raised a spurious

    Outbox schema mismatch: missing CHECK constraint '<table>_lease_ck' …

    on a perfectly valid schema. _validate_check_constraints_sync now reads the
    expected name off the Table object (identifying the lease constraint by
    its normalized predicate and using its convention-resolved .name), so the
    expectation always matches what SQLAlchemy / Alembic emit from your metadata —
    convention or not. The explicitly-named indexes were never affected: the
    ix / uq convention keys only re-template auto-named indexes.

Compatibility

validate_schema() is opt-in (you call it from a health check / CI gate; it
is never run by broker.start()). If you don't call it, nothing changes.

If you do, and you use no naming_convention, behavior is identical to
0.10.1 — the probe still resolves to the literal <table>_lease_ck. If you use a
ck convention, validation that previously raised spuriously now passes,
provided the constraint in your DB carries the convention-rendered name (which it
does when the schema was created via MetaData.create_all or a
convention-aware Alembic migration). Hand-written migrations must create the
constraint under that same rendered name — see the updated
naming_convention guidance
in the Alembic docs.

No other behavior change — producers, subscribers, the lease / terminal-write
paths, timers, the index/uniqueness probes, and the dlq_table=None path are
all identical to 0.10.1.

Docs

  • docs/operations/alembic.md — replaced the misleading
    "wrap each name in op.f('outbox_lease_ck')" caveat with an
    introspect-the-rendered-name recipe (the probe now expects the
    convention-resolved name, not the literal).
  • architecture/dlq.md — recorded the convention-awareness invariant in the
    validate_schema() mechanics section.

Touched surface

  • faststream_outbox/client.py — new _resolve_check_constraint_name +
    _validate_check_constraints_sync reads the expected CHECK name off the
    Table. Only package code change.
  • docs/operations/alembic.md, architecture/dlq.md — caveat fix + invariant.
  • tests/test_unit.py, tests/test_integration.py — regression coverage
    (convention-resolved name honored, missing-under-convention, literal-name
    fallback, and an end-to-end Postgres pass under a ck convention).

See also

0.10.1 — actionable schema-drift errors

16 Jun 16:53
b784b80

Choose a tag to compare

Patch release. One opt-in diagnostic refinement plus docs/branding. No public-API change and no breaking change beyond 0.10.0's. The only change to the installed package is validate_schema()'s error text.

Diagnostics

  • validate_schema() now tells you how to fix Alembic-blind drift (#99). A missing/altered <table>_lease_ck CHECK and a drifted partial-index predicate cannot be remediated by alembic revision --autogenerate (no check-constraint comparator; the index comparator ignores postgresql_where). When one of those probes fires, the raised RuntimeError now appends a pointer to the hand-written-migration recipe. Autogenerate-fixable drift (columns, plain indexes, DLQ) gets no pointer.

Compatibility

validate_schema() is opt-in; if you don't call it, nothing changes. If you do, the error gains a trailing pointer block but the "Outbox schema mismatch: " prefix and per-error strings are unchanged, so existing greps / match= assertions still hold. No other behavior change.

Docs

  • New "Fixing drift autogenerate can't see" section in docs/operations/alembic.md + cross-link (#99).
  • Org logo, favicon, brand palette (#100).
  • architecture/ deep-dive refresh for 0.10.0 surfaces (#98).

Full notes: planning/releases/0.10.1.md. PRs: #99, #100, #98.

0.10.0 — pass-3 audit closure: a High fix, two features, a hardened tail

14 Jun 14:44
f0f4d6c

Choose a tag to compare

Minor release. Full resolution of the 2026-06-14 pass-3 deep audit — one High correctness fix, two additive features, a cluster of robustness/validation fixes, and a large test-hardening + docs sweep. Backward-compatible by default; three opt-in/behavior refinements noted below.

New features

  • OutboxBroker(..., last_exception_renderer=...) — opt-in Callable[[BaseException], str | None] to redact or drop the DLQ last_exception (default keeps repr). For deployments whose exceptions can embed payloads/PII/credentials. Also on the FastAPI OutboxRouter. (#93)
  • FastAPI OutboxRouter forwards dlq_table + metrics_recorder — DLQ archival and the recorder-metrics seam were previously unreachable for FastAPI users. (#88)
  • drain_timeout observability — a stop() drain exceeding graceful_timeout now emits a WARNING + a metric (faststream_outbox_drain_timeout_total / messaging.outbox.drain_timeout). (#92, #96)

Bug fixes

  • [High] propagate_inbound_headers=True no longer poisons a successful relay: a chained OutboxResponse no longer inherits the inbound content-type/correlation_id, which previously raised in _encode_payload and nacked the successful inbound row to retry-exhaustion. (#85)
  • Completed the eager-validation fix: OutboxResponse and empty publish_batch now reject a bad queue/session at the call site (one shared _validate_publish_args) instead of masquerading as a handler failure at dispatch. (#86)
  • fetch_unprocessed(limit<1) rejected; ping() bounded by a timeout. (#89)

Robustness / correctness

  • validate_schema() flags a non-unique timer_id_uq (indisunique). (#92)
  • 63-byte identifier guard also enforced in OutboxClient.__init__. (#92)
  • OutboxBroker.stop() sets running=False before the subscriber-stop gather; reconnect backoff measures healthy time from a live connection. (#92)

Behavior notes (read before upgrading in place)

  1. propagate_inbound_headers=True: a chained OutboxResponse no longer receives the inbound content-type/correlation_id (foreign-broker relays unaffected). (#85)
  2. OutboxResponse / empty publish_batch validate eagerly — a bad queue/session raises earlier (at the call site). (#86)
  3. validate_schema() is stricter (opt-in): raises on a non-unique timer_id_uq. No-op if you don't call it. (#92)

Hot paths, lease/terminal-write invariants, and the dlq_table=None path are otherwise unchanged.

Full notes: planning/releases/0.10.0.md. Disposition ledger: planning/audits/2026-06-14-deep-audit-pass3-findings.md. PRs: #85#97.

0.9.1 — suspected-findings fixes + diagnostics cleanup

13 Jun 10:24
04a93af

Choose a tag to compare

faststream-outbox 0.9.1 — suspected-findings fixes + diagnostics cleanup

Patch release. Closes the five "suspected" findings from the 2026-06-12 audit (4 fixed, 1 already resolved in 0.9.0) plus three diagnostics / test-broker cleanups. No new features and no new breaking changes beyond 0.9.0's. One opt-in behavior change worth a look before upgrading: validate_schema() now catches partial-index drift it previously missed (see Behavior change).

Robustness / correctness

  • S1 — bounded the LISTEN-connection teardown close. On reconnect/shutdown, a graceful close() of the raw LISTEN connection ran unbounded — on the same half-dead socket the health probe may have just flagged, it could block on the kernel keepalive and wedge the fetch loop. It's now wait_for(close, _LISTEN_CLOSE_TIMEOUT) with an immediate terminate() fallback, best-effort so teardown never raises.
  • S2 — validate_schema() detects partial-index predicate drift. Alembic's autogenerate diff compares index columns + uniqueness but ignores postgresql_where, so a {table}_timer_id_uq built with the wrong predicate (WHERE timer_id IS NULL) — or as a plain non-partial UNIQUE (queue, timer_id) — passed validation, then broke the producer's ON CONFLICT arbiter inference at publish time. validate_schema() now probes the live partial-index predicates (pg_get_expr(indpred, …)) for all three indexes and flags both a wrong predicate and a present-but-non-partial index.
  • S4 — import faststream_outbox is resilient to upstream module moves. The faststream...try_it_out import (best-effort ASGI registry wiring) sat outside the try/except meant to tolerate upstream rename/removal, so a module move would have broken import faststream_outbox entirely. The import is now inside the guard.
  • S3 — already resolved in 0.9.0 (recorded for completeness). The processed_total lease-lost double-count was closed by 0.9.0's P17: outcome metrics emit only after a successful flush, so a lease-lost row emits lease_lost instead of a paired acked/nacked.

Diagnostics / test broker

  • S5 — sync-mode batch publish mirrors production ordering. TestOutboxBroker (sync mode) used to dispatch each handler mid-feed — a handler could observe a half-inserted batch (impossible in production) — and emit published after the handlers ran. It now inserts the whole batch → emits published → then dispatches, matching the production order (atomic batch INSERTpublished → subscriber fetch).
  • P27 — subscriber-misconfig warnings point at your call site. The UserWarnings from @broker.subscriber(...) / @router.subscriber(...) misconfigurations used a static stacklevel that was wrong through the FastAPI-router path (it landed on a faststream-internal frame). They now use skip_file_prefixes (3.12+) and are attributed to your decorator line on both the direct and router paths.
  • P29 / P30 — test-broker internals. The four fake publish paths were deduplicated behind two shared helpers (no behavior change). And loop-mode TestOutboxBroker now wakes the fetch loop on feed/publish via the subscriber's _notify_event, mirroring production NOTIFY — so loop-mode tests no longer need a tight min_fetch_interval to be responsive.

Behavior change: validate_schema() is stricter

validate_schema() is opt-in (you call it from a health check / startup hook — it isn't run by broker.start()). If you call it and your outbox table's partial indexes were hand-built or migrated with a predicate that differs from what make_outbox_table declares — most importantly a timer_id_uq that isn't … WHERE timer_id IS NOT NULL — it will now raise where 0.9.0 passed. That's the point: such an index silently breaks publish(..., timer_id=…)'s ON CONFLICT at runtime, and this surfaces it at validation time. The fix is to recreate the index with the correct predicate (regenerate your Alembic migration). If you don't call validate_schema(), nothing changes.

No other behavior change. Producers, subscribers, the lease/terminal-write paths, and the dlq_table=None path are all unchanged.

Touched surface

  • faststream_outbox/subscriber/usecase.py — bounded LISTEN close (S1).
  • faststream_outbox/client.py — partial-index predicate probe in validate_schema (S2).
  • faststream_outbox/__init__.py — guarded try_it_out import (S4).
  • faststream_outbox/subscriber/factory.pyskip_file_prefixes warning attribution (P27).
  • faststream_outbox/testing.py — sync-batch ordering (S5), fake-publish dedup (P29), loop-mode NOTIFY (P30).

See also

0.9.0 — code-audit hardening

13 Jun 09:20
a9013e5

Choose a tag to compare

faststream-outbox 0.9.0 — code-audit hardening

Hardening release from a full code audit (2026-06-12). No new features. 16 confirmed bugs fixed (several silent data-loss paths), input validation tightened, a schema CHECK constraint and a DLQ column added, and metric accuracy corrected. Bumped minor, not patch: the new validation rejects some calls that previously succeeded, and the schema gains a constraint + column that need a migration.

Two operator actions before upgrading — see Migration:

  1. Generate and apply a new Alembic migration (a CHECK constraint on the outbox table; a timer_id column on the DLQ table if you use one).
  2. Review any alerting keyed off the old published{status="error"} / acked / nacked_* metric semantics.

Data-loss bugs fixed (delivery-behavior change)

The most important fixes. Each previously deleted a row that should have been retried, or dropped a publish silently.

  • B5 / B6 / B7 — the reject-fallback trap. Three paths landed a failed delivery in the destructive "manual-reject" fallback that DELETEs the row:
    • B5AckPolicy.MANUAL + a handler exception (e.g. a DB blip before msg.ack()) deleted the row. It now honors the retry strategy (nack), matching every native FastStream broker's "unacked failure → redeliver" semantics.
    • B6raise NackMessage(delay=…) / AckMessage(**opts) (FastStream's documented idiom) raised TypeError inside the ack middleware (our ack/nack/reject took no kwargs), which was swallowed and fell through to reject → row deleted even under the default NACK_ON_ERROR. ack/nack/reject now accept-and-ignore **options.
    • B7 — any exception from a retry strategy turned the nack into a terminal reject. An in-box trigger shipped: ExponentialRetry(max_attempts=None) raised OverflowError (2.0 ** 1024) after ~3.5 days of a persistently-failing row. The strategy call is now wrapped (a raising strategy degrades to retry_terminal, logged), and ExponentialRetry clamps the exponent.
  • B8 — publish_batch dropped a leading None body. publish_batch(None, x) inserted 1 row; publish_batch(None) inserted nothing, no error, no metric (upstream's batch_bodies excludes body is None, while publish(None) inserts b""). OutboxPublishCommand now keeps every positional body.
  • B10 — the DLQ ignored Table.schema. With MetaData(schema="app") + a DLQ, every terminal failure hit UndefinedTable (poison rows retried forever, the outbox grew) or silently wrote to a same-named search_path table. The DLQ CTE and validate_schema() are now schema-qualified (format_table / MetaData(schema=…)).

Liveness / health

  • B1–B4 — fetch-loop fixes. No more drain-window connection-churn storm / test-broker livelock (the fetch loop now halts on drain); start() resets the drain flag so a stopped-then-started subscriber fetches again; reconnect backoff resets after sustained uptime (no lockstep reconnect herd); the LISTEN connection is closed when add_listener fails or is cancelled (was one leaked socket per reconnect cycle).
  • B11 / B12 — ping(). Now walks the subscribers property (so router-registered subscribers — the FastAPI pattern — are health-checked) and honors its timeout via anyio.move_on_after (was accepted-and-ignored; could hang for minutes on a black-holed socket — the exact partition ping exists to detect).
  • B13 — missing-extra ImportErrors are reachable. Importing faststream_outbox.metrics.prometheus / the native middleware without the extra now raises the friendly "install …[prometheus]" message instead of a raw ModuleNotFoundError from an upstream frame.

Observability accuracy

  • B9 — the Prometheus in-process gauge no longer goes negative. The max_deliveries terminal emitted nacked_terminal with no preceding dispatched; the adapter dec'd unconditionally. The .dec() is now gated on duration_seconds (present only for post-dispatched terminals).
  • P17 — outcome metrics emit only after the flush lands. A lease-lost row (its lease reclaimed by a newer fetch) used to emit acked/nacked_* and get redelivered → counted twice. It now emits lease_lost instead (no paired ack/nack).
  • P28 — published{status="error"} fires at count=0. The error-status series — the one dashboards alert on — was gated behind if count > 0 and could never increment. It now increments once per failed publish.
  • P34 — dlq_written omits exception_type when there is no exception (e.g. max_deliveries) instead of emitting it as None, matching the nacked_terminal convention.

Breaking changes

Stricter validation — rejects input that previously "worked"

These now raise at the call/decoration site instead of failing opaquely (or silently) later:

Improvement Now rejects
P2 an explicit correlation_id= that conflicts with headers["correlation_id"] (was silently dropped; the kwarg now wins when there's no conflict).
P4 timer_id / correlation_id on a batch publish (multiple bodies) — they were silently dropped.
P5 a queue that is empty, non-str, or longer than the 255-char column.
P10 delete_with_lease(dlq_payload=…) on a client with no dlq_table (was a silent plain DELETE — lost audit data).
P12 non-positive min_fetch_interval / max_fetch_interval / lease_ttl_seconds (would busy-poll / instantly expire the lease).
P23 invalid retry-strategy knobs (jitter_factor outside [0, 2], non-positive delays/multiplier, max_attempts < 1, non-positive caps).

If you were (knowingly or not) relying on any of these, the upgrade trips immediately with a clear error.

Schema — migration required

  • P8 — a CHECK ((acquired_token IS NULL) = (acquired_at IS NULL)) constraint on the outbox table makes a half-set lease unrepresentable.
  • P9 — a nullable timer_id column on the DLQ table, so a terminally-failed timer keeps its dedup key in the audit trail.

Both are declared on the tables make_outbox_table / make_dlq_table return, so Alembic autogenerate brings them up — but you must generate and apply the migration. The outbox column set is otherwise unchanged.

Metric semantics

  • P17 (above): a lease-lost row now emits lease_lost instead of acked/nacked_*. If you alert on acked rate, the lease-loss tail is no longer double-counted.
  • P28 (above): published{status="error"} now actually increments. Alerts that were silently never firing will start firing on real publish errors.

These are corrections, but they change what existing dashboards see — review before upgrading in production.

Test broker (run_loops=True)

  • B14 — loop-mode TestOutboxBroker spawned every fetch/worker loop twice (so max_workers=1 ran 2 workers); it now spawns each once.
  • B15 — loop-mode worker tasks are cancelled on context exit (no more "Task was destroyed but it is pending!" noise / stale workers).
  • B16 — the test broker's patched fetch_unprocessed now accepts limit= (a production-valid call previously raised TypeError).

Tests using TestOutboxBroker(run_loops=True) get more faithful behavior; a test that implicitly depended on the double-spawn (e.g. asserting worker count) should be re-checked.


Other improvements

Quality / hygiene with no migration impact:

  • P1 empty publish_batch validates the session type up front · P3 encode failures emit the published error metric · P6 documents the count=0 convention (error vs timer_id conflict) · P7 the table-name length guard covers derived index/constraint names, not just the NOTIFY channel · P11 fetch uses queue = ANY(:queues) (stable SQL → prepared-statement reuse; verified to still drive both partial indexes) · P14 _on_notify wakes only for queues the subscriber serves (no cross-queue wakeup storm) · P16 stop() awaits the cancelled tasks before returning (a caller's immediate engine.dispose() no longer races teardown) · P18 a worker config error (OutboxResponse + foreign publisher) is logged, not routed through the reconnect/backoff path that throttled unrelated rows · P19/P13/P15 clearer logs/docs · P20 broker.stop() snapshots subscribers once (never-raise contract) · P21 the unstarted-foreign-broker warning names only the decorated subscriber's queues · P22 a start()-time duplicate-queue check across routers · P24 dead OutboxBrokerConfig.connect()/disconnect() removed · P25/P26 annotation + divergence-marker fixes · P31/P32 the test broker validates tz-aware feed() datetimes and copies headers at its boundaries · P33/P35 clarifying comments.

Plus 8 new test-quality guards (T1–T8) pinning invariants the suite previously didn't (lease-token filter on the retry path, the manual-reject fallback through dispatch, consume-escape row preservation, drain "no new claims", the relay dual-fire guard ordering, the index-implying fetch-CTE shape, recorder event sequencing, and off-Postgres drain coverage).

Deferred: P29/P30 (a coupled test-broker dedup + notify-on-feed) and P27 (a stacklevel with no single correct value across call paths) — see the audit findings.


Migration

  1. Schema. Regenerate your Alembic migration after upgrading; autogenerate will add the <table>_lease_ck CHECK constraint and, if you use a DLQ, the outbox_dlq.timer_id column. Apply it before deploying the new code. (A pre-existing half-set lease — if any operator ever set one by hand — must be cleaned up before the CHECK can be added.)
  2. Validation. If any publish/publish_batch/subscriber/retry-strategy call relied on the now-rejected inputs in the table above, fix the call. These surface immediately on the first use (or at decoration time), not in production.
  3. Metrics / alerts. Re-ch...
Read more

0.8.0

04 Jun 21:24
0442b35

Choose a tag to compare

What's Changed

  • feat: foreign-broker relay from OutboxSubscriber by @lesnik512 in #44

Full Changelog: 0.7.1...0.8.0

0.7.1

04 Jun 14:16
22de000

Choose a tag to compare

What's Changed

  • chore: adopt faststream 0.7.1 TestBroker typing fix by @lesnik512 in #43

Full Changelog: 0.7.0...0.7.1

0.7.0

03 Jun 19:43
b36dcc5

Choose a tag to compare

What's Changed

Full Changelog: 0.6.1...0.7.0

0.6.1

03 Jun 09:27
0ebccf3

Choose a tag to compare

What's Changed

  • chore: add 'all' extra and planning/ workflow directory by @lesnik512 in #41

Full Changelog: 0.6.0...0.6.1