Releases: modern-python/faststream-outbox
0.10.3 — schema validation matches the lease CHECK by predicate, not name
faststream-outbox 0.10.3 — schema validation matches the lease CHECK by predicate, not name
Patch release. One bug fix to validate_schema()'s CHECK-constraint probe,
correcting the 0.10.2 approach. No public-API change. The only change to the
installed package is how the probe identifies the lease CHECK; everything else is
identical to 0.10.2.
Fixes
-
validate_schema()no longer false-fails when the lease CHECK was created
under a literal name while theMetaDatacarries acknaming_convention.
0.10.2 made the probe read the expected name off theTableobject, which
under ackconvention resolves to the doubledck_<table>_<table>_lease_ck.
But a hand-written migration —
op.create_check_constraint('<table>_lease_ck', ...)— creates the literal
name verbatim, because Alembic op functions don't applytarget_metadata's
convention. So the probe demanded a name the migration never produces and
raised a spuriousOutbox schema mismatch: missing CHECK constraint 'ck_<table>_<table>_lease_ck' …on a valid schema. The live constraint name is simply not predictable from
the package side — it depends on how the migration was authored
(MetaData.create_all/ convention-aware autogenerate → doubled name; a
hand-writtenop.create_check_constraint→ literal name)._validate_check_constraints_syncnow matches the lease CHECK by predicate,
not name: it normalizes every live CHECK's definition and passes if one
enforces(acquired_token IS NULL) = (acquired_at IS NULL)under any name.
When none does (including a drifted predicate — that just means the correct one
is absent), it reportsmissing CHECK constraint enforcing 'acquired_token is null = acquired_at is null' (the lease invariant; name it e.g. <table>_lease_ck)This is strictly more correct: a CHECK's name is irrelevant to whether it
enforces the invariant._resolve_check_constraint_name(added in 0.10.2) is
removed.
Compatibility
validate_schema() is opt-in (you call it from a health check / CI gate; it
is never run by broker.start()). If you don't call it, nothing changes.
If you do: any schema whose lease CHECK enforces the correct predicate now
validates regardless of the constraint's name — with no convention, with a ck
convention and the doubled name, or with a ck convention and a literal-named
hand-written migration (the case 0.10.2 broke). Schemas that genuinely lack the
invariant still fail, now with a predicate-describing message instead of a
guessed name.
The "wrong predicate" drift case now reports as "missing CHECK constraint
enforcing …" rather than "has wrong predicate" — a drifted CHECK doesn't enforce
the invariant, so it reads as the correct one being absent. The remediation
pointer (#fixing-drift-autogenerate-cant-see) still fires.
No other behavior change — producers, subscribers, the lease / terminal-write
paths, timers, the index/uniqueness probes, and the dlq_table=None path are all
identical to 0.10.2.
Docs
docs/operations/alembic.md— replaced the "introspect the rendered name"
recipe with "the CHECK name doesn't matter; match the predicate." The literal
op.create_check_constraint('outbox_lease_ck', ...)recipe is now correct even
under a convention.architecture/dlq.md— recorded the match-by-predicate invariant and why
name-prediction was reverted.
Touched surface
faststream_outbox/client.py—_validate_check_constraints_syncmatches by
predicate;_resolve_check_constraint_nameand the now-unusedCheckConstraint
import removed. Only package code change.docs/operations/alembic.md,architecture/dlq.md— recipe + invariant.tests/test_unit.py,tests/test_integration.py— regression coverage
(predicate matched under convention-doubled name, under literal name, missing
describes the predicate, drifted-reads-as-missing, and an end-to-end Postgres
pass for convention metadata + a literally-named constraint).
See also
- Corrects the 0.10.2 approach
(#102).
0.10.2 — schema validation honors naming_convention
faststream-outbox 0.10.2 — schema validation honors naming_convention
Patch release. One bug fix to validate_schema()'s CHECK-constraint probe.
No public-API change. The only change to the installed package is which
constraint name the probe looks up; everything else is identical to 0.10.1.
Fixes
-
validate_schema()no longer false-fails under a SQLAlchemyck
naming_convention. AMetaDatacarrying anaming_conventionwith ack
key re-templates the package's explicitly-named leaseCheckConstraint— the
given name becomes the%(constraint_name)stoken, so the live constraint is
named e.g.ck_<table>_<table>_lease_ck, not<table>_lease_ck. The probe
hard-coded the literal<table>_lease_ck, never found the re-templated name,
and raised a spuriousOutbox schema mismatch: missing CHECK constraint '<table>_lease_ck' …on a perfectly valid schema.
_validate_check_constraints_syncnow reads the
expected name off theTableobject (identifying the lease constraint by
its normalized predicate and using its convention-resolved.name), so the
expectation always matches what SQLAlchemy / Alembic emit from your metadata —
convention or not. The explicitly-named indexes were never affected: the
ix/uqconvention keys only re-template auto-named indexes.
Compatibility
validate_schema() is opt-in (you call it from a health check / CI gate; it
is never run by broker.start()). If you don't call it, nothing changes.
If you do, and you use no naming_convention, behavior is identical to
0.10.1 — the probe still resolves to the literal <table>_lease_ck. If you use a
ck convention, validation that previously raised spuriously now passes,
provided the constraint in your DB carries the convention-rendered name (which it
does when the schema was created via MetaData.create_all or a
convention-aware Alembic migration). Hand-written migrations must create the
constraint under that same rendered name — see the updated
naming_convention guidance
in the Alembic docs.
No other behavior change — producers, subscribers, the lease / terminal-write
paths, timers, the index/uniqueness probes, and the dlq_table=None path are
all identical to 0.10.1.
Docs
docs/operations/alembic.md— replaced the misleading
"wrap each name inop.f('outbox_lease_ck')" caveat with an
introspect-the-rendered-name recipe (the probe now expects the
convention-resolved name, not the literal).architecture/dlq.md— recorded the convention-awareness invariant in the
validate_schema()mechanics section.
Touched surface
faststream_outbox/client.py— new_resolve_check_constraint_name+
_validate_check_constraints_syncreads the expected CHECK name off the
Table. Only package code change.docs/operations/alembic.md,architecture/dlq.md— caveat fix + invariant.tests/test_unit.py,tests/test_integration.py— regression coverage
(convention-resolved name honored, missing-under-convention, literal-name
fallback, and an end-to-end Postgres pass under ackconvention).
See also
- Follow-up to the 0.10.1 schema-validation work
(#99 change
bundle:planning/changes/archive/2026-06-16.01-actionable-schema-drift-error/).
0.10.1 — actionable schema-drift errors
Patch release. One opt-in diagnostic refinement plus docs/branding. No public-API change and no breaking change beyond 0.10.0's. The only change to the installed package is validate_schema()'s error text.
Diagnostics
validate_schema()now tells you how to fix Alembic-blind drift (#99). A missing/altered<table>_lease_ckCHECK and a drifted partial-index predicate cannot be remediated byalembic revision --autogenerate(no check-constraint comparator; the index comparator ignorespostgresql_where). When one of those probes fires, the raisedRuntimeErrornow appends a pointer to the hand-written-migration recipe. Autogenerate-fixable drift (columns, plain indexes, DLQ) gets no pointer.
Compatibility
validate_schema() is opt-in; if you don't call it, nothing changes. If you do, the error gains a trailing pointer block but the "Outbox schema mismatch: " prefix and per-error strings are unchanged, so existing greps / match= assertions still hold. No other behavior change.
Docs
- New "Fixing drift autogenerate can't see" section in
docs/operations/alembic.md+ cross-link (#99). - Org logo, favicon, brand palette (#100).
architecture/deep-dive refresh for 0.10.0 surfaces (#98).
Full notes: planning/releases/0.10.1.md. PRs: #99, #100, #98.
0.10.0 — pass-3 audit closure: a High fix, two features, a hardened tail
Minor release. Full resolution of the 2026-06-14 pass-3 deep audit — one High correctness fix, two additive features, a cluster of robustness/validation fixes, and a large test-hardening + docs sweep. Backward-compatible by default; three opt-in/behavior refinements noted below.
New features
OutboxBroker(..., last_exception_renderer=...)— opt-inCallable[[BaseException], str | None]to redact or drop the DLQlast_exception(default keepsrepr). For deployments whose exceptions can embed payloads/PII/credentials. Also on the FastAPIOutboxRouter. (#93)- FastAPI
OutboxRouterforwardsdlq_table+metrics_recorder— DLQ archival and the recorder-metrics seam were previously unreachable for FastAPI users. (#88) drain_timeoutobservability — astop()drain exceedinggraceful_timeoutnow emits a WARNING + a metric (faststream_outbox_drain_timeout_total/messaging.outbox.drain_timeout). (#92, #96)
Bug fixes
- [High]
propagate_inbound_headers=Trueno longer poisons a successful relay: a chainedOutboxResponseno longer inherits the inboundcontent-type/correlation_id, which previously raised in_encode_payloadand nacked the successful inbound row to retry-exhaustion. (#85) - Completed the eager-validation fix:
OutboxResponseand emptypublish_batchnow reject a badqueue/sessionat the call site (one shared_validate_publish_args) instead of masquerading as a handler failure at dispatch. (#86) fetch_unprocessed(limit<1)rejected;ping()bounded by a timeout. (#89)
Robustness / correctness
validate_schema()flags a non-uniquetimer_id_uq(indisunique). (#92)- 63-byte identifier guard also enforced in
OutboxClient.__init__. (#92) OutboxBroker.stop()setsrunning=Falsebefore the subscriber-stop gather; reconnect backoff measures healthy time from a live connection. (#92)
Behavior notes (read before upgrading in place)
propagate_inbound_headers=True: a chainedOutboxResponseno longer receives the inboundcontent-type/correlation_id(foreign-broker relays unaffected). (#85)OutboxResponse/ emptypublish_batchvalidate eagerly — a badqueue/sessionraises earlier (at the call site). (#86)validate_schema()is stricter (opt-in): raises on a non-uniquetimer_id_uq. No-op if you don't call it. (#92)
Hot paths, lease/terminal-write invariants, and the dlq_table=None path are otherwise unchanged.
Full notes: planning/releases/0.10.0.md. Disposition ledger: planning/audits/2026-06-14-deep-audit-pass3-findings.md. PRs: #85–#97.
0.9.1 — suspected-findings fixes + diagnostics cleanup
faststream-outbox 0.9.1 — suspected-findings fixes + diagnostics cleanup
Patch release. Closes the five "suspected" findings from the 2026-06-12 audit (4 fixed, 1 already resolved in 0.9.0) plus three diagnostics / test-broker cleanups. No new features and no new breaking changes beyond 0.9.0's. One opt-in behavior change worth a look before upgrading: validate_schema() now catches partial-index drift it previously missed (see Behavior change).
Robustness / correctness
- S1 — bounded the LISTEN-connection teardown close. On reconnect/shutdown, a graceful
close()of the raw LISTEN connection ran unbounded — on the same half-dead socket the health probe may have just flagged, it could block on the kernel keepalive and wedge the fetch loop. It's nowwait_for(close, _LISTEN_CLOSE_TIMEOUT)with an immediateterminate()fallback, best-effort so teardown never raises. - S2 —
validate_schema()detects partial-index predicate drift. Alembic's autogenerate diff compares index columns + uniqueness but ignorespostgresql_where, so a{table}_timer_id_uqbuilt with the wrong predicate (WHERE timer_id IS NULL) — or as a plain non-partialUNIQUE (queue, timer_id)— passed validation, then broke the producer'sON CONFLICTarbiter inference at publish time.validate_schema()now probes the live partial-index predicates (pg_get_expr(indpred, …)) for all three indexes and flags both a wrong predicate and a present-but-non-partial index. - S4 —
import faststream_outboxis resilient to upstream module moves. Thefaststream...try_it_outimport (best-effort ASGI registry wiring) sat outside thetry/exceptmeant to tolerate upstream rename/removal, so a module move would have brokenimport faststream_outboxentirely. The import is now inside the guard. - S3 — already resolved in 0.9.0 (recorded for completeness). The
processed_totallease-lost double-count was closed by 0.9.0's P17: outcome metrics emit only after a successful flush, so a lease-lost row emitslease_lostinstead of a pairedacked/nacked.
Diagnostics / test broker
- S5 — sync-mode batch publish mirrors production ordering.
TestOutboxBroker(sync mode) used to dispatch each handler mid-feed — a handler could observe a half-inserted batch (impossible in production) — and emitpublishedafter the handlers ran. It now inserts the whole batch → emitspublished→ then dispatches, matching the production order (atomic batchINSERT→published→ subscriber fetch). - P27 — subscriber-misconfig warnings point at your call site. The
UserWarnings from@broker.subscriber(...)/@router.subscriber(...)misconfigurations used a staticstacklevelthat was wrong through the FastAPI-router path (it landed on a faststream-internal frame). They now useskip_file_prefixes(3.12+) and are attributed to your decorator line on both the direct and router paths. - P29 / P30 — test-broker internals. The four fake publish paths were deduplicated behind two shared helpers (no behavior change). And loop-mode
TestOutboxBrokernow wakes the fetch loop on feed/publish via the subscriber's_notify_event, mirroring productionNOTIFY— so loop-mode tests no longer need a tightmin_fetch_intervalto be responsive.
Behavior change: validate_schema() is stricter
validate_schema() is opt-in (you call it from a health check / startup hook — it isn't run by broker.start()). If you call it and your outbox table's partial indexes were hand-built or migrated with a predicate that differs from what make_outbox_table declares — most importantly a timer_id_uq that isn't … WHERE timer_id IS NOT NULL — it will now raise where 0.9.0 passed. That's the point: such an index silently breaks publish(..., timer_id=…)'s ON CONFLICT at runtime, and this surfaces it at validation time. The fix is to recreate the index with the correct predicate (regenerate your Alembic migration). If you don't call validate_schema(), nothing changes.
No other behavior change. Producers, subscribers, the lease/terminal-write paths, and the dlq_table=None path are all unchanged.
Touched surface
faststream_outbox/subscriber/usecase.py— bounded LISTEN close (S1).faststream_outbox/client.py— partial-index predicate probe invalidate_schema(S2).faststream_outbox/__init__.py— guardedtry_it_outimport (S4).faststream_outbox/subscriber/factory.py—skip_file_prefixeswarning attribution (P27).faststream_outbox/testing.py— sync-batch ordering (S5), fake-publish dedup (P29), loop-mode NOTIFY (P30).
See also
- Audit findings + resolution:
planning/active/2026-06-12-code-audit-findings.md. - PRs: #68 (S1–S5), #69 (P27), #70 (P29/P30).
0.9.0 — code-audit hardening
faststream-outbox 0.9.0 — code-audit hardening
Hardening release from a full code audit (2026-06-12). No new features. 16 confirmed bugs fixed (several silent data-loss paths), input validation tightened, a schema CHECK constraint and a DLQ column added, and metric accuracy corrected. Bumped minor, not patch: the new validation rejects some calls that previously succeeded, and the schema gains a constraint + column that need a migration.
Two operator actions before upgrading — see Migration:
- Generate and apply a new Alembic migration (a
CHECKconstraint on the outbox table; atimer_idcolumn on the DLQ table if you use one). - Review any alerting keyed off the old
published{status="error"}/acked/nacked_*metric semantics.
Data-loss bugs fixed (delivery-behavior change)
The most important fixes. Each previously deleted a row that should have been retried, or dropped a publish silently.
- B5 / B6 / B7 — the reject-fallback trap. Three paths landed a failed delivery in the destructive "manual-reject" fallback that
DELETEs the row:- B5 —
AckPolicy.MANUAL+ a handler exception (e.g. a DB blip beforemsg.ack()) deleted the row. It now honors the retry strategy (nack), matching every native FastStream broker's "unacked failure → redeliver" semantics. - B6 —
raise NackMessage(delay=…)/AckMessage(**opts)(FastStream's documented idiom) raisedTypeErrorinside the ack middleware (ourack/nack/rejecttook no kwargs), which was swallowed and fell through to reject → row deleted even under the defaultNACK_ON_ERROR.ack/nack/rejectnow accept-and-ignore**options. - B7 — any exception from a retry strategy turned the nack into a terminal reject. An in-box trigger shipped:
ExponentialRetry(max_attempts=None)raisedOverflowError(2.0 ** 1024) after ~3.5 days of a persistently-failing row. The strategy call is now wrapped (a raising strategy degrades toretry_terminal, logged), andExponentialRetryclamps the exponent.
- B5 —
- B8 —
publish_batchdropped a leadingNonebody.publish_batch(None, x)inserted 1 row;publish_batch(None)inserted nothing, no error, no metric (upstream'sbatch_bodiesexcludesbody is None, whilepublish(None)insertsb"").OutboxPublishCommandnow keeps every positional body. - B10 — the DLQ ignored
Table.schema. WithMetaData(schema="app")+ a DLQ, every terminal failure hitUndefinedTable(poison rows retried forever, the outbox grew) or silently wrote to a same-namedsearch_pathtable. The DLQ CTE andvalidate_schema()are now schema-qualified (format_table/MetaData(schema=…)).
Liveness / health
- B1–B4 — fetch-loop fixes. No more drain-window connection-churn storm / test-broker livelock (the fetch loop now halts on drain);
start()resets the drain flag so a stopped-then-started subscriber fetches again; reconnect backoff resets after sustained uptime (no lockstep reconnect herd); the LISTEN connection is closed whenadd_listenerfails or is cancelled (was one leaked socket per reconnect cycle). - B11 / B12 —
ping(). Now walks thesubscribersproperty (so router-registered subscribers — the FastAPI pattern — are health-checked) and honors itstimeoutviaanyio.move_on_after(was accepted-and-ignored; could hang for minutes on a black-holed socket — the exact partitionpingexists to detect). - B13 — missing-extra
ImportErrors are reachable. Importingfaststream_outbox.metrics.prometheus/ the native middleware without the extra now raises the friendly "install…[prometheus]" message instead of a rawModuleNotFoundErrorfrom an upstream frame.
Observability accuracy
- B9 — the Prometheus in-process gauge no longer goes negative. The
max_deliveriesterminal emittednacked_terminalwith no precedingdispatched; the adapter dec'd unconditionally. The.dec()is now gated onduration_seconds(present only for post-dispatchedterminals). - P17 — outcome metrics emit only after the flush lands. A lease-lost row (its lease reclaimed by a newer fetch) used to emit
acked/nacked_*and get redelivered → counted twice. It now emitslease_lostinstead (no paired ack/nack). - P28 —
published{status="error"}fires atcount=0. The error-status series — the one dashboards alert on — was gated behindif count > 0and could never increment. It now increments once per failed publish. - P34 —
dlq_writtenomitsexception_typewhen there is no exception (e.g.max_deliveries) instead of emitting it asNone, matching thenacked_terminalconvention.
Breaking changes
Stricter validation — rejects input that previously "worked"
These now raise at the call/decoration site instead of failing opaquely (or silently) later:
| Improvement | Now rejects |
|---|---|
| P2 | an explicit correlation_id= that conflicts with headers["correlation_id"] (was silently dropped; the kwarg now wins when there's no conflict). |
| P4 | timer_id / correlation_id on a batch publish (multiple bodies) — they were silently dropped. |
| P5 | a queue that is empty, non-str, or longer than the 255-char column. |
| P10 | delete_with_lease(dlq_payload=…) on a client with no dlq_table (was a silent plain DELETE — lost audit data). |
| P12 | non-positive min_fetch_interval / max_fetch_interval / lease_ttl_seconds (would busy-poll / instantly expire the lease). |
| P23 | invalid retry-strategy knobs (jitter_factor outside [0, 2], non-positive delays/multiplier, max_attempts < 1, non-positive caps). |
If you were (knowingly or not) relying on any of these, the upgrade trips immediately with a clear error.
Schema — migration required
- P8 — a
CHECK ((acquired_token IS NULL) = (acquired_at IS NULL))constraint on the outbox table makes a half-set lease unrepresentable. - P9 — a nullable
timer_idcolumn on the DLQ table, so a terminally-failed timer keeps its dedup key in the audit trail.
Both are declared on the tables make_outbox_table / make_dlq_table return, so Alembic autogenerate brings them up — but you must generate and apply the migration. The outbox column set is otherwise unchanged.
Metric semantics
- P17 (above): a lease-lost row now emits
lease_lostinstead ofacked/nacked_*. If you alert onackedrate, the lease-loss tail is no longer double-counted. - P28 (above):
published{status="error"}now actually increments. Alerts that were silently never firing will start firing on real publish errors.
These are corrections, but they change what existing dashboards see — review before upgrading in production.
Test broker (run_loops=True)
- B14 — loop-mode
TestOutboxBrokerspawned every fetch/worker loop twice (somax_workers=1ran 2 workers); it now spawns each once. - B15 — loop-mode worker tasks are cancelled on context exit (no more "Task was destroyed but it is pending!" noise / stale workers).
- B16 — the test broker's patched
fetch_unprocessednow acceptslimit=(a production-valid call previously raisedTypeError).
Tests using TestOutboxBroker(run_loops=True) get more faithful behavior; a test that implicitly depended on the double-spawn (e.g. asserting worker count) should be re-checked.
Other improvements
Quality / hygiene with no migration impact:
- P1 empty
publish_batchvalidates the session type up front · P3 encode failures emit thepublishederror metric · P6 documents thecount=0convention (error vstimer_idconflict) · P7 the table-name length guard covers derived index/constraint names, not just the NOTIFY channel · P11 fetch usesqueue = ANY(:queues)(stable SQL → prepared-statement reuse; verified to still drive both partial indexes) · P14_on_notifywakes only for queues the subscriber serves (no cross-queue wakeup storm) · P16stop()awaits the cancelled tasks before returning (a caller's immediateengine.dispose()no longer races teardown) · P18 a worker config error (OutboxResponse+ foreign publisher) is logged, not routed through the reconnect/backoff path that throttled unrelated rows · P19/P13/P15 clearer logs/docs · P20broker.stop()snapshots subscribers once (never-raise contract) · P21 the unstarted-foreign-broker warning names only the decorated subscriber's queues · P22 astart()-time duplicate-queue check across routers · P24 deadOutboxBrokerConfig.connect()/disconnect()removed · P25/P26 annotation + divergence-marker fixes · P31/P32 the test broker validates tz-awarefeed()datetimes and copies headers at its boundaries · P33/P35 clarifying comments.
Plus 8 new test-quality guards (T1–T8) pinning invariants the suite previously didn't (lease-token filter on the retry path, the manual-reject fallback through dispatch, consume-escape row preservation, drain "no new claims", the relay dual-fire guard ordering, the index-implying fetch-CTE shape, recorder event sequencing, and off-Postgres drain coverage).
Deferred: P29/P30 (a coupled test-broker dedup + notify-on-feed) and P27 (a stacklevel with no single correct value across call paths) — see the audit findings.
Migration
- Schema. Regenerate your Alembic migration after upgrading; autogenerate will add the
<table>_lease_ckCHECK constraint and, if you use a DLQ, theoutbox_dlq.timer_idcolumn. Apply it before deploying the new code. (A pre-existing half-set lease — if any operator ever set one by hand — must be cleaned up before the CHECK can be added.) - Validation. If any
publish/publish_batch/subscriber/retry-strategy call relied on the now-rejected inputs in the table above, fix the call. These surface immediately on the first use (or at decoration time), not in production. - Metrics / alerts. Re-ch...
0.8.0
What's Changed
- feat: foreign-broker relay from OutboxSubscriber by @lesnik512 in #44
Full Changelog: 0.7.1...0.8.0
0.7.1
What's Changed
- chore: adopt faststream 0.7.1 TestBroker typing fix by @lesnik512 in #43
Full Changelog: 0.7.0...0.7.1
0.7.0
0.6.1
What's Changed
- chore: add 'all' extra and planning/ workflow directory by @lesnik512 in #41
Full Changelog: 0.6.0...0.6.1