Skip to content

[fix][broker] Defer ack state updates until persistence succeeds#25528

Draft
nodece wants to merge 3 commits intoapache:masterfrom
nodece:fix-broker-ack-state-updates
Draft

[fix][broker] Defer ack state updates until persistence succeeds#25528
nodece wants to merge 3 commits intoapache:masterfrom
nodece:fix-broker-ack-state-updates

Conversation

@nodece
Copy link
Copy Markdown
Member

@nodece nodece commented Apr 15, 2026

Motivation

The broker ack path was updating consumer-side in-memory ack state before the underlying subscription ack path had fully completed. That makes ack failures harder to reason about and can leave broker-side state transitions out of sync with the persisted ack result and ack-receipt flow.

Changes

  • add an async-first subscription ack path and route broker ack handling through it
  • defer consumer unacked and pending-ack state updates until the async ack operation completes successfully
  • keep ack-receipt handling aligned with the async broker ack completion in ServerCnx
  • update subscription implementations that participate in the shared ack flow, including persistent, non-persistent, compactor, replicated-subscription, dispatcher, and pending-ack paths
  • adjust broker-side regression tests covering cumulative ack, transaction marker delete, persistent subscription behavior, and topic-level backlog checks

@codelipenghui
Copy link
Copy Markdown
Contributor

Will the broker stop dispatch messages to the consumer if the consumer has been set to a lower max unack message and the persistence interval of the cursor set to a higher value?

@nodece
Copy link
Copy Markdown
Member Author

nodece commented Apr 16, 2026

@codelipenghui

The broker should not stop dispatching messages just because managedLedgerCursorPositionFlushSeconds is higher.

In this implementation, unackedMessages is decremented when acknowledgeMessageAsync(...) completes, and that completion is driven by the managed cursor ack callback path, not by the periodic cursor flush task itself.

For the normal path, the ack immediately triggers async mark-delete/delete persistence, so it does not wait for managedLedgerCursorPositionFlushSeconds.

managedLedgerCursorPositionFlushSeconds is only a fallback for dirty/rate-limited cursor updates. In that case the cursor callback is completed immediately and the actual flush is deferred, so dispatch is not blocked waiting for that interval either.

@nodece nodece force-pushed the fix-broker-ack-state-updates branch 2 times, most recently from 21fe8ef to b6b33f1 Compare April 17, 2026 10:43
@nodece nodece force-pushed the fix-broker-ack-state-updates branch from b6b33f1 to 65f7a2a Compare April 17, 2026 10:46
@nodece nodece marked this pull request as draft April 20, 2026 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants