staticaddr/withdraw: add retry and reorg handling for withdrawals#1134
staticaddr/withdraw: add retry and reorg handling for withdrawals#1134kaldun-tech wants to merge 1 commit into
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request improves the reliability of the static address withdrawal manager by addressing critical gaps in error handling and state management. By introducing robust retry mechanisms for notification registration, implementing reorg detection, and ensuring state is persisted before monitoring begins, the system is now more resilient to transient chain issues and process restarts. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request enhances the reliability of static address withdrawals by implementing automatic retries for notification registrations and robust handling for blockchain reorgs. It also reorders state persistence to ensure that withdrawals can be recovered correctly after a restart. The review feedback identifies several opportunities to harden the implementation, specifically by handling potential channel closures to prevent panics or tight loops during shutdown, and ensuring that the correct transaction hash is cleared from the internal tracking map when Replace-By-Fee (RBF) transactions are confirmed.
a39322a to
692327e
Compare
|
@hieblmi Ready for review |
|
@kaldun-tech, remember to re-request review from reviewers when ready |
|
@hieblmi @ffranr @alexbosworth @starius Sorry for ping, reminder to review |
|
Doing additional review and adding tests. Please stand by. |
This commit fixes three related issues in the withdrawal manager: 1. No retry when RegisterSpendNtfn/RegisterConfirmationsNtfn fails 2. No handling for reorgs (tx confirmed then reorged out) 3. State not persisted before handleWithdrawal, causing recovery failures Changes: - Reorder state persistence to happen BEFORE handleWithdrawal so recoverWithdrawals can find deposits in Withdrawing state on restart - Add retry-on-next-block for registration failures using block notifications - Add reorg detection via lndclient.WithReOrgChan for both spend and confirmation notifications - On reorg after spend detection, return to spend watching (not just re-register for confirmations) since the spend tx may be invalidated - Make handleWithdrawal async/best-effort since tx is already published - Added appropriate tests in manager_test.go Fixes issue lightninglabs#1087 Signed-off-by: kaldun-tech <tsmereka@protonmail.com>
692327e to
e33e525
Compare
| // Withdrawing), meaning this is the first withdrawal attempt. Start the | ||
| // spend monitor goroutine. Fee bumps reuse the existing monitor since | ||
| // we ensure the same deposit ensemble is used for republishing. | ||
| if allDeposited { |
There was a problem hiding this comment.
I'd like to rename the allDeposited variable to isFirstWithdrawal to better reflect how it's being used here
kaldun-tech
left a comment
There was a problem hiding this comment.
Ready for review. Added inline comments on two areas where I'd appreciate maintainer input.
| m.mu.Lock() | ||
| delete(m.finalizedWithdrawalTxns, txHash) | ||
| delete(m.finalizedWithdrawalTxns, *spentTx.SpenderTxHash) |
There was a problem hiding this comment.
I added a defensive nil check here.
My concern in the RBF (Replace-By-Fee) is the case where when the replacement confirms, the replacement's hash is used to delete here. But the original txHash was added to the map initially. Does it remain as a leak? Should both be cleaned up here or does the map get cleared via another path?
Summary
Fixes Issue #1087
Addresses three related issues in the static address withdrawal manager:
RegisterSpendNtfn/RegisterConfirmationsNtfnfailures silently stopped monitoringhandleWithdrawalfailure prevented state persistence, breaking recoveryChanges
handleWithdrawalso recovery works even if monitoring setup failslndclient.WithReOrgChan()- on reorg after spend detection, returns to spend watching (not just confirmation re-registration) since the spend tx itself may be invalidatedhandleWithdrawalasync/best-effort since tx is already publishedDesign Note: Best-Effort Monitoring
After
WithdrawDepositspublishes the transaction and persists deposits to Withdrawing state, subsequent operations (withdrawal record creation, spend monitoring) are best-effort. Errors are logged but don't fail the RPC API call because:recoverWithdrawalswill retry monitoring on restartTesting
Unit tests added for retry logic and clean shutdown:
TestHandleWithdrawal_HappyPathTestHandleWithdrawal_SpendRegistrationRetryTestHandleWithdrawal_ConfirmationRegistrationRetryTestHandleWithdrawal_ContextCanceledTestHandleWithdrawal_BlockChannelClosedTestHandleWithdrawal_SpendErrorReregistersI decided to defer tests related to reorg. These would require
MockChainNotifierextension to capturelndclient.NotifierOptionQuestions for Maintainers
SpenderTxHashis used to delete fromfinalizedWithdrawalTxns. If the original tx was fee-bumped, does the original txHash remain in the map? Should both be cleaned up?allDepositedflag indicates first withdrawal vs fee bump. How about we rename this toisFirstWithdrawalfor clarity?Pull Request Checklist
release_notes.mdif your PR contains major features, breaking changes or bugfixes