Implement Exponential Backoff for Transient Sync Errors by tosynthegeek · Pull Request #588 · lightningdevkit/ldk-node

tosynthegeek · 2025-06-26T14:09:53Z

This PR attempts to improve retry in the bitcoind RPC synchronization loop from #587 by replacing the existing linear backoff with a proper exponential backoff strategy. We could also add a maximum delay to prevent excessively long waits, but I am not sure if it would really help.

ldk-reviews-bot · 2025-06-26T14:09:58Z

👋 Thanks for assigning @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

tnull

Thanks!

CI is unhappy though, please check that the code compiles before pushing.

tnull · 2025-06-27T07:16:45Z

 							log_error!(logger, "Failed to synchronize chain listeners: {:?}", e);
-							tokio::time::sleep(Duration::from_secs(CHAIN_POLLING_INTERVAL_SECS))
-								.await;
+							backoff_delay_multiplier += 1;


I don't think we should let the backoff delay grow infinitely. Please include a sane upper bound.

Got it, I am adding a max backoff delay of 300 secs for this

tnull · 2025-06-27T07:18:12Z

+							} else {
+								log_error!(
+									logger,
+									"Failed to synchronize chain listeners: {:?}, e"


This doesn't compile. Also, as discussed elsewhere, we at least need to error out. But technically, we should keep retrying currently as otherwise we'd need to add panic here as permanently being unable to connect to the chain source is an unrecoverable error.

This doesn't compile. Also, as discussed elsewhere, we at least need to error out. But technically, we should keep retrying currently as otherwise we'd need to add panic here as permanently being unable to connect to the chain source is an unrecoverable error.

@tnull As you suggested that panicking isnt best and not being able to error out I stuck with retrying but with the delay of 300 secs. Exponential backoff from CHAIN_POLLING_INTERVAL_SECS was only introduced for transient errors

tosynthegeek · 2025-06-27T07:20:28Z

Thanks!

CI is unhappy though, please check that the code compiles before pushing.

Yeah, just saw that, would check it out now..

tnull

Please also make sure to always provide rationale for the change in the commit message, and format it properly (heading shouldn't be larger than one line). Feel free to refer to https://cbea.ms/git-commit/ on how to write good commit messages.

Previously, chain synchronization failures would retry immediately without any delay, which could lead to tight retry loops and high CPU usage during failures. This change introduces exponential backoff for transient errors, starting at 2 seconds and doubling each time up to a maximum of 300 seconds. Persistent errors also now delay retries by the maximum backoff duration to prevent rapid loops while maintaining eventual recovery. Fixes lightningdevkit#587

tnull

Thanks!

Previously, the background reconnection task retried every persisted peer on a fixed 60s interval with no backoff, so an unreachable peer was retried indefinitely at the same cadence — log spam and wasted work. This became more visible after lightningdevkit#895 retained peers across force-closes so that channel_reestablish recovery can run. Track per-peer reconnect state in ConnectionManager: on failure, double the retry interval up to PEER_RECONNECTION_MAX_INTERVAL (30 min); on success (including user-initiated connects), clear the state so a subsequent drop retries promptly. The 60s tokio::time::interval is kept as the wakeup, gated per-peer by next_retry_at, since lightningdevkit#588's inline-sleep form does not generalize to N peers. Backoff state is in-memory and resets on restart — a fresh post-restart attempt is the correct behavior. State is also cleared when a peer is removed from the persisted store. Closes lightningdevkit#918.

ldk-reviews-bot requested a review from valentinewallace June 26, 2025 14:20

tosynthegeek mentioned this pull request Jun 26, 2025

bitcoind rpc gets stuck in an infinite loop if incorrect password is provided #587

Closed

tnull requested review from tnull and removed request for valentinewallace June 27, 2025 07:14

tnull reviewed Jun 27, 2025

View reviewed changes

tosynthegeek closed this Jun 28, 2025

tosynthegeek force-pushed the main branch from 3e79263 to 243c874 Compare June 28, 2025 07:25

tosynthegeek reopened this Jun 28, 2025

tosynthegeek force-pushed the main branch from dac462c to 3954355 Compare June 30, 2025 08:06

tnull approved these changes Jun 30, 2025

View reviewed changes

tnull merged commit 775e0db into lightningdevkit:main Jun 30, 2025
20 of 24 checks passed

Jolah1 mentioned this pull request Jun 3, 2026

Add configurable reconnection backoff for persisted peers #918

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Exponential Backoff for Transient Sync Errors#588

Implement Exponential Backoff for Transient Sync Errors#588
tnull merged 1 commit into
lightningdevkit:mainfrom
tosynthegeek:main

tosynthegeek commented Jun 26, 2025

Uh oh!

ldk-reviews-bot commented Jun 26, 2025 •

edited

Loading

Uh oh!

tnull left a comment

Uh oh!

tnull Jun 27, 2025

Uh oh!

tosynthegeek Jun 28, 2025

Uh oh!

tnull Jun 27, 2025

Uh oh!

tosynthegeek Jun 28, 2025 •

edited

Loading

Uh oh!

tosynthegeek commented Jun 27, 2025

Uh oh!

tnull left a comment

Uh oh!

tnull left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tosynthegeek commented Jun 26, 2025

Uh oh!

ldk-reviews-bot commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

tnull Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

tosynthegeek Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

tnull Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

tosynthegeek Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tosynthegeek commented Jun 27, 2025

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ldk-reviews-bot commented Jun 26, 2025 •

edited

Loading

tosynthegeek Jun 28, 2025 •

edited

Loading