feat: bound, retry, and resume peer downloads end-to-end (#13 5/5)#20
Merged
roziscoding merged 2 commits intoJun 6, 2026
Conversation
Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads), a retry loop (bounded backoff+jitter, attempts tracked, transient vs permanent classification, Retry-After honored), and resume: the restart progress event persists via markResumeReset, and an active/reenqueued dedupe prevents duplicate rows. On startup, index.ts re-drives stale downloading rows with resumeStaleDownloads() before the watcher scans, falling back to reconcileStaleDownloads() when downloads is unconfigured. Closes #13.
- Dedupe stale downloading rows by destPath before re-driving: only one row per destination is resumable (they share the same .part), so mark the superseded duplicates failed instead of letting the second silently early-return in runDownload and stay stuck in downloading. - Release the reenqueued claim on successful resume (stub already unlinked, so no scan race) so a later legitimate re-drop of the same torrent filename is not silently skipped for the rest of the process. Refs #13.
f441524
into
feat/harden-peer-downloads/4-attempts-schema
6 checks passed
roziscoding
added a commit
that referenced
this pull request
Jun 6, 2026
…4/5) (#19) * feat: track download attempts and expose stale rows for re-drive Add an attempts column to the downloads table (additive migration) and repository methods: incrementAttempts, markResumeReset (reset downloadedBytes and record the resume-from-zero transition), and listStaleDownloads (returns stale downloading rows without mutating them, for active startup re-enqueue). reconcileStaleDownloads is kept as the fallback for when downloads is unconfigured. Refs #13. * feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) (#20) * feat: bound, retry, and resume peer downloads end-to-end Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads), a retry loop (bounded backoff+jitter, attempts tracked, transient vs permanent classification, Retry-After honored), and resume: the restart progress event persists via markResumeReset, and an active/reenqueued dedupe prevents duplicate rows. On startup, index.ts re-drives stale downloading rows with resumeStaleDownloads() before the watcher scans, falling back to reconcileStaleDownloads() when downloads is unconfigured. Closes #13. * fix: harden startup re-enqueue dedupe (review feedback) - Dedupe stale downloading rows by destPath before re-driving: only one row per destination is resumable (they share the same .part), so mark the superseded duplicates failed instead of letting the second silently early-return in runDownload and stay stuck in downloading. - Release the reenqueued claim on successful resume (stub already unlinked, so no scan race) so a later legitimate re-drop of the same torrent filename is not silently skipped for the rest of the process. Refs #13.
roziscoding
added a commit
that referenced
this pull request
Jun 6, 2026
…5) (#18) * feat: add retry, semaphore, and download concurrency/retry config Add a generic retry() helper (bounded attempts, exponential backoff with full jitter, optional Retry-After override, injectable sleep/random) and a download retry classifier (transient: network/timeout/5xx/429/incomplete stream; permanent: non-429 4xx and others). Add a FIFO async Semaphore. Extend DownloadsConfig with maxConcurrentDownloads and retry knobs (all defaulted so existing configs keep parsing). Primitives are wired into DownloadsService in a later change. Refs #13. * feat: track download attempts and expose stale rows for re-drive (#13 4/5) (#19) * feat: track download attempts and expose stale rows for re-drive Add an attempts column to the downloads table (additive migration) and repository methods: incrementAttempts, markResumeReset (reset downloadedBytes and record the resume-from-zero transition), and listStaleDownloads (returns stale downloading rows without mutating them, for active startup re-enqueue). reconcileStaleDownloads is kept as the fallback for when downloads is unconfigured. Refs #13. * feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) (#20) * feat: bound, retry, and resume peer downloads end-to-end Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads), a retry loop (bounded backoff+jitter, attempts tracked, transient vs permanent classification, Retry-After honored), and resume: the restart progress event persists via markResumeReset, and an active/reenqueued dedupe prevents duplicate rows. On startup, index.ts re-drives stale downloading rows with resumeStaleDownloads() before the watcher scans, falling back to reconcileStaleDownloads() when downloads is unconfigured. Closes #13. * fix: harden startup re-enqueue dedupe (review feedback) - Dedupe stale downloading rows by destPath before re-driving: only one row per destination is resumable (they share the same .part), so mark the superseded duplicates failed instead of letting the second silently early-return in runDownload and stay stuck in downloading. - Release the reenqueued claim on successful resume (stub already unlinked, so no scan race) so a later legitimate re-drop of the same torrent filename is not silently skipped for the rest of the process. Refs #13. * fix: address retry review feedback
roziscoding
added a commit
that referenced
this pull request
Jun 6, 2026
* feat: resume interrupted peer downloads via HTTP Range downloadFile now detects an existing .part file and sends Range: bytes=<size>-, validating the peer's 206 + Content-Range against the persisted expected size before appending. On 200 (range ignored), a Content-Range mismatch, or 416 it discards the stale .part and restarts from byte 0, emitting a restart progress event. The write path uses a node:fs FileHandle (append/write) with datasync at checkpoints, and the .part is preserved on error so the next attempt can resume. A truncated stream throws a retryable IncompleteDownloadError. Refs #13. * feat: retry, semaphore, and download concurrency/retry config (#13 3/5) (#18) * feat: add retry, semaphore, and download concurrency/retry config Add a generic retry() helper (bounded attempts, exponential backoff with full jitter, optional Retry-After override, injectable sleep/random) and a download retry classifier (transient: network/timeout/5xx/429/incomplete stream; permanent: non-429 4xx and others). Add a FIFO async Semaphore. Extend DownloadsConfig with maxConcurrentDownloads and retry knobs (all defaulted so existing configs keep parsing). Primitives are wired into DownloadsService in a later change. Refs #13. * feat: track download attempts and expose stale rows for re-drive (#13 4/5) (#19) * feat: track download attempts and expose stale rows for re-drive Add an attempts column to the downloads table (additive migration) and repository methods: incrementAttempts, markResumeReset (reset downloadedBytes and record the resume-from-zero transition), and listStaleDownloads (returns stale downloading rows without mutating them, for active startup re-enqueue). reconcileStaleDownloads is kept as the fallback for when downloads is unconfigured. Refs #13. * feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) (#20) * feat: bound, retry, and resume peer downloads end-to-end Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads), a retry loop (bounded backoff+jitter, attempts tracked, transient vs permanent classification, Retry-After honored), and resume: the restart progress event persists via markResumeReset, and an active/reenqueued dedupe prevents duplicate rows. On startup, index.ts re-drives stale downloading rows with resumeStaleDownloads() before the watcher scans, falling back to reconcileStaleDownloads() when downloads is unconfigured. Closes #13. * fix: harden startup re-enqueue dedupe (review feedback) - Dedupe stale downloading rows by destPath before re-driving: only one row per destination is resumable (they share the same .part), so mark the superseded duplicates failed instead of letting the second silently early-return in runDownload and stay stuck in downloading. - Release the reenqueued claim on successful resume (stub already unlinked, so no scan race) so a later legitimate re-drop of the same torrent filename is not silently skipped for the rest of the process. Refs #13. * fix: address retry review feedback * fix: guard non-ok resume responses * fix: avoid leaked peer download reader lock * fix: close peer download handle on reader failure
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack 5/5 for #13 (top). Base:
feat/harden-peer-downloads/4-attempts-schema(#19). Integrates PRs 2–4.What this PR does
Wires the primitives together in
DownloadsServiceand rewires startup, completing the hardened flow.runDownload()runs every download through a sharedSemaphore(maxConcurrentDownloads)and aretry()loop: incrementsattemptsper attempt, classifies transient vs permanent (PR 3), and persists therestartevent viamarkResumeReset(PR 4).activeset (keyed bydestPath) and areenqueuedset (torrent filenames) dedupe so the startup re-enqueue and the watcher never double-process the same file.resumeStaleDownloads()actively re-drives staledownloadingrows from the DB (resuming from their.part) instead of failing them.index.tscalls it before the watcher scans;reconcileStaleDownloads()is the fallback only whendownloadsis unconfigured.release.filenameand the atomic-rename completion are preserved.Review-feedback fix (commit
5d38820)A code review caught two edge cases in the re-enqueue dedupe, both fixed with regression tests:
destPathare deduped (they share one.part); superseded duplicates are markedfailedinstead of silently orphaning adownloadingrow.reenqueuedclaim is released on successful resume (after the stub is unlinked, so no scan race), so a later legitimate re-drop of the same filename isn't silently skipped.Files
apps/backend/src/modules/downloads/downloads.service.ts(full rewrite)apps/backend/src/index.ts(startup wiring)apps/backend/src/__tests__/downloads-service.test.ts(full rewrite, 10 tests)Testing
10 service tests: row lifecycle, no-row-on-metadata-fail, path traversal, permanent failure (no retry), transient retry-then-succeed, resume reset persisted, concurrency cap, stale re-drive, duplicate-destPath dedupe, re-enqueue-claim release. Full suite: 129/129 green, lint + typecheck clean.
Manual verification (post-merge)
Boot Jack with a leftover
*.part+ a staledownloadingrow and confirm it re-enqueues, issues aRangerequest, resumes, and renames into the completed folder.Review focus
runDownload/downloadWithRetry, theactive/reenqueueddedupe lifecycle, and theindex.tsstartup ordering (resumeStaleDownloads()beforewatcher.start()).Greptile Summary
[Linus Torvalds Mode] Oh look, someone finally decided to wire together all the primitives they've been lovingly building for the last four stacked PRs — if you're going to take five PRs to ship a download manager, at least make the fifth one actually work, which, grudgingly, this does.
This PR completes the hardened peer-download flow by integrating
Semaphore,retry, andmarkResumeResetintoDownloadsService:runDownload/downloadWithRetry: every download acquires aSemaphore(maxConcurrentDownloads)permit, then runs insideretry()with exponential-backoff + jitter. Permanent errors (4xx non-429) are not retried; transient ones (5xx, 429, network errors,IncompleteDownloadError) are.active+reenqueueddedup sets:activeprevents duplicate rows/writers for concurrent live drops;reenqueuedgates the watcher away from stubs owned by startup re-enqueue, cleared only on successful resume.resumeStaleDownloads: dedupesdownloadingrows bydestPath, marks superseded rows failed, fires backgroundrunDownloadcalls beforewatcher.start().index.tsstartup ordering: correct —resumeStaleDownloads()before watcher when configured;reconcileStaleDownloads()fallback when not.Confidence Score: 5/5
[Linus Torvalds Mode] Five stacked PRs to ship a download manager — I've seen glaciers move faster. That said, this final piece holds together: semaphore, retry, and dedup logic are correct, startup ordering is right, and the tests cover the real edge cases. Safe to merge.
The single finding is a P2 type annotation in a test helper. No functional bugs found. The active.add synchronicity guarantee, reenqueued lifecycle, semaphore fairness, retry policy, and startup ordering are all correct. 10-test suite validates every meaningful invariant.
Only downloads-service.test.ts needs the minor type fix on the downloadsConfig helper; service and startup wiring are solid.
Important Files Changed
Sequence Diagram
sequenceDiagram participant idx as index.ts participant svc as DownloadsService participant sem as Semaphore participant repo as DownloadsRepository participant peer as PeerConnector idx->>svc: resumeStaleDownloads() svc->>repo: listStaleDownloads() repo-->>svc: stale DownloadRecord[] note over svc: Dedupe by destPath, mark superseded failed, add filenames to reenqueued svc-)svc: void runDownload(record) fire-and-forget svc-->>idx: resumed count idx->>svc: BlackholeWatcher triggers processTorrentFile note over svc: reenqueued.has? skip. active.has? skip. svc->>peer: getRelease(itemId) peer-->>svc: Release validated svc->>repo: create(downloadInput) repo-->>svc: DownloadRecord svc->>svc: runDownload(record) note over svc: active.add(destPath) synchronous svc->>sem: acquire() sem-->>svc: permit loop retry up to maxDownloadAttempts svc->>repo: incrementAttempts(id) svc->>peer: downloadFile(itemId, destPath, partPath, onProgress) peer->>svc: onProgress headers svc->>repo: setExpectedBytes peer->>svc: onProgress restart optional svc->>repo: markResumeReset peer->>svc: onProgress completed svc->>repo: markCompleted peer-->>svc: resolved or transient error end svc->>svc: unlink stub svc->>svc: triggerImport svc->>repo: markImportQueued svc->>svc: reenqueued.delete svc->>sem: release note over svc: active.delete in finally blockReviews (1): Last reviewed commit: "fix: harden startup re-enqueue dedupe (r..." | Re-trigger Greptile