feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) by roziscoding · Pull Request #20 · roziscoding/jack

roziscoding · 2026-06-06T01:21:41Z

Stack 5/5 for #13 (top). Base: feat/harden-peer-downloads/4-attempts-schema (#19). Integrates PRs 2–4.

What this PR does

Wires the primitives together in DownloadsService and rewires startup, completing the hardened flow.

runDownload() runs every download through a shared Semaphore(maxConcurrentDownloads) and a retry() loop: increments attempts per attempt, classifies transient vs permanent (PR 3), and persists the restart event via markResumeReset (PR 4).
An active set (keyed by destPath) and a reenqueued set (torrent filenames) dedupe so the startup re-enqueue and the watcher never double-process the same file.
resumeStaleDownloads() actively re-drives stale downloading rows from the DB (resuming from their .part) instead of failing them. index.ts calls it before the watcher scans; reconcileStaleDownloads() is the fallback only when downloads is unconfigured.
The path-traversal guard on the peer-controlled release.filename and the atomic-rename completion are preserved.

Review-feedback fix (commit `5d38820`)

A code review caught two edge cases in the re-enqueue dedupe, both fixed with regression tests:

Stale rows sharing a destPath are deduped (they share one .part); superseded duplicates are marked failed instead of silently orphaning a downloading row.
The reenqueued claim is released on successful resume (after the stub is unlinked, so no scan race), so a later legitimate re-drop of the same filename isn't silently skipped.

Files

apps/backend/src/modules/downloads/downloads.service.ts (full rewrite)
apps/backend/src/index.ts (startup wiring)
apps/backend/src/__tests__/downloads-service.test.ts (full rewrite, 10 tests)

Testing

10 service tests: row lifecycle, no-row-on-metadata-fail, path traversal, permanent failure (no retry), transient retry-then-succeed, resume reset persisted, concurrency cap, stale re-drive, duplicate-destPath dedupe, re-enqueue-claim release. Full suite: 129/129 green, lint + typecheck clean.

Manual verification (post-merge)

Boot Jack with a leftover *.part + a stale downloading row and confirm it re-enqueues, issues a Range request, resumes, and renames into the completed folder.

Review focus

The semaphore/retry/resume wiring in runDownload/downloadWithRetry, the active/reenqueued dedupe lifecycle, and the index.ts startup ordering (resumeStaleDownloads() before watcher.start()).

Greptile Summary

[Linus Torvalds Mode] Oh look, someone finally decided to wire together all the primitives they've been lovingly building for the last four stacked PRs — if you're going to take five PRs to ship a download manager, at least make the fifth one actually work, which, grudgingly, this does.

This PR completes the hardened peer-download flow by integrating Semaphore, retry, and markResumeReset into DownloadsService:

runDownload / downloadWithRetry: every download acquires a Semaphore(maxConcurrentDownloads) permit, then runs inside retry() with exponential-backoff + jitter. Permanent errors (4xx non-429) are not retried; transient ones (5xx, 429, network errors, IncompleteDownloadError) are.
active + reenqueued dedup sets: active prevents duplicate rows/writers for concurrent live drops; reenqueued gates the watcher away from stubs owned by startup re-enqueue, cleared only on successful resume.
resumeStaleDownloads: dedupes downloading rows by destPath, marks superseded rows failed, fires background runDownload calls before watcher.start().
index.ts startup ordering: correct — resumeStaleDownloads() before watcher when configured; reconcileStaleDownloads() fallback when not.

Confidence Score: 5/5

[Linus Torvalds Mode] Five stacked PRs to ship a download manager — I've seen glaciers move faster. That said, this final piece holds together: semaphore, retry, and dedup logic are correct, startup ordering is right, and the tests cover the real edge cases. Safe to merge.

The single finding is a P2 type annotation in a test helper. No functional bugs found. The active.add synchronicity guarantee, reenqueued lifecycle, semaphore fairness, retry policy, and startup ordering are all correct. 10-test suite validates every meaningful invariant.

Only downloads-service.test.ts needs the minor type fix on the downloadsConfig helper; service and startup wiring are solid.

Important Files Changed

Filename	Overview
apps/backend/src/modules/downloads/downloads.service.ts	Full rewrite wiring Semaphore concurrency, retry loop, active/reenqueued dedup sets, and resumeStaleDownloads; all edge cases handled correctly.
apps/backend/src/index.ts	Startup wiring correctly calls resumeStaleDownloads() before watcher.start(); fallback reconcileStaleDownloads() path is correct.
apps/backend/src/tests/downloads-service.test.ts	10 well-targeted tests covering full lifecycle, retry, concurrency cap, stale re-drive, dedupe, and reenqueue-claim release; one minor type issue in the downloadsConfig helper.

Sequence Diagram

sequenceDiagram
    participant idx as index.ts
    participant svc as DownloadsService
    participant sem as Semaphore
    participant repo as DownloadsRepository
    participant peer as PeerConnector

    idx->>svc: resumeStaleDownloads()
    svc->>repo: listStaleDownloads()
    repo-->>svc: stale DownloadRecord[]
    note over svc: Dedupe by destPath, mark superseded failed, add filenames to reenqueued
    svc-)svc: void runDownload(record) fire-and-forget
    svc-->>idx: resumed count

    idx->>svc: BlackholeWatcher triggers processTorrentFile
    note over svc: reenqueued.has? skip. active.has? skip.
    svc->>peer: getRelease(itemId)
    peer-->>svc: Release validated
    svc->>repo: create(downloadInput)
    repo-->>svc: DownloadRecord
    svc->>svc: runDownload(record)
    note over svc: active.add(destPath) synchronous
    svc->>sem: acquire()
    sem-->>svc: permit

    loop retry up to maxDownloadAttempts
        svc->>repo: incrementAttempts(id)
        svc->>peer: downloadFile(itemId, destPath, partPath, onProgress)
        peer->>svc: onProgress headers
        svc->>repo: setExpectedBytes
        peer->>svc: onProgress restart optional
        svc->>repo: markResumeReset
        peer->>svc: onProgress completed
        svc->>repo: markCompleted
        peer-->>svc: resolved or transient error
    end

    svc->>svc: unlink stub
    svc->>svc: triggerImport
    svc->>repo: markImportQueued
    svc->>svc: reenqueued.delete
    svc->>sem: release
    note over svc: active.delete in finally block

_{Reviews (1): Last reviewed commit: "fix: harden startup re-enqueue dedupe (r..." | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads), a retry loop (bounded backoff+jitter, attempts tracked, transient vs permanent classification, Retry-After honored), and resume: the restart progress event persists via markResumeReset, and an active/reenqueued dedupe prevents duplicate rows. On startup, index.ts re-drives stale downloading rows with resumeStaleDownloads() before the watcher scans, falling back to reconcileStaleDownloads() when downloads is unconfigured. Closes #13.

- Dedupe stale downloading rows by destPath before re-driving: only one row per destination is resumable (they share the same .part), so mark the superseded duplicates failed instead of letting the second silently early-return in runDownload and stay stuck in downloading. - Release the reenqueued claim on successful resume (stub already unlinked, so no scan race) so a later legitimate re-drop of the same torrent filename is not silently skipped for the rest of the process. Refs #13.

…4/5) (#19) * feat: track download attempts and expose stale rows for re-drive Add an attempts column to the downloads table (additive migration) and repository methods: incrementAttempts, markResumeReset (reset downloadedBytes and record the resume-from-zero transition), and listStaleDownloads (returns stale downloading rows without mutating them, for active startup re-enqueue). reconcileStaleDownloads is kept as the fallback for when downloads is unconfigured. Refs #13. * feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) (#20) * feat: bound, retry, and resume peer downloads end-to-end Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads), a retry loop (bounded backoff+jitter, attempts tracked, transient vs permanent classification, Retry-After honored), and resume: the restart progress event persists via markResumeReset, and an active/reenqueued dedupe prevents duplicate rows. On startup, index.ts re-drives stale downloading rows with resumeStaleDownloads() before the watcher scans, falling back to reconcileStaleDownloads() when downloads is unconfigured. Closes #13. * fix: harden startup re-enqueue dedupe (review feedback) - Dedupe stale downloading rows by destPath before re-driving: only one row per destination is resumable (they share the same .part), so mark the superseded duplicates failed instead of letting the second silently early-return in runDownload and stay stuck in downloading. - Release the reenqueued claim on successful resume (stub already unlinked, so no scan race) so a later legitimate re-drop of the same torrent filename is not silently skipped for the rest of the process. Refs #13.

…5) (#18) * feat: add retry, semaphore, and download concurrency/retry config Add a generic retry() helper (bounded attempts, exponential backoff with full jitter, optional Retry-After override, injectable sleep/random) and a download retry classifier (transient: network/timeout/5xx/429/incomplete stream; permanent: non-429 4xx and others). Add a FIFO async Semaphore. Extend DownloadsConfig with maxConcurrentDownloads and retry knobs (all defaulted so existing configs keep parsing). Primitives are wired into DownloadsService in a later change. Refs #13. * feat: track download attempts and expose stale rows for re-drive (#13 4/5) (#19) * feat: track download attempts and expose stale rows for re-drive Add an attempts column to the downloads table (additive migration) and repository methods: incrementAttempts, markResumeReset (reset downloadedBytes and record the resume-from-zero transition), and listStaleDownloads (returns stale downloading rows without mutating them, for active startup re-enqueue). reconcileStaleDownloads is kept as the fallback for when downloads is unconfigured. Refs #13. * feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) (#20) * feat: bound, retry, and resume peer downloads end-to-end Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads), a retry loop (bounded backoff+jitter, attempts tracked, transient vs permanent classification, Retry-After honored), and resume: the restart progress event persists via markResumeReset, and an active/reenqueued dedupe prevents duplicate rows. On startup, index.ts re-drives stale downloading rows with resumeStaleDownloads() before the watcher scans, falling back to reconcileStaleDownloads() when downloads is unconfigured. Closes #13. * fix: harden startup re-enqueue dedupe (review feedback) - Dedupe stale downloading rows by destPath before re-driving: only one row per destination is resumable (they share the same .part), so mark the superseded duplicates failed instead of letting the second silently early-return in runDownload and stay stuck in downloading. - Release the reenqueued claim on successful resume (stub already unlinked, so no scan race) so a later legitimate re-drop of the same torrent filename is not silently skipped for the rest of the process. Refs #13. * fix: address retry review feedback

* feat: resume interrupted peer downloads via HTTP Range downloadFile now detects an existing .part file and sends Range: bytes=<size>-, validating the peer's 206 + Content-Range against the persisted expected size before appending. On 200 (range ignored), a Content-Range mismatch, or 416 it discards the stale .part and restarts from byte 0, emitting a restart progress event. The write path uses a node:fs FileHandle (append/write) with datasync at checkpoints, and the .part is preserved on error so the next attempt can resume. A truncated stream throws a retryable IncompleteDownloadError. Refs #13. * feat: retry, semaphore, and download concurrency/retry config (#13 3/5) (#18) * feat: add retry, semaphore, and download concurrency/retry config Add a generic retry() helper (bounded attempts, exponential backoff with full jitter, optional Retry-After override, injectable sleep/random) and a download retry classifier (transient: network/timeout/5xx/429/incomplete stream; permanent: non-429 4xx and others). Add a FIFO async Semaphore. Extend DownloadsConfig with maxConcurrentDownloads and retry knobs (all defaulted so existing configs keep parsing). Primitives are wired into DownloadsService in a later change. Refs #13. * feat: track download attempts and expose stale rows for re-drive (#13 4/5) (#19) * feat: track download attempts and expose stale rows for re-drive Add an attempts column to the downloads table (additive migration) and repository methods: incrementAttempts, markResumeReset (reset downloadedBytes and record the resume-from-zero transition), and listStaleDownloads (returns stale downloading rows without mutating them, for active startup re-enqueue). reconcileStaleDownloads is kept as the fallback for when downloads is unconfigured. Refs #13. * feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) (#20) * feat: bound, retry, and resume peer downloads end-to-end Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads), a retry loop (bounded backoff+jitter, attempts tracked, transient vs permanent classification, Retry-After honored), and resume: the restart progress event persists via markResumeReset, and an active/reenqueued dedupe prevents duplicate rows. On startup, index.ts re-drives stale downloading rows with resumeStaleDownloads() before the watcher scans, falling back to reconcileStaleDownloads() when downloads is unconfigured. Closes #13. * fix: harden startup re-enqueue dedupe (review feedback) - Dedupe stale downloading rows by destPath before re-driving: only one row per destination is resumable (they share the same .part), so mark the superseded duplicates failed instead of letting the second silently early-return in runDownload and stay stuck in downloading. - Release the reenqueued claim on successful resume (stub already unlinked, so no scan race) so a later legitimate re-drop of the same torrent filename is not silently skipped for the rest of the process. Refs #13. * fix: address retry review feedback * fix: guard non-ok resume responses * fix: avoid leaked peer download reader lock * fix: close peer download handle on reader failure

roziscoding added 2 commits June 6, 2026 02:59

roziscoding mentioned this pull request Jun 6, 2026

feat: serve HTTP byte ranges on the peer file endpoint (#13 1/5) #16

Open

roziscoding marked this pull request as ready for review June 6, 2026 01:22

greptile-apps Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread apps/backend/src/__tests__/downloads-service.test.ts

roziscoding merged commit f441524 into feat/harden-peer-downloads/4-attempts-schema Jun 6, 2026
6 checks passed

roziscoding deleted the feat/harden-peer-downloads/5-service-wiring branch June 6, 2026 09:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: bound, retry, and resume peer downloads end-to-end (#13 5/5)#20

feat: bound, retry, and resume peer downloads end-to-end (#13 5/5)#20
roziscoding merged 2 commits into
feat/harden-peer-downloads/4-attempts-schemafrom
feat/harden-peer-downloads/5-service-wiring

roziscoding commented Jun 6, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

roziscoding commented Jun 6, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Review-feedback fix (commit 5d38820)

Files

Testing

Manual verification (post-merge)

Review focus

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

roziscoding commented Jun 6, 2026 •

edited by greptile-apps Bot

Loading

Review-feedback fix (commit `5d38820`)