Skip to content

feat: bound, retry, and resume peer downloads end-to-end (#13 5/5)#20

Merged
roziscoding merged 2 commits into
feat/harden-peer-downloads/4-attempts-schemafrom
feat/harden-peer-downloads/5-service-wiring
Jun 6, 2026
Merged

feat: bound, retry, and resume peer downloads end-to-end (#13 5/5)#20
roziscoding merged 2 commits into
feat/harden-peer-downloads/4-attempts-schemafrom
feat/harden-peer-downloads/5-service-wiring

Conversation

@roziscoding
Copy link
Copy Markdown
Owner

@roziscoding roziscoding commented Jun 6, 2026

Stack 5/5 for #13 (top). Base: feat/harden-peer-downloads/4-attempts-schema (#19). Integrates PRs 2–4.

What this PR does

Wires the primitives together in DownloadsService and rewires startup, completing the hardened flow.

  • runDownload() runs every download through a shared Semaphore(maxConcurrentDownloads) and a retry() loop: increments attempts per attempt, classifies transient vs permanent (PR 3), and persists the restart event via markResumeReset (PR 4).
  • An active set (keyed by destPath) and a reenqueued set (torrent filenames) dedupe so the startup re-enqueue and the watcher never double-process the same file.
  • resumeStaleDownloads() actively re-drives stale downloading rows from the DB (resuming from their .part) instead of failing them. index.ts calls it before the watcher scans; reconcileStaleDownloads() is the fallback only when downloads is unconfigured.
  • The path-traversal guard on the peer-controlled release.filename and the atomic-rename completion are preserved.

Review-feedback fix (commit 5d38820)

A code review caught two edge cases in the re-enqueue dedupe, both fixed with regression tests:

  • Stale rows sharing a destPath are deduped (they share one .part); superseded duplicates are marked failed instead of silently orphaning a downloading row.
  • The reenqueued claim is released on successful resume (after the stub is unlinked, so no scan race), so a later legitimate re-drop of the same filename isn't silently skipped.

Files

  • apps/backend/src/modules/downloads/downloads.service.ts (full rewrite)
  • apps/backend/src/index.ts (startup wiring)
  • apps/backend/src/__tests__/downloads-service.test.ts (full rewrite, 10 tests)

Testing

10 service tests: row lifecycle, no-row-on-metadata-fail, path traversal, permanent failure (no retry), transient retry-then-succeed, resume reset persisted, concurrency cap, stale re-drive, duplicate-destPath dedupe, re-enqueue-claim release. Full suite: 129/129 green, lint + typecheck clean.

Manual verification (post-merge)

Boot Jack with a leftover *.part + a stale downloading row and confirm it re-enqueues, issues a Range request, resumes, and renames into the completed folder.

Review focus

  • The semaphore/retry/resume wiring in runDownload/downloadWithRetry, the active/reenqueued dedupe lifecycle, and the index.ts startup ordering (resumeStaleDownloads() before watcher.start()).

Greptile Summary

[Linus Torvalds Mode] Oh look, someone finally decided to wire together all the primitives they've been lovingly building for the last four stacked PRs — if you're going to take five PRs to ship a download manager, at least make the fifth one actually work, which, grudgingly, this does.

This PR completes the hardened peer-download flow by integrating Semaphore, retry, and markResumeReset into DownloadsService:

  • runDownload / downloadWithRetry: every download acquires a Semaphore(maxConcurrentDownloads) permit, then runs inside retry() with exponential-backoff + jitter. Permanent errors (4xx non-429) are not retried; transient ones (5xx, 429, network errors, IncompleteDownloadError) are.
  • active + reenqueued dedup sets: active prevents duplicate rows/writers for concurrent live drops; reenqueued gates the watcher away from stubs owned by startup re-enqueue, cleared only on successful resume.
  • resumeStaleDownloads: dedupes downloading rows by destPath, marks superseded rows failed, fires background runDownload calls before watcher.start().
  • index.ts startup ordering: correct — resumeStaleDownloads() before watcher when configured; reconcileStaleDownloads() fallback when not.

Confidence Score: 5/5

[Linus Torvalds Mode] Five stacked PRs to ship a download manager — I've seen glaciers move faster. That said, this final piece holds together: semaphore, retry, and dedup logic are correct, startup ordering is right, and the tests cover the real edge cases. Safe to merge.

The single finding is a P2 type annotation in a test helper. No functional bugs found. The active.add synchronicity guarantee, reenqueued lifecycle, semaphore fairness, retry policy, and startup ordering are all correct. 10-test suite validates every meaningful invariant.

Only downloads-service.test.ts needs the minor type fix on the downloadsConfig helper; service and startup wiring are solid.

Important Files Changed

Filename Overview
apps/backend/src/modules/downloads/downloads.service.ts Full rewrite wiring Semaphore concurrency, retry loop, active/reenqueued dedup sets, and resumeStaleDownloads; all edge cases handled correctly.
apps/backend/src/index.ts Startup wiring correctly calls resumeStaleDownloads() before watcher.start(); fallback reconcileStaleDownloads() path is correct.
apps/backend/src/tests/downloads-service.test.ts 10 well-targeted tests covering full lifecycle, retry, concurrency cap, stale re-drive, dedupe, and reenqueue-claim release; one minor type issue in the downloadsConfig helper.

Sequence Diagram

sequenceDiagram
    participant idx as index.ts
    participant svc as DownloadsService
    participant sem as Semaphore
    participant repo as DownloadsRepository
    participant peer as PeerConnector

    idx->>svc: resumeStaleDownloads()
    svc->>repo: listStaleDownloads()
    repo-->>svc: stale DownloadRecord[]
    note over svc: Dedupe by destPath, mark superseded failed, add filenames to reenqueued
    svc-)svc: void runDownload(record) fire-and-forget
    svc-->>idx: resumed count

    idx->>svc: BlackholeWatcher triggers processTorrentFile
    note over svc: reenqueued.has? skip. active.has? skip.
    svc->>peer: getRelease(itemId)
    peer-->>svc: Release validated
    svc->>repo: create(downloadInput)
    repo-->>svc: DownloadRecord
    svc->>svc: runDownload(record)
    note over svc: active.add(destPath) synchronous
    svc->>sem: acquire()
    sem-->>svc: permit

    loop retry up to maxDownloadAttempts
        svc->>repo: incrementAttempts(id)
        svc->>peer: downloadFile(itemId, destPath, partPath, onProgress)
        peer->>svc: onProgress headers
        svc->>repo: setExpectedBytes
        peer->>svc: onProgress restart optional
        svc->>repo: markResumeReset
        peer->>svc: onProgress completed
        svc->>repo: markCompleted
        peer-->>svc: resolved or transient error
    end

    svc->>svc: unlink stub
    svc->>svc: triggerImport
    svc->>repo: markImportQueued
    svc->>svc: reenqueued.delete
    svc->>sem: release
    note over svc: active.delete in finally block
Loading

Fix All in Claude Code Fix All in Codex

Reviews (1): Last reviewed commit: "fix: harden startup re-enqueue dedupe (r..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads),
a retry loop (bounded backoff+jitter, attempts tracked, transient vs
permanent classification, Retry-After honored), and resume: the restart
progress event persists via markResumeReset, and an active/reenqueued
dedupe prevents duplicate rows. On startup, index.ts re-drives stale
downloading rows with resumeStaleDownloads() before the watcher scans,
falling back to reconcileStaleDownloads() when downloads is unconfigured.

Closes #13.
- Dedupe stale downloading rows by destPath before re-driving: only one
  row per destination is resumable (they share the same .part), so mark
  the superseded duplicates failed instead of letting the second silently
  early-return in runDownload and stay stuck in downloading.
- Release the reenqueued claim on successful resume (stub already
  unlinked, so no scan race) so a later legitimate re-drop of the same
  torrent filename is not silently skipped for the rest of the process.

Refs #13.
Comment thread apps/backend/src/__tests__/downloads-service.test.ts
@roziscoding roziscoding merged commit f441524 into feat/harden-peer-downloads/4-attempts-schema Jun 6, 2026
6 checks passed
@roziscoding roziscoding deleted the feat/harden-peer-downloads/5-service-wiring branch June 6, 2026 09:47
roziscoding added a commit that referenced this pull request Jun 6, 2026
…4/5) (#19)

* feat: track download attempts and expose stale rows for re-drive

Add an attempts column to the downloads table (additive migration) and
repository methods: incrementAttempts, markResumeReset (reset
downloadedBytes and record the resume-from-zero transition), and
listStaleDownloads (returns stale downloading rows without mutating
them, for active startup re-enqueue). reconcileStaleDownloads is kept
as the fallback for when downloads is unconfigured.

Refs #13.

* feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) (#20)

* feat: bound, retry, and resume peer downloads end-to-end

Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads),
a retry loop (bounded backoff+jitter, attempts tracked, transient vs
permanent classification, Retry-After honored), and resume: the restart
progress event persists via markResumeReset, and an active/reenqueued
dedupe prevents duplicate rows. On startup, index.ts re-drives stale
downloading rows with resumeStaleDownloads() before the watcher scans,
falling back to reconcileStaleDownloads() when downloads is unconfigured.

Closes #13.

* fix: harden startup re-enqueue dedupe (review feedback)

- Dedupe stale downloading rows by destPath before re-driving: only one
  row per destination is resumable (they share the same .part), so mark
  the superseded duplicates failed instead of letting the second silently
  early-return in runDownload and stay stuck in downloading.
- Release the reenqueued claim on successful resume (stub already
  unlinked, so no scan race) so a later legitimate re-drop of the same
  torrent filename is not silently skipped for the rest of the process.

Refs #13.
roziscoding added a commit that referenced this pull request Jun 6, 2026
…5) (#18)

* feat: add retry, semaphore, and download concurrency/retry config

Add a generic retry() helper (bounded attempts, exponential backoff with
full jitter, optional Retry-After override, injectable sleep/random) and
a download retry classifier (transient: network/timeout/5xx/429/incomplete
stream; permanent: non-429 4xx and others). Add a FIFO async Semaphore.
Extend DownloadsConfig with maxConcurrentDownloads and retry knobs (all
defaulted so existing configs keep parsing). Primitives are wired into
DownloadsService in a later change.

Refs #13.

* feat: track download attempts and expose stale rows for re-drive (#13 4/5) (#19)

* feat: track download attempts and expose stale rows for re-drive

Add an attempts column to the downloads table (additive migration) and
repository methods: incrementAttempts, markResumeReset (reset
downloadedBytes and record the resume-from-zero transition), and
listStaleDownloads (returns stale downloading rows without mutating
them, for active startup re-enqueue). reconcileStaleDownloads is kept
as the fallback for when downloads is unconfigured.

Refs #13.

* feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) (#20)

* feat: bound, retry, and resume peer downloads end-to-end

Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads),
a retry loop (bounded backoff+jitter, attempts tracked, transient vs
permanent classification, Retry-After honored), and resume: the restart
progress event persists via markResumeReset, and an active/reenqueued
dedupe prevents duplicate rows. On startup, index.ts re-drives stale
downloading rows with resumeStaleDownloads() before the watcher scans,
falling back to reconcileStaleDownloads() when downloads is unconfigured.

Closes #13.

* fix: harden startup re-enqueue dedupe (review feedback)

- Dedupe stale downloading rows by destPath before re-driving: only one
  row per destination is resumable (they share the same .part), so mark
  the superseded duplicates failed instead of letting the second silently
  early-return in runDownload and stay stuck in downloading.
- Release the reenqueued claim on successful resume (stub already
  unlinked, so no scan race) so a later legitimate re-drop of the same
  torrent filename is not silently skipped for the rest of the process.

Refs #13.

* fix: address retry review feedback
roziscoding added a commit that referenced this pull request Jun 6, 2026
* feat: resume interrupted peer downloads via HTTP Range

downloadFile now detects an existing .part file and sends
Range: bytes=<size>-, validating the peer's 206 + Content-Range against
the persisted expected size before appending. On 200 (range ignored), a
Content-Range mismatch, or 416 it discards the stale .part and restarts
from byte 0, emitting a restart progress event. The write path uses a
node:fs FileHandle (append/write) with datasync at checkpoints, and the
.part is preserved on error so the next attempt can resume. A truncated
stream throws a retryable IncompleteDownloadError.

Refs #13.

* feat: retry, semaphore, and download concurrency/retry config (#13 3/5) (#18)

* feat: add retry, semaphore, and download concurrency/retry config

Add a generic retry() helper (bounded attempts, exponential backoff with
full jitter, optional Retry-After override, injectable sleep/random) and
a download retry classifier (transient: network/timeout/5xx/429/incomplete
stream; permanent: non-429 4xx and others). Add a FIFO async Semaphore.
Extend DownloadsConfig with maxConcurrentDownloads and retry knobs (all
defaulted so existing configs keep parsing). Primitives are wired into
DownloadsService in a later change.

Refs #13.

* feat: track download attempts and expose stale rows for re-drive (#13 4/5) (#19)

* feat: track download attempts and expose stale rows for re-drive

Add an attempts column to the downloads table (additive migration) and
repository methods: incrementAttempts, markResumeReset (reset
downloadedBytes and record the resume-from-zero transition), and
listStaleDownloads (returns stale downloading rows without mutating
them, for active startup re-enqueue). reconcileStaleDownloads is kept
as the fallback for when downloads is unconfigured.

Refs #13.

* feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) (#20)

* feat: bound, retry, and resume peer downloads end-to-end

Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads),
a retry loop (bounded backoff+jitter, attempts tracked, transient vs
permanent classification, Retry-After honored), and resume: the restart
progress event persists via markResumeReset, and an active/reenqueued
dedupe prevents duplicate rows. On startup, index.ts re-drives stale
downloading rows with resumeStaleDownloads() before the watcher scans,
falling back to reconcileStaleDownloads() when downloads is unconfigured.

Closes #13.

* fix: harden startup re-enqueue dedupe (review feedback)

- Dedupe stale downloading rows by destPath before re-driving: only one
  row per destination is resumable (they share the same .part), so mark
  the superseded duplicates failed instead of letting the second silently
  early-return in runDownload and stay stuck in downloading.
- Release the reenqueued claim on successful resume (stub already
  unlinked, so no scan race) so a later legitimate re-drop of the same
  torrent filename is not silently skipped for the rest of the process.

Refs #13.

* fix: address retry review feedback

* fix: guard non-ok resume responses

* fix: avoid leaked peer download reader lock

* fix: close peer download handle on reader failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant