Skip to content

feat: retry, semaphore, and download concurrency/retry config (#13 3/5)#18

Merged
roziscoding merged 3 commits into
feat/harden-peer-downloads/2-client-resumefrom
feat/harden-peer-downloads/3-retry-concurrency
Jun 6, 2026
Merged

feat: retry, semaphore, and download concurrency/retry config (#13 3/5)#18
roziscoding merged 3 commits into
feat/harden-peer-downloads/2-client-resumefrom
feat/harden-peer-downloads/3-retry-concurrency

Conversation

@roziscoding
Copy link
Copy Markdown
Owner

@roziscoding roziscoding commented Jun 6, 2026

Stack 3/5 for #13. Base: feat/harden-peer-downloads/2-client-resume (#17).

What this PR does

Adds the two cross-cutting primitives the hardened download flow needs, plus their config knobs. These are standalone and unit-tested here; they're wired into DownloadsService in PR 5.

  • retry() (src/lib/retry.ts) — bounded attempts, exponential backoff with full jitter, optional Retry-After override (capped at maxDelayMs), and injectable sleep/random for deterministic tests.
  • isTransientDownloadError / downloadRetryAfterMs (src/modules/downloads/retry-policy.ts) — transient (retry): network failures, timeouts, HTTP 5xx, 429, and IncompleteDownloadError; permanent (no retry): non-429 4xx and everything else. Parses Retry-After (seconds or HTTP-date).
  • Semaphore (src/lib/semaphore.ts) — FIFO async counting semaphore with direct permit handoff; throws on a non-positive permit count.
  • DownloadsConfig gains maxConcurrentDownloads (default 3), maxDownloadAttempts (5), retryBaseDelayMs (1000), retryMaxDelayMs (60000) — all defaulted so existing configs keep parsing. examples/config.jsonc documents them.

Files

  • apps/backend/src/lib/retry.ts, apps/backend/src/lib/semaphore.ts (new)
  • apps/backend/src/modules/downloads/retry-policy.ts (new)
  • apps/backend/src/lib/config.ts, examples/config.jsonc
  • apps/backend/src/__tests__/retry.test.ts, semaphore.test.ts (new); config.test.ts (+2)
  • apps/backend/src/__tests__/{downloads-api,integration}.test.ts — config literals switched to AppConfig.parse({...}) because the new defaulted fields are required on the output type.

Testing

12 retry/classifier tests, 4 semaphore tests, 2 config tests. Full suite green.

Review focus

  • Backoff/jitter math and the Retry-After cap; the transient-vs-permanent classification boundaries.
  • Semaphore FIFO fairness and permit release on throw (no permit leak).

Greptile Summary

This PR introduces two cross-cutting primitives — a bounded-retry helper with exponential backoff/full-jitter and a FIFO counting semaphore — plus a download retry classification policy and four new DownloadsConfig knobs. DownloadsService is refactored to wire these together: concurrent downloads are now capped by a semaphore, transient failures are retried with Retry-After support, and stale rows from prior runs are actively re-driven instead of bulk-failed.

  • retry.ts / semaphore.ts: New standalone primitives with injected sleep/random for deterministic testing; retry validates maxAttempts ≥ 1 eagerly and uses AWS full-jitter backoff; Semaphore hands permits directly to FIFO waiters and releases correctly on throw.
  • retry-policy.ts: isTransientDownloadError correctly classifies 5xx/429/network TypeError/TimeoutError/IncompleteDownloadError as transient and manual AbortError as permanent; downloadRetryAfterMs parses both seconds and HTTP-date Retry-After headers.
  • downloads.service.ts: Adds resumeStaleDownloads() with destPath deduplication, reenqueued guard to block the watcher from duplicating resuming stubs, and retry-wrapped downloadWithRetry; the active Set check and add happen with no await in between, so there is no TOCTOU race in the duplicate-destPath guard.

Confidence Score: 5/5

Safe to merge. The new primitives are standalone, well-tested, and the service wiring is correct.

The retry helper, semaphore, and retry policy are all independently unit-tested with injected dependencies. AbortError is correctly classified as permanent. The FIFO semaphore releases permits correctly on throw. The active/reenqueued guards in DownloadsService are set synchronously before any await, preventing TOCTOU races. All issues raised in prior review rounds have been addressed.

No files require special attention.

Important Files Changed

Filename Overview
apps/backend/src/lib/retry.ts New retry primitive: eager maxAttempts validation, infinite loop with direct throw on exhaustion (no dead-code lastError), full AWS jitter, Retry-After cap at maxDelayMs, injectable sleep/random. Logic is correct.
apps/backend/src/lib/semaphore.ts FIFO counting semaphore with direct permit handoff; run() uses try/finally for guaranteed release on throw; constructor rejects non-positive permits. Correct.
apps/backend/src/modules/downloads/retry-policy.ts Correct transient/permanent classification: AbortError is permanent, TimeoutError is transient; Retry-After parsed as seconds then HTTP-date with Math.max(0,...) floor. No issues.
apps/backend/src/modules/downloads/downloads.service.ts Refactored to use semaphore + retry; reenqueued/active sets correctly guard against duplicates; resumeStaleDownloads deduplicates by destPath and claims stubs synchronously before the watcher starts. triggerImport never throws (internal try/catch), so reenqueued.delete always runs on success.
apps/backend/src/lib/config.ts Four new DownloadsConfig fields with defaults (maxConcurrentDownloads: min 1, retryBaseDelayMs/retryMaxDelayMs: min 0 for test support). Existing configs continue to parse.
apps/backend/src/tests/retry.test.ts Comprehensive: covers first-success, retryable failure, permanent failure, exhausted attempts, retryAfterMs override, invalid maxAttempts, error classification, and both Retry-After parse paths with mocked Date.now.
apps/backend/src/tests/semaphore.test.ts Four tests cover: concurrency limit enforcement, FIFO ordering, permit release on throw, and constructor validation.
apps/backend/src/tests/downloads-service.test.ts Extended with new tests for permanent-error no-retry, transient-error retry-then-succeed, concurrency limiting, stale resume, destPath deduplication, and reenqueued claim lifecycle.
apps/backend/src/modules/downloads/downloads.repository.ts Adds incrementAttempts (atomic SQL increment), markResumeReset, and listStaleDownloads alongside the existing reconcileStaleDownloads. Clean additions.
apps/backend/drizzle/0001_tearful_the_fallen.sql Adds attempts INTEGER DEFAULT 0 NOT NULL column to downloads table. Safe additive migration.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[processTorrentFile] --> B{reenqueued?}
    B -- yes --> Z1[skip]
    B -- no --> C{file exists?}
    C -- no --> Z2[skip]
    C -- yes --> D[parse stub]
    D --> E{active.has destPath?}
    E -- yes --> Z3[skip duplicate]
    E -- no --> F[create DB row]
    F --> G[runDownload]
    R[resumeStaleDownloads] --> S[listStaleDownloads]
    S --> T[dedupe by destPath]
    T --> U[claim reenqueued set]
    U --> G
    G --> H{active.has destPath?}
    H -- yes --> Z4[return]
    H -- no --> I[active.add]
    I --> J[semaphore.run downloadWithRetry]
    J --> K{peer found?}
    K -- no --> Z5[markFailed]
    K -- yes --> L[retry: incrementAttempts + downloadFile]
    L --> M{success?}
    M -- yes --> N[unlink + triggerImport + markImportQueued + reenqueued.delete]
    M -- transient --> O[sleep jitter or Retry-After]
    O --> L
    M -- permanent --> P[markFailed]
    J --> Q[finally: active.delete]
Loading

Reviews (3): Last reviewed commit: "fix: address retry review feedback" | Re-trigger Greptile

Add a generic retry() helper (bounded attempts, exponential backoff with
full jitter, optional Retry-After override, injectable sleep/random) and
a download retry classifier (transient: network/timeout/5xx/429/incomplete
stream; permanent: non-429 4xx and others). Add a FIFO async Semaphore.
Extend DownloadsConfig with maxConcurrentDownloads and retry knobs (all
defaulted so existing configs keep parsing). Primitives are wired into
DownloadsService in a later change.

Refs #13.
Comment thread apps/backend/src/modules/downloads/retry-policy.ts Outdated
Comment thread apps/backend/src/lib/retry.ts Outdated
Comment thread apps/backend/src/__tests__/retry.test.ts
roziscoding and others added 2 commits June 6, 2026 11:48
…4/5) (#19)

* feat: track download attempts and expose stale rows for re-drive

Add an attempts column to the downloads table (additive migration) and
repository methods: incrementAttempts, markResumeReset (reset
downloadedBytes and record the resume-from-zero transition), and
listStaleDownloads (returns stale downloading rows without mutating
them, for active startup re-enqueue). reconcileStaleDownloads is kept
as the fallback for when downloads is unconfigured.

Refs #13.

* feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) (#20)

* feat: bound, retry, and resume peer downloads end-to-end

Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads),
a retry loop (bounded backoff+jitter, attempts tracked, transient vs
permanent classification, Retry-After honored), and resume: the restart
progress event persists via markResumeReset, and an active/reenqueued
dedupe prevents duplicate rows. On startup, index.ts re-drives stale
downloading rows with resumeStaleDownloads() before the watcher scans,
falling back to reconcileStaleDownloads() when downloads is unconfigured.

Closes #13.

* fix: harden startup re-enqueue dedupe (review feedback)

- Dedupe stale downloading rows by destPath before re-driving: only one
  row per destination is resumable (they share the same .part), so mark
  the superseded duplicates failed instead of letting the second silently
  early-return in runDownload and stay stuck in downloading.
- Release the reenqueued claim on successful resume (stub already
  unlinked, so no scan race) so a later legitimate re-drop of the same
  torrent filename is not silently skipped for the rest of the process.

Refs #13.
@roziscoding roziscoding merged commit ce79179 into feat/harden-peer-downloads/2-client-resume Jun 6, 2026
6 checks passed
@roziscoding roziscoding deleted the feat/harden-peer-downloads/3-retry-concurrency branch June 6, 2026 10:29
roziscoding added a commit that referenced this pull request Jun 6, 2026
* feat: resume interrupted peer downloads via HTTP Range

downloadFile now detects an existing .part file and sends
Range: bytes=<size>-, validating the peer's 206 + Content-Range against
the persisted expected size before appending. On 200 (range ignored), a
Content-Range mismatch, or 416 it discards the stale .part and restarts
from byte 0, emitting a restart progress event. The write path uses a
node:fs FileHandle (append/write) with datasync at checkpoints, and the
.part is preserved on error so the next attempt can resume. A truncated
stream throws a retryable IncompleteDownloadError.

Refs #13.

* feat: retry, semaphore, and download concurrency/retry config (#13 3/5) (#18)

* feat: add retry, semaphore, and download concurrency/retry config

Add a generic retry() helper (bounded attempts, exponential backoff with
full jitter, optional Retry-After override, injectable sleep/random) and
a download retry classifier (transient: network/timeout/5xx/429/incomplete
stream; permanent: non-429 4xx and others). Add a FIFO async Semaphore.
Extend DownloadsConfig with maxConcurrentDownloads and retry knobs (all
defaulted so existing configs keep parsing). Primitives are wired into
DownloadsService in a later change.

Refs #13.

* feat: track download attempts and expose stale rows for re-drive (#13 4/5) (#19)

* feat: track download attempts and expose stale rows for re-drive

Add an attempts column to the downloads table (additive migration) and
repository methods: incrementAttempts, markResumeReset (reset
downloadedBytes and record the resume-from-zero transition), and
listStaleDownloads (returns stale downloading rows without mutating
them, for active startup re-enqueue). reconcileStaleDownloads is kept
as the fallback for when downloads is unconfigured.

Refs #13.

* feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) (#20)

* feat: bound, retry, and resume peer downloads end-to-end

Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads),
a retry loop (bounded backoff+jitter, attempts tracked, transient vs
permanent classification, Retry-After honored), and resume: the restart
progress event persists via markResumeReset, and an active/reenqueued
dedupe prevents duplicate rows. On startup, index.ts re-drives stale
downloading rows with resumeStaleDownloads() before the watcher scans,
falling back to reconcileStaleDownloads() when downloads is unconfigured.

Closes #13.

* fix: harden startup re-enqueue dedupe (review feedback)

- Dedupe stale downloading rows by destPath before re-driving: only one
  row per destination is resumable (they share the same .part), so mark
  the superseded duplicates failed instead of letting the second silently
  early-return in runDownload and stay stuck in downloading.
- Release the reenqueued claim on successful resume (stub already
  unlinked, so no scan race) so a later legitimate re-drop of the same
  torrent filename is not silently skipped for the rest of the process.

Refs #13.

* fix: address retry review feedback

* fix: guard non-ok resume responses

* fix: avoid leaked peer download reader lock

* fix: close peer download handle on reader failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant