Skip to content

feat: resume interrupted peer downloads via HTTP Range (#13 2/5)#17

Merged
roziscoding merged 5 commits into
feat/harden-peer-downloads/1-range-servingfrom
feat/harden-peer-downloads/2-client-resume
Jun 6, 2026
Merged

feat: resume interrupted peer downloads via HTTP Range (#13 2/5)#17
roziscoding merged 5 commits into
feat/harden-peer-downloads/1-range-servingfrom
feat/harden-peer-downloads/2-client-resume

Conversation

@roziscoding
Copy link
Copy Markdown
Owner

@roziscoding roziscoding commented Jun 6, 2026

Stack 2/5 for #13. Base: feat/harden-peer-downloads/1-range-serving (#16). Consumes the range-serving from PR 1.

What this PR does

Rewrites PeerConnector.downloadFile() so an interrupted download resumes from its existing .part file instead of restarting from byte 0.

  • Detects an existing .part, sends Range: bytes=<existingSize>-, and validates the peer's 206 + Content-Range (start matches the offset; total matches the persisted expected size) before appending.
  • Restarts cleanly from byte 0 — discarding the stale .part and emitting a restart progress event — when the peer ignores the range (200), returns a mismatched Content-Range total, or returns 416.
  • Switches the write path from Bun.file().writer() (which truncates) to a node:fs FileHandle opened in append/write mode, with datasync() at progress checkpoints for durable resume bytes.
  • Preserves the .part on error (no longer unlinks it) so the next attempt can resume; the atomic rename to the final path is retained, so a failed download never lands in the completed folder.
  • A truncated stream now throws a typed IncompleteDownloadError (classified retryable in PR 3). A 206 to a non-range (fresh) request is rejected so a partial body is never renamed as the whole file.
  • The 100 GB guard remains effective on the resume path (downloadedBytes is seeded from the local .part size).

Files

  • apps/backend/src/lib/servers/peer.ts
  • apps/backend/src/lib/errors/IncompleteDownloadError.ts (new)
  • apps/backend/src/modules/downloads/downloads.service.ts (temporary restart no-op branch to keep typecheck green; superseded in PR 5)
  • apps/backend/src/__tests__/peer-download.test.ts

Testing

6 new resume tests (append on 206, restart on 200/mismatch/416, .part preserved on incomplete stream, 206-for-fresh rejected) plus the 5 pre-existing download tests still pass.

Review focus

  • The 206/200/416/mismatch control flow and the resuming decision.
  • FileHandle append-vs-write selection, datasync cadence (checkpoint-only, not per-chunk), and reader/handle cleanup on all paths.
  • Note: await response.body?.cancel() is fired without awaiting on the restart path to avoid an already-closed-stream hang; correctness comes from the subsequent unlink + fresh fetch.

Greptile Summary

This PR rewrites PeerConnector.downloadFile() to resume interrupted downloads from an existing .part file via Range requests, rather than always restarting from byte 0. It also adds typed IncompleteDownloadError, retry integration, stale-download re-drive, and a suite of 6 new resume tests covering all the key HTTP response branches.

  • Resume logic: Reads .part size → sends Range: bytes=N- → validates 206/Content-Range start and total before appending; falls back to a fresh download on 200 (range ignored), Content-Range total mismatch, or 416; rejects non-ok non-416 responses immediately with FetchError.
  • Write path: Switched from Bun.file().writer() (truncates) to node:fs FileHandle opened in 'a' (append) or 'w' (fresh), with datasync() at progress checkpoints; open() is acquired before getReader() so a failed open() never leaks a reader lock.
  • Durability & atomics: .part is preserved on error (enables resume on retry); rename() to destPath is only reached after a complete byte-count match, so a partial stream never lands in the completed folder.

Confidence Score: 5/5

The PR is safe to merge. The resume control flow is well-guarded, the two issues flagged in the previous review round are both fixed, and every key branch is covered by a corresponding test.

The 206/200/416/non-ok branching is correct and exhaustive: non-ok resumes throw FetchError before any file I/O, the fresh-200 restart discards and re-downloads cleanly, and the 206-for-non-range request is explicitly rejected. FileHandle acquisition now precedes getReader() so a failed open cannot strand a locked reader. The .part is preserved on all error paths, and the atomic rename is only reached after a full byte-count match.

apps/backend/src/lib/servers/peer.ts — the expectedBytesSource label on the resume path and the schema check constraint (snapshot 0001) that would need updating if that label is ever corrected.

Important Files Changed

Filename Overview
apps/backend/src/lib/servers/peer.ts Core resume logic rewrite: correctly handles 206/200/416/non-ok branches, acquires FileHandle before reader to prevent lock leaks, and preserves .part on error.
apps/backend/src/lib/errors/IncompleteDownloadError.ts New typed error class extending AppError; wires into retry-policy as a retryable error, enabling clean resume-on-retry semantics for truncated streams.
apps/backend/src/modules/downloads/downloads.service.ts Adds retry loop with semaphore, stale-download re-drive via resumeStaleDownloads, and restart progress event handling.
apps/backend/src/tests/peer-download.test.ts Comprehensive test coverage for all resume branches: append on 206, restart on 200/mismatch/416, non-ok resume rejected, .part preserved on incomplete stream, and 206-for-fresh rejected.
apps/backend/src/modules/downloads/retry-policy.ts Added IncompleteDownloadError as a transient/retryable error; other classification logic is unchanged and correct.
apps/backend/drizzle/0001_tearful_the_fallen.sql Adds the attempts column (integer, NOT NULL, DEFAULT 0) to the downloads table; straightforward additive migration.

Comments Outside Diff (1)

  1. apps/backend/src/lib/servers/peer.ts, line 258-273 (link)

    P2 expectedBytesSource mislabelled on resume

    Congratulations, you've invented content_length as a synonym for content_range. This is the kind of creative redefinition that makes observability dashboards completely useless.

    On the resume path, expectedBytes is parsed from Content-Range, not Content-Length. Yet expectedBytesSource is hardcoded to 'content_length' in both the span attribute (line 261) and the onProgress headers event (line 272). A caller that later adds a 'content_range' source variant will silently get the wrong value today.

    Consider either extending the source type to include 'content_range' and branching on resuming, or at minimum documenting that the field is intentionally coarse for now.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

    Fix in Claude Code Fix in Codex

Reviews (4): Last reviewed commit: "fix: close peer download handle on reade..." | Re-trigger Greptile

downloadFile now detects an existing .part file and sends
Range: bytes=<size>-, validating the peer's 206 + Content-Range against
the persisted expected size before appending. On 200 (range ignored), a
Content-Range mismatch, or 416 it discards the stale .part and restarts
from byte 0, emitting a restart progress event. The write path uses a
node:fs FileHandle (append/write) with datasync at checkpoints, and the
.part is preserved on error so the next attempt can resume. A truncated
stream throws a retryable IncompleteDownloadError.

Refs #13.
Comment thread apps/backend/src/lib/servers/peer.ts
roziscoding and others added 2 commits June 6, 2026 12:29
…5) (#18)

* feat: add retry, semaphore, and download concurrency/retry config

Add a generic retry() helper (bounded attempts, exponential backoff with
full jitter, optional Retry-After override, injectable sleep/random) and
a download retry classifier (transient: network/timeout/5xx/429/incomplete
stream; permanent: non-429 4xx and others). Add a FIFO async Semaphore.
Extend DownloadsConfig with maxConcurrentDownloads and retry knobs (all
defaulted so existing configs keep parsing). Primitives are wired into
DownloadsService in a later change.

Refs #13.

* feat: track download attempts and expose stale rows for re-drive (#13 4/5) (#19)

* feat: track download attempts and expose stale rows for re-drive

Add an attempts column to the downloads table (additive migration) and
repository methods: incrementAttempts, markResumeReset (reset
downloadedBytes and record the resume-from-zero transition), and
listStaleDownloads (returns stale downloading rows without mutating
them, for active startup re-enqueue). reconcileStaleDownloads is kept
as the fallback for when downloads is unconfigured.

Refs #13.

* feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) (#20)

* feat: bound, retry, and resume peer downloads end-to-end

Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads),
a retry loop (bounded backoff+jitter, attempts tracked, transient vs
permanent classification, Retry-After honored), and resume: the restart
progress event persists via markResumeReset, and an active/reenqueued
dedupe prevents duplicate rows. On startup, index.ts re-drives stale
downloading rows with resumeStaleDownloads() before the watcher scans,
falling back to reconcileStaleDownloads() when downloads is unconfigured.

Closes #13.

* fix: harden startup re-enqueue dedupe (review feedback)

- Dedupe stale downloading rows by destPath before re-driving: only one
  row per destination is resumable (they share the same .part), so mark
  the superseded duplicates failed instead of letting the second silently
  early-return in runDownload and stay stuck in downloading.
- Release the reenqueued claim on successful resume (stub already
  unlinked, so no scan race) so a later legitimate re-drop of the same
  torrent filename is not silently skipped for the rest of the process.

Refs #13.

* fix: address retry review feedback
Comment thread apps/backend/src/lib/servers/peer.ts Outdated
@roziscoding roziscoding merged commit 4a3cb0f into feat/harden-peer-downloads/1-range-serving Jun 6, 2026
6 checks passed
@roziscoding roziscoding deleted the feat/harden-peer-downloads/2-client-resume branch June 6, 2026 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant