Skip to content

feat: track download attempts and expose stale rows for re-drive (#13 4/5)#19

Merged
roziscoding merged 2 commits into
feat/harden-peer-downloads/3-retry-concurrencyfrom
feat/harden-peer-downloads/4-attempts-schema
Jun 6, 2026
Merged

feat: track download attempts and expose stale rows for re-drive (#13 4/5)#19
roziscoding merged 2 commits into
feat/harden-peer-downloads/3-retry-concurrencyfrom
feat/harden-peer-downloads/4-attempts-schema

Conversation

@roziscoding
Copy link
Copy Markdown
Owner

@roziscoding roziscoding commented Jun 6, 2026

Stack 4/5 for #13. Base: feat/harden-peer-downloads/3-retry-concurrency (#18).

What this PR does

Adds the persistence the startup re-enqueue and retry accounting need (consumed in PR 5). No call sites change here, so it ships independently with green tests.

  • Adds an attempts column to the downloads table (additive Drizzle migration 0001_tearful_the_fallen.sql: ALTER TABLE downloads ADD attempts integer DEFAULT 0 NOT NULL). Migrations auto-apply at DB open, so the test suite exercises it on fresh DBs.
  • DownloadsRepository gains:
    • incrementAttempts(id) — atomic attempts = attempts + 1, returns the new count.
    • markResumeReset(id) — resets downloadedBytes to 0 and records the resume-from-zero transition (status stays downloading).
    • listStaleDownloads() — returns stale downloading rows without mutating them, for active re-drive.
  • reconcileStaleDownloads() is intentionally retained as the fallback for when downloads is unconfigured.

Files

  • apps/backend/src/database/schema.ts
  • apps/backend/drizzle/0001_tearful_the_fallen.sql + drizzle/meta/* (generated)
  • apps/backend/src/modules/downloads/downloads.repository.ts
  • apps/backend/src/__tests__/database.test.ts (+2)

Testing

2 new repository tests (attempts increment + resume reset; stale listing without mutation). Full suite green.

Review focus

  • Migration is additive with a default (safe for existing rows).
  • incrementAttempts uses a parameterized Drizzle sql\${downloads.attempts} + 1`column reference (no injection);listStaleDownloads` is read-only.

Greptile Summary

[Linus Torvalds Mode]

Congratulations, you wrote a stack-PR that is actually coherent — I almost had to check if someone else wrote it. Fine, it moves, it doesn't explode, let's talk about what it actually does.

This PR (4/5 of the harden-peer-downloads stack) adds the persistence layer that the upcoming retry-and-re-enqueue machinery in PR 5 will consume. The core deliverables are:

  • Schema/migration: Additive ALTER TABLE downloads ADD attempts INTEGER DEFAULT 0 NOT NULL — safe for existing rows, no drama.
  • incrementAttempts(id): Atomic attempts = attempts + 1 via a parameterized Drizzle sql expression; returns the new count. Called inside the retry loop so each attempt is tracked.
  • markResumeReset(id): Resets downloadedBytes to 0 and records the resume-from-zero transition in error; status stays downloading, cleared on eventual markCompleted.
  • listStaleDownloads(): Read-only WHERE status = 'downloading' — consumed by the new resumeStaleDownloads() in the service.
  • resumeStaleDownloads() in DownloadsService: Deduplicates by destPath (marking superseded rows failed), claims all resumable stubs in reenqueued synchronously (before the watcher starts), then fires re-drives in the background behind the semaphore. reenqueued is only cleared on success so a crashed re-drive keeps its claim for the next restart.
  • index.ts: reconcileStaleDownloads (mark-all-failed fallback) moved to the else branch when config.downloads is absent; resumeStaleDownloads runs when it is present.
  • Tests: 2 new repository unit tests + 6 new service integration tests covering retry, resume-reset, concurrency cap, stale re-drive, duplicate-destPath deduplication, and claim-release-on-success.

The two previously-flagged concerns on listStaleDownloads (no time-based predicate) and incrementAttempts (silent 0 on missing ID) remain open — go read those threads again if you haven't.

Confidence Score: 5/5

[Linus Torvalds Mode] Two open prior-thread concerns exist and haven't been resolved, which is annoying, but they don't block this additive PR from landing safely — no new P0/P1 issues found, migration is safe, tests pass.

I'm not handing out 5s like candy, but this one earns it. The prior thread issues (silent-zero return from incrementAttempts on a missing ID, and listStaleDownloads having no time-based staleness predicate) are already tracked and the author clearly knows about them. Within the scope of this PR: the migration is additive with a default, the reenqueued guard correctly prevents watcher duplication, the id: -1 fallback is unreachable when repo is defined, and the fire-and-forget re-drive pattern is both intentional and properly handled. No new correctness or data-integrity issues discovered.

apps/backend/src/modules/downloads/downloads.repository.ts — the two already-flagged issues live here; if you're feeling brave, fix them before PR 5 makes callers depend on the lying return value.

Important Files Changed

Filename Overview
apps/backend/drizzle/0001_tearful_the_fallen.sql Additive migration: ALTER TABLE adds attempts INTEGER DEFAULT 0 NOT NULL — safe for existing rows, no breakpoints needed.
apps/backend/src/database/schema.ts Adds attempts integer column with notNull().default(0) — matches the migration exactly.
apps/backend/src/modules/downloads/downloads.repository.ts Adds incrementAttempts, markResumeReset, and listStaleDownloads; incrementAttempts returns 0 silently on nonexistent ID (previously flagged); listStaleDownloads has no time-based staleness predicate (previously flagged).
apps/backend/src/modules/downloads/downloads.service.ts Substantial refactor: adds Semaphore, retry loop, resumeStaleDownloads, runDownload, downloadWithRetry; fire-and-forget re-drive pattern is intentional and guarded by the reenqueued set; id: -1 fallback is safe because repo is undefined whenever created is undefined.
apps/backend/src/index.ts Moves reconcileStaleDownloads into the else branch (no downloads config) and adds resumeStaleDownloads when config is present — correct conditional logic.
apps/backend/src/tests/database.test.ts Two new tests cover incrementAttempts/markResumeReset round-trip and listStaleDownloads non-mutation — both thorough and correct.
apps/backend/src/tests/downloads-service.test.ts Large expansion: adds tests for retry, resume-reset, concurrency limit, stale re-drive, duplicate destPath deduplication, and re-enqueue claim release — good coverage of the new service paths.

Sequence Diagram

sequenceDiagram
    participant idx as index.ts
    participant svc as DownloadsService
    participant repo as DownloadsRepository
    participant db as SQLite DB
    participant peer as PeerConnector

    idx->>svc: resumeStaleDownloads()
    svc->>repo: listStaleDownloads()
    repo->>db: "SELECT WHERE status='downloading'"
    db-->>repo: stale rows
    repo-->>svc: DownloadRecord[]
    svc->>repo: markFailed(id, 'superseded') [duplicates]
    svc->>svc: reenqueued.add(torrentFilename)
    svc-->>idx: resumable.length (fire-and-forget started)

    idx->>idx: BlackholeWatcher.start()

    par background re-drives
        svc->>svc: runDownload(record)
        svc->>svc: semaphore.run(...)
        svc->>repo: incrementAttempts(id)
        repo->>db: "UPDATE attempts = attempts + 1"
        svc->>peer: downloadFile(...)
        peer-->>svc: onProgress(restart)
        svc->>repo: markResumeReset(id)
        repo->>db: "UPDATE downloadedBytes=0, error=..."
        peer-->>svc: onProgress(completed)
        svc->>repo: markCompleted(id, bytes)
        svc->>repo: markImportQueued(id)
        svc->>svc: reenqueued.delete(torrentFilename)
    end

    idx->>svc: processTorrentFile(filePath, filename)
    alt filename in reenqueued
        svc-->>idx: skip (stub owned by re-enqueue)
    else normal processing
        svc->>repo: create(...)
        svc->>svc: runDownload(record)
    end
Loading

Reviews (2): Last reviewed commit: "feat: bound, retry, and resume peer down..." | Re-trigger Greptile

Add an attempts column to the downloads table (additive migration) and
repository methods: incrementAttempts, markResumeReset (reset
downloadedBytes and record the resume-from-zero transition), and
listStaleDownloads (returns stale downloading rows without mutating
them, for active startup re-enqueue). reconcileStaleDownloads is kept
as the fallback for when downloads is unconfigured.

Refs #13.
Comment thread apps/backend/src/modules/downloads/downloads.repository.ts
Comment thread apps/backend/src/modules/downloads/downloads.repository.ts
* feat: bound, retry, and resume peer downloads end-to-end

Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads),
a retry loop (bounded backoff+jitter, attempts tracked, transient vs
permanent classification, Retry-After honored), and resume: the restart
progress event persists via markResumeReset, and an active/reenqueued
dedupe prevents duplicate rows. On startup, index.ts re-drives stale
downloading rows with resumeStaleDownloads() before the watcher scans,
falling back to reconcileStaleDownloads() when downloads is unconfigured.

Closes #13.

* fix: harden startup re-enqueue dedupe (review feedback)

- Dedupe stale downloading rows by destPath before re-driving: only one
  row per destination is resumable (they share the same .part), so mark
  the superseded duplicates failed instead of letting the second silently
  early-return in runDownload and stay stuck in downloading.
- Release the reenqueued claim on successful resume (stub already
  unlinked, so no scan race) so a later legitimate re-drop of the same
  torrent filename is not silently skipped for the rest of the process.

Refs #13.
@roziscoding roziscoding merged commit 7c8844e into feat/harden-peer-downloads/3-retry-concurrency Jun 6, 2026
6 checks passed
@roziscoding roziscoding deleted the feat/harden-peer-downloads/4-attempts-schema branch June 6, 2026 09:48
roziscoding added a commit that referenced this pull request Jun 6, 2026
…5) (#18)

* feat: add retry, semaphore, and download concurrency/retry config

Add a generic retry() helper (bounded attempts, exponential backoff with
full jitter, optional Retry-After override, injectable sleep/random) and
a download retry classifier (transient: network/timeout/5xx/429/incomplete
stream; permanent: non-429 4xx and others). Add a FIFO async Semaphore.
Extend DownloadsConfig with maxConcurrentDownloads and retry knobs (all
defaulted so existing configs keep parsing). Primitives are wired into
DownloadsService in a later change.

Refs #13.

* feat: track download attempts and expose stale rows for re-drive (#13 4/5) (#19)

* feat: track download attempts and expose stale rows for re-drive

Add an attempts column to the downloads table (additive migration) and
repository methods: incrementAttempts, markResumeReset (reset
downloadedBytes and record the resume-from-zero transition), and
listStaleDownloads (returns stale downloading rows without mutating
them, for active startup re-enqueue). reconcileStaleDownloads is kept
as the fallback for when downloads is unconfigured.

Refs #13.

* feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) (#20)

* feat: bound, retry, and resume peer downloads end-to-end

Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads),
a retry loop (bounded backoff+jitter, attempts tracked, transient vs
permanent classification, Retry-After honored), and resume: the restart
progress event persists via markResumeReset, and an active/reenqueued
dedupe prevents duplicate rows. On startup, index.ts re-drives stale
downloading rows with resumeStaleDownloads() before the watcher scans,
falling back to reconcileStaleDownloads() when downloads is unconfigured.

Closes #13.

* fix: harden startup re-enqueue dedupe (review feedback)

- Dedupe stale downloading rows by destPath before re-driving: only one
  row per destination is resumable (they share the same .part), so mark
  the superseded duplicates failed instead of letting the second silently
  early-return in runDownload and stay stuck in downloading.
- Release the reenqueued claim on successful resume (stub already
  unlinked, so no scan race) so a later legitimate re-drop of the same
  torrent filename is not silently skipped for the rest of the process.

Refs #13.

* fix: address retry review feedback
roziscoding added a commit that referenced this pull request Jun 6, 2026
* feat: resume interrupted peer downloads via HTTP Range

downloadFile now detects an existing .part file and sends
Range: bytes=<size>-, validating the peer's 206 + Content-Range against
the persisted expected size before appending. On 200 (range ignored), a
Content-Range mismatch, or 416 it discards the stale .part and restarts
from byte 0, emitting a restart progress event. The write path uses a
node:fs FileHandle (append/write) with datasync at checkpoints, and the
.part is preserved on error so the next attempt can resume. A truncated
stream throws a retryable IncompleteDownloadError.

Refs #13.

* feat: retry, semaphore, and download concurrency/retry config (#13 3/5) (#18)

* feat: add retry, semaphore, and download concurrency/retry config

Add a generic retry() helper (bounded attempts, exponential backoff with
full jitter, optional Retry-After override, injectable sleep/random) and
a download retry classifier (transient: network/timeout/5xx/429/incomplete
stream; permanent: non-429 4xx and others). Add a FIFO async Semaphore.
Extend DownloadsConfig with maxConcurrentDownloads and retry knobs (all
defaulted so existing configs keep parsing). Primitives are wired into
DownloadsService in a later change.

Refs #13.

* feat: track download attempts and expose stale rows for re-drive (#13 4/5) (#19)

* feat: track download attempts and expose stale rows for re-drive

Add an attempts column to the downloads table (additive migration) and
repository methods: incrementAttempts, markResumeReset (reset
downloadedBytes and record the resume-from-zero transition), and
listStaleDownloads (returns stale downloading rows without mutating
them, for active startup re-enqueue). reconcileStaleDownloads is kept
as the fallback for when downloads is unconfigured.

Refs #13.

* feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) (#20)

* feat: bound, retry, and resume peer downloads end-to-end

Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads),
a retry loop (bounded backoff+jitter, attempts tracked, transient vs
permanent classification, Retry-After honored), and resume: the restart
progress event persists via markResumeReset, and an active/reenqueued
dedupe prevents duplicate rows. On startup, index.ts re-drives stale
downloading rows with resumeStaleDownloads() before the watcher scans,
falling back to reconcileStaleDownloads() when downloads is unconfigured.

Closes #13.

* fix: harden startup re-enqueue dedupe (review feedback)

- Dedupe stale downloading rows by destPath before re-driving: only one
  row per destination is resumable (they share the same .part), so mark
  the superseded duplicates failed instead of letting the second silently
  early-return in runDownload and stay stuck in downloading.
- Release the reenqueued claim on successful resume (stub already
  unlinked, so no scan race) so a later legitimate re-drop of the same
  torrent filename is not silently skipped for the rest of the process.

Refs #13.

* fix: address retry review feedback

* fix: guard non-ok resume responses

* fix: avoid leaked peer download reader lock

* fix: close peer download handle on reader failure
roziscoding added a commit that referenced this pull request Jun 6, 2026
* feat: resume interrupted peer downloads via HTTP Range

downloadFile now detects an existing .part file and sends
Range: bytes=<size>-, validating the peer's 206 + Content-Range against
the persisted expected size before appending. On 200 (range ignored), a
Content-Range mismatch, or 416 it discards the stale .part and restarts
from byte 0, emitting a restart progress event. The write path uses a
node:fs FileHandle (append/write) with datasync at checkpoints, and the
.part is preserved on error so the next attempt can resume. A truncated
stream throws a retryable IncompleteDownloadError.

Refs #13.

* feat: retry, semaphore, and download concurrency/retry config (#13 3/5) (#18)

* feat: add retry, semaphore, and download concurrency/retry config

Add a generic retry() helper (bounded attempts, exponential backoff with
full jitter, optional Retry-After override, injectable sleep/random) and
a download retry classifier (transient: network/timeout/5xx/429/incomplete
stream; permanent: non-429 4xx and others). Add a FIFO async Semaphore.
Extend DownloadsConfig with maxConcurrentDownloads and retry knobs (all
defaulted so existing configs keep parsing). Primitives are wired into
DownloadsService in a later change.

Refs #13.

* feat: track download attempts and expose stale rows for re-drive (#13 4/5) (#19)

* feat: track download attempts and expose stale rows for re-drive

Add an attempts column to the downloads table (additive migration) and
repository methods: incrementAttempts, markResumeReset (reset
downloadedBytes and record the resume-from-zero transition), and
listStaleDownloads (returns stale downloading rows without mutating
them, for active startup re-enqueue). reconcileStaleDownloads is kept
as the fallback for when downloads is unconfigured.

Refs #13.

* feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) (#20)

* feat: bound, retry, and resume peer downloads end-to-end

Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads),
a retry loop (bounded backoff+jitter, attempts tracked, transient vs
permanent classification, Retry-After honored), and resume: the restart
progress event persists via markResumeReset, and an active/reenqueued
dedupe prevents duplicate rows. On startup, index.ts re-drives stale
downloading rows with resumeStaleDownloads() before the watcher scans,
falling back to reconcileStaleDownloads() when downloads is unconfigured.

Closes #13.

* fix: harden startup re-enqueue dedupe (review feedback)

- Dedupe stale downloading rows by destPath before re-driving: only one
  row per destination is resumable (they share the same .part), so mark
  the superseded duplicates failed instead of letting the second silently
  early-return in runDownload and stay stuck in downloading.
- Release the reenqueued claim on successful resume (stub already
  unlinked, so no scan race) so a later legitimate re-drop of the same
  torrent filename is not silently skipped for the rest of the process.

Refs #13.

* fix: address retry review feedback

* fix: guard non-ok resume responses

* fix: avoid leaked peer download reader lock

* fix: close peer download handle on reader failure
roziscoding added a commit that referenced this pull request Jun 6, 2026
* feat: serve HTTP byte ranges on peer file endpoint

Parse the Range header in the peer file route and serve 206 Partial
Content with Content-Range, 416 Range Not Satisfiable for unsatisfiable
ranges, and Accept-Ranges: bytes on full responses. streamFile now
returns a discriminated result (full/partial/unsatisfiable) resolved
against the file size, streaming only the requested slice.

Foundation for resumable peer downloads (#13).

* feat: resume interrupted peer downloads via HTTP Range (#13 2/5) (#17)

* feat: resume interrupted peer downloads via HTTP Range

downloadFile now detects an existing .part file and sends
Range: bytes=<size>-, validating the peer's 206 + Content-Range against
the persisted expected size before appending. On 200 (range ignored), a
Content-Range mismatch, or 416 it discards the stale .part and restarts
from byte 0, emitting a restart progress event. The write path uses a
node:fs FileHandle (append/write) with datasync at checkpoints, and the
.part is preserved on error so the next attempt can resume. A truncated
stream throws a retryable IncompleteDownloadError.

Refs #13.

* feat: retry, semaphore, and download concurrency/retry config (#13 3/5) (#18)

* feat: add retry, semaphore, and download concurrency/retry config

Add a generic retry() helper (bounded attempts, exponential backoff with
full jitter, optional Retry-After override, injectable sleep/random) and
a download retry classifier (transient: network/timeout/5xx/429/incomplete
stream; permanent: non-429 4xx and others). Add a FIFO async Semaphore.
Extend DownloadsConfig with maxConcurrentDownloads and retry knobs (all
defaulted so existing configs keep parsing). Primitives are wired into
DownloadsService in a later change.

Refs #13.

* feat: track download attempts and expose stale rows for re-drive (#13 4/5) (#19)

* feat: track download attempts and expose stale rows for re-drive

Add an attempts column to the downloads table (additive migration) and
repository methods: incrementAttempts, markResumeReset (reset
downloadedBytes and record the resume-from-zero transition), and
listStaleDownloads (returns stale downloading rows without mutating
them, for active startup re-enqueue). reconcileStaleDownloads is kept
as the fallback for when downloads is unconfigured.

Refs #13.

* feat: bound, retry, and resume peer downloads end-to-end (#13 5/5) (#20)

* feat: bound, retry, and resume peer downloads end-to-end

Rewire DownloadsService around a shared Semaphore (maxConcurrentDownloads),
a retry loop (bounded backoff+jitter, attempts tracked, transient vs
permanent classification, Retry-After honored), and resume: the restart
progress event persists via markResumeReset, and an active/reenqueued
dedupe prevents duplicate rows. On startup, index.ts re-drives stale
downloading rows with resumeStaleDownloads() before the watcher scans,
falling back to reconcileStaleDownloads() when downloads is unconfigured.

Closes #13.

* fix: harden startup re-enqueue dedupe (review feedback)

- Dedupe stale downloading rows by destPath before re-driving: only one
  row per destination is resumable (they share the same .part), so mark
  the superseded duplicates failed instead of letting the second silently
  early-return in runDownload and stay stuck in downloading.
- Release the reenqueued claim on successful resume (stub already
  unlinked, so no scan race) so a later legitimate re-drop of the same
  torrent filename is not silently skipped for the rest of the process.

Refs #13.

* fix: address retry review feedback

* fix: guard non-ok resume responses

* fix: avoid leaked peer download reader lock

* fix: close peer download handle on reader failure

* test: use Bun.write instead of node:fs writeFile in range test

* chore: sync bun.lock with hono bump from main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant