Skip to content

feat(cp): stream downloads and report progress#89

Open
acmore wants to merge 7 commits intomainfrom
feat/cp-streaming-progress
Open

feat(cp): stream downloads and report progress#89
acmore wants to merge 7 commits intomainfrom
feat/cp-streaming-progress

Conversation

@acmore
Copy link
Copy Markdown
Owner

@acmore acmore commented Apr 20, 2026

Summary

  • Stream downloads: okdev cp no longer spools the full remote tar archive to a temp file before extracting. Files are extracted directly into the destination as bytes arrive, so intermediate files become visible while the copy is in progress.
  • Progress reporting: Throttled single-line status on stderr shows bytes transferred, transfer rate, file count, and current file. Multi-pod mode shows aggregate bytes + `N/M pods` completion. A final summary line is printed to stdout on completion. TTY-aware; non-interactive outputs stay silent until the summary.
  • Safer retries: Copy retries now only fire when nothing has been written locally, so a flaky mid-transfer error can't silently restart a partially-completed large copy. Remote tar stderr is captured and surfaced in error messages.
  • New surface: Added `kube.CopyOptions` and `kube.CopyProgress` with `*WithOptions` variants of the four copy methods. Existing signatures remain and forward to the options variants with no callbacks.

Test plan

  • `go test ./...` (kube + cli + everything else)
  • `.venv/bin/pre-commit run --all-files --hook-stage manual okdev-gofmt`
  • Rebuilt `okdev` locally and verified `okdev cp --help` still renders
  • Manual e2e: upload a directory to a running session and confirm progress line renders and files appear incrementally
  • Manual e2e: download a directory from a session (single-pod and `--all`) and confirm files materialize during transfer

Made with Cursor

acmore added 7 commits April 20, 2026 19:55
Downloads used to spool the entire remote tar archive into a temp file
before extracting, so `okdev cp` blocked silently and nothing showed up
in the destination until the very end. Copies also had no progress
indicator, so large transfers looked hung.

Extract tar streams directly into the destination so intermediate files
materialize as they arrive, and surface a throttled single-line status
(bytes, rate, file count, current file) on stderr with a summary line on
completion. Retries now only fire when nothing has been written locally
so flaky mid-transfer errors can no longer silently restart a partially
completed large copy.

Made-with: Cursor
Print a one-shot announce line before the transfer starts with the
operation and total size (e.g. "Downloading :/workspace/data -> ./out
(4.32 GB) via sess-master-0") and add an ETA field to the live progress
line when a total is known. For multi-pod copies the announce summarizes
per-pod and aggregate bytes. Also render the progress bar immediately on
construction so users see the prefix + planned total without waiting for
the first tick.

Made-with: Cursor
A single exec/SPDY stream against the API server is usually the bottleneck
on large directory copies. Add a --parallel flag (default 4, capped at
16) that splits the source file list into size-balanced buckets and
issues one concurrent tar stream per bucket.

Enumeration:
- Remote: `find -printf` with a stat fallback for busybox images
- Local: filepath.Walk

Bucketing uses an LPT heuristic so a few large files don't starve
workers. Buckets are disjoint file sets, so extracting concurrently into
the same destination tree is safe (MkdirAll is idempotent; file paths
don't collide). In multi-pod mode the effective per-pod parallelism is
clamped so fanout × parallel stays under a soft cap to protect the API
server. Single-file copies ignore --parallel for now; range-based
single-file parallelism is left as a follow-up.

Made-with: Cursor
Two fixes prompted by users perceiving the parallel copy path as
progress-less:

1. Before the first byte lands, the parallel path spends real time
   doing a remote `find -printf` enumeration (and, for uploads, a local
   filepath.Walk). With no phase signal and no elapsed in the status
   line, successive renders looked identical and felt frozen. Now:

   - CopyProgress has an OnPhase callback.
   - CopyDir*ParallelWithOptions signals phases ("listing remote files"
     / "scanning local files") around those pre-transfer steps.
   - cpProgress renders "<prefix> · <phase> · <elapsed>" while bytes=0
     and a phase is set, and falls back to the normal byte/rate/ETA
     view as soon as data flows.
   - Every render now includes elapsed so the line visibly changes
     every tick even if bytes or phase don't.

2. Add --quiet/-q to opt out of the announce, progress bar, and
   summary. Errors still surface; stdout/stderr stay silent otherwise.

Tested: unit coverage for phase wiring on both kube() and
kubeBytesOnly(), render output with phase set, render transition once
bytes arrive, and flag presence on the cobra command.

Made-with: Cursor
… path

Real-world reports show the default --parallel 4 path is often slower
than sequential. Root cause is structural:

- The sequential path runs a single recursive `tar cf - <dir>` with a
  warm dir walk (one readdir per subdir, stream stats).
- The parallel path runs N tars with per-file args, each paying its
  own stat/open/read overhead. For trees with many small files the
  per-file tax can exceed any parallelism gain.
- When the apiserver/kubelet/runtime is the true bottleneck (common),
  extra SPDY exec streams add CPU and framing overhead without adding
  throughput.
- The parallel path also pays a mandatory remote `find -printf` round
  trip before the first byte flows.

Changes:
- Flip --parallel default from 4 to 1 so callers opt in explicitly.
- Short-circuit to the recursive single-stream path when the
  enumerated tree has fewer than 8 files (enumeration + N tars costs
  more than one recursive tar for tiny trees).
- Rewrite runBucketDownload to feed the per-bucket file list via
  stdin (`tar cf - -C <dir> -T -`) instead of packing every path onto
  `sh -lc`'s command line. The previous approach tripped ARG_MAX
  (~128KB–2MB) on large buckets; stdin-fed file lists scale
  indefinitely. Tradeoff: paths containing literal newlines are not
  supported in the parallel path (rare in practice; users can fall
  back to --parallel 1).

Docs and flag help now explain when to raise --parallel (high-latency
/ high-bandwidth links where a single SPDY stream under-utilizes the
pipe) and when it hurts (apiserver/kubelet-bound copies).

Made-with: Cursor
…st byte

Two changes addressing reports of low observed throughput (e.g. ~6 KB/s)
on `okdev cp`:

1. Transfer rate was measured from cpProgress creation, not from the
   first byte. Pre-transfer work — IsRemoteDir probe, remote `du -sb`
   for the announce line, tar process startup, `find -printf`
   enumeration in the parallel path — all counted toward the rate
   divisor and artificially deflated the displayed speed. Now a
   first-byte instant is CAS-recorded on the first OnBytes call and
   rate/ETA are computed from that moment forward. The total elapsed
   in the final summary still reflects wall time.

2. Add --compress/-z to gzip the tar stream on the wire:

   - CopyOptions.Compress plumbed through all copy paths.
   - Directory download: remote emits `tar czf -`; local wraps the
     pipe reader in gzip.NewReader before feeding the tar extractor.
   - Directory upload: local tar output is wrapped in gzip.NewWriter;
     remote runs `tar xzf -`. A small gzipPipeWriter adapter keeps
     the gzip trailer flush and the pipe close independent so the
     exec stream sees EOF only after the gzip footer is written.
   - Parallel download/upload paths mirror the same wiring per
     bucket.

   Gzip is opt-in: on fast links or pre-compressed payloads it just
   burns pod CPU, but on the slow links that motivated this change
   it turns bytes-on-wire into the dominant saving. Requires `gzip`
   in the pod image, which is ubiquitous in both GNU and busybox
   containers.

Tests: gzip round-trip through writeFilesTar + extractTarToDir; the
first-byte rate semantics (transferElapsed is 0 before bytes,
positive after, monotonic, and excludes pre-byte latency); --compress
flag presence on the cobra command.

Made-with: Cursor
…ount

Split large single-file downloads into N byte ranges that stream
concurrently through separate exec streams, each writing to its own
offset in a pre-allocated temp file. Pick N adaptively from the
probed remote size (<16 MiB stays sequential).

For the sequential path, track the last written offset and resume on
retry with `tail -c +N` instead of restarting from byte 0, so transient
stream errors no longer restart the transfer.

Fix a latent bug where `extractSingleFileFromTar` / `extractTarToDir`
only marked `produced=true` after io.Copy returned successfully. When a
copy failed mid-stream the outer retry loop saw produced=false, deleted
the partially-written temp file, and restarted — double-counting bytes
in the progress bar (the "194 MB / 180 MB (100%)" case) and silently
redoing partial transfers. Flip the flag as soon as any byte lands on
disk.

Add `probeRemoteFile` helper that returns {size, mode, isRegular} in
one exec call, unit tests for range-split math and offsetWriter
concurrency, a mid-stream-failure test for the produced-flag fix,
and docs updates describing the new behavior.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant