Open
Conversation
Downloads used to spool the entire remote tar archive into a temp file before extracting, so `okdev cp` blocked silently and nothing showed up in the destination until the very end. Copies also had no progress indicator, so large transfers looked hung. Extract tar streams directly into the destination so intermediate files materialize as they arrive, and surface a throttled single-line status (bytes, rate, file count, current file) on stderr with a summary line on completion. Retries now only fire when nothing has been written locally so flaky mid-transfer errors can no longer silently restart a partially completed large copy. Made-with: Cursor
Print a one-shot announce line before the transfer starts with the operation and total size (e.g. "Downloading :/workspace/data -> ./out (4.32 GB) via sess-master-0") and add an ETA field to the live progress line when a total is known. For multi-pod copies the announce summarizes per-pod and aggregate bytes. Also render the progress bar immediately on construction so users see the prefix + planned total without waiting for the first tick. Made-with: Cursor
A single exec/SPDY stream against the API server is usually the bottleneck on large directory copies. Add a --parallel flag (default 4, capped at 16) that splits the source file list into size-balanced buckets and issues one concurrent tar stream per bucket. Enumeration: - Remote: `find -printf` with a stat fallback for busybox images - Local: filepath.Walk Bucketing uses an LPT heuristic so a few large files don't starve workers. Buckets are disjoint file sets, so extracting concurrently into the same destination tree is safe (MkdirAll is idempotent; file paths don't collide). In multi-pod mode the effective per-pod parallelism is clamped so fanout × parallel stays under a soft cap to protect the API server. Single-file copies ignore --parallel for now; range-based single-file parallelism is left as a follow-up. Made-with: Cursor
Two fixes prompted by users perceiving the parallel copy path as
progress-less:
1. Before the first byte lands, the parallel path spends real time
doing a remote `find -printf` enumeration (and, for uploads, a local
filepath.Walk). With no phase signal and no elapsed in the status
line, successive renders looked identical and felt frozen. Now:
- CopyProgress has an OnPhase callback.
- CopyDir*ParallelWithOptions signals phases ("listing remote files"
/ "scanning local files") around those pre-transfer steps.
- cpProgress renders "<prefix> · <phase> · <elapsed>" while bytes=0
and a phase is set, and falls back to the normal byte/rate/ETA
view as soon as data flows.
- Every render now includes elapsed so the line visibly changes
every tick even if bytes or phase don't.
2. Add --quiet/-q to opt out of the announce, progress bar, and
summary. Errors still surface; stdout/stderr stay silent otherwise.
Tested: unit coverage for phase wiring on both kube() and
kubeBytesOnly(), render output with phase set, render transition once
bytes arrive, and flag presence on the cobra command.
Made-with: Cursor
… path Real-world reports show the default --parallel 4 path is often slower than sequential. Root cause is structural: - The sequential path runs a single recursive `tar cf - <dir>` with a warm dir walk (one readdir per subdir, stream stats). - The parallel path runs N tars with per-file args, each paying its own stat/open/read overhead. For trees with many small files the per-file tax can exceed any parallelism gain. - When the apiserver/kubelet/runtime is the true bottleneck (common), extra SPDY exec streams add CPU and framing overhead without adding throughput. - The parallel path also pays a mandatory remote `find -printf` round trip before the first byte flows. Changes: - Flip --parallel default from 4 to 1 so callers opt in explicitly. - Short-circuit to the recursive single-stream path when the enumerated tree has fewer than 8 files (enumeration + N tars costs more than one recursive tar for tiny trees). - Rewrite runBucketDownload to feed the per-bucket file list via stdin (`tar cf - -C <dir> -T -`) instead of packing every path onto `sh -lc`'s command line. The previous approach tripped ARG_MAX (~128KB–2MB) on large buckets; stdin-fed file lists scale indefinitely. Tradeoff: paths containing literal newlines are not supported in the parallel path (rare in practice; users can fall back to --parallel 1). Docs and flag help now explain when to raise --parallel (high-latency / high-bandwidth links where a single SPDY stream under-utilizes the pipe) and when it hurts (apiserver/kubelet-bound copies). Made-with: Cursor
…st byte
Two changes addressing reports of low observed throughput (e.g. ~6 KB/s)
on `okdev cp`:
1. Transfer rate was measured from cpProgress creation, not from the
first byte. Pre-transfer work — IsRemoteDir probe, remote `du -sb`
for the announce line, tar process startup, `find -printf`
enumeration in the parallel path — all counted toward the rate
divisor and artificially deflated the displayed speed. Now a
first-byte instant is CAS-recorded on the first OnBytes call and
rate/ETA are computed from that moment forward. The total elapsed
in the final summary still reflects wall time.
2. Add --compress/-z to gzip the tar stream on the wire:
- CopyOptions.Compress plumbed through all copy paths.
- Directory download: remote emits `tar czf -`; local wraps the
pipe reader in gzip.NewReader before feeding the tar extractor.
- Directory upload: local tar output is wrapped in gzip.NewWriter;
remote runs `tar xzf -`. A small gzipPipeWriter adapter keeps
the gzip trailer flush and the pipe close independent so the
exec stream sees EOF only after the gzip footer is written.
- Parallel download/upload paths mirror the same wiring per
bucket.
Gzip is opt-in: on fast links or pre-compressed payloads it just
burns pod CPU, but on the slow links that motivated this change
it turns bytes-on-wire into the dominant saving. Requires `gzip`
in the pod image, which is ubiquitous in both GNU and busybox
containers.
Tests: gzip round-trip through writeFilesTar + extractTarToDir; the
first-byte rate semantics (transferElapsed is 0 before bytes,
positive after, monotonic, and excludes pre-byte latency); --compress
flag presence on the cobra command.
Made-with: Cursor
…ount
Split large single-file downloads into N byte ranges that stream
concurrently through separate exec streams, each writing to its own
offset in a pre-allocated temp file. Pick N adaptively from the
probed remote size (<16 MiB stays sequential).
For the sequential path, track the last written offset and resume on
retry with `tail -c +N` instead of restarting from byte 0, so transient
stream errors no longer restart the transfer.
Fix a latent bug where `extractSingleFileFromTar` / `extractTarToDir`
only marked `produced=true` after io.Copy returned successfully. When a
copy failed mid-stream the outer retry loop saw produced=false, deleted
the partially-written temp file, and restarted — double-counting bytes
in the progress bar (the "194 MB / 180 MB (100%)" case) and silently
redoing partial transfers. Flip the flag as soon as any byte lands on
disk.
Add `probeRemoteFile` helper that returns {size, mode, isRegular} in
one exec call, unit tests for range-split math and offsetWriter
concurrency, a mid-stream-failure test for the produced-flag fix,
and docs updates describing the new behavior.
Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
okdev cpno longer spools the full remote tar archive to a temp file before extracting. Files are extracted directly into the destination as bytes arrive, so intermediate files become visible while the copy is in progress.Test plan
Made with Cursor