fix(compression): correctness findings from compression audit#2803
fix(compression): correctness findings from compression audit#2803ValentaTomas wants to merge 9 commits into
Conversation
PR SummaryMedium Risk Overview Reviewed by Cursor Bugbot for commit 8755e37. Bugbot is set up for automated code reviews on this repo. Configure here. |
❌ 3 Tests Failed:
View the full list of 5 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
There was a problem hiding this comment.
Code Review
The alignment check in WriteAtWithoutLock only verifies that the write is at least one block long but does not ensure the total length is a multiple of the block size, which could lead to a panic when slicing the buffer for the final block; the condition should be updated to require that len(b) is an exact multiple of c.blockSize.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Idle timer not reset after successful Read completion
- Added timer.Reset(r.idle) after successful reads to give consumers the full idle budget between reads instead of reducing it by the read duration.
Or push these changes by commenting:
@cursor push f40a25dca8
Preview (f40a25dca8)
diff --git a/packages/shared/pkg/storage/storage_google.go b/packages/shared/pkg/storage/storage_google.go
--- a/packages/shared/pkg/storage/storage_google.go
+++ b/packages/shared/pkg/storage/storage_google.go
@@ -303,6 +303,8 @@
n, err := r.ReadCloser.Read(p)
if err != nil {
r.timer.Stop()
+ } else {
+ r.timer.Reset(r.idle)
}
return n, err
}You can send follow-ups to the cloud agent here.
1a7b59d to
c8b0411
Compare
c8b0411 to
d26c7de
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: GCS ErrObjectNotExist check never matches in peerSeekable
- Added error translation in openRangeReader to wrap cloud library ErrObjectNotExist with local package sentinel, matching the pattern used in Size() and WriteTo() methods.
Or push these changes by commenting:
@cursor push 3750f3cd4c
Preview (3750f3cd4c)
diff --git a/packages/shared/pkg/storage/storage_google.go b/packages/shared/pkg/storage/storage_google.go
--- a/packages/shared/pkg/storage/storage_google.go
+++ b/packages/shared/pkg/storage/storage_google.go
@@ -274,6 +274,10 @@
if err != nil {
cancel()
+ if errors.Is(err, storage.ErrObjectNotExist) {
+ return nil, fmt.Errorf("failed to create GCS range reader for %q at %d+%d: %w", o.path, off, length, ErrObjectNotExist)
+ }
+
return nil, fmt.Errorf("failed to create GCS range reader for %q at %d+%d: %w", o.path, off, length, err)
}You can send follow-ups to the cloud agent here.
4560047 to
ddf1bab
Compare
A buffer shorter than blockSize or a misaligned offset panicked on the header.IsZero(b[:c.blockSize]) slice. Return an explicit error so a future caller cannot SIGSEGV the orchestrator.
uploadFramed always wrote h.Builds[buildID] even when srcPath was "" and selfBuild was the zero value. The on-disk header was then structurally indistinguishable from an empty-size build. No reader trusts the entry today, but tighten the serialization so future callers can rely on the mapping fall-through instead.
deserializeV4 ignored the uint32 size prefix and decompressLZ4 did an unbounded io.ReadAll, so a malformed header (corrupt write, partial upload, or actor with bucket-write access) could allocate up to LZ4's ~255x expansion of the compressed body. Read the prefix, reject sizes above 64 MiB, and cap the LZ4 reader at expected+1 bytes.
A truncated or otherwise short .frm file (partial NFS write, disk full mid-write) used to decompress cleanly into a short stream and serve incorrect bytes to the chunker, or stay poisoned across reads until manual eviction. Stat the cache file on hit and compare against the FrameTable's compressed length; on mismatch remove the file and fall through to the miss path, which refetches and rewrites it.
peerSeekable.OpenRangeReader fires PeerTransitionedError at most once when the peer flips uploaded=true. If GCS then 404s the just-finalized object due to gRPC client visibility lag, every subsequent call falls through to base and returns a permanent ErrObjectNotExist because the transition flag is sticky. Track when the first transition was emitted and re-emit PeerTransitionedError on base 404 for the next 30 s so the upper retry loop reloads the header and tries again.
The 10 s googleReadTimeout was applied to the context that backed every subsequent Read on the returned ReadCloser. For compressed reads the consumer pulls bytes through decompressor → tee → raw → GCS, and slow NBD/UFFD back-pressure or cache writeback can expand the drain past 10 s and surface as context deadline exceeded. Keep the absolute deadline only on NewRangeReader; cap each Read with an idle timer that resets on progress.
ddf1bab to
5584b06
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ef6431c76c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Concurrent reads lose transition retry
- Removed the conditional CAS check so all concurrent goroutines in the retry window receive PeerTransitionedError and can retry after header swap.
Or push these changes by commenting:
@cursor push 2c923d7144
Preview (2c923d7144)
diff --git a/packages/orchestrator/pkg/sandbox/template/peerclient/seekable.go b/packages/orchestrator/pkg/sandbox/template/peerclient/seekable.go
--- a/packages/orchestrator/pkg/sandbox/template/peerclient/seekable.go
+++ b/packages/orchestrator/pkg/sandbox/template/peerclient/seekable.go
@@ -145,9 +145,8 @@
if errors.Is(err, storage.ErrObjectNotExist) {
at := s.transitionAt.Load()
if at != 0 && time.Since(time.Unix(0, at)) < postTransitionRetryWindow {
- if s.transitionAt.CompareAndSwap(at, 0) {
- return nil, &storage.PeerTransitionedError{}
- }
+ s.transitionAt.CompareAndSwap(at, 0)
+ return nil, &storage.PeerTransitionedError{}
}
}You can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit 60ed9c2. Configure here.


Fixes compression correctness issues around aligned cache writes, V4 header bounds, empty diffs, corrupt compressed cache entries, post-transition reads, and GCS read idle timeouts.