Skip to content

fix(compression): correctness findings from compression audit#2803

Open
ValentaTomas wants to merge 9 commits into
mainfrom
lev-compression-correctness-fixes
Open

fix(compression): correctness findings from compression audit#2803
ValentaTomas wants to merge 9 commits into
mainfrom
lev-compression-correctness-fixes

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

@ValentaTomas ValentaTomas commented May 22, 2026

Fixes compression correctness issues around aligned cache writes, V4 header bounds, empty diffs, corrupt compressed cache entries, post-transition reads, and GCS read idle timeouts.

@cla-bot cla-bot Bot added the cla-signed label May 22, 2026
@cursor
Copy link
Copy Markdown

cursor Bot commented May 22, 2026

PR Summary

Medium Risk
Changes retry behavior on the hot path for build reads and alters the PeerTransitionedError contract, which could impact latency and transition correctness if mis-tuned. Scope is limited to peer transition handling and header swap timing.

Overview
Fixes a tight retry loop during peer-to-storage transitions where storage can briefly return 404 after upload completion by carrying a RetryAfter delay on PeerTransitionedError and sleeping before reloading the header, reducing churn and transient failures.

Reviewed by Cursor Bugbot for commit 8755e37. Bugbot is set up for automated code reviews on this repo. Configure here.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

❌ 3 Tests Failed:

Tests completed Failed Passed Skipped
2708 3 2705 7
View the full list of 5 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/templates::TestTemplateBuildRUN

Flake rate in main: 46.65% (Passed 549 times, Failed 480 times)

Stack Traces | 0s run time
=== RUN   TestTemplateBuildRUN
=== PAUSE TestTemplateBuildRUN
=== CONT  TestTemplateBuildRUN
--- FAIL: TestTemplateBuildRUN (0.00s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/templates::TestTemplateBuildRUN/Single_RUN_command

Flake rate in main: 46.65% (Passed 549 times, Failed 480 times)

Stack Traces | 168s run time
=== RUN   TestTemplateBuildRUN/Single_RUN_command
=== PAUSE TestTemplateBuildRUN/Single_RUN_command
=== CONT  TestTemplateBuildRUN/Single_RUN_command
    build_template_test.go:133: test-ubuntu-run: [info] Building template 8ubo7hdntyt0rrl3s6sf/68f47dfb-54e8-40e9-ac84-c2d5bdd73b58
    build_template_test.go:133: test-ubuntu-run: [info] [base] FROM ubuntu:22.04 [853db866ee97500d93b96c12372d9b297568ba646efed473c099998ba016e737]
    build_template_test.go:133: test-ubuntu-run: [info] Base Docker image size: 30 MB
    build_template_test.go:133: test-ubuntu-run: [info] Creating file system and pulling Docker image
    build_template_test.go:133: test-ubuntu-run: [info] Uncompressing layer sha256:40d16f30db405106ef8074779bdf41f012465c2a785bbeaa2eab9f2081099b47 30 MB
    build_template_test.go:133: test-ubuntu-run: [info] Uncompressing layer sha256:851a5c9481ab7137223fda95a4f3b35a7885e8d8c68c3330cd961a247accb2ce 16 MB
    build_template_test.go:133: test-ubuntu-run: [info] Uncompressing layer sha256:8c4b1b28875140ed3abacaf16ad0d696f6bef912f52d2148f261a23e3349465b 168 B
    build_template_test.go:133: test-ubuntu-run: [info] Layers extracted
    build_template_test.go:133: test-ubuntu-run: [info] Root filesystem structure: bin, boot, dev, etc, home, lib, lib32, lib64, libx32, media, mnt, opt, proc, root, run, sbin, srv, sys, tmp, usr, var
    build_template_test.go:133: test-ubuntu-run: [info] Provisioning sandbox template
    build_template_test.go:133: test-ubuntu-run: [info] Provisioning was successful, cleaning up
    build_template_test.go:133: test-ubuntu-run: [info] Sandbox template provisioned
    build_template_test.go:133: test-ubuntu-run: [info] [base] DEFAULT USER user [b97b5fcc78b0f792b4eba66f81f8a39fd3a6d714ef9c470af8d9c70526254cc5]
    build_template_test.go:133: test-ubuntu-run: [info] [builder 1/1] RUN echo 'Hello, World!' [fee4392ea8b74715fa4dc3ee06a32c86fad386a78b32dd7036ea57170ab70553]
    build_template_test.go:133: test-ubuntu-run: [info] [builder 1/1] [stdout]: Hello, World!
    build_template_test.go:133: test-ubuntu-run: [info] [finalize] Finalizing template build [22385bb4f7feb0fd660dd4f18c4a00f87af47c4039c0186f9fe9e340faa018b6]
    build_template_test.go:133: test-ubuntu-run: [error] Build failed: build was cancelled
    build_template_test.go:166: Build failed: {<nil> build was cancelled <nil>}
--- FAIL: TestTemplateBuildRUN/Single_RUN_command (168.47s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/templates::TestUpdateTemplateNotOwnedByTeam

Flake rate in main: 46.54% (Passed 549 times, Failed 478 times)

Stack Traces | 150s run time
=== RUN   TestUpdateTemplateNotOwnedByTeam
=== PAUSE TestUpdateTemplateNotOwnedByTeam
=== CONT  TestUpdateTemplateNotOwnedByTeam
    template_update_test.go:205: Build failed: {<nil> build was cancelled <nil>}
--- FAIL: TestUpdateTemplateNotOwnedByTeam (149.97s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 60.55% (Passed 555 times, Failed 852 times)

Stack Traces | 78.4s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:27: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (78.43s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 60.73% (Passed 545 times, Failed 843 times)

Stack Traces | 107s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1264}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\nUsed memory before tmpfs mount: 192 MB\nFree memory before tmpfs mount: 792 MB\nMemory to use in integrity test (80% of free, min 64MB): 633 MB\n"}}
Executing command bash in sandbox ijvc6igexiq6lg4r4v2eo (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"633+0 records in\n633+0 records out\n663748608 bytes (664 MB, 633 MiB) copied, 3.62939 s, 183 MB/s\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\t"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"C"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"o"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"m"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"m"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"a"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"d"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"b"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"e"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"i"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"g"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"t"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"i"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"m"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"e"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"d"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:":"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\""}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"dd if=/dev/urandom of=/mnt/testfile bs=1M count=633\"\n\tUser time (seconds): 0.00\n\tSystem time (seconds): 3.59\n\tPercent of CPU this job got: 98%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:03.63\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data size (kbytes): 0\n\tAve"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"rage stack size "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"(kbytes"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"): 0\n\t"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"Averag"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"e total size (kbytes): 0\n\tMaximum resid"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"ent set siz"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"e (kbyt"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"es): 2620"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\n\tAver"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"age reside"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"nt set size (kbyte"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"s): 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tMajor"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" (requiri"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"ng I/O"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:") page"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" fault"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"s: 3\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tMinor (r"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"eclaimi"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"ng a f"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"rame) "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"page f"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"aults: 343\n\tVol"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"untar"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"y cont"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"ext sw"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"itches"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:": 4\n\tInvoluntary c"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"ontext"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" switch"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"es: 6\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tSwaps: "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"0\n\tFil"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"e syst"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"em inp"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"uts: "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"176\n\tFile "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"system"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" outpu"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"ts: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tSocke"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"t messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n\tPage size (bytes): 4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 828 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox iuvgfndwb0uz2qe4rax80
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1281}}
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{data:{stdout:"fb275e0562eb488867f5d8f7c6c9277577d99d0443f4cfad4583e77f56de6553\n"}}
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:80: Command [bash] completed successfully in sandbox iuvgfndwb0uz2qe4rax80
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1284}}
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
Executing command bash in sandbox iuvgfndwb0uz2qe4rax80 (user: root)
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:81
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox iuvgfndwb0uz2qe4rax80: unavailable: HTTP status 502 Bad Gateway
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:78
        	            				.../tests/orchestrator/sandbox_memory_integrity_test.go:110
        	Error:      	Condition never satisfied
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (106.93s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@ValentaTomas ValentaTomas changed the title fix(compression): correctness-critical findings from compression audit fix(compression): correctness + perf findings from compression audit May 22, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The alignment check in WriteAtWithoutLock only verifies that the write is at least one block long but does not ensure the total length is a multiple of the block size, which could lead to a panic when slicing the buffer for the final block; the condition should be updated to require that len(b) is an exact multiple of c.blockSize.

Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Idle timer not reset after successful Read completion
    • Added timer.Reset(r.idle) after successful reads to give consumers the full idle budget between reads instead of reducing it by the read duration.

Create PR

Or push these changes by commenting:

@cursor push f40a25dca8
Preview (f40a25dca8)
diff --git a/packages/shared/pkg/storage/storage_google.go b/packages/shared/pkg/storage/storage_google.go
--- a/packages/shared/pkg/storage/storage_google.go
+++ b/packages/shared/pkg/storage/storage_google.go
@@ -303,6 +303,8 @@
 	n, err := r.ReadCloser.Read(p)
 	if err != nil {
 		r.timer.Stop()
+	} else {
+		r.timer.Reset(r.idle)
 	}
 	return n, err
 }

You can send follow-ups to the cloud agent here.

Comment thread packages/shared/pkg/storage/storage_google.go
@ValentaTomas ValentaTomas force-pushed the lev-compression-correctness-fixes branch from 1a7b59d to c8b0411 Compare May 22, 2026 21:12
@ValentaTomas ValentaTomas changed the title fix(compression): correctness + perf findings from compression audit fix(compression): correctness findings from compression audit May 22, 2026
@ValentaTomas ValentaTomas force-pushed the lev-compression-correctness-fixes branch from c8b0411 to d26c7de Compare May 22, 2026 22:00
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: GCS ErrObjectNotExist check never matches in peerSeekable
    • Added error translation in openRangeReader to wrap cloud library ErrObjectNotExist with local package sentinel, matching the pattern used in Size() and WriteTo() methods.

Create PR

Or push these changes by commenting:

@cursor push 3750f3cd4c
Preview (3750f3cd4c)
diff --git a/packages/shared/pkg/storage/storage_google.go b/packages/shared/pkg/storage/storage_google.go
--- a/packages/shared/pkg/storage/storage_google.go
+++ b/packages/shared/pkg/storage/storage_google.go
@@ -274,6 +274,10 @@
 	if err != nil {
 		cancel()
 
+		if errors.Is(err, storage.ErrObjectNotExist) {
+			return nil, fmt.Errorf("failed to create GCS range reader for %q at %d+%d: %w", o.path, off, length, ErrObjectNotExist)
+		}
+
 		return nil, fmt.Errorf("failed to create GCS range reader for %q at %d+%d: %w", o.path, off, length, err)
 	}

You can send follow-ups to the cloud agent here.

Comment thread packages/orchestrator/pkg/sandbox/template/peerclient/seekable.go
@ValentaTomas ValentaTomas marked this pull request as ready for review May 22, 2026 22:49
@ValentaTomas ValentaTomas force-pushed the lev-compression-correctness-fixes branch from 4560047 to ddf1bab Compare May 22, 2026 22:50
A buffer shorter than blockSize or a misaligned offset panicked on the
header.IsZero(b[:c.blockSize]) slice. Return an explicit error so a
future caller cannot SIGSEGV the orchestrator.
uploadFramed always wrote h.Builds[buildID] even when srcPath was "" and
selfBuild was the zero value. The on-disk header was then structurally
indistinguishable from an empty-size build. No reader trusts the entry
today, but tighten the serialization so future callers can rely on the
mapping fall-through instead.
deserializeV4 ignored the uint32 size prefix and decompressLZ4 did an
unbounded io.ReadAll, so a malformed header (corrupt write, partial
upload, or actor with bucket-write access) could allocate up to LZ4's
~255x expansion of the compressed body. Read the prefix, reject sizes
above 64 MiB, and cap the LZ4 reader at expected+1 bytes.
A truncated or otherwise short .frm file (partial NFS write, disk full
mid-write) used to decompress cleanly into a short stream and serve
incorrect bytes to the chunker, or stay poisoned across reads until
manual eviction. Stat the cache file on hit and compare against the
FrameTable's compressed length; on mismatch remove the file and fall
through to the miss path, which refetches and rewrites it.
peerSeekable.OpenRangeReader fires PeerTransitionedError at most once
when the peer flips uploaded=true. If GCS then 404s the just-finalized
object due to gRPC client visibility lag, every subsequent call falls
through to base and returns a permanent ErrObjectNotExist because the
transition flag is sticky. Track when the first transition was emitted
and re-emit PeerTransitionedError on base 404 for the next 30 s so the
upper retry loop reloads the header and tries again.
The 10 s googleReadTimeout was applied to the context that backed every
subsequent Read on the returned ReadCloser. For compressed reads the
consumer pulls bytes through decompressor → tee → raw → GCS, and slow
NBD/UFFD back-pressure or cache writeback can expand the drain past
10 s and surface as context deadline exceeded. Keep the absolute
deadline only on NewRangeReader; cap each Read with an idle timer that
resets on progress.
@ValentaTomas ValentaTomas force-pushed the lev-compression-correctness-fixes branch from ddf1bab to 5584b06 Compare May 22, 2026 22:51
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef6431c76c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/orchestrator/pkg/sandbox/template/peerclient/seekable.go Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Concurrent reads lose transition retry
    • Removed the conditional CAS check so all concurrent goroutines in the retry window receive PeerTransitionedError and can retry after header swap.

Create PR

Or push these changes by commenting:

@cursor push 2c923d7144
Preview (2c923d7144)
diff --git a/packages/orchestrator/pkg/sandbox/template/peerclient/seekable.go b/packages/orchestrator/pkg/sandbox/template/peerclient/seekable.go
--- a/packages/orchestrator/pkg/sandbox/template/peerclient/seekable.go
+++ b/packages/orchestrator/pkg/sandbox/template/peerclient/seekable.go
@@ -145,9 +145,8 @@
 	if errors.Is(err, storage.ErrObjectNotExist) {
 		at := s.transitionAt.Load()
 		if at != 0 && time.Since(time.Unix(0, at)) < postTransitionRetryWindow {
-			if s.transitionAt.CompareAndSwap(at, 0) {
-				return nil, &storage.PeerTransitionedError{}
-			}
+			s.transitionAt.CompareAndSwap(at, 0)
+			return nil, &storage.PeerTransitionedError{}
 		}
 	}

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 60ed9c2. Configure here.

Comment thread packages/orchestrator/pkg/sandbox/template/peerclient/seekable.go Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant