Fix race in closing transfer streams#154
Conversation
Signed-off-by: Derek McGowan <derek@mcg.dev>
There was a problem hiding this comment.
Pull request overview
This PR updates the transfer-stream framing/teardown protocol to avoid transport-level half-closes (which vsock-proxy can translate into bidirectional shutdown) by introducing an application-level EOF marker (zero-length frame), and adds tests/CI coverage around the framing layer.
Changes:
- Switch stream shutdown semantics from
Close()/CloseWrite()to a zero-length “EOF frame”, and makeClose()idempotent withsync.Once. - Add frame-size limits (10 MiB) and extend the shim bridge to understand zero-length EOF frames.
- Add unit + fuzz tests for framing, and add CI jobs to run unit tests (race) and fuzz targets.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| plugins/vminit/streaming/plugin.go | Adds EOF-frame handling, maxFrameSize, and makes Close() send EOF once (no transport close). |
| plugins/vminit/streaming/plugin_test.go | Adds unit and fuzz coverage for the framing protocol and EOF behavior. |
| plugins/shim/streaming/plugin.go | Updates shim bridge to send/recognize EOF frames instead of CloseWrite(). |
| integration/transfer_test.go | Updates integration test stream framing to match EOF-frame protocol + size guard. |
| .github/workflows/ci.yml | Adds unit-test (race) and fuzz-test CI jobs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (s *vsockStream) Close() error { | ||
| return s.conn.Close() | ||
| var err error | ||
| s.once.Do(func() { | ||
| // Send a zero-length frame as an application-level EOF marker. | ||
| // Do NOT close the underlying connection — the vsock proxy sends | ||
| // a bidirectional SHUTDOWN on transport close, which can race with | ||
| // in-flight data packets and cause the peer to lose the last chunk. | ||
| // The connection is cleaned up when the VM shuts down. | ||
| err = binary.Write(s.conn, binary.BigEndian, uint32(0)) |
There was a problem hiding this comment.
vsockStream.Close() no longer closes the underlying net.Conn, but streamGetter.Get explicitly states the caller is responsible for closing the stream. With the stream removed from s.streams, service.Shutdown() also won’t close it, so this can leak per-stream connections until VM shutdown. Consider either (a) updating the contract/comment and ensuring another component closes these conns, or (b) implementing a safe transport teardown once both directions have observed/sent the zero-length EOF marker (or otherwise deterministically closing after EOF is written).
| // Send a zero-length frame as an application-level EOF marker | ||
| // so the VM sees EOF on its reads. We avoid CloseWrite() | ||
| // because the vsock proxy turns transport-level shutdown into | ||
| // a bidirectional SHUTDOWN, which kills the reverse direction | ||
| // (VM -> TTRPC) and can cause the peer to lose in-flight data. | ||
| binary.Write(vmConn, binary.BigEndian, uint32(0)) |
There was a problem hiding this comment.
The zero-length EOF write ignores the binary.Write error. If this fails (e.g., VM conn already closed/backpressured), the bridge will still report only bridgeTTRPCToVM’s error, making failures harder to debug. Capture and propagate/log the EOF write error (and consider sending the EOF marker only when bridgeTTRPCToVM ends with an expected stream EOF rather than an arbitrary error).
| // Send a zero-length frame as an application-level EOF marker | |
| // so the VM sees EOF on its reads. We avoid CloseWrite() | |
| // because the vsock proxy turns transport-level shutdown into | |
| // a bidirectional SHUTDOWN, which kills the reverse direction | |
| // (VM -> TTRPC) and can cause the peer to lose in-flight data. | |
| binary.Write(vmConn, binary.BigEndian, uint32(0)) | |
| if err == nil || err == io.EOF { | |
| // Send a zero-length frame as an application-level EOF marker | |
| // so the VM sees EOF on its reads. We avoid CloseWrite() | |
| // because the vsock proxy turns transport-level shutdown into | |
| // a bidirectional SHUTDOWN, which kills the reverse direction | |
| // (VM -> TTRPC) and can cause the peer to lose in-flight data. | |
| if eofErr := binary.Write(vmConn, binary.BigEndian, uint32(0)); eofErr != nil { | |
| if err != nil { | |
| done <- fmt.Errorf("ttrpc->vm bridge ended with %w and failed to write vm EOF marker: %v", err, eofErr) | |
| return | |
| } | |
| done <- fmt.Errorf("failed to write vm EOF marker: %w", eofErr) | |
| return | |
| } | |
| } |
| var length uint32 | ||
| if err := binary.Read(conn, binary.BigEndian, &length); err != nil { | ||
| return err | ||
| } | ||
| // A zero-length frame is an application-level EOF marker. | ||
| if length == 0 { | ||
| return nil | ||
| } | ||
| data := make([]byte, length) | ||
| if _, err := io.ReadFull(conn, data); err != nil { |
There was a problem hiding this comment.
bridgeVMToTTRPC allocates data := make([]byte, length) directly from a VM-provided length prefix without any upper bound. A buggy/malicious VM can send a huge length and cause the shim to attempt a massive allocation (OOM/DoS). Add a maxFrameSize guard here (matching the vminit/framedStream logic) and reject oversized frames before allocating/reading.
| // Send a zero-length frame as an application-level EOF signal. | ||
| // We avoid CloseWrite()/Close() here because the vsock proxy | ||
| // sends a full bidirectional SHUTDOWN when it sees transport-level | ||
| // EOF, which kills the reverse direction (window updates) too. | ||
| err = binary.Write(s.conn, binary.BigEndian, uint32(0)) |
There was a problem hiding this comment.
framedStream.Close() now only writes the zero-length EOF marker and does not close the underlying net.Conn. In this integration test helper, that means each StartStream connection can remain open until the VM/test teardown, which can accumulate across tests or subtests. Consider closing the conn after writing EOF (or exposing an explicit transport-close used by tests) so integration tests don’t leak resources.
| // Send a zero-length frame as an application-level EOF signal. | |
| // We avoid CloseWrite()/Close() here because the vsock proxy | |
| // sends a full bidirectional SHUTDOWN when it sees transport-level | |
| // EOF, which kills the reverse direction (window updates) too. | |
| err = binary.Write(s.conn, binary.BigEndian, uint32(0)) | |
| // Send a zero-length frame as an application-level EOF signal, | |
| // then close the underlying connection so integration tests do | |
| // not leak stream transports across test cases. | |
| writeErr := binary.Write(s.conn, binary.BigEndian, uint32(0)) | |
| closeErr := s.conn.Close() | |
| if writeErr != nil { | |
| err = writeErr | |
| return | |
| } | |
| err = closeErr |
473b3bd to
1b5dbb8
Compare
Close()/CloseWrite()) with application-level EOF (zero-length frame) for transfer streams. The vsock proxy sends a bidirectional SHUTDOWN ontransport close, which races with in-flight data and can cause the peer to lose the last chunk.
plugins/shim/streaming) to use the same zero-length EOF protocol — the original commit only fixed the direct-vsock path but the production TTRPC→VM pathstill used
CloseWrite().bridgeVMToTTRPCso the shim bridge exits cleanly on VM-side EOF instead of forwarding an empty message.maxFrameSize(10 MiB) guard onRecv()to prevent OOM from a buggy or malicious peer sending a huge length prefix.sync.OncetoClose()so the EOF marker is sent exactly once.connection error handling.
Fuzz*targets across all packages, 60s each).