Skip to content

Fix race in closing transfer streams#154

Open
dmcgowan wants to merge 1 commit intocontainerd:mainfrom
dmcgowan:fix-transfer-close-race
Open

Fix race in closing transfer streams#154
dmcgowan wants to merge 1 commit intocontainerd:mainfrom
dmcgowan:fix-transfer-close-race

Conversation

@dmcgowan
Copy link
Copy Markdown
Member

  • Replace transport-level close (Close()/CloseWrite()) with application-level EOF (zero-length frame) for transfer streams. The vsock proxy sends a bidirectional SHUTDOWN on
    transport close, which races with in-flight data and can cause the peer to lose the last chunk.
  • Fix the shim bridge (plugins/shim/streaming) to use the same zero-length EOF protocol — the original commit only fixed the direct-vsock path but the production TTRPC→VM path
    still used CloseWrite().
  • Handle zero-length frames in bridgeVMToTTRPC so the shim bridge exits cleanly on VM-side EOF instead of forwarding an empty message.
  • Add maxFrameSize (10 MiB) guard on Recv() to prevent OOM from a buggy or malicious peer sending a huge length prefix.
  • Add sync.Once to Close() so the EOF marker is sent exactly once.
  • Add unit tests (10 cases) and fuzz tests (3 targets) for the streaming framing protocol covering roundtrip, EOF, bidirectional, large messages, oversized frame rejection, and
    connection error handling.
  • Add CI jobs for unit tests (with race detector) and fuzz tests (auto-discovers all Fuzz* targets across all packages, 60s each).

Signed-off-by: Derek McGowan <derek@mcg.dev>
Copilot AI review requested due to automatic review settings April 11, 2026 06:19
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the transfer-stream framing/teardown protocol to avoid transport-level half-closes (which vsock-proxy can translate into bidirectional shutdown) by introducing an application-level EOF marker (zero-length frame), and adds tests/CI coverage around the framing layer.

Changes:

  • Switch stream shutdown semantics from Close()/CloseWrite() to a zero-length “EOF frame”, and make Close() idempotent with sync.Once.
  • Add frame-size limits (10 MiB) and extend the shim bridge to understand zero-length EOF frames.
  • Add unit + fuzz tests for framing, and add CI jobs to run unit tests (race) and fuzz targets.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
plugins/vminit/streaming/plugin.go Adds EOF-frame handling, maxFrameSize, and makes Close() send EOF once (no transport close).
plugins/vminit/streaming/plugin_test.go Adds unit and fuzz coverage for the framing protocol and EOF behavior.
plugins/shim/streaming/plugin.go Updates shim bridge to send/recognize EOF frames instead of CloseWrite().
integration/transfer_test.go Updates integration test stream framing to match EOF-frame protocol + size guard.
.github/workflows/ci.yml Adds unit-test (race) and fuzz-test CI jobs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 263 to +271
func (s *vsockStream) Close() error {
return s.conn.Close()
var err error
s.once.Do(func() {
// Send a zero-length frame as an application-level EOF marker.
// Do NOT close the underlying connection — the vsock proxy sends
// a bidirectional SHUTDOWN on transport close, which can race with
// in-flight data packets and cause the peer to lose the last chunk.
// The connection is cleaned up when the VM shuts down.
err = binary.Write(s.conn, binary.BigEndian, uint32(0))
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vsockStream.Close() no longer closes the underlying net.Conn, but streamGetter.Get explicitly states the caller is responsible for closing the stream. With the stream removed from s.streams, service.Shutdown() also won’t close it, so this can leak per-stream connections until VM shutdown. Consider either (a) updating the contract/comment and ensuring another component closes these conns, or (b) implementing a safe transport teardown once both directions have observed/sent the zero-length EOF marker (or otherwise deterministically closing after EOF is written).

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +109
// Send a zero-length frame as an application-level EOF marker
// so the VM sees EOF on its reads. We avoid CloseWrite()
// because the vsock proxy turns transport-level shutdown into
// a bidirectional SHUTDOWN, which kills the reverse direction
// (VM -> TTRPC) and can cause the peer to lose in-flight data.
binary.Write(vmConn, binary.BigEndian, uint32(0))
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zero-length EOF write ignores the binary.Write error. If this fails (e.g., VM conn already closed/backpressured), the bridge will still report only bridgeTTRPCToVM’s error, making failures harder to debug. Capture and propagate/log the EOF write error (and consider sending the EOF marker only when bridgeTTRPCToVM ends with an expected stream EOF rather than an arbitrary error).

Suggested change
// Send a zero-length frame as an application-level EOF marker
// so the VM sees EOF on its reads. We avoid CloseWrite()
// because the vsock proxy turns transport-level shutdown into
// a bidirectional SHUTDOWN, which kills the reverse direction
// (VM -> TTRPC) and can cause the peer to lose in-flight data.
binary.Write(vmConn, binary.BigEndian, uint32(0))
if err == nil || err == io.EOF {
// Send a zero-length frame as an application-level EOF marker
// so the VM sees EOF on its reads. We avoid CloseWrite()
// because the vsock proxy turns transport-level shutdown into
// a bidirectional SHUTDOWN, which kills the reverse direction
// (VM -> TTRPC) and can cause the peer to lose in-flight data.
if eofErr := binary.Write(vmConn, binary.BigEndian, uint32(0)); eofErr != nil {
if err != nil {
done <- fmt.Errorf("ttrpc->vm bridge ended with %w and failed to write vm EOF marker: %v", err, eofErr)
return
}
done <- fmt.Errorf("failed to write vm EOF marker: %w", eofErr)
return
}
}

Copilot uses AI. Check for mistakes.
Comment on lines 168 to 177
var length uint32
if err := binary.Read(conn, binary.BigEndian, &length); err != nil {
return err
}
// A zero-length frame is an application-level EOF marker.
if length == 0 {
return nil
}
data := make([]byte, length)
if _, err := io.ReadFull(conn, data); err != nil {
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bridgeVMToTTRPC allocates data := make([]byte, length) directly from a VM-provided length prefix without any upper bound. A buggy/malicious VM can send a huge length and cause the shim to attempt a massive allocation (OOM/DoS). Add a maxFrameSize guard here (matching the vminit/framedStream logic) and reject oversized frames before allocating/reading.

Copilot uses AI. Check for mistakes.
Comment on lines +186 to +190
// Send a zero-length frame as an application-level EOF signal.
// We avoid CloseWrite()/Close() here because the vsock proxy
// sends a full bidirectional SHUTDOWN when it sees transport-level
// EOF, which kills the reverse direction (window updates) too.
err = binary.Write(s.conn, binary.BigEndian, uint32(0))
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

framedStream.Close() now only writes the zero-length EOF marker and does not close the underlying net.Conn. In this integration test helper, that means each StartStream connection can remain open until the VM/test teardown, which can accumulate across tests or subtests. Consider closing the conn after writing EOF (or exposing an explicit transport-close used by tests) so integration tests don’t leak resources.

Suggested change
// Send a zero-length frame as an application-level EOF signal.
// We avoid CloseWrite()/Close() here because the vsock proxy
// sends a full bidirectional SHUTDOWN when it sees transport-level
// EOF, which kills the reverse direction (window updates) too.
err = binary.Write(s.conn, binary.BigEndian, uint32(0))
// Send a zero-length frame as an application-level EOF signal,
// then close the underlying connection so integration tests do
// not leak stream transports across test cases.
writeErr := binary.Write(s.conn, binary.BigEndian, uint32(0))
closeErr := s.conn.Close()
if writeErr != nil {
err = writeErr
return
}
err = closeErr

Copilot uses AI. Check for mistakes.
@dmcgowan dmcgowan force-pushed the fix-transfer-close-race branch from 473b3bd to 1b5dbb8 Compare April 13, 2026 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants