Skip to content

tee: don't treat a short read as end-of-file#11755

Closed
kevinburke wants to merge 1 commit intouutils:mainfrom
kevinburke:tee-fix-short-read
Closed

tee: don't treat a short read as end-of-file#11755
kevinburke wants to merge 1 commit intouutils:mainfrom
kevinburke:tee-fix-short-read

Conversation

@kevinburke
Copy link
Copy Markdown
Contributor

Commit 13fb3be ("tee: increase buf size for large input") split copy() into a one-shot first read followed by a loop over a larger buffer. The first branch returned early if read(2) returned fewer bytes than FIRST_BUF_SIZE:

if bytes_count < FIRST_BUF_SIZE {
    output.flush()?;
    return Ok(len);
}

A short read is not EOF. POSIX read(2) only returns 0 at end-of-file; a return of N < buflen just means "that's what was available without blocking". Any pipeline where the upstream writer pauses between writes — a slow producer, a sleep in a shell pipeline, a systemd service emitting log lines in bursts — will see a short read on the very first call and will be cut off after one chunk.

Downstream this manifests as the writer dying with SIGPIPE (exit 141) on its next write, because tee has already closed its end of the pipe. We hit this with a Go program writing structured logs through 2>&1 | tee -a /var/log/foo.log in a systemd oneshot: the first slog line made it through, tee exited, and the next write killed the Go process before it could do any work.

Keep the two-buffer optimization — it's useful for large inputs — but gate the upgrade to the larger buffer on actually seeing a full-sized read, and keep looping on the small buffer until we do. Only read == 0 terminates the loop.

Add a regression test that writes two chunks separated by a delay, drains the first one from tee's stdout so we know tee has processed its first read/write cycle, then writes the second chunk and asserts it makes it through to both stdout and the output file. Without this fix the second write_in fails with "Broken pipe (os error 32)".

Commit 13fb3be ("tee: increase buf size for large input") split
copy() into a one-shot first read followed by a loop over a larger
buffer. The first branch returned early if `read(2)` returned fewer
bytes than FIRST_BUF_SIZE:

    if bytes_count < FIRST_BUF_SIZE {
        output.flush()?;
        return Ok(len);
    }

A short read is not EOF. POSIX `read(2)` only returns 0 at
end-of-file; a return of N < buflen just means "that's what was
available without blocking". Any pipeline where the upstream writer
pauses between writes — a slow producer, a `sleep` in a shell
pipeline, a systemd service emitting log lines in bursts — will see
a short read on the very first call and will be cut off after one
chunk.

Downstream this manifests as the writer dying with SIGPIPE (exit
141) on its next write, because tee has already closed its end of
the pipe. We hit this with a Go program writing structured logs
through `2>&1 | tee -a /var/log/foo.log` in a systemd oneshot: the
first slog line made it through, tee exited, and the next write
killed the Go process before it could do any work.

Keep the two-buffer optimization — it's useful for large inputs —
but gate the upgrade to the larger buffer on actually seeing a
full-sized read, and keep looping on the small buffer until we do.
Only `read == 0` terminates the loop.

Add a regression test that writes two chunks separated by a delay,
drains the first one from tee's stdout so we know tee has
processed its first read/write cycle, then writes the second chunk
and asserts it makes it through to both stdout and the output
file. Without this fix the second `write_in` fails with "Broken
pipe (os error 32)".
output.flush()?;
return Ok(len);
len += bytes_count;
if bytes_count == FIRST_BUF_SIZE {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think == is too strict and difficult to switch to code path for large file.

@github-actions
Copy link
Copy Markdown

GNU testsuite comparison:

GNU test failed: tests/tail/retry. tests/tail/retry is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/date/date-locale-hour (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tty/tty-eof (fails in this run but passes in the 'main' branch)
Note: The gnu test tests/csplit/csplit-heap is now being skipped but was previously passing.
Note: The gnu test tests/cut/bounded-memory is now being skipped but was previously passing.
Note: The gnu test tests/seq/seq-epipe is now being skipped but was previously passing.

@kevinburke
Copy link
Copy Markdown
Contributor Author

This is also fixed by #11686, closing.

@kevinburke kevinburke closed this Apr 11, 2026
@oech3
Copy link
Copy Markdown
Contributor

oech3 commented Apr 12, 2026

merged. You can open a PR for pure-Rust test for more targets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants