fix(arrow-ipc): bound MessageReader allocations by actual stream bytes by masumi-ryugo · Pull Request #9869 · apache/arrow-rs

masumi-ryugo · 2026-05-01T15:19:16Z

Which issue does this PR close?

N/A — found via cargo-fuzz libFuzzer harness over StreamReader::try_new.
Happy to file a tracking issue first if maintainers prefer.

Rationale for this change

MessageReader::maybe_next decodes the on-wire meta_len (after the
existing check that rejects negative values) and the FlatBuffer message's
bodyLength, and uses both directly for up-front allocations:

self.buf.resize(meta_len, 0);                                    // attacker-controlled
let mut buf = MutableBuffer::from_len_zeroed(message.bodyLength() as usize);

A 4-byte input — 00 1b 00 48 — claims a ~1.2 GiB metadata payload via
meta_len = i32::from_le_bytes(...) = 0x4800_1b00, driving a 1.2 GiB
Vec::resize before any short-read could fail. Roughly a 3×10⁸
amplification factor from input bytes to allocation; OOM-kills the
process on a 2 GiB-rss-limited fuzzer or a memory-constrained service.

Per SECURITY.md this is a bug, not a vulnerability (no RCE, no
information disclosure — only availability), but it is reachable from
the public StreamReader::try_new entrypoint and is the same shape of
bug that the recent meta_len-negative fix addressed.

What changes are included in this PR?

Read the metadata bytes via (&mut self.reader).take(meta_len).read_to_end(&mut self.buf)
followed by an explicit length check, so the buffer grows as bytes
actually arrive instead of being eagerly resized to the (untrusted)
declared length.
Add a read_body_into_buffer helper that fills a MutableBuffer in
64 KiB chunks via extend_from_slice. This preserves the cache-line
aligned allocation that downstream Arrow consumers rely on, while
keeping the high-water-mark allocation proportional to the bytes
actually delivered by the underlying reader.
Validate bodyLength (i64) via usize::try_from, surfacing a
negative or out-of-range value as a ParseError instead of wrapping
silently into a huge usize on 64-bit and a different huge usize
on 32-bit.
One regression test, test_stream_reader_huge_meta_len_does_not_oom,
that runs the 4-byte fuzzer repro through StreamReader::try_new and
asserts a clean Err.

Are these changes tested?

Yes — new unit test as above. cargo test -p arrow-ipc --release
(112 unit + 17 integration tests), cargo clippy -p arrow-ipc --all-targets -- -D warnings, and cargo fmt --check are all clean.

Are there any user-facing changes?

Yes — malformed IPC streams that previously triggered a multi-GB
allocation now return an ArrowError::ParseError early. No behavior
change for well-formed streams; allocation is still cache-line aligned
and the final buffer shape (MutableBuffer) is unchanged.

Reproducer

4 bytes:

0x00 0x1b 0x00 0x48

let bytes: [u8; 4] = [0x00, 0x1b, 0x00, 0x48];
let _ = arrow_ipc::reader::StreamReader::try_new(std::io::Cursor::new(bytes), None);

Before this PR: a ~1.2 GiB allocation completes (or OOM-kills the
process under a memory limit) before read_exact discovers there are
0 bytes left and returns an EOF error.

After this PR: Err(ArrowError::ParseError("Unexpected EOF reading 1207975168 bytes of message metadata, got 0")), with peak allocation
on the order of the 64 KiB read chunk plus a small flatbuffer scratch.

Found via

cargo-fuzz libFuzzer harness wrapping StreamReader::try_new.

`MessageReader::maybe_next` decoded the on-wire `meta_len` (after the round-1 check that rejects negative values) and the FlatBuffer message's `bodyLength` and used both directly for up-front allocations: self.buf.resize(meta_len, 0); // <— attacker-controlled let mut buf = MutableBuffer::from_len_zeroed(message.bodyLength() as usize); A 4-byte input — `00 1b 00 48` — claims a ~1.2 GiB metadata payload via `meta_len = i32::from_le_bytes(...) = 0x4800_1b00`, driving a 1.2 GiB `Vec::resize` before any short-read could fail. ~3×10^8 amplification factor from input bytes to allocation; OOM-kills the process on a 2 GB-rss-limited fuzzer. Read both metadata and body via incremental reads tied to the bytes actually delivered by the underlying `Read`: * metadata uses `take(meta_len).read_to_end(&mut self.buf)` followed by an explicit length check; * body is filled by a new `read_body_into_buffer` helper that `extend_from_slice`s 64 KiB chunks into a `MutableBuffer`, preserving the cache-line-aligned allocation that downstream Arrow consumers rely on while keeping the high-water-mark proportional to the bytes actually received. Add `bodyLength` validation (`usize::try_from`) so a negative i64 is surfaced as a `ParseError` instead of wrapping into a huge `usize`. Add a regression test (`test_stream_reader_huge_meta_len_does_not_oom`) that feeds the 4-byte fuzzer repro through `StreamReader::try_new` and asserts a clean `Err`. Found via cargo-fuzz libFuzzer harness wrapping `StreamReader::try_new`.

youichi-uda · 2026-05-03T07:20:20Z

Independent confirmation from a fresh cargo-fuzz harness on this same code path — flagging this here because it's exactly the kind of evidence #5332 was set up to produce.

Setup: I added a arrow-ipc/fuzz/ipc_stream_reader cargo-fuzz target as part of the fuzz infrastructure proposed in #5332 (branch fuzz/initial-harnesses on my fork). The harness just feeds &[u8] straight into StreamReader::try_new(Cursor::new(data), None) and iterates batches.

Pre-fix (current main, with no seed corpus, no dictionary):

libFuzzer hits an OOM in well under 60 seconds of run time.
Smallest crasher it produces is 4 bytes: [0x30, 0x22, 0x32, 0x2f]. Decoded as little-endian i32, that's a meta_len of 791,814,704 (~755 MiB), which goes straight into self.buf.resize(meta_len, 0) before any short-read can surface.
This is a different trigger from the [0x00, 0x1b, 0x00, 0x48] regression test in this PR, but the same root cause and the same code path. Two distinct 4-byte inputs hitting the same OOM is a good sign the regression test isn't over-fitted.

Post-fix (this PR's code):

The 4-byte repro [0x30, 0x22, 0x32, 0x2f] exits in 0 ms with a ParseError, no allocation spike. ✓
200,000 fuzz runs in 60 s under -rss_limit_mb=512 (well below the libFuzzer default of 2.5 GiB): 0 OOMs, 0 crashes, peak RSS 121 MiB, 246 edges / 288 features / 29 corpus entries. Reasonable coverage even from an empty corpus, suggesting both meta_len and bodyLength paths are being exercised.

So this PR cleanly defuses the entire class of "single u32 in the header drives a multi-GB allocation" for StreamReader, not just the one specific trigger in the regression test. Happy to keep a periodic libFuzzer run pointed at this once #5332 lands so we have a place to park future regressions like this in CI.

alamb · 2026-05-07T12:59:24Z

run benchmark ipc_reader ipc_writer

adriangbot · 2026-05-07T13:02:47Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4397308106-2048-ckfgz 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing fix/arrow-ipc-body-length-bounded (c36a092) to fd86c75 (merge-base) diff
BENCH_NAME=ipc_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench ipc_reader
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-07T13:03:21Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4397308106-2049-2xz8p 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing fix/arrow-ipc-body-length-bounded (c36a092) to fd86c75 (merge-base) diff
BENCH_NAME=ipc_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench ipc_writer
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-07T13:04:46Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                                                 fix_arrow-ipc-body-length-bounded      main
-----                                                 ---------------------------------      ----
arrow_ipc_stream_writer/FileWriter/write_10           1.08    186.3±1.95µs        ? ?/sec    1.00    172.5±1.77µs        ? ?/sec
arrow_ipc_stream_writer/StreamWriter/write_10         1.09    186.0±1.90µs        ? ?/sec    1.00    170.1±1.83µs        ? ?/sec
arrow_ipc_stream_writer/StreamWriter/write_10/zstd    1.01      7.3±0.03ms        ? ?/sec    1.00      7.3±0.07ms        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	35.0s
Peak memory	2.6 GiB
Avg memory	2.6 GiB
CPU user	31.8s
CPU sys	0.7s
Peak spill	0 B

branch

Metric	Value
Wall time	30.0s
Peak memory	2.6 GiB
Avg memory	2.6 GiB
CPU user	27.1s
CPU sys	0.1s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-05-07T13:06:01Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                                                       fix_arrow-ipc-body-length-bounded      main
-----                                                       ---------------------------------      ----
arrow_ipc_reader/FileReader/no_validation/read_10           1.00    122.1±3.78µs        ? ?/sec    1.01    123.0±7.20µs        ? ?/sec
arrow_ipc_reader/FileReader/no_validation/read_10/mmap      1.00     57.3±0.76µs        ? ?/sec    1.00     57.1±0.63µs        ? ?/sec
arrow_ipc_reader/FileReader/read_10                         1.01   417.5±67.08µs        ? ?/sec    1.00   415.4±42.33µs        ? ?/sec
arrow_ipc_reader/FileReader/read_10/mmap                    1.01   469.1±56.62µs        ? ?/sec    1.00   466.7±54.18µs        ? ?/sec
arrow_ipc_reader/StreamReader/no_validation/read_10         1.86   242.0±13.31µs        ? ?/sec    1.00   130.1±13.80µs        ? ?/sec
arrow_ipc_reader/StreamReader/no_validation/read_10/zstd    1.01      2.5±0.02ms        ? ?/sec    1.00      2.5±0.02ms        ? ?/sec
arrow_ipc_reader/StreamReader/read_10                       1.14   512.6±55.61µs        ? ?/sec    1.00   449.5±59.10µs        ? ?/sec
arrow_ipc_reader/StreamReader/read_10/zstd                  1.01      2.7±0.07ms        ? ?/sec    1.00      2.7±0.07ms        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	85.0s
Peak memory	2.7 GiB
Avg memory	2.6 GiB
CPU user	71.4s
CPU sys	11.3s
Peak spill	0 B

branch

Metric	Value
Wall time	85.0s
Peak memory	2.7 GiB
Avg memory	2.6 GiB
CPU user	72.3s
CPU sys	10.6s
Peak spill	0 B

File an issue against this benchmark runner

alamb · 2026-05-08T00:10:46Z

This implementation seems to be quite a bit slower

github-actions Bot added the arrow Changes to the arrow crate label May 1, 2026

youichi-uda mentioned this pull request May 3, 2026

Fuzz tests for Arrow/Parquet #5332

Open

youichi-uda mentioned this pull request May 3, 2026

fix(parquet): Prevent negative list sizes in Thrift compact protocol parser #9868

Merged

This was referenced May 3, 2026

fix(parquet): bound thrift list capacity by remaining input bytes #9883

Closed

fix(rust): Bound polars-parquet thrift list capacity by remaining input pola-rs/polars#27490

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(arrow-ipc): bound MessageReader allocations by actual stream bytes#9869

fix(arrow-ipc): bound MessageReader allocations by actual stream bytes#9869
masumi-ryugo wants to merge 1 commit intoapache:mainfrom
masumi-ryugo:fix/arrow-ipc-body-length-bounded

masumi-ryugo commented May 1, 2026

Uh oh!

youichi-uda commented May 3, 2026

Uh oh!

alamb commented May 7, 2026

Uh oh!

adriangbot commented May 7, 2026

Uh oh!

adriangbot commented May 7, 2026

Uh oh!

adriangbot commented May 7, 2026

Uh oh!

adriangbot commented May 7, 2026

Uh oh!

alamb commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

masumi-ryugo commented May 1, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Reproducer

Found via

Uh oh!

youichi-uda commented May 3, 2026

Uh oh!

alamb commented May 7, 2026

Uh oh!

adriangbot commented May 7, 2026

Uh oh!

adriangbot commented May 7, 2026

Uh oh!

adriangbot commented May 7, 2026

Uh oh!

adriangbot commented May 7, 2026

Uh oh!

alamb commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants