Skip to content

fix: reject truncated BinaryRow serialized bytes instead of panicking#364

Open
tonghuaroot wants to merge 1 commit into
apache:mainfrom
tonghuaroot:fix-binaryrow-truncated-body
Open

fix: reject truncated BinaryRow serialized bytes instead of panicking#364
tonghuaroot wants to merge 1 commit into
apache:mainfrom
tonghuaroot:fix-binaryrow-truncated-body

Conversation

@tonghuaroot
Copy link
Copy Markdown

What

BinaryRow::from_serialized_bytes currently validates only the 4-byte arity prefix and then passes the remaining bytes straight to from_bytes. A buffer that carries a valid arity prefix but a body shorter than the null bitmap therefore decodes "successfully", and the panic surfaces later when a reader touches the missing null-bitmap byte:

pub fn is_null_at(&self, pos: usize) -> bool {
    let bit_index = pos + Self::HEADER_SIZE_IN_BYTES as usize;
    let byte_index = bit_index / 8;
    let bit_offset = bit_index % 8;
    (self.data[byte_index] & (1 << bit_offset)) != 0   // index panic on a short buffer
}

from_serialized_bytes is used throughout the manifest / stats / partition read path (e.g. stats.rs, table_scan.rs, partition_listing.rs, the DataFusion system tables), and the decoded row is typically handed to a consumer that calls is_null_at first — format_partition_value, predicate evaluation, get_datum — so a truncated or malformed serialized BinaryRow aborts the reader with a bounds panic instead of a recoverable error.

Change

  • In from_serialized_bytes, after reading the arity, reject inputs whose body is shorter than cal_fix_part_size_in_bytes(arity) (null bitmap + fixed part), returning the crate's existing Error::UnexpectedError rather than constructing a row that will panic on read. A negative arity is also rejected, and the required size is computed in i64 so an absurd arity in malformed input cannot overflow the i32 size math.
  • As a second layer of defense, make is_null_at index the null bitmap with get, so a short buffer reports the field as not-null and the typed field readers (already bounds-checked via read_slice / read_byte_at) return a graceful Err instead of is_null_at panicking.

Tests

Added regression tests in the binary_row test module:

  • test_from_serialized_bytes_truncated_body — valid arity prefix but empty / too-short body → Err, no panic.
  • test_from_serialized_bytes_negative_arity — negative arity → Err.
  • test_is_null_at_short_buffer_does_not_panicis_null_at on a buffer lacking the null bitmap does not panic; the typed reader then returns Err.
  • test_from_serialized_bytes_well_formed_decodes — negative control: a correctly sized body still decodes and reads back.

cargo test -p paimon, cargo build -p paimon, cargo fmt --all -- --check, and cargo clippy -p paimon --all-targets -- -D warnings all pass locally.

`BinaryRow::from_serialized_bytes` only validated the 4-byte arity
prefix and then handed the remaining bytes straight to `from_bytes`. A
buffer that carries a valid arity prefix but a body shorter than the
null bitmap therefore decoded "successfully" and panicked later, when a
reader called `is_null_at` and indexed the missing null-bitmap byte
(e.g. via `format_partition_value`, predicate evaluation, or
`get_datum`).

Add a body-length check after reading the arity: reject inputs whose
body is shorter than `cal_fix_part_size_in_bytes(arity)` (null bitmap +
fixed part), and reject a negative arity. The required size is computed
in i64 so an absurd arity in malformed input cannot overflow. As a
second layer of defense, make `is_null_at` index the null bitmap with
`get` so a short buffer reports not-null and the typed field readers
return a graceful error rather than panicking.

Add regression tests covering a truncated body, a too-short body, a
negative arity, a short-buffer `is_null_at`, and a well-formed control.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant