fix: reject truncated BinaryRow serialized bytes instead of panicking#364
Open
tonghuaroot wants to merge 1 commit into
Open
fix: reject truncated BinaryRow serialized bytes instead of panicking#364tonghuaroot wants to merge 1 commit into
tonghuaroot wants to merge 1 commit into
Conversation
`BinaryRow::from_serialized_bytes` only validated the 4-byte arity prefix and then handed the remaining bytes straight to `from_bytes`. A buffer that carries a valid arity prefix but a body shorter than the null bitmap therefore decoded "successfully" and panicked later, when a reader called `is_null_at` and indexed the missing null-bitmap byte (e.g. via `format_partition_value`, predicate evaluation, or `get_datum`). Add a body-length check after reading the arity: reject inputs whose body is shorter than `cal_fix_part_size_in_bytes(arity)` (null bitmap + fixed part), and reject a negative arity. The required size is computed in i64 so an absurd arity in malformed input cannot overflow. As a second layer of defense, make `is_null_at` index the null bitmap with `get` so a short buffer reports not-null and the typed field readers return a graceful error rather than panicking. Add regression tests covering a truncated body, a too-short body, a negative arity, a short-buffer `is_null_at`, and a well-formed control.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
BinaryRow::from_serialized_bytescurrently validates only the 4-byte arity prefix and then passes the remaining bytes straight tofrom_bytes. A buffer that carries a valid arity prefix but a body shorter than the null bitmap therefore decodes "successfully", and the panic surfaces later when a reader touches the missing null-bitmap byte:from_serialized_bytesis used throughout the manifest / stats / partition read path (e.g.stats.rs,table_scan.rs,partition_listing.rs, the DataFusion system tables), and the decoded row is typically handed to a consumer that callsis_null_atfirst —format_partition_value, predicate evaluation,get_datum— so a truncated or malformed serializedBinaryRowaborts the reader with a bounds panic instead of a recoverable error.Change
from_serialized_bytes, after reading the arity, reject inputs whose body is shorter thancal_fix_part_size_in_bytes(arity)(null bitmap + fixed part), returning the crate's existingError::UnexpectedErrorrather than constructing a row that will panic on read. A negative arity is also rejected, and the required size is computed ini64so an absurd arity in malformed input cannot overflow thei32size math.is_null_atindex the null bitmap withget, so a short buffer reports the field as not-null and the typed field readers (already bounds-checked viaread_slice/read_byte_at) return a gracefulErrinstead ofis_null_atpanicking.Tests
Added regression tests in the
binary_rowtest module:test_from_serialized_bytes_truncated_body— valid arity prefix but empty / too-short body →Err, no panic.test_from_serialized_bytes_negative_arity— negative arity →Err.test_is_null_at_short_buffer_does_not_panic—is_null_aton a buffer lacking the null bitmap does not panic; the typed reader then returnsErr.test_from_serialized_bytes_well_formed_decodes— negative control: a correctly sized body still decodes and reads back.cargo test -p paimon,cargo build -p paimon,cargo fmt --all -- --check, andcargo clippy -p paimon --all-targets -- -D warningsall pass locally.