perf: fast-path inline strings in ByteViewGroupValueBuilder::vectorized_append#21618
Open
EeshanBembi wants to merge 3 commits intoapache:mainfrom
Open
perf: fast-path inline strings in ByteViewGroupValueBuilder::vectorized_append#21618EeshanBembi wants to merge 3 commits intoapache:mainfrom
EeshanBembi wants to merge 3 commits intoapache:mainfrom
Conversation
…on types Closes apache#21144 Implements DFExtensionType for all remaining canonical Arrow extension types so they are recognized and pretty-printed by the extension type registry: - Bool8: displays Int8 values as 'true'/'false' instead of raw integers - Json: uses default string formatter (values are already valid JSON) - Opaque: uses default formatter - FixedShapeTensor: uses default formatter, storage_type computed from value_type and list_size - VariableShapeTensor: uses default formatter, storage_type computed from value_type and dimensions - TimestampWithOffset: uses default formatter All six types are registered in MemoryExtensionTypeRegistry::new_with_canonical_extension_types() alongside the existing UUID registration.
…ed_append When the input StringView/BinaryView array has no data buffers (all values ≤12 bytes, stored inline), skip the value() → make_view() round-trip in do_append_val_inner and instead copy the u128 views directly. Arrow guarantees valid arrays have zero-padded inline views, so the direct copy is semantically identical and lets the compiler vectorize the loop. Also pre-reserve views capacity in the slow path (non-inline strings) to avoid repeated Vec reallocation. Closes apache#21568
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #21568.
Rationale for this change
ByteViewGroupValueBuilder::vectorized_appendwas doing unnecessary work for short strings (≤12 bytes): for each row it calledarray.value(row)to decode the u128 view into a&[u8], then calledmake_viewto re-encode it back into a u128. The inputGenericByteViewArrayalready stores inline values in exactly that u128 format, so the round-trip is redundant.This mirrors the existing
HAS_BUFFERSspecialisation invectorized_equal_to_inner, which uses the samedata_buffers().is_empty()guard to take a direct-view-compare fast path for inline strings.What changes are included in this PR?
In
vectorized_append_inner, theNulls::Nonebranch now dispatches onarr.data_buffers().is_empty():self.views.extend(rows.iter().map(|&row| arr.views()[row])). Arrow's validity invariant guarantees inline views are zero-padded, so direct copy is semantically identical tovalue() → make_view().self.views.reserve(rows.len())before the existing loop to avoid repeated reallocation.Are these changes tested?
Covered by the existing 6 unit tests in
bytes_view::tests, all passing unchanged.test_byte_view_vectorized_operation_special_caseexercises the fast path directly (11-byte strings, no data buffers).Are there any user-facing changes?
No. Internal performance improvement only.
Benchmark
inline_null_0.0_size_1000/vectorized_append(8-byte strings, no nulls, 1 000 rows):