Skip to content

perf: fast-path inline strings in ByteViewGroupValueBuilder::vectorized_append#21618

Open
EeshanBembi wants to merge 3 commits intoapache:mainfrom
EeshanBembi:main
Open

perf: fast-path inline strings in ByteViewGroupValueBuilder::vectorized_append#21618
EeshanBembi wants to merge 3 commits intoapache:mainfrom
EeshanBembi:main

Conversation

@EeshanBembi
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #21568.

Rationale for this change

ByteViewGroupValueBuilder::vectorized_append was doing unnecessary work for short strings (≤12 bytes): for each row it called array.value(row) to decode the u128 view into a &[u8], then called make_view to re-encode it back into a u128. The input GenericByteViewArray already stores inline values in exactly that u128 format, so the round-trip is redundant.

This mirrors the existing HAS_BUFFERS specialisation in vectorized_equal_to_inner, which uses the same data_buffers().is_empty() guard to take a direct-view-compare fast path for inline strings.

What changes are included in this PR?

In vectorized_append_inner, the Nulls::None branch now dispatches on arr.data_buffers().is_empty():

  • Fast path (no data buffers → all values ≤12 bytes inline): copies u128 views directly via self.views.extend(rows.iter().map(|&row| arr.views()[row])). Arrow's validity invariant guarantees inline views are zero-padded, so direct copy is semantically identical to value() → make_view().
  • Slow path (array has non-inline strings): adds self.views.reserve(rows.len()) before the existing loop to avoid repeated reallocation.

Are these changes tested?

Covered by the existing 6 unit tests in bytes_view::tests, all passing unchanged. test_byte_view_vectorized_operation_special_case exercises the fast path directly (11-byte strings, no data buffers).

Are there any user-facing changes?

No. Internal performance improvement only.

Benchmark

inline_null_0.0_size_1000/vectorized_append (8-byte strings, no nulls, 1 000 rows):

time
Before 3.37 µs
After 495 ns
Change −85.3% (6.8× faster)

ebembi-crdb and others added 3 commits April 7, 2026 18:33
…on types

Closes apache#21144

Implements DFExtensionType for all remaining canonical Arrow extension
types so they are recognized and pretty-printed by the extension type
registry:

- Bool8: displays Int8 values as 'true'/'false' instead of raw integers
- Json: uses default string formatter (values are already valid JSON)
- Opaque: uses default formatter
- FixedShapeTensor: uses default formatter, storage_type computed from
  value_type and list_size
- VariableShapeTensor: uses default formatter, storage_type computed
  from value_type and dimensions
- TimestampWithOffset: uses default formatter

All six types are registered in
MemoryExtensionTypeRegistry::new_with_canonical_extension_types()
alongside the existing UUID registration.
…ed_append

When the input StringView/BinaryView array has no data buffers (all values
≤12 bytes, stored inline), skip the value() → make_view() round-trip in
do_append_val_inner and instead copy the u128 views directly. Arrow
guarantees valid arrays have zero-padded inline views, so the direct copy
is semantically identical and lets the compiler vectorize the loop.

Also pre-reserve views capacity in the slow path (non-inline strings) to
avoid repeated Vec reallocation.

Closes apache#21568
@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate common Related to common crate physical-plan Changes to the physical-plan crate labels Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate logical-expr Logical plan and expressions physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize ByteViewGroupValueBuilder vectorized_append

1 participant