feat(parquet): RowSelection can be backed by a BooleanBuffer#10141
feat(parquet): RowSelection can be backed by a BooleanBuffer#10141haohuaijin wants to merge 13 commits into
RowSelection can be backed by a BooleanBuffer#10141Conversation
|
Thanks @alamb for the pointers. I looked at #6624, #7454, and the work that landed in #8733. My understanding is:
This does not replace selector-backed selections. Selectors are still better for clustered/page-index-style selections. The case this helps is when the caller already has a row-level bitmap, such as from an external index, FTS index, or bitmap index. Today that bitmap has to go through So I see this as a small representation-layer improvement that complements #8733. #8733 added mask execution; this PR adds a direct bitmap-backed input path for it. It does not try to implement the broader adaptive predicate-pushdown/page-cache design from #7454. |
|
run benchmark arrow_reader arrow_reader_row_filter |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing export-mask (1720537) to 2e035fd (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing export-mask (1720537) to 2e035fd (merge-base) diff File an issue against this benchmark runner |
|
run benchmark arrow_reader_clickbench |
|
Merging up to fix CI |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing export-mask (7b102f4) to 6ba533d (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
run benchmark row_selector row_selection_cursor |
alamb
left a comment
There was a problem hiding this comment.
Thank you @haohuaijin . I think this is totally the right direction and I really like where this is going
As long as this doesn't cause a performance regression I think it is good to go.
I have actually tried this myself (see #6624) as did @XiangpengHao in #6624.
Some other thoughts
- We need to make sure it doesn't cause a performance regression (I can help here)
- It currently has a public API change, so we can't merge it until main opens for the next major release (likely in mid-late July). If we can figure out how to keep the API the same we could potentially release it sooner
I haven' tmade it through the entire PR yet, but I did fire off some benchmarks
|
|
||
| #[inline] | ||
| fn next(&mut self) -> Option<RowSelector> { | ||
| match self { |
There was a problem hiding this comment.
In general I think this will add a branch in the main loops - every chunk of bits will require matching -- it might make sense to figure out how to have iterators for the two different types, so that code paths that want to iterate over the selection can match outside
match iter {
RowSelectionIter::Selectors(iter) => { for i in iter ... }, // special loop for selector backed
RowSelectionIter::Mask(iter) => { for i in iter ... }, // special loop for mask backed
}| if self.finished { | ||
| return None; | ||
| } | ||
| match self.slices.next() { |
| impl Iterator for MaskRunIter<'_> { | ||
| type Item = RowSelector; | ||
|
|
||
| fn next(&mut self) -> Option<RowSelector> { |
There was a problem hiding this comment.
I think this is the main API that is not compatible -- it need to return &RowSelector -- if we could change that we can merge this PR before the next major release (with breaking API changes)
There was a problem hiding this comment.
I updated RowSelection::iter() to keep returning Iterator<Item = &RowSelector>.
- For selector-backed selections this still borrows from the internal
Vec<RowSelector>. - For mask-backed selections I added a lazy selector cache, because the returned
&RowSelectorvalues need stable storage insideRowSelection; otherwise they would point into a temporaryVec.
The internal hot paths still avoid this cache and use the BooleanBuffer or MaskRunIter directly.
the struct is below
pub struct RowSelection {
inner: RowSelectionInner,
}
pub(crate) enum RowSelectionInner {
Selectors(Vec<RowSelector>),
Mask(Box<MaskSelection>),
}
pub(crate) struct MaskSelection {
mask: BooleanBuffer,
selectors: OnceLock<Vec<RowSelector>>,
}I am not fully sure this is the best approach, so suggestions and feedback are welcome.
| if let (Some(l), Some(r)) = (self.as_mask(), other.as_mask()) { | ||
| return Self::from_boolean_buffer(intersect_masks(l, r)); | ||
| } | ||
| let l = self.materialize_for_combine(); |
There was a problem hiding this comment.
this is a clever idea -- as a follow on we could potentially implement specialized versions of each of these implementations
There was a problem hiding this comment.
for example, I think and_then can use some of the fancy BMI instructions that @devanbenz is investigating in #10136
| // Verify that the size of RowGroupDecoderState does not grow too large | ||
| fn test_structure_size() { | ||
| assert_eq!(std::mem::size_of::<RowGroupDecoderState>(), 232); | ||
| assert_eq!(std::mem::size_of::<RowGroupDecoderState>(), 256); |
There was a problem hiding this comment.
I think this is bigger because a RowSelection is now bigger.
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing export-mask (7b102f4) to 6ba533d (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing export-mask (7b102f4) to 6ba533d (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
🤔 looks like arrow_reader_row_filter is slightly slower. Rerunning to try an reproduce |
|
run benchmark arrow_reader_row_filter |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing export-mask (0ffbf26) to 6ba533d (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing export-mask (0ffbf26) to 6ba533d (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
Thanks for your reviews @alamb , i revert the break api change, and fix some performance issue, i retrigger the benchmark, let see the result. One thing I noticed is that I am not sure if we should do that in this PR, as it would make the diff larger, or leave it as a follow-up. |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
run benchmark arrow_reader_clickbench |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing export-mask (0ffbf26) to 6ba533d (merge-base) diff File an issue against this benchmark runner |
|
the update: the |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
run benchmark arrow_reader_clickbench |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing export-mask (3121682) to 6ba533d (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
run benchmark arrow_reader_clickbench |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing export-mask (3121682) to 6ba533d (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
run benchmark arrow_reader_clickbench |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing export-mask (ee2df31) to 7fb7751 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
weird, in my local(M2 Max), i can't reproduce the result for git checkout export-mask
cargo bench --bench arrow_reader_clickbench --features="arrow,async,object_store,experimental,test_common" -- --save-baseline 10141
git checkout main
cargo bench --bench arrow_reader_clickbench --features="arrow,async,object_store,experimental,test_common" -- --save-baseline main
critcmp main 10141
|
|
I tried one more change in #10209: inline the selector-backed bodies of I ran
Same commit, but results swing 30–46 percentage points between runs. The Update: Today I ran more benchmarks on #10214 with the same commit id in this branch. The results are still not stable, but overall they show this branch may have a 1–3% regression in some queries, possibly related to this PR adding a match arm and extracting some function bodies into new functions. will check later. |
|
run benchmark arrow_reader_clickbench |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing export-mask (ee2df31) to 7fb7751 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
Will try and review this one more carefully today |
|
FYI, today I ran
|
|
I had similar issues with |



Which issue does this PR close?
RowSelectionto be backed by aBooleanBufferto reduce memory usage #10140Rationale for this change
RowSelectioncurrently stores selections asVec<RowSelector>(16 bytes per selector). This is compact for long runs, but expensive for scattered matches. With ~35% isolated single-row hits, it uses about 11.2 bytes per input row. ABooleanBufferuses 1 bit per input row, about 90x less memory.The reader can also choose the
Maskstrategy, which converts selectors back into a bitmap. When the caller already had a bitmap, this conversion round-trip is unnecessary.This PR lets
RowSelectionpreserve a caller-provided bitmap and pass it directly to mask execution.This is not intended to claim broad DataFusion / TPC-DS / ClickBench speedups. Current common DataFusion SQL paths generally do not naturally produce bitmap-backed
RowSelections. The practical benefit is for integrations that already have a row-level bitmap and need Parquet to consume it without materializing a large selector list.What changes are included in this PR?
RowSelectioncan now be backed by eitherVec<RowSelector>orBooleanBuffer. New public construction:Methods that can work directly on the bitmap now do so:
iter()streams viaBitSliceIteratorrow_count/skipped_row_countusecount_set_bitsselects_anyusesset_indices().next()trimpreserves mask backing viaBooleanBuffer::sliceintersection/uniononMask+MaskuseBitAnd/BitOrsplit_offon a mask usesBooleanBuffer::slice(O(1), both halves stay mask-backed)limitslices at the selected-row boundary viafind_nth_set_bit_position, staying mask-backedoffsetfinds the first selected row to keep viafind_nth_set_bit_positionand rebuilds only the mask buffer, avoiding selector materializationand_thenapplies the inner selection over the mask's set positions, returning a mask-backed resultFromIterator<RowSelection>concatenatesBooleanBuffers when every input is mask-backedMixed inputs, and existing selector-backed inputs, still use the existing selector helpers. Existing callers keep the same behavior.
The reader (
ReadPlanBuilder::build) passes a mask-backed selection straight toRowSelectionCursor::new_mask_from_buffer, so it skips rebuilding the bitmap from selectors.Are these changes tested?
Yes. This PR extends the existing
RowSelectionunit tests with coverage for:BooleanBuffer, including empty and all-unset masksFrom<BooleanBuffer>split_off,limit,offset,and_then, and all-maskFromIterator<RowSelection>intersection/union, including uneven-length inputsfrom_filtersselector pathAre there any user-facing changes?
no breaking api change