feat/perf: add insert_range and contains to DeleteVector#2292
feat/perf: add insert_range and contains to DeleteVector#2292Baunsgaard wants to merge 3 commits intoapache:mainfrom
Conversation
Expose RoaringTreemap::insert_range through DeleteVector for bulk range insertion of deleted positions, avoiding per-position insert overhead.
A reversed range (start > end) indicates a bug in the caller and should fail fast with a panic rather than silently return 0. An empty range (start == end) remains a valid no-op. This makes insert_range behavior consistent across the Java, C++, and Rust Iceberg implementations.
|
Just to make it complete, here are results from the Rust benchmark. Related PRs:
Benchmark results
|
xanderbailey
left a comment
There was a problem hiding this comment.
Took a quick look and it seems to make sense to me. I'm wondering where we're expected to use these new paths from?
| /// | ||
| /// Returns the number of newly inserted positions. | ||
| #[allow(unused)] | ||
| pub fn insert_range(&mut self, start: u64, end: u64) -> u64 { |
There was a problem hiding this comment.
Should these be pub if delete_vector is mod delete_vector;? Is the idea to make delete_vector public?
Which issue does this PR close?
This is the Rust counterpart of the Java optimization in apache/iceberg#15791.
What changes are included in this PR?
DeleteVector::insert_range(start, end)that delegates toRoaringTreemap::insert_rangefor bulk range insertionDeleteVector::contains(pos)for efficient single-position lookupsAre these changes tested?
Yes, unit tests covering all edge cases listed above are included in
delete_vector::tests.