Describe the enhancement requested
org.apache.parquet.internal.filter2.columnindex.RowRanges is currently only constructible internally — instances are produced by column-index filtering (ColumnIndexFilter.calculateRowRanges(...)) or by the package-private factory/union helpers. All of its constructors are private, so there is no supported way for code outside the package to build a RowRanges from an arbitrary set of selected rows.
This is limiting for external readers that determine which rows to read from some other source than column-index filtering. A concrete motivating case is a materialization path that receives a stream of selected row indices from a downstream operator (for example, the row positions surviving a filter or join) and needs to turn that stream into a RowRanges to drive reads — without knowing page boundaries ahead of time.
Proposed change
Add a small RowRanges.Builder that lets callers append selected row indices incrementally and coalesces consecutive indices into Range entries:
RowRanges.Builder builder = RowRanges.builder();
for (long row : selectedRowsInOrder) {
builder.addSelected(row);
}
RowRanges ranges = builder.build();
Semantics:
addSelected(long) must be called in strictly increasing order. Consecutive indices are merged into a single Range; a gap closes the current run and starts a new one.
- Calling
addSelected with a value <= the previous value throws IllegalArgumentException (rejects out-of-order and duplicate indices).
build() returns RowRanges.EMPTY when no rows were selected.
Why a builder (vs. exposing a constructor)
A builder keeps RowRanges immutable and its internal Range list encapsulated, while still letting callers feed rows one at a time. The coalescing logic lives in one place rather than being re-implemented by each external caller, and the strictly-increasing contract is enforced at the point of construction.
Scope
- Additive,
Core only. No change to existing RowRanges behavior or to the column-index filtering path.
- No user-facing API removal or behavioral change to existing methods.
This is the first of two related enhancements opening up RowRanges/reader APIs needed by the materialization feature described above; a follow-up (#3597) will expose per-row-range reader APIs on ParquetFileReader.
Component(s)
Core
Describe the enhancement requested
org.apache.parquet.internal.filter2.columnindex.RowRangesis currently only constructible internally — instances are produced by column-index filtering (ColumnIndexFilter.calculateRowRanges(...)) or by the package-private factory/union helpers. All of its constructors areprivate, so there is no supported way for code outside the package to build aRowRangesfrom an arbitrary set of selected rows.This is limiting for external readers that determine which rows to read from some other source than column-index filtering. A concrete motivating case is a materialization path that receives a stream of selected row indices from a downstream operator (for example, the row positions surviving a filter or join) and needs to turn that stream into a
RowRangesto drive reads — without knowing page boundaries ahead of time.Proposed change
Add a small
RowRanges.Builderthat lets callers append selected row indices incrementally and coalesces consecutive indices intoRangeentries:Semantics:
addSelected(long)must be called in strictly increasing order. Consecutive indices are merged into a singleRange; a gap closes the current run and starts a new one.addSelectedwith a value<=the previous value throwsIllegalArgumentException(rejects out-of-order and duplicate indices).build()returnsRowRanges.EMPTYwhen no rows were selected.Why a builder (vs. exposing a constructor)
A builder keeps
RowRangesimmutable and its internalRangelist encapsulated, while still letting callers feed rows one at a time. The coalescing logic lives in one place rather than being re-implemented by each external caller, and the strictly-increasing contract is enforced at the point of construction.Scope
Coreonly. No change to existingRowRangesbehavior or to the column-index filtering path.This is the first of two related enhancements opening up
RowRanges/reader APIs needed by the materialization feature described above; a follow-up (#3597) will expose per-row-range reader APIs onParquetFileReader.Component(s)
Core