Skip to content

Add a RowRanges.Builder for incremental construction from selected row indices #3596

@peter-toth

Description

@peter-toth

Describe the enhancement requested

org.apache.parquet.internal.filter2.columnindex.RowRanges is currently only constructible internally — instances are produced by column-index filtering (ColumnIndexFilter.calculateRowRanges(...)) or by the package-private factory/union helpers. All of its constructors are private, so there is no supported way for code outside the package to build a RowRanges from an arbitrary set of selected rows.

This is limiting for external readers that determine which rows to read from some other source than column-index filtering. A concrete motivating case is a materialization path that receives a stream of selected row indices from a downstream operator (for example, the row positions surviving a filter or join) and needs to turn that stream into a RowRanges to drive reads — without knowing page boundaries ahead of time.

Proposed change

Add a small RowRanges.Builder that lets callers append selected row indices incrementally and coalesces consecutive indices into Range entries:

RowRanges.Builder builder = RowRanges.builder();
for (long row : selectedRowsInOrder) {
  builder.addSelected(row);
}
RowRanges ranges = builder.build();

Semantics:

  • addSelected(long) must be called in strictly increasing order. Consecutive indices are merged into a single Range; a gap closes the current run and starts a new one.
  • Calling addSelected with a value <= the previous value throws IllegalArgumentException (rejects out-of-order and duplicate indices).
  • build() returns RowRanges.EMPTY when no rows were selected.

Why a builder (vs. exposing a constructor)

A builder keeps RowRanges immutable and its internal Range list encapsulated, while still letting callers feed rows one at a time. The coalescing logic lives in one place rather than being re-implemented by each external caller, and the strictly-increasing contract is enforced at the point of construction.

Scope

  • Additive, Core only. No change to existing RowRanges behavior or to the column-index filtering path.
  • No user-facing API removal or behavioral change to existing methods.

This is the first of two related enhancements opening up RowRanges/reader APIs needed by the materialization feature described above; a follow-up (#3597) will expose per-row-range reader APIs on ParquetFileReader.

Component(s)

Core

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions