feat: clustered segments pt.2 (write side support)#19579
Conversation
changes: * adds `BaseTableProjectionSpec` interface to capture the operator facing shape of V10 base table schemas * adds `ClusteredValueGroupsBaseTableSchema` implementation for ingesting clustered segments * adds `DataSchema.baseTable`, a `BaseTableProjectionSpec` which when set puts the DataSchema into a new mode where the majority of the schema is defined via the baseTable, rejecting other top level fields * adds `DataSchema.segmentGranularity` to use when `baseTable` is set, which captures the segment granularity and intervals (query granularity is defined by the baseTable) * adds `AdaptedBaseTableProjectionSpec` implementation for converting classic `DataSchema` fields to a `BaseTableProjectionSpec` * adds `OnHeapClusteredBaseTable`, `OnHeapClusterGroup` used by `OnHeapIncrementalIndex` to build clustered segments * adds `IndexMergerV10.makeClusteredIndexFiles` which merges and builds clustered v10 segments * `Sink`/`BatchAppenderator`/`StreamAppenderator` wiring for clustered segments so that the cluster group tuples appear on the `DataSegment` * known issues: unbounded, no aggregate projections, no compaction support, no time ordered cursor support
FrankChen021
left a comment
There was a problem hiding this comment.
| Severity | Findings |
|---|---|
| P0 | 0 |
| P1 | 0 |
| P2 | 1 |
| P3 | 0 |
| Total | 1 |
Reviewed 53 of 53 changed files.
This is an automated review by Codex GPT-5.5
| this.metrics = metrics == null ? new AggregatorFactory[0] : metrics; | ||
| validateNoOverlap(this.clusteringColumns, this.nonClusteringDimensions); | ||
| this.dimensionsSpec = computeDimensionsSpec(this.clusteringColumns, this.nonClusteringDimensions); | ||
| this.ordering = declaredOrdering != null |
There was a problem hiding this comment.
[P2] Declared ordering can be advertised without being honored
The spec accepts any declaredOrdering here, but the clustered write path does not use it to build the actual per-group row comparator: OnHeapClusterGroup derives ordering from timePosition/DimensionsSpec, while IndexMergerV10 emits groups in ascending tuple order and then stores firstSchema.getOrdering() in metadata. A spec such as [tenant ASC, page ASC, __time ASC], or any DESC order, can therefore be persisted with advertised ordering that the data does not actually satisfy. Query engines may trust CursorHolder/DataSegment ordering and skip sorting, producing misordered results. Please reject unsupported ordering or wire the declared order through ingestion and merge.
There was a problem hiding this comment.
ah this is fair, i should probably make it purely computed for now like AggregateProjectionSpec since its a bit of work to actually wire this up to honor it
| { | ||
| final int numClusteringColumns = clusteringColumns.size(); | ||
| return OnheapIncrementalIndex.ROUGH_OVERHEAD_PER_MAP_ENTRY | ||
| + (long) numClusteringColumns * (Long.BYTES * 2 + Long.BYTES) |
Description
Follow-up to #19460, this PR introduces the writer side stuff so that the segments can actually be created.
As part of this I'm experimenting with a new V10 oriented way to express
DataSchema, where the ingest spec looks a lot more like how the V10 segment metadata is organized, as 1 or more projections. Using the wikipedia example, a clustered segment ingest spec looks something like this:Like projections, it requires expressing the query granularity as a virtual column transformation of __time (though a bit more fragile than expressions since it requires the column be named
__virtualGranularityright now, thinking about changing this to 'resolve' it like we do for projections, or something else, not sure yet). To support this, when usingbaseTablethere is also asegmentGranularityfield which splits out the segment granularity parts of the existingGranularitySpec. To be less disruptive for now, these can be computed into aGranularitySpec, but over time i'd like to migrate to this model of expressing the schema.Includes the fix part o #19578 since this is the branch where I ran into that problem.
changes:
BaseTableProjectionSpecinterface to capture the operator facing shape of V10 base table schemasClusteredValueGroupsBaseTableSchemaimplementation for ingesting clustered segmentsDataSchema.baseTable, aBaseTableProjectionSpecwhich when set puts the DataSchema into a new mode where the majority of the schema is defined via the baseTable, rejecting other top level fieldsDataSchema.segmentGranularityto use whenbaseTableis set, which captures the segment granularity and intervals (query granularity is defined by the baseTable)AdaptedBaseTableProjectionSpecimplementation for converting classicDataSchemafields to aBaseTableProjectionSpecOnHeapClusteredBaseTable,OnHeapClusterGroupused byOnHeapIncrementalIndexto build clustered segmentsIndexMergerV10.makeClusteredIndexFileswhich merges and builds clustered v10 segmentsSink/BatchAppenderator/StreamAppenderatorwiring for clustered segments so that the cluster group tuples appear on theDataSegment