Skip to content

Count aware aggregates#36776

Draft
frankmcsherry wants to merge 2 commits into
MaterializeInc:mainfrom
frankmcsherry:count-aware-aggregates
Draft

Count aware aggregates#36776
frankmcsherry wants to merge 2 commits into
MaterializeInc:mainfrom
frankmcsherry:count-aware-aggregates

Conversation

@frankmcsherry
Copy link
Copy Markdown
Contributor

@frankmcsherry frankmcsherry commented May 28, 2026

Summary

Two related optimizer/eval changes that together let a row-wise count(*)
over an integer generate_series evaluate in O(1) instead of materializing
one row per generated value.

  • Count-aware aggregate eval. AggregateFunc::eval now consumes
    Item = (Datum, Diff) instead of bare Datum. count sums the diffs,
    multiplicity-insensitive aggregates (min/max/any/all) drop them, and the
    rest expand by count as before. Avoids fold_reduce_constant expanding a
    compact (row, N) into N rows before aggregation.

  • Collapse unused integer generate_series. When a
    generate_series(start, stop, step) FlatMap's output column is not
    demanded, projection_pushdown rewrites it into a RepeatRowNonNegative
    emitting the same row count, encoded as a single (empty_row, diff=N).
    Sign-guarded emptiness CASE so truncating DivInt64 coincides with floor
    division and empty series collapse to zero. Fires only for non-zero literal
    steps of integer series.

This reproduces the work of #22753, but with the assistance of 🤖 to grind through the aggregate eval work.

cc: @def- : this has the potential to make some of your generate-series based stress tests less stressful.

frankmcsherry and others added 2 commits May 28, 2026 15:15
Change the aggregate evaluation surface from `Item = Datum` to
`Item = (Datum, Diff)` so aggregates can consume input multiplicity
directly rather than requiring callers to expand each value into `diff`
copies. `count` sums the diffs, multiplicity-insensitive aggregates
(min/max/any/all) drop them, and the rest expand as before.

This avoids materializing a number of rows proportional to the diff in
`fold_reduce_constant`, where a compact `(row, N)` previously expanded to
N rows before aggregation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…egative

When a `generate_series(start, stop, step)` FlatMap's output column is not
demanded, projection_pushdown now rewrites it into a `RepeatRowNonNegative`
emitting the same row count. This is the count-aware-aggregate synergy: the
cardinality is encoded as a single `(empty_row, diff=N)` rather than N physical
rows, so downstream count-aware reduces stay O(1).

Uses a sign-guarded emptiness `CASE` so truncating `DivInt64` coincides with
floor division and empty series collapse to zero. Only fires for non-zero
literal steps of integer series.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant