Count aware aggregates by frankmcsherry · Pull Request #36776 · MaterializeInc/materialize

frankmcsherry · 2026-05-28T19:55:13Z

Summary

Two related optimizer/eval changes that together let a row-wise count(*)
over an integer generate_series evaluate in O(1) instead of materializing
one row per generated value.

Count-aware aggregate eval. AggregateFunc::eval now consumes
Item = (Datum, Diff) instead of bare Datum. count sums the diffs,
multiplicity-insensitive aggregates (min/max/any/all) drop them, and the
rest expand by count as before. Avoids fold_reduce_constant expanding a
compact (row, N) into N rows before aggregation.
Collapse unused integer generate_series. When a
generate_series(start, stop, step) FlatMap's output column is not
demanded, projection_pushdown rewrites it into a RepeatRowNonNegative
emitting the same row count, encoded as a single (empty_row, diff=N).
Sign-guarded emptiness CASE so truncating DivInt64 coincides with floor
division and empty series collapse to zero. Fires only for non-zero literal
steps of integer series.

This reproduces the work of #22753, but with the assistance of 🤖 to grind through the aggregate eval work.

cc: @def- : this has the potential to make some of your generate-series based stress tests less stressful.

Change the aggregate evaluation surface from `Item = Datum` to `Item = (Datum, Diff)` so aggregates can consume input multiplicity directly rather than requiring callers to expand each value into `diff` copies. `count` sums the diffs, multiplicity-insensitive aggregates (min/max/any/all) drop them, and the rest expand as before. This avoids materializing a number of rows proportional to the diff in `fold_reduce_constant`, where a compact `(row, N)` previously expanded to N rows before aggregation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…egative When a `generate_series(start, stop, step)` FlatMap's output column is not demanded, projection_pushdown now rewrites it into a `RepeatRowNonNegative` emitting the same row count. This is the count-aware-aggregate synergy: the cardinality is encoded as a single `(empty_row, diff=N)` rather than N physical rows, so downstream count-aware reduces stay O(1). Uses a sign-guarded emptiness `CASE` so truncating `DivInt64` coincides with floor division and empty series collapse to zero. Only fires for non-zero literal steps of integer series. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

frankmcsherry and others added 2 commits May 28, 2026 15:15

frankmcsherry mentioned this pull request May 28, 2026

Rowwise subquery eval #36735

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Count aware aggregates#36776

Count aware aggregates#36776
frankmcsherry wants to merge 2 commits into
MaterializeInc:mainfrom
frankmcsherry:count-aware-aggregates

frankmcsherry commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frankmcsherry commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

frankmcsherry commented May 28, 2026 •

edited

Loading