compute: insert temporal bucketing before ArrangeBy nodes#35935
Open
antiguru wants to merge 10 commits intoMaterializeInc:mainfrom
Open
compute: insert temporal bucketing before ArrangeBy nodes#35935antiguru wants to merge 10 commits intoMaterializeInc:mainfrom
antiguru wants to merge 10 commits intoMaterializeInc:mainfrom
Conversation
Contributor
|
Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone. PR title guidelines
Pre-merge checklist
|
650ca9b to
fff4565
Compare
teskje
reviewed
Apr 13, 2026
Contributor
teskje
left a comment
There was a problem hiding this comment.
This make sense to me, though somebody who understand the lowering should probably review that part, as it's also the part that has the most potential for logic bugs.
| until: Antichain<mz_repr::Timestamp>, | ||
| config_set: &ConfigSet, | ||
| input_has_future_updates: bool, | ||
| as_of: Antichain<mz_repr::Timestamp>, |
Contributor
There was a problem hiding this comment.
Nit: Move as_of before until.
Add a simple helper method that checks whether any predicate in an MFP contains a temporal expression (mz_now()), without modifying the MFP. This will be used in LIR lowering to detect when temporal bucketing is needed before arrangement-forming operators. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Structural change only — the field is false everywhere. The lowering logic to set it follows in the next commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tracks whether each plan node's output may contain updates at future timestamps (from temporal MFPs using mz_now()). Sets input_has_future_updates on ArrangeBy nodes so the renderer knows where to insert temporal bucketing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dispatches temporal bucketing based on timestamp type. For mz_repr::Timestamp (TotalOrder), applies actual bucketing. For Product timestamps in iterative scopes, no-op for now. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When an ArrangeBy node's input_has_future_updates flag is set and the ENABLE_TEMPORAL_BUCKETING dyncfg is on, applies temporal bucketing to the collection before forming arrangements. This buffers far-future updates in a BucketChain, preventing the arrangement merge batcher from doing unnecessary work on them.
The targeted ArrangeBy-level bucketing from the previous commit subsumes this. Source data itself doesn't contain future updates; future updates are produced by temporal MFPs downstream.
The previous commit only bucketed when ensure_collections formed a new collection from scratch (self.collection.is_none()). In the typical temporal-MFP -> ArrangeBy flow, the upstream Mfp/Get already produced the collection, so that path was skipped. Add a second bucketing site at arrangement consumption for pre-existing collections.
PassArrangements passes through the raw collection unchanged, which may contain future timestamps from an upstream temporal MFP. Track which ids have future updates in a BTreeSet during lowering so that a downstream Get with PassArrangements correctly reports has_future_updates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a benchmark that creates an index on a view with a temporal filter over 1M rows joined with a table. Exercises the temporal bucketing path in ArrangeBy rendering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
489b798 to
cfdd140
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
has_temporal_predicates()helper toMapFilterProjectto detect temporal MFPs (those usingmz_now())has_future_updates: boolbit through LIR plan lowering — every plan node now knows whether its output may contain future-timestamped updatesinput_has_future_updates: boolonPlanNode::ArrangeBy/RenderPlanNode::ArrangeByENABLE_TEMPORAL_BUCKETINGis on, inserts temporal bucketing (viaBucketChain) betweenas_collection_coreoutput and arrangement formation inensure_collectionsmaybe_apply_temporal_bucketingtoRenderTimestampto handle theTotalOrderbound: real bucketing formz_repr::Timestamp, no-op forProduct<...>in iterative scopes (TODO for future work)Motivation
Arrangement merge batchers sort/consolidate by
(data, time). When a temporal MFP (e.g.WHERE mz_now() > ts) produces updates with far-future timestamps, the batcher does unnecessary work merging data that won't be read for a long time. Temporal bucketing buffers these future updates in a logarithmicBucketChainand releases them as the frontier advances.The old source-level bucketing applied to all source imports regardless of whether future updates were actually produced. This change targets bucketing precisely at
ArrangeBynodes that receive future updates from upstream temporal MFPs.Test plan
cargo test -p mz-expr mz-compute-types mz-computepasses locallyENABLE_TEMPORAL_BUCKETINGdyncfg (default false)🤖 Generated with Claude Code