Skip to content

fix(substrait): plan nested projected window expressions#22630

Open
bvolpato wants to merge 1 commit into
apache:mainfrom
bvolpato:bvolpato/substrait-nested-window-expression
Open

fix(substrait): plan nested projected window expressions#22630
bvolpato wants to merge 1 commit into
apache:mainfrom
bvolpato:bvolpato/substrait-nested-window-expression

Conversation

@bvolpato
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

The Substrait ProjectRel consumer only added a WindowAggr relation when the root projected expression was a WindowFunction. A scalar expression wrapping a valid window function therefore left the window directly inside Projection, which cannot be physically planned.

Minimal reproducer represented by the added Substrait fixture:

SELECT 1 + count(*) OVER () FROM DATA;

Before this patch, the regression test produced this plan difference:

 Projection: Int64(1) + count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS EXPR$0
-  WindowAggr: windowExpr=[[count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]]
-    TableScan: DATA
+  TableScan: DATA

This is the same class of failure that reaches physical planning as an unsupported nested WindowFunction expression.

What changes are included in this PR?

  • Use existing find_window_exprs(...) recursion while consuming Substrait ProjectRel expressions, retaining current HashSet deduplication across projections.
  • Add a minimal Substrait JSON fixture with a window function nested inside an arithmetic scalar expression.
  • Add a logical-plan snapshot plus execution regression, proving WindowAggr is inserted and physically executable.

Are these changes tested?

Pre-fix evidence:

  • cargo test -p datafusion-substrait --test substrait_integration nested_window_function_in_expression -- --nocapture failed because the consumed plan omitted expected WindowAggr and put Projection directly above TableScan.

Final validation on Apache main commit d8c458828:

  • cargo fmt --all -- --check
  • cargo test -p datafusion-substrait (49 unit passed, 200 integration passed, 3 doctests passed; 6 existing ignored)
  • cargo check --all-targets -p datafusion-substrait
  • cargo check --no-default-features -p datafusion-substrait
  • cargo check --no-default-features -p datafusion-substrait --features=physical
  • cargo check --no-default-features -p datafusion-substrait --features=protoc
  • cargo clippy --all-targets --all-features -- -D warnings
  • ./dev/rust_lint.sh

Are there any user-facing changes?

Substrait producers may now send projected expressions that contain nested window functions; DataFusion consumes them into executable logical plans instead of leaving unsupported window expressions in projections.

Collect window expressions recursively when consuming ProjectRel so scalar-wrapped windows are placed in WindowAggr nodes before physical planning.

Add an executing Substrait fixture regression for 1 + count(*) OVER ().

Refs apache#22629.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Substrait consumer does not plan nested window expressions in projections

1 participant