CometProject and CometHashAggregate do not perform cross-sibling subexpression elimination over ScalaUDF

When a `ScalaUDF` is dispatched into the native plan via the JVM Scala UDF codegen dispatcher (enabled by default in #4514), Comet's `CometProject` and `CometHashAggregate` do not implement Spark's cross-sibling common subexpression elimination over `ScalaUDF`. An expression such as `sum(udf(b) + udf(b) + udf(b))` therefore invokes the UDF body once per reference instead of once.

This is observable in Spark's `SQLQuerySuite` "Common subexpression elimination" test: the call count for the aggregate case is 3 under Comet versus 1 in vanilla Spark. The query result is unchanged; only the number of UDF invocations differs.

Follow-on from #4514. We should extend the cross-sibling CSE that `CometProject` performs to the aggregate operator's input projection (and any other operator that builds an input projection) so that repeated `ScalaUDF` references are evaluated once.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CometProject and CometHashAggregate do not perform cross-sibling subexpression elimination over ScalaUDF #4516

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

CometProject and CometHashAggregate do not perform cross-sibling subexpression elimination over ScalaUDF #4516

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions