[SPARK-57176][SQL] Extend nested column pruning through array-returning functions by sunchao · Pull Request #56227 · apache/spark

sunchao · 2026-05-30T18:13:28Z

Why are the changes needed?

SPARK-57176 follows SPARK-57022, which added nested column pruning for transform over array<struct> inputs.

Array-returning functions still retain the complete input element struct even when downstream expressions and lambdas only require a subset of nested fields. For example:

SELECT filter(friends, friend -> friend.last = 'Smith').first
FROM contacts

If friends contains first, middle, and last, Spark currently reads all three fields even though the query only requires first and last.

What changes were proposed in this PR?

Merge downstream result-field requirements with lambda requirements for filter and comparator-based array_sort.
Propagate projected element schemas through reverse, shuffle, slice, and array_compact.
Rewrite bound lambda variable types and nested field ordinals after pruning.
Retain the complete element schema when the whole result is used, when a lambda consumes the whole element, or when default array_sort natural ordering requires the full struct.

Functions that inspect full element equality or natural ordering remain out of scope because dropping nested fields could change results.

Does this PR introduce any user-facing change?

Yes. Eligible queries using array-returning functions over arrays of structs can read a narrower input schema. Query results and SQL APIs are unchanged.

How was this patch tested?

JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH build/sbt "catalyst/testOnly org.apache.spark.sql.catalyst.expressions.SchemaPruningSuite" "sql/testOnly org.apache.spark.sql.execution.datasources.parquet.ParquetV1SchemaPruningSuite org.apache.spark.sql.execution.datasources.parquet.ParquetV2SchemaPruningSuite org.apache.spark.sql.execution.datasources.orc.OrcV1SchemaPruningSuite org.apache.spark.sql.execution.datasources.orc.OrcV2SchemaPruningSuite -- -z Array"
JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH build/sbt catalyst/scalastyle sql/scalastyle
git diff --check

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Codex (GPT-5)

…ng functions

[SPARK-57176][SQL] Extend nested column pruning through array-returni…

f1fb146

…ng functions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57176][SQL] Extend nested column pruning through array-returning functions#56227

[SPARK-57176][SQL] Extend nested column pruning through array-returning functions#56227
sunchao wants to merge 1 commit into
apache:masterfrom
sunchao:dev/chao/codex/spark-array-returning-function-pruning

sunchao commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sunchao commented May 30, 2026

Why are the changes needed?

What changes were proposed in this PR?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant