Skip to content

[SPARK-57188][SQL] Parameterless function takes precedence over UDF parameter#56239

Open
dejankrak-db wants to merge 2 commits into
apache:masterfrom
dejankrak-db:paramless-func-over-udf-param-oss
Open

[SPARK-57188][SQL] Parameterless function takes precedence over UDF parameter#56239
dejankrak-db wants to merge 2 commits into
apache:masterfrom
dejankrak-db:paramless-func-over-udf-param-oss

Conversation

@dejankrak-db
Copy link
Copy Markdown
Contributor

@dejankrak-db dejankrak-db commented May 31, 2026

What changes were proposed in this pull request?

A parameterless built-in function (current_user, current_date, current_time, current_timestamp, user, session_user, grouping__id) now takes precedence over a SQL UDF parameter that shares its name. Previously the UDF parameter alias shadowed the built-in, e.g. CREATE FUNCTION f(current_user STRING) RETURNS STRING RETURN current_user; SELECT f('alice') returned 'alice' instead of the actual current user.

SQL UDF input-parameter aliases are marked with a named Metadata key, SessionCatalog.SQL_FUNCTION_PARAMETER_ALIAS_METADATA_KEY. Producers are SessionCatalog.makeSQLFunctionPlan and CreateSQLFunctionCommand (CREATE time). Storing the marker in Metadata (rather than a TreeNodeTag) lets it propagate automatically through Alias.toAttribute and AttributeReference copies.

In ColumnResolutionHelper, when name-based resolution returns an attribute carrying this metadata, LiteralFunctionResolution is preferred over it. Real columns from relations (no marker) still win, preserving the overall precedence (column > parameterless function > UDF parameter).

The new behavior is gated behind a legacy kill-switch, spark.sql.legacy.allowUdfParameterToShadowParameterlessFunction (default false); set to true to restore the previous behavior.

Why are the changes needed?

To match the documented SQL name resolution rules: a parameterless built-in function should not be shadowed by a same-named UDF parameter.

Does this PR introduce any user-facing change?

Yes. A SQL UDF parameter named like a parameterless built-in function no longer shadows that function in the function body. The legacy conf restores the old behavior.

How was this patch tested?

Added golden-file tests sql-udf-name-precedence(.legacy) and parameterless-function-name-precedence(.legacy) covering column > param, LCA > param, outer-ref > param, parameterless function > param (scalar and table UDF), param > session variable, nested-UDF inner-scope binding, and the legacy (flag-on) behavior. All 4 scenarios pass.

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored using Claude code.

…arameter

### What changes were proposed in this pull request?

A parameterless built-in function (`current_user`, `current_date`, `current_time`,
`current_timestamp`, `user`, `session_user`, `grouping__id`) now takes precedence
over a SQL UDF parameter that shares its name. Previously the UDF parameter alias
shadowed the built-in, e.g. `CREATE FUNCTION f(current_user STRING) RETURNS STRING
RETURN current_user; SELECT f('alice')` returned `'alice'` instead of the actual
current user.

SQL UDF input-parameter aliases are marked with a named `Metadata` key,
`SessionCatalog.SQL_FUNCTION_PARAMETER_ALIAS_METADATA_KEY`. Producers are
`SessionCatalog.makeSQLFunctionPlan` and `makeSQLTableFunctionPlan` (CALL time,
scalar and table UDFs) and `CreateSQLFunctionCommand` (CREATE time). Storing the
marker in `Metadata` (rather than a `TreeNodeTag`) lets it propagate automatically
through `Alias.toAttribute` and `AttributeReference` copies.

In `ColumnResolutionHelper`, when name-based resolution returns an attribute
carrying this metadata, `LiteralFunctionResolution` is preferred over it. Real
columns from relations (no marker) still win, preserving the overall precedence
(column > parameterless function > UDF parameter).

The new behavior is gated behind a legacy kill-switch,
`spark.sql.legacy.allowUdfParameterToShadowParameterlessFunction` (default
`false`); set to `true` to restore the previous behavior.

### Why are the changes needed?

To match the documented SQL name resolution rules: a parameterless built-in
function should not be shadowed by a same-named UDF parameter.

### Does this PR introduce _any_ user-facing change?

Yes. A SQL UDF parameter named like a parameterless built-in function no longer
shadows that function in the function body. The legacy conf restores the old
behavior.

### How was this patch tested?

Added golden-file tests `sql-udf-name-precedence(.legacy)` and
`parameterless-function-name-precedence(.legacy)` covering column > param,
LCA > param, outer-ref > param, parameterless function > param (scalar and table
UDF), param > session variable, nested-UDF inner-scope binding, and the legacy
(flag-on) behavior. All 4 scenarios pass.
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 blocking, 1 non-blocking, 0 nits.
The fix is correct and well-tested for the scalar direct-body UDF case (the bug-report shape). One non-blocking design point: the table-function marker is dead code and the description/test overstate the table-function fix.

Design / architecture (1)

  • SessionCatalog.scala:2058: the makeSQLTableFunctionPlan marker is never consumed; table UDFs were already correct via function-vs-outer-reference precedence, and the legacy flag does not restore param-shadowing for them — see inline.

Verification

Traced both UDF plan shapes. Scalar UDFs with a direct expression body (makeSQLFunctionPlan builds Project(Alias(body), Project(params, OneRowRelation))) reference the param as a local column, so resolveColumnByName returns it with the marker and isSQLFunctionParameterAlias flips precedence to the function — this is the path the fix actually corrects, and the goldens confirm it (function_won=true; legacy flips to false, current_date'int', grouping__id→42). Table UDFs build LateralJoin(Project(params), LateralSubquery(body)), so the param is an outer reference: resolveColumnByName returns None and LiteralFunctionResolution wins first (the pre-existing function-beats-outer-reference rule), before the marker is read. Proof the marker is inert there: the legacy golden shows function_won=true for the TVF case, identical to non-legacy — the flag changes nothing for table UDFs.

case a: Attribute => OuterReference(a)
}
Alias(Cast(outer, param.dataType), param.name)(qualifier = qualifier)
// Mark the alias as function input so name resolution can give a parameterless
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The marker stamped here on makeSQLTableFunctionPlan parameter aliases is never consumed. A table UDF's body is a query inside LateralJoin(Project(params), LateralSubquery(body)), so a body reference to a parameter resolves as an outer reference (the param lives in the lateral-join's left child, not the body's local scope). In ColumnResolutionHelper, resolveColumnByName returns None for it, so LiteralFunctionResolution already wins via the pre-existing "function beats outer reference" precedence — isSQLFunctionParameterAlias is never evaluated on this path.

Consequences:

  • The legacy flag does not restore param-shadowing for table UDFs. The legacy golden sql-udf-name-precedence-legacy.sql.out shows function_won = true for the tvf_paramless_vs_param case, identical to the non-legacy golden — table UDFs were never buggy here.
  • The PR description ("makeSQLTableFunctionPlan ... CALL time") and the test comment "same rule must apply on the makeSQLTableFunctionPlan path" are misleading: the function wins via an unrelated, pre-existing rule, not this PR's marker logic. The same applies to the CREATE-time marker as applied to table functions.

Suggest either dropping the makeSQLTableFunctionPlan marker change (keep the TVF test as a regression guard, but retitle its comment to note table UDFs are already correct via function-vs-outer-reference precedence), or — if the marker is intended as defensive future-proofing — adding a comment saying it is not consumed today and not claiming the table-function path is fixed/restorable by the flag. Non-blocking: default behavior is correct on all tested paths.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan, good catch - the table-function marker was never consumed (outer-reference path). Dropped it from makeSQLTableFunctionPlan, gated the CREATE-time marker to scalar functions, and reworded the TVF test as a regression guard for the pre-existing function-vs-outer-reference precedence.

… markers

Per review feedback: the SQL_FUNCTION_PARAMETER_ALIAS_METADATA_KEY marker stamped
on table-UDF parameter aliases was never consumed. A table UDF body references its
parameter as an outer reference (the param lives in the lateral join's left child),
so `resolveColumnByName` returns None and a parameterless built-in function already
wins via the pre-existing "function beats outer reference" precedence -- before
`isSQLFunctionParameterAlias` is evaluated. The legacy flag therefore never restored
param-shadowing for table UDFs (the legacy golden showed `function_won = true` for
the TVF case, identical to non-legacy).

- Remove the marker from `SessionCatalog.makeSQLTableFunctionPlan` (with a comment
  explaining why table UDFs need no marker).
- Restrict the `CreateSQLFunctionCommand` marker to scalar functions (`!isTableFunc`).
- Reword the TVF golden test comment: it is a regression guard for the pre-existing
  function-vs-outer-reference precedence, not this PR's marker logic.

Goldens are unchanged (table-UDF behavior is identical), confirming the markers were
inert. The scalar direct-body UDF fix (the bug-report shape) is unaffected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants