[SPARK-57188][SQL] Parameterless function takes precedence over UDF parameter#56239
[SPARK-57188][SQL] Parameterless function takes precedence over UDF parameter#56239dejankrak-db wants to merge 2 commits into
Conversation
…arameter
### What changes were proposed in this pull request?
A parameterless built-in function (`current_user`, `current_date`, `current_time`,
`current_timestamp`, `user`, `session_user`, `grouping__id`) now takes precedence
over a SQL UDF parameter that shares its name. Previously the UDF parameter alias
shadowed the built-in, e.g. `CREATE FUNCTION f(current_user STRING) RETURNS STRING
RETURN current_user; SELECT f('alice')` returned `'alice'` instead of the actual
current user.
SQL UDF input-parameter aliases are marked with a named `Metadata` key,
`SessionCatalog.SQL_FUNCTION_PARAMETER_ALIAS_METADATA_KEY`. Producers are
`SessionCatalog.makeSQLFunctionPlan` and `makeSQLTableFunctionPlan` (CALL time,
scalar and table UDFs) and `CreateSQLFunctionCommand` (CREATE time). Storing the
marker in `Metadata` (rather than a `TreeNodeTag`) lets it propagate automatically
through `Alias.toAttribute` and `AttributeReference` copies.
In `ColumnResolutionHelper`, when name-based resolution returns an attribute
carrying this metadata, `LiteralFunctionResolution` is preferred over it. Real
columns from relations (no marker) still win, preserving the overall precedence
(column > parameterless function > UDF parameter).
The new behavior is gated behind a legacy kill-switch,
`spark.sql.legacy.allowUdfParameterToShadowParameterlessFunction` (default
`false`); set to `true` to restore the previous behavior.
### Why are the changes needed?
To match the documented SQL name resolution rules: a parameterless built-in
function should not be shadowed by a same-named UDF parameter.
### Does this PR introduce _any_ user-facing change?
Yes. A SQL UDF parameter named like a parameterless built-in function no longer
shadows that function in the function body. The legacy conf restores the old
behavior.
### How was this patch tested?
Added golden-file tests `sql-udf-name-precedence(.legacy)` and
`parameterless-function-name-precedence(.legacy)` covering column > param,
LCA > param, outer-ref > param, parameterless function > param (scalar and table
UDF), param > session variable, nested-UDF inner-scope binding, and the legacy
(flag-on) behavior. All 4 scenarios pass.
cloud-fan
left a comment
There was a problem hiding this comment.
0 blocking, 1 non-blocking, 0 nits.
The fix is correct and well-tested for the scalar direct-body UDF case (the bug-report shape). One non-blocking design point: the table-function marker is dead code and the description/test overstate the table-function fix.
Design / architecture (1)
- SessionCatalog.scala:2058: the
makeSQLTableFunctionPlanmarker is never consumed; table UDFs were already correct via function-vs-outer-reference precedence, and the legacy flag does not restore param-shadowing for them — see inline.
Verification
Traced both UDF plan shapes. Scalar UDFs with a direct expression body (makeSQLFunctionPlan builds Project(Alias(body), Project(params, OneRowRelation))) reference the param as a local column, so resolveColumnByName returns it with the marker and isSQLFunctionParameterAlias flips precedence to the function — this is the path the fix actually corrects, and the goldens confirm it (function_won=true; legacy flips to false, current_date→'int', grouping__id→42). Table UDFs build LateralJoin(Project(params), LateralSubquery(body)), so the param is an outer reference: resolveColumnByName returns None and LiteralFunctionResolution wins first (the pre-existing function-beats-outer-reference rule), before the marker is read. Proof the marker is inert there: the legacy golden shows function_won=true for the TVF case, identical to non-legacy — the flag changes nothing for table UDFs.
| case a: Attribute => OuterReference(a) | ||
| } | ||
| Alias(Cast(outer, param.dataType), param.name)(qualifier = qualifier) | ||
| // Mark the alias as function input so name resolution can give a parameterless |
There was a problem hiding this comment.
The marker stamped here on makeSQLTableFunctionPlan parameter aliases is never consumed. A table UDF's body is a query inside LateralJoin(Project(params), LateralSubquery(body)), so a body reference to a parameter resolves as an outer reference (the param lives in the lateral-join's left child, not the body's local scope). In ColumnResolutionHelper, resolveColumnByName returns None for it, so LiteralFunctionResolution already wins via the pre-existing "function beats outer reference" precedence — isSQLFunctionParameterAlias is never evaluated on this path.
Consequences:
- The legacy flag does not restore param-shadowing for table UDFs. The legacy golden
sql-udf-name-precedence-legacy.sql.outshowsfunction_won = truefor thetvf_paramless_vs_paramcase, identical to the non-legacy golden — table UDFs were never buggy here. - The PR description ("
makeSQLTableFunctionPlan... CALL time") and the test comment "same rule must apply on the makeSQLTableFunctionPlan path" are misleading: the function wins via an unrelated, pre-existing rule, not this PR's marker logic. The same applies to the CREATE-time marker as applied to table functions.
Suggest either dropping the makeSQLTableFunctionPlan marker change (keep the TVF test as a regression guard, but retitle its comment to note table UDFs are already correct via function-vs-outer-reference precedence), or — if the marker is intended as defensive future-proofing — adding a comment saying it is not consumed today and not claiming the table-function path is fixed/restorable by the flag. Non-blocking: default behavior is correct on all tested paths.
There was a problem hiding this comment.
@cloud-fan, good catch - the table-function marker was never consumed (outer-reference path). Dropped it from makeSQLTableFunctionPlan, gated the CREATE-time marker to scalar functions, and reworded the TVF test as a regression guard for the pre-existing function-vs-outer-reference precedence.
… markers Per review feedback: the SQL_FUNCTION_PARAMETER_ALIAS_METADATA_KEY marker stamped on table-UDF parameter aliases was never consumed. A table UDF body references its parameter as an outer reference (the param lives in the lateral join's left child), so `resolveColumnByName` returns None and a parameterless built-in function already wins via the pre-existing "function beats outer reference" precedence -- before `isSQLFunctionParameterAlias` is evaluated. The legacy flag therefore never restored param-shadowing for table UDFs (the legacy golden showed `function_won = true` for the TVF case, identical to non-legacy). - Remove the marker from `SessionCatalog.makeSQLTableFunctionPlan` (with a comment explaining why table UDFs need no marker). - Restrict the `CreateSQLFunctionCommand` marker to scalar functions (`!isTableFunc`). - Reword the TVF golden test comment: it is a regression guard for the pre-existing function-vs-outer-reference precedence, not this PR's marker logic. Goldens are unchanged (table-UDF behavior is identical), confirming the markers were inert. The scalar direct-body UDF fix (the bug-report shape) is unaffected.
What changes were proposed in this pull request?
A parameterless built-in function (
current_user,current_date,current_time,current_timestamp,user,session_user,grouping__id) now takes precedence over a SQL UDF parameter that shares its name. Previously the UDF parameter alias shadowed the built-in, e.g.CREATE FUNCTION f(current_user STRING) RETURNS STRING RETURN current_user; SELECT f('alice')returned'alice'instead of the actual current user.SQL UDF input-parameter aliases are marked with a named
Metadatakey,SessionCatalog.SQL_FUNCTION_PARAMETER_ALIAS_METADATA_KEY. Producers areSessionCatalog.makeSQLFunctionPlanandCreateSQLFunctionCommand(CREATE time). Storing the marker inMetadata(rather than aTreeNodeTag) lets it propagate automatically throughAlias.toAttributeandAttributeReferencecopies.In
ColumnResolutionHelper, when name-based resolution returns an attribute carrying this metadata,LiteralFunctionResolutionis preferred over it. Real columns from relations (no marker) still win, preserving the overall precedence (column > parameterless function > UDF parameter).The new behavior is gated behind a legacy kill-switch,
spark.sql.legacy.allowUdfParameterToShadowParameterlessFunction(defaultfalse); set totrueto restore the previous behavior.Why are the changes needed?
To match the documented SQL name resolution rules: a parameterless built-in function should not be shadowed by a same-named UDF parameter.
Does this PR introduce any user-facing change?
Yes. A SQL UDF parameter named like a parameterless built-in function no longer shadows that function in the function body. The legacy conf restores the old behavior.
How was this patch tested?
Added golden-file tests
sql-udf-name-precedence(.legacy)andparameterless-function-name-precedence(.legacy)covering column > param, LCA > param, outer-ref > param, parameterless function > param (scalar and table UDF), param > session variable, nested-UDF inner-scope binding, and the legacy (flag-on) behavior. All 4 scenarios pass.Was this patch authored or co-authored using generative AI tooling?
Yes, co-authored using Claude code.