[SPARK-57187][SQL] Fix INTERNAL_ERROR when current_user() is used as DEFAULT for CHAR/VARCHAR columns#56238
Conversation
…DEFAULT for CHAR/VARCHAR columns ### What changes were proposed in this pull request? Two related fixes for non-foldable DEFAULT expressions (e.g. `current_user()`) on CHAR/VARCHAR columns: 1. **Fix INTERNAL_ERROR at DDL time.** `ResolveDefaultColumns.coerceDefaultValue` called `CharVarcharUtils.stringLengthCheck(ret, dataType).eval(EmptyRow)` to validate CHAR/VARCHAR default lengths at compile time, without checking foldability. For a non-foldable default such as `current_user()` (processed before it is replaced by a literal), `.eval(EmptyRow)` throws "Cannot evaluate expression", surfacing as INTERNAL_ERROR. Added an `&& ret.foldable` guard so the eager check only runs for foldable expressions. 2. **Add runtime CHAR/VARCHAR length enforcement for implicit defaults.** When a column with a DEFAULT is omitted from an INSERT column list, `TableOutputResolver.reorderColumnsByName` filled the default but did not apply `stringLengthCheck`, so oversized non-foldable defaults silently succeeded. It now wraps the default expression with the write-side length check, using `CharVarcharUtils.getRawType(metadata)` to detect CHAR/VARCHAR for both V1 and V2 tables. The explicit `DEFAULT` keyword path already goes through `checkField`. ### Why are the changes needed? `CREATE TABLE t(s CHAR(100) DEFAULT current_user()) USING parquet` (or the equivalent `ALTER TABLE ... SET DEFAULT`) previously failed with INTERNAL_ERROR. Oversized non-foldable defaults for omitted columns also bypassed length checks. ### Does this PR introduce _any_ user-facing change? Yes. Non-foldable expressions are now allowed as CHAR/VARCHAR defaults (instead of throwing INTERNAL_ERROR), and oversized non-foldable defaults on omitted INSERT columns now fail with EXCEED_LIMIT_LENGTH at runtime instead of succeeding. ### How was this patch tested? Added 6 tests to `ResolveDefaultColumnsSuite` (CREATE/ALTER with `current_user()` on CHAR/VARCHAR; foldable oversize fails at DDL; non-foldable oversize fails at INSERT for both implicit and explicit DEFAULT paths). Full suite: 19 tests pass.
cloud-fan
left a comment
There was a problem hiding this comment.
Nice, clean fix for the INTERNAL_ERROR — the foldability guard and the by-name runtime enforcement both look right and are well covered by the new tests. One completeness gap and one simplification below.
1 blocking, 1 non-blocking, 0 nits.
Correctness (1)
- TableOutputResolver.scala:330: the runtime length check is only added to the by-name default-fill path; the by-position path (
resolveColumnsByPosition) has the same bypass — see inline
Suggestions (1)
- TableOutputResolver.scala:339: the Alias-field preservation is discarded by
applyColumnMetadataand thecase otherbranch is dead — see inline
| ) | ||
| } | ||
| Some(applyColumnMetadata(defaultExpr.get, expectedCol)) | ||
| // Apply CHAR/VARCHAR length check to the default expression so that non-foldable |
There was a problem hiding this comment.
This runtime enforcement is only added to the by-name default-fill path. The by-position fill in resolveColumnsByPosition (around lines 461-470) is structurally identical — getDefaultValueExprOrNullLit + applyColumnMetadata with no stringLengthCheck — and is reached for a by-position INSERT under V2 schema evolution (RECURSE; see Analyzer.scala's coerceInsertNestedTypes && schemaEvolutionEnabled branch) when a trailing CHAR/VARCHAR column is omitted. So an oversized non-foldable default there would still be written without EXCEED_LIMIT_LENGTH — the same bypass this PR closes for by-name. I traced both sites and the matched-column analogues (checkField/checkUpdate both wrap with stringLengthCheck) but didn't run an end-to-end schema-evolution write — can you confirm the by-position path isn't enforced elsewhere? If not, I'd extract a small shared helper (e.g. applyDefaultWithLengthCheck(expectedCol, conf)) called from both default-fill sites, and add a test covering the by-position/RECURSE path.
There was a problem hiding this comment.
Extracted a shared applyDefaultWithLengthCheck(defaultExpr, expectedCol, conf) helper, called from both default-fill sites — reorderColumnsByName (by-name) and resolveColumnsByPosition (by-position, reached under V2 schema-evolution RECURSE). Both paths now apply the CHAR/VARCHAR write-side length check identically. Thanks!
| if (!conf.charVarcharAsString && CharVarcharUtils.hasCharVarchar(rawType)) { | ||
| expr match { | ||
| case a: Alias => | ||
| a.copy(child = CharVarcharUtils.stringLengthCheck(a.child, rawType))( |
There was a problem hiding this comment.
This branch carefully preserves exprId/qualifier/explicitMetadata/nonInheritableMetadataKeys, but applyColumnMetadata on the next line strips the Alias down to its child and re-wraps it in a fresh Alias(stripAlias, name)(explicitMetadata = Some(requiredMetadata)), so all of that is discarded. The case other branch is also dead — getDefaultValueExprOrNullLit always returns an Alias. The whole block can collapse to wrapping the default's child with stringLengthCheck, which folds naturally into the shared helper suggested above.
There was a problem hiding this comment.
Dead code collapsed - the helper drops the pointless Alias-field preservation (applyColumnMetadata strips + re-wraps anyway) and the dead case other branch; it just wraps the default's value with stringLengthCheck
…ass, dedup helper Per review feedback: - The CHAR/VARCHAR write-side length check for filled default values was only applied on the by-name path (`reorderColumnsByName`). The by-position path (`resolveColumnsByPosition`, reached under V2 schema evolution / RECURSE when a trailing column is omitted) had the same bypass. Extract a shared `applyDefaultWithLengthCheck` helper and call it from both default-fill sites so an oversized non-foldable default is caught at runtime on both paths. - Collapse the dead Alias-field preservation: `applyColumnMetadata` strips the default's alias to its child and re-wraps it with the required metadata, so preserving exprId/qualifier/explicitMetadata was a no-op, and the `case other` branch was dead (`getDefaultValueExprOrNullLit` always returns an Alias). The helper now just wraps the default's value with `stringLengthCheck`. Added a regression test driving `resolveOutputColumns(byName = false, RECURSE)` with a trailing `CHAR(100) DEFAULT current_user()` column omitted, asserting the filled default is wrapped with the `charTypeWriteSideCheck` write-side check. Full ResolveDefaultColumnsSuite: 20 tests pass.
|
thanks, merging to master/4.x/4.2 (bug fix) |
cloud-fan
left a comment
There was a problem hiding this comment.
2 addressed, 0 remaining, 1 new. (1 newly introduced, 0 late catches.)
Both prior findings cleanly addressed - the by-position bypass is closed by routing both default-fill sites through the shared applyDefaultWithLengthCheck, and the dead alias-preservation block is collapsed into it.
Nits: 1 minor item (see inline comments).
Verification
Checked fix completeness: of the 7 applyColumnMetadata call sites, only two are default-fills (reorderColumnsByName :355, resolveColumnsByPosition :476) and both now route through the shared length-check helper. The matched-column (checkUpdate :205-208) and explicit-DEFAULT (checkField) paths already wrap with stringLengthCheck, so no remaining write path bypasses the check. The by-position test asserts the charTypeWriteSideCheck StaticInvoke directly.
| * `applyColumnMetadata` strips the default's outer alias and re-wraps it with the required | ||
| * metadata, so the length check is applied to the default value itself (the alias child). |
There was a problem hiding this comment.
Minor doc-accuracy nit: this sentence attributes the alias unwrap to applyColumnMetadata, but in the CHAR/VARCHAR path the helper itself unwraps the alias (case a: Alias => a.child below) before the length check, and applyColumnMetadata then runs on the StaticInvoke (no longer an Alias), so it never strips the default's alias here. The unwrap is needed precisely because otherwise stringLengthCheck would wrap the Alias and applyColumnMetadata would not strip it. Behavior is correct; only the explanation is backwards. Suggested rewrite:
| * `applyColumnMetadata` strips the default's outer alias and re-wraps it with the required | |
| * metadata, so the length check is applied to the default value itself (the alias child). | |
| * We unwrap the default's outer alias before the length check so the check wraps the | |
| * default value itself, not the alias; `applyColumnMetadata` then re-adds the required | |
| * alias and metadata afterward. |
…DEFAULT for CHAR/VARCHAR columns ### What changes were proposed in this pull request? Two related fixes for non-foldable DEFAULT expressions (e.g. `current_user()`) on CHAR/VARCHAR columns: 1. **Fix INTERNAL_ERROR at DDL time.** `ResolveDefaultColumns.coerceDefaultValue` called `CharVarcharUtils.stringLengthCheck(ret, dataType).eval(EmptyRow)` to validate CHAR/VARCHAR default lengths at compile time, without checking foldability. For a non-foldable default such as `current_user()` (processed before it is replaced by a literal), `.eval(EmptyRow)` throws "Cannot evaluate expression", surfacing as INTERNAL_ERROR. Added an `&& ret.foldable` guard so the eager check only runs for foldable expressions. 2. **Add runtime CHAR/VARCHAR length enforcement for implicit defaults.** When a column with a DEFAULT is omitted from an INSERT column list, `TableOutputResolver.reorderColumnsByName` filled the default but did not apply `stringLengthCheck`, so oversized non-foldable defaults silently succeeded. It now wraps the default expression with the write-side length check, using `CharVarcharUtils.getRawType(metadata)` to detect CHAR/VARCHAR for both V1 and V2 tables. The explicit `DEFAULT` keyword path already goes through `checkField`. ### Why are the changes needed? `CREATE TABLE t(s CHAR(100) DEFAULT current_user()) USING parquet` (or the equivalent `ALTER TABLE ... SET DEFAULT`) previously failed with INTERNAL_ERROR. Oversized non-foldable defaults for omitted columns also bypassed length checks. ### Does this PR introduce _any_ user-facing change? Yes. Non-foldable expressions are now allowed as CHAR/VARCHAR defaults (instead of throwing INTERNAL_ERROR), and oversized non-foldable defaults on omitted INSERT columns now fail with EXCEED_LIMIT_LENGTH at runtime instead of succeeding. ### How was this patch tested? Added 6 tests to `ResolveDefaultColumnsSuite` (CREATE/ALTER with `current_user()` on CHAR/VARCHAR; foldable oversize fails at DDL; non-foldable oversize fails at INSERT for both implicit and explicit DEFAULT paths). Full suite: 19 tests pass. ### Was this patch authored or co-authored using generative AI tooling? Yes, co-authored using Claude code. Closes #56238 from dejankrak-db/current-user-default-char-oss. Authored-by: Dejan Krakovic <dejan.krakovic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 3bf85b7) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…DEFAULT for CHAR/VARCHAR columns ### What changes were proposed in this pull request? Two related fixes for non-foldable DEFAULT expressions (e.g. `current_user()`) on CHAR/VARCHAR columns: 1. **Fix INTERNAL_ERROR at DDL time.** `ResolveDefaultColumns.coerceDefaultValue` called `CharVarcharUtils.stringLengthCheck(ret, dataType).eval(EmptyRow)` to validate CHAR/VARCHAR default lengths at compile time, without checking foldability. For a non-foldable default such as `current_user()` (processed before it is replaced by a literal), `.eval(EmptyRow)` throws "Cannot evaluate expression", surfacing as INTERNAL_ERROR. Added an `&& ret.foldable` guard so the eager check only runs for foldable expressions. 2. **Add runtime CHAR/VARCHAR length enforcement for implicit defaults.** When a column with a DEFAULT is omitted from an INSERT column list, `TableOutputResolver.reorderColumnsByName` filled the default but did not apply `stringLengthCheck`, so oversized non-foldable defaults silently succeeded. It now wraps the default expression with the write-side length check, using `CharVarcharUtils.getRawType(metadata)` to detect CHAR/VARCHAR for both V1 and V2 tables. The explicit `DEFAULT` keyword path already goes through `checkField`. ### Why are the changes needed? `CREATE TABLE t(s CHAR(100) DEFAULT current_user()) USING parquet` (or the equivalent `ALTER TABLE ... SET DEFAULT`) previously failed with INTERNAL_ERROR. Oversized non-foldable defaults for omitted columns also bypassed length checks. ### Does this PR introduce _any_ user-facing change? Yes. Non-foldable expressions are now allowed as CHAR/VARCHAR defaults (instead of throwing INTERNAL_ERROR), and oversized non-foldable defaults on omitted INSERT columns now fail with EXCEED_LIMIT_LENGTH at runtime instead of succeeding. ### How was this patch tested? Added 6 tests to `ResolveDefaultColumnsSuite` (CREATE/ALTER with `current_user()` on CHAR/VARCHAR; foldable oversize fails at DDL; non-foldable oversize fails at INSERT for both implicit and explicit DEFAULT paths). Full suite: 19 tests pass. ### Was this patch authored or co-authored using generative AI tooling? Yes, co-authored using Claude code. Closes #56238 from dejankrak-db/current-user-default-char-oss. Authored-by: Dejan Krakovic <dejan.krakovic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 3bf85b7) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
@dejankrak-db you can address the last minor review comments in your other related PRs. |
Added the suggested minor fix as part of #56237 |
What changes were proposed in this pull request?
Two related fixes for non-foldable DEFAULT expressions (e.g.
current_user()) on CHAR/VARCHAR columns:Fix INTERNAL_ERROR at DDL time.
ResolveDefaultColumns.coerceDefaultValuecalledCharVarcharUtils.stringLengthCheck(ret, dataType).eval(EmptyRow)to validate CHAR/VARCHAR default lengths at compile time, without checking foldability. For a non-foldable default such ascurrent_user()(processed before it is replaced by a literal),.eval(EmptyRow)throws "Cannot evaluate expression", surfacing as INTERNAL_ERROR. Added an&& ret.foldableguard so the eager check only runs for foldable expressions.Add runtime CHAR/VARCHAR length enforcement for implicit defaults. When a column with a DEFAULT is omitted from an INSERT column list,
TableOutputResolver.reorderColumnsByNamefilled the default but did not applystringLengthCheck, so oversized non-foldable defaults silently succeeded. It now wraps the default expression with the write-side length check, usingCharVarcharUtils.getRawType(metadata)to detect CHAR/VARCHAR for both V1 and V2 tables. The explicitDEFAULTkeyword path already goes throughcheckField.Why are the changes needed?
CREATE TABLE t(s CHAR(100) DEFAULT current_user()) USING parquet(or the equivalentALTER TABLE ... SET DEFAULT) previously failed with INTERNAL_ERROR. Oversized non-foldable defaults for omitted columns also bypassed length checks.Does this PR introduce any user-facing change?
Yes. Non-foldable expressions are now allowed as CHAR/VARCHAR defaults (instead of throwing INTERNAL_ERROR), and oversized non-foldable defaults on omitted INSERT columns now fail with EXCEED_LIMIT_LENGTH at runtime instead of succeeding.
How was this patch tested?
Added 6 tests to
ResolveDefaultColumnsSuite(CREATE/ALTER withcurrent_user()on CHAR/VARCHAR; foldable oversize fails at DDL; non-foldable oversize fails at INSERT for both implicit and explicit DEFAULT paths). Full suite: 19 tests pass.Was this patch authored or co-authored using generative AI tooling?
Yes, co-authored using Claude code.