Skip to content

[SPARK-57187][SQL] Fix INTERNAL_ERROR when current_user() is used as DEFAULT for CHAR/VARCHAR columns#56238

Closed
dejankrak-db wants to merge 2 commits into
apache:masterfrom
dejankrak-db:current-user-default-char-oss
Closed

[SPARK-57187][SQL] Fix INTERNAL_ERROR when current_user() is used as DEFAULT for CHAR/VARCHAR columns#56238
dejankrak-db wants to merge 2 commits into
apache:masterfrom
dejankrak-db:current-user-default-char-oss

Conversation

@dejankrak-db
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Two related fixes for non-foldable DEFAULT expressions (e.g. current_user()) on CHAR/VARCHAR columns:

  1. Fix INTERNAL_ERROR at DDL time. ResolveDefaultColumns.coerceDefaultValue called CharVarcharUtils.stringLengthCheck(ret, dataType).eval(EmptyRow) to validate CHAR/VARCHAR default lengths at compile time, without checking foldability. For a non-foldable default such as current_user() (processed before it is replaced by a literal), .eval(EmptyRow) throws "Cannot evaluate expression", surfacing as INTERNAL_ERROR. Added an && ret.foldable guard so the eager check only runs for foldable expressions.

  2. Add runtime CHAR/VARCHAR length enforcement for implicit defaults. When a column with a DEFAULT is omitted from an INSERT column list, TableOutputResolver.reorderColumnsByName filled the default but did not apply stringLengthCheck, so oversized non-foldable defaults silently succeeded. It now wraps the default expression with the write-side length check, using CharVarcharUtils.getRawType(metadata) to detect CHAR/VARCHAR for both V1 and V2 tables. The explicit DEFAULT keyword path already goes through checkField.

Why are the changes needed?

CREATE TABLE t(s CHAR(100) DEFAULT current_user()) USING parquet (or the equivalent ALTER TABLE ... SET DEFAULT) previously failed with INTERNAL_ERROR. Oversized non-foldable defaults for omitted columns also bypassed length checks.

Does this PR introduce any user-facing change?

Yes. Non-foldable expressions are now allowed as CHAR/VARCHAR defaults (instead of throwing INTERNAL_ERROR), and oversized non-foldable defaults on omitted INSERT columns now fail with EXCEED_LIMIT_LENGTH at runtime instead of succeeding.

How was this patch tested?

Added 6 tests to ResolveDefaultColumnsSuite (CREATE/ALTER with current_user() on CHAR/VARCHAR; foldable oversize fails at DDL; non-foldable oversize fails at INSERT for both implicit and explicit DEFAULT paths). Full suite: 19 tests pass.

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored using Claude code.

…DEFAULT for CHAR/VARCHAR columns

### What changes were proposed in this pull request?

Two related fixes for non-foldable DEFAULT expressions (e.g. `current_user()`) on
CHAR/VARCHAR columns:

1. **Fix INTERNAL_ERROR at DDL time.** `ResolveDefaultColumns.coerceDefaultValue`
   called `CharVarcharUtils.stringLengthCheck(ret, dataType).eval(EmptyRow)` to
   validate CHAR/VARCHAR default lengths at compile time, without checking
   foldability. For a non-foldable default such as `current_user()` (processed
   before it is replaced by a literal), `.eval(EmptyRow)` throws
   "Cannot evaluate expression", surfacing as INTERNAL_ERROR. Added an
   `&& ret.foldable` guard so the eager check only runs for foldable expressions.

2. **Add runtime CHAR/VARCHAR length enforcement for implicit defaults.** When a
   column with a DEFAULT is omitted from an INSERT column list,
   `TableOutputResolver.reorderColumnsByName` filled the default but did not apply
   `stringLengthCheck`, so oversized non-foldable defaults silently succeeded. It
   now wraps the default expression with the write-side length check, using
   `CharVarcharUtils.getRawType(metadata)` to detect CHAR/VARCHAR for both V1 and
   V2 tables. The explicit `DEFAULT` keyword path already goes through `checkField`.

### Why are the changes needed?

`CREATE TABLE t(s CHAR(100) DEFAULT current_user()) USING parquet` (or the
equivalent `ALTER TABLE ... SET DEFAULT`) previously failed with INTERNAL_ERROR.
Oversized non-foldable defaults for omitted columns also bypassed length checks.

### Does this PR introduce _any_ user-facing change?

Yes. Non-foldable expressions are now allowed as CHAR/VARCHAR defaults (instead of
throwing INTERNAL_ERROR), and oversized non-foldable defaults on omitted INSERT
columns now fail with EXCEED_LIMIT_LENGTH at runtime instead of succeeding.

### How was this patch tested?

Added 6 tests to `ResolveDefaultColumnsSuite` (CREATE/ALTER with `current_user()`
on CHAR/VARCHAR; foldable oversize fails at DDL; non-foldable oversize fails at
INSERT for both implicit and explicit DEFAULT paths). Full suite: 19 tests pass.
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, clean fix for the INTERNAL_ERROR — the foldability guard and the by-name runtime enforcement both look right and are well covered by the new tests. One completeness gap and one simplification below.

1 blocking, 1 non-blocking, 0 nits.

Correctness (1)

  • TableOutputResolver.scala:330: the runtime length check is only added to the by-name default-fill path; the by-position path (resolveColumnsByPosition) has the same bypass — see inline

Suggestions (1)

  • TableOutputResolver.scala:339: the Alias-field preservation is discarded by applyColumnMetadata and the case other branch is dead — see inline

)
}
Some(applyColumnMetadata(defaultExpr.get, expectedCol))
// Apply CHAR/VARCHAR length check to the default expression so that non-foldable
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runtime enforcement is only added to the by-name default-fill path. The by-position fill in resolveColumnsByPosition (around lines 461-470) is structurally identical — getDefaultValueExprOrNullLit + applyColumnMetadata with no stringLengthCheck — and is reached for a by-position INSERT under V2 schema evolution (RECURSE; see Analyzer.scala's coerceInsertNestedTypes && schemaEvolutionEnabled branch) when a trailing CHAR/VARCHAR column is omitted. So an oversized non-foldable default there would still be written without EXCEED_LIMIT_LENGTH — the same bypass this PR closes for by-name. I traced both sites and the matched-column analogues (checkField/checkUpdate both wrap with stringLengthCheck) but didn't run an end-to-end schema-evolution write — can you confirm the by-position path isn't enforced elsewhere? If not, I'd extract a small shared helper (e.g. applyDefaultWithLengthCheck(expectedCol, conf)) called from both default-fill sites, and add a test covering the by-position/RECURSE path.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted a shared applyDefaultWithLengthCheck(defaultExpr, expectedCol, conf) helper, called from both default-fill sites — reorderColumnsByName (by-name) and resolveColumnsByPosition (by-position, reached under V2 schema-evolution RECURSE). Both paths now apply the CHAR/VARCHAR write-side length check identically. Thanks!

if (!conf.charVarcharAsString && CharVarcharUtils.hasCharVarchar(rawType)) {
expr match {
case a: Alias =>
a.copy(child = CharVarcharUtils.stringLengthCheck(a.child, rawType))(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch carefully preserves exprId/qualifier/explicitMetadata/nonInheritableMetadataKeys, but applyColumnMetadata on the next line strips the Alias down to its child and re-wraps it in a fresh Alias(stripAlias, name)(explicitMetadata = Some(requiredMetadata)), so all of that is discarded. The case other branch is also dead — getDefaultValueExprOrNullLit always returns an Alias. The whole block can collapse to wrapping the default's child with stringLengthCheck, which folds naturally into the shared helper suggested above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code collapsed - the helper drops the pointless Alias-field preservation (applyColumnMetadata strips + re-wraps anyway) and the dead case other branch; it just wraps the default's value with stringLengthCheck

…ass, dedup helper

Per review feedback:

- The CHAR/VARCHAR write-side length check for filled default values was only
  applied on the by-name path (`reorderColumnsByName`). The by-position path
  (`resolveColumnsByPosition`, reached under V2 schema evolution / RECURSE when a
  trailing column is omitted) had the same bypass. Extract a shared
  `applyDefaultWithLengthCheck` helper and call it from both default-fill sites so
  an oversized non-foldable default is caught at runtime on both paths.
- Collapse the dead Alias-field preservation: `applyColumnMetadata` strips the
  default's alias to its child and re-wraps it with the required metadata, so
  preserving exprId/qualifier/explicitMetadata was a no-op, and the `case other`
  branch was dead (`getDefaultValueExprOrNullLit` always returns an Alias). The
  helper now just wraps the default's value with `stringLengthCheck`.

Added a regression test driving `resolveOutputColumns(byName = false, RECURSE)`
with a trailing `CHAR(100) DEFAULT current_user()` column omitted, asserting the
filled default is wrapped with the `charTypeWriteSideCheck` write-side check.
Full ResolveDefaultColumnsSuite: 20 tests pass.
@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master/4.x/4.2 (bug fix)

Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 addressed, 0 remaining, 1 new. (1 newly introduced, 0 late catches.)

Both prior findings cleanly addressed - the by-position bypass is closed by routing both default-fill sites through the shared applyDefaultWithLengthCheck, and the dead alias-preservation block is collapsed into it.

Nits: 1 minor item (see inline comments).

Verification

Checked fix completeness: of the 7 applyColumnMetadata call sites, only two are default-fills (reorderColumnsByName :355, resolveColumnsByPosition :476) and both now route through the shared length-check helper. The matched-column (checkUpdate :205-208) and explicit-DEFAULT (checkField) paths already wrap with stringLengthCheck, so no remaining write path bypasses the check. The by-position test asserts the charTypeWriteSideCheck StaticInvoke directly.

Comment on lines +293 to +294
* `applyColumnMetadata` strips the default's outer alias and re-wraps it with the required
* metadata, so the length check is applied to the default value itself (the alias child).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor doc-accuracy nit: this sentence attributes the alias unwrap to applyColumnMetadata, but in the CHAR/VARCHAR path the helper itself unwraps the alias (case a: Alias => a.child below) before the length check, and applyColumnMetadata then runs on the StaticInvoke (no longer an Alias), so it never strips the default's alias here. The unwrap is needed precisely because otherwise stringLengthCheck would wrap the Alias and applyColumnMetadata would not strip it. Behavior is correct; only the explanation is backwards. Suggested rewrite:

Suggested change
* `applyColumnMetadata` strips the default's outer alias and re-wraps it with the required
* metadata, so the length check is applied to the default value itself (the alias child).
* We unwrap the default's outer alias before the length check so the check wraps the
* default value itself, not the alias; `applyColumnMetadata` then re-adds the required
* alias and metadata afterward.

@cloud-fan cloud-fan closed this in 3bf85b7 Jun 2, 2026
cloud-fan pushed a commit that referenced this pull request Jun 2, 2026
…DEFAULT for CHAR/VARCHAR columns

### What changes were proposed in this pull request?

Two related fixes for non-foldable DEFAULT expressions (e.g. `current_user()`) on CHAR/VARCHAR columns:

1. **Fix INTERNAL_ERROR at DDL time.** `ResolveDefaultColumns.coerceDefaultValue` called `CharVarcharUtils.stringLengthCheck(ret, dataType).eval(EmptyRow)` to validate CHAR/VARCHAR default lengths at compile time, without checking foldability. For a non-foldable default such as `current_user()` (processed before it is replaced by a literal), `.eval(EmptyRow)` throws "Cannot evaluate expression", surfacing as INTERNAL_ERROR. Added an `&& ret.foldable` guard so the eager check only runs for foldable expressions.

2. **Add runtime CHAR/VARCHAR length enforcement for implicit defaults.** When a column with a DEFAULT is omitted from an INSERT column list, `TableOutputResolver.reorderColumnsByName` filled the default but did not apply `stringLengthCheck`, so oversized non-foldable defaults silently succeeded. It now wraps the default expression with the write-side length check, using `CharVarcharUtils.getRawType(metadata)` to detect CHAR/VARCHAR for both V1 and V2 tables. The explicit `DEFAULT` keyword path already goes through `checkField`.

### Why are the changes needed?

`CREATE TABLE t(s CHAR(100) DEFAULT current_user()) USING parquet` (or the equivalent `ALTER TABLE ... SET DEFAULT`) previously failed with INTERNAL_ERROR. Oversized non-foldable defaults for omitted columns also bypassed length checks.

### Does this PR introduce _any_ user-facing change?

Yes. Non-foldable expressions are now allowed as CHAR/VARCHAR defaults (instead of throwing INTERNAL_ERROR), and oversized non-foldable defaults on omitted INSERT columns now fail with EXCEED_LIMIT_LENGTH at runtime instead of succeeding.

### How was this patch tested?

Added 6 tests to `ResolveDefaultColumnsSuite` (CREATE/ALTER with `current_user()` on CHAR/VARCHAR; foldable oversize fails at DDL; non-foldable oversize fails at INSERT for both implicit and explicit DEFAULT paths). Full suite: 19 tests pass.

### Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored using Claude code.

Closes #56238 from dejankrak-db/current-user-default-char-oss.

Authored-by: Dejan Krakovic <dejan.krakovic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 3bf85b7)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Jun 2, 2026
…DEFAULT for CHAR/VARCHAR columns

### What changes were proposed in this pull request?

Two related fixes for non-foldable DEFAULT expressions (e.g. `current_user()`) on CHAR/VARCHAR columns:

1. **Fix INTERNAL_ERROR at DDL time.** `ResolveDefaultColumns.coerceDefaultValue` called `CharVarcharUtils.stringLengthCheck(ret, dataType).eval(EmptyRow)` to validate CHAR/VARCHAR default lengths at compile time, without checking foldability. For a non-foldable default such as `current_user()` (processed before it is replaced by a literal), `.eval(EmptyRow)` throws "Cannot evaluate expression", surfacing as INTERNAL_ERROR. Added an `&& ret.foldable` guard so the eager check only runs for foldable expressions.

2. **Add runtime CHAR/VARCHAR length enforcement for implicit defaults.** When a column with a DEFAULT is omitted from an INSERT column list, `TableOutputResolver.reorderColumnsByName` filled the default but did not apply `stringLengthCheck`, so oversized non-foldable defaults silently succeeded. It now wraps the default expression with the write-side length check, using `CharVarcharUtils.getRawType(metadata)` to detect CHAR/VARCHAR for both V1 and V2 tables. The explicit `DEFAULT` keyword path already goes through `checkField`.

### Why are the changes needed?

`CREATE TABLE t(s CHAR(100) DEFAULT current_user()) USING parquet` (or the equivalent `ALTER TABLE ... SET DEFAULT`) previously failed with INTERNAL_ERROR. Oversized non-foldable defaults for omitted columns also bypassed length checks.

### Does this PR introduce _any_ user-facing change?

Yes. Non-foldable expressions are now allowed as CHAR/VARCHAR defaults (instead of throwing INTERNAL_ERROR), and oversized non-foldable defaults on omitted INSERT columns now fail with EXCEED_LIMIT_LENGTH at runtime instead of succeeding.

### How was this patch tested?

Added 6 tests to `ResolveDefaultColumnsSuite` (CREATE/ALTER with `current_user()` on CHAR/VARCHAR; foldable oversize fails at DDL; non-foldable oversize fails at INSERT for both implicit and explicit DEFAULT paths). Full suite: 19 tests pass.

### Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored using Claude code.

Closes #56238 from dejankrak-db/current-user-default-char-oss.

Authored-by: Dejan Krakovic <dejan.krakovic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 3bf85b7)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Copy Markdown
Contributor

@dejankrak-db you can address the last minor review comments in your other related PRs.

@dejankrak-db
Copy link
Copy Markdown
Contributor Author

@dejankrak-db you can address the last minor review comments in your other related PRs.

Added the suggested minor fix as part of #56237

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants