Skip to content

[SPARK-57208][SQL] Simplify Ascii codegen by extracting a static Java helper#56267

Closed
LuciferYang wants to merge 1 commit into
apache:masterfrom
LuciferYang:ascii-codegen-helper
Closed

[SPARK-57208][SQL] Simplify Ascii codegen by extracting a static Java helper#56267
LuciferYang wants to merge 1 commit into
apache:masterfrom
LuciferYang:ascii-codegen-helper

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Ascii implemented the same first-character logic twice — once in nullSafeEval and once inlined in doGenCode (a ~6-line substring(0, 1) / numChars() > 0 ? codePointAt(0) : 0 block). This moves that logic into a single ExpressionImplUtils.ascii(UTF8String): int helper; nullSafeEval delegates to it and doGenCode becomes a one-line defineCodeGen call, so eval and codegen share one implementation.

This follows the other SPARK-56908 helpers in ExpressionImplUtils and lands alongside Crc32 (#56222), regexp (#56223), Chr (#56224), Acosh (#56228), and Asinh (#56229), which all append to the same file.

Why are the changes needed?

It collapses the inlined block into a single invokestatic per ascii call site — fewer constant-pool entries and a smaller generated method, which helps with the JVM 64KB method / constant-pool limits, Janino compile time, and JIT work — and removes the duplicated logic between eval and codegen. Part of SPARK-56908.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing StringExpressionsSuite ascii tests, plus a new assertion for a supplementary-plane code point (ascii('😀') = 128512) that pins the codePointAt (rather than charAt) behavior. checkEvaluation runs both the interpreted and codegen paths.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

… helper

Move Ascii's first-character logic into a single ExpressionImplUtils.ascii(UTF8String) helper. nullSafeEval delegates to it and doGenCode becomes a one-line defineCodeGen call, so eval and codegen share one implementation instead of duplicating the substring/codePointAt block. This collapses the inlined block into one invokestatic per call site (fewer constant-pool entries, smaller generated method), helping with the JVM 64KB method / constant-pool limits, Janino compile time, and JIT work.

Part of SPARK-56908.
@LuciferYang LuciferYang force-pushed the ascii-codegen-helper branch from 76715b1 to 2078851 Compare June 3, 2026 02:30
@LuciferYang
Copy link
Copy Markdown
Contributor Author

rebased

LuciferYang added a commit that referenced this pull request Jun 3, 2026
… helper

### What changes were proposed in this pull request?

`Ascii` implemented the same first-character logic twice — once in `nullSafeEval` and once inlined in `doGenCode` (a ~6-line `substring(0, 1)` / `numChars() > 0 ? codePointAt(0) : 0` block). This moves that logic into a single `ExpressionImplUtils.ascii(UTF8String): int` helper; `nullSafeEval` delegates to it and `doGenCode` becomes a one-line `defineCodeGen` call, so eval and codegen share one implementation.

This follows the other SPARK-56908 helpers in `ExpressionImplUtils` and lands alongside Crc32 (#56222), regexp (#56223), Chr (#56224), Acosh (#56228), and Asinh (#56229), which all append to the same file.

### Why are the changes needed?

It collapses the inlined block into a single `invokestatic` per `ascii` call site — fewer constant-pool entries and a smaller generated method, which helps with the JVM 64KB method / constant-pool limits, Janino compile time, and JIT work — and removes the duplicated logic between eval and codegen. Part of SPARK-56908.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing `StringExpressionsSuite` `ascii` tests, plus a new assertion for a supplementary-plane code point (`ascii('😀') = 128512`) that pins the `codePointAt` (rather than `charAt`) behavior. `checkEvaluation` runs both the interpreted and codegen paths.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes #56267 from LuciferYang/ascii-codegen-helper.

Authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
(cherry picked from commit 1fe8493)
Signed-off-by: yangjie01 <yangjie01@baidu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants