Skip to content

feat: support dayname and monthname natively#4544

Merged
mbutrovich merged 5 commits into
apache:mainfrom
andygrove:feat-datetime-dayname-monthname
Jun 2, 2026
Merged

feat: support dayname and monthname natively#4544
mbutrovich merged 5 commits into
apache:mainfrom
andygrove:feat-datetime-dayname-monthname

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented May 30, 2026

Which issue does this PR close?

Part of #4418.

Rationale for this change

dayname and monthname (Spark 4.0+) had no Comet support and forced a fallback to Spark.

What changes are included in this PR?

This work was scaffolded with the implement-comet-expression skill.

dayname and monthname are implemented natively. Spark's DayName / MonthName map a DateType value through DayOfWeek / Month .getDisplayName(TextStyle.SHORT, Locale.US). DateFormatter.defaultLocale is the constant Locale.US, so the result is a fixed set of abbreviated English names (Mon..Sun, Jan..Dec) with no session-locale or timezone dependence. The native Rust scalar function reproduces this exactly with US-English lookup tables keyed on the weekday / month computed from the Date32 value.

  • A dayname / monthname scalar function in native/spark-expr/src/datetime_funcs/day_month_name.rs, registered in comet_scalar_funcs.
  • Serde wiring in the spark-4.0 / 4.1 / 4.2 CometExprShim (these are Spark 4.0+ expression classes, so they cannot live in shared serde) that emits the scalar function proto.

These were Spark 4.0+ only, so they are kept out of the shared serde and gated to 4.0+.

The documentation-only corrections for the rewrite-backed and constant-folded datetime functions (to_date, to_timestamp, make_timestamp_*, current_*, now, curdate) are split out into #4543.

How are these changes tested?

A native Rust unit test covers the weekday / month name mapping, including the epoch boundary and a pre-epoch date. A CometExpressionSuite test (gated to Spark 4.0+) uses checkSparkAnswerAndOperator to assert both functions execute fully in Comet (no fallback) and return Spark-identical results across column references, a null date, and a constant-folding-disabled literal. Verified locally against the Spark 4.0 profile.

@andygrove andygrove force-pushed the feat-datetime-dayname-monthname branch from c08890c to f60f5b1 Compare May 30, 2026 20:37
Add Comet support for the Spark 4.0+ dayname and monthname expressions by
routing them through the Arrow-direct codegen dispatcher in the spark-4.0
shim, so they run Spark's own generated code for exact parity including
locale handling.

Also add tests confirming that several datetime functions already execute in
Comet via Spark's rewrite rules, and update the expression support doc to
reflect this coverage:

- to_date / to_timestamp / to_timestamp_ntz / to_timestamp_ltz reduce to
  Cast (no format) or GetTimestamp (with format)
- make_timestamp_ntz / make_timestamp_ltz (6-argument form) reduce to
  MakeTimestamp
- current_date / current_timestamp / now / curdate are constant-folded to
  literals before Comet sees the plan
@andygrove andygrove force-pushed the feat-datetime-dayname-monthname branch from f60f5b1 to 5488ec0 Compare May 30, 2026 21:56
andygrove added 2 commits May 31, 2026 08:25
… ci]

The docs-only corrections for to_date/to_timestamp/make_timestamp/current_* move
to apache#4543. This PR's doc change is limited to the new dayname and monthname
expressions. monthname is now also checked off (it was missed).
Replace the JVM codegen-dispatch path with a native Rust scalar function.
Spark's DayName/MonthName map a DateType value through
DayOfWeek/Month.getDisplayName(TextStyle.SHORT, Locale.US), which is locale-fixed
and timezone-independent, so the native function reproduces it exactly with
US-English lookup tables keyed on the weekday/month of the Date32 value.

Wired through CometExprShim for Spark 4.0/4.1/4.2 via scalarFunctionExprToProto,
registered as dayname/monthname in comet_scalar_funcs. The test now asserts native
execution without the codegen dispatcher and covers null and literal inputs.
@andygrove andygrove changed the title feat: support dayname and monthname expressions feat: support dayname and monthname natively May 31, 2026
andygrove added 2 commits May 31, 2026 08:59
…to those functions [skip ci]

The dayname/monthname conversion was identical across the spark-4.0/4.1/4.2
CometExprShim files. Move it into a single CometExprShim4x trait under the shared
spark-4.x source root (compiled for all 4.x profiles) and have each per-version
shim mix it in and delegate, removing the triplication.

Also drop the to_date/to_timestamp, make_timestamp, and current_date/now tests so
this PR is strictly the dayname/monthname feature; those cover rewrite-backed and
constant-folded functions whose documentation lives in the docs PR.
Replace the Scala CometExpressionSuite test with one SQL file per function under
expressions/datetime, gated to Spark 4.0+ via MinSparkVersion so they skip cleanly
on 3.4/3.5 where the functions do not exist. The default query mode asserts native
Comet execution and Spark-identical results; coverage includes a null date and a
literal argument.
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @andygrove!

@mbutrovich mbutrovich merged commit 1296312 into apache:main Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants