Skip to content

[datafusion-spark] Support 2-argument ceil(value, scale)#21710

Open
diegoQuinas wants to merge 13 commits into
apache:mainfrom
diegoQuinas:feat/ceil-two-args
Open

[datafusion-spark] Support 2-argument ceil(value, scale)#21710
diegoQuinas wants to merge 13 commits into
apache:mainfrom
diegoQuinas:feat/ceil-two-args

Conversation

@diegoQuinas
Copy link
Copy Markdown
Contributor

@diegoQuinas diegoQuinas commented Apr 17, 2026

Which issue does this PR close?

Part of #21560

Rationale for this change

The Spark ceil function supports an optional scale parameter that controls
the decimal position to round up to. This was not yet implemented in
datafusion-spark.

What changes are included in this PR?

  • Updated Signature to accept 1 or 2 arguments, following the same pattern as SparkRound
  • Updated return_type: floats preserve their type when a scale is provided (instead of returning Int64); scale=0 preserves the original behavior
  • Added get_scale() helper to extract the optional scale argument, returning None for NULL scale (which produces a NULL result)
  • Added ceil_float() helper for ceiling floats at arbitrary decimal positions
  • Updated spark_ceil_scalar and spark_ceil_array to apply the scale

Are these changes tested?

Yes, unit tests covering:

  • Float64/Float32 scalar with positive scale, negative scale, zero scale, and NULL scale
  • Float64 array with scale
  • Existing 1-argument tests continue to pass

Are there any user-facing changes?

Yes — ceil(expr, scale) is now supported in addition to the existing ceil(expr).

Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diegoQuinas
Thanks for the work here.
There are a few inconsistencies between the declared return types and the actual execution paths that could lead to planner/runtime mismatches. I’ve called those out below. Once those are addressed and we have a couple of type-contract tests in place, this should be in a much safer spot.

Comment thread datafusion/spark/src/function/math/ceil.rs
Comment thread datafusion/spark/src/function/math/ceil.rs
Comment thread datafusion/spark/src/function/math/ceil.rs
Comment thread datafusion/spark/src/function/math/ceil.rs Outdated
Comment thread datafusion/spark/src/function/math/ceil.rs Outdated
Comment thread datafusion/spark/src/function/math/ceil.rs Outdated
Comment thread datafusion/spark/src/function/math/ceil.rs Outdated
@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label Apr 28, 2026
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diegoQuinas
Nice follow-up overall. This addresses all the original comments and the refactor looks clean. The type handling is much more consistent now and the SLT coverage is solid. I left a few small suggestions, mostly around the decimal 2-arg behavior and some minor readability and edge-case clarifications.

Comment thread datafusion/spark/src/function/math/ceil.rs Outdated
Comment thread datafusion/spark/src/function/math/ceil.rs
Comment thread datafusion/spark/src/function/math/ceil.rs
Comment thread datafusion/spark/src/function/math/ceil.rs Outdated
Comment thread datafusion/spark/src/function/math/ceil.rs Outdated
// Decimal128 with positive scale (1-arg only).
ScalarValue::Decimal128(_, _, _) if has_scale => {
return not_impl_err!(
"2-argument ceil is not yet supported for decimal inputs"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we aren't supporting this yet we should keep the original issue open instead of marking this PR as closing it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on this. Maybe this night I add it.

Comment thread datafusion/spark/src/function/math/scale.rs
Comment thread datafusion/spark/src/function/math/ceil.rs
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-spark v53.1.0 (current)
       Built [  57.513s] (current)
     Parsing datafusion-spark v53.1.0 (current)
      Parsed [   0.060s] (current)
    Building datafusion-spark v53.1.0 (baseline)
       Built [  56.585s] (baseline)
     Parsing datafusion-spark v53.1.0 (baseline)
      Parsed [   0.062s] (baseline)
    Checking datafusion-spark v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.362s] 222 checks: 218 pass, 4 fail, 0 warn, 30 skip

--- failure function_missing: pub fn removed or renamed ---

Description:
A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/function_missing.ron

Failed in:
  function datafusion_spark::function::math::pow, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/6cd247a557d226d9f3b72e3f2b0f8df2be95b610/datafusion/spark/src/function/math/mod.rs:46
  function datafusion_spark::function::math::expr_fn::pow, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/6cd247a557d226d9f3b72e3f2b0f8df2be95b610/datafusion/spark/src/function/math/mod.rs:71
  function datafusion_spark::expr_fn::pow, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/6cd247a557d226d9f3b72e3f2b0f8df2be95b610/datafusion/spark/src/function/math/mod.rs:71

--- failure function_parameter_count_changed: pub fn parameter count changed ---

Description:
A publicly-visible function now takes a different number of parameters.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/function_parameter_count_changed.ron

Failed in:
  datafusion_spark::function::math::expr_fn::ceil now takes 2 parameters instead of 1, in /home/runner/work/datafusion/datafusion/datafusion/spark/src/function/math/mod.rs:59
  datafusion_spark::expr_fn::ceil now takes 2 parameters instead of 1, in /home/runner/work/datafusion/datafusion/datafusion/spark/src/function/math/mod.rs:59

--- failure module_missing: pub module removed or renamed ---

Description:
A publicly-visible module cannot be imported by its prior path. A `pub use` may have been removed, or the module may have been renamed, removed, or made non-public.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/module_missing.ron

Failed in:
  mod datafusion_spark::function::math::pow, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/6cd247a557d226d9f3b72e3f2b0f8df2be95b610/datafusion/spark/src/function/math/pow.rs:18

--- failure struct_missing: pub struct removed or renamed ---

Description:
A publicly-visible struct cannot be imported by its prior path. A `pub use` may have been removed, or the struct itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/struct_missing.ron

Failed in:
  struct datafusion_spark::function::math::pow::SparkPow, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/6cd247a557d226d9f3b72e3f2b0f8df2be95b610/datafusion/spark/src/function/math/pow.rs:42

     Summary semver requires new major version: 4 major and 0 minor checks failed
    Finished [ 116.160s] datafusion-spark
    Building datafusion-sqllogictest v53.1.0 (current)
       Built [ 169.240s] (current)
     Parsing datafusion-sqllogictest v53.1.0 (current)
      Parsed [   0.023s] (current)
    Building datafusion-sqllogictest v53.1.0 (baseline)
       Built [ 168.895s] (baseline)
     Parsing datafusion-sqllogictest v53.1.0 (baseline)
      Parsed [   0.024s] (baseline)
    Checking datafusion-sqllogictest v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.091s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 341.391s] datafusion-sqllogictest

@github-actions github-actions Bot added the auto detected api change Auto detected API change label May 7, 2026
- Fix type contract: 2-arg ceil now preserves input type for any scale
  (incl. scale=0 on floats); decimal 2-arg surfaces NotImplemented
- Implement integer ceil at negative scale (ceiling toward +inf, distinct
  from HALF_UP round); positive scale on integer is a no-op
- Extract shared get_scale helper into math/scale.rs and reuse from round
- Simplify Signature using TypeSignature::Numeric + Coercible([Numeric,
  Int32]) instead of hand-enumerated type lists
- Move Rust unit tests into the SLT, including type-contract assertions
  via arrow_typeof for every 2-arg overload
@diegoQuinas diegoQuinas force-pushed the feat/ceil-two-args branch from 808a7b6 to bc19931 Compare May 24, 2026 02:00
@diegoQuinas diegoQuinas marked this pull request as draft May 24, 2026 03:36
@diegoQuinas diegoQuinas marked this pull request as ready for review May 27, 2026 10:56
@diegoQuinas
Copy link
Copy Markdown
Contributor Author

@Jefffrey , I will merge this one but not closing the issue since @kosiew already approved it. I will open another PR for the 2-argument for decimal inputs.

diegoQuinas and others added 5 commits May 27, 2026 10:17
The return_type branch for Decimal128 with a scale argument used
exec_err! while the two evaluate paths use not_impl_err!. The
ceil.slt test expects the "This feature is not implemented" message,
so the mismatch failed CI. Make all three paths consistent.
An unrelated submodule bump (13bbae387 -> eccb0e4a4) was accidentally
committed alongside the ceil work. Restore it to the value tracked by
main so the diff stays scoped to the ceil change.
@Jefffrey
Copy link
Copy Markdown
Contributor

Jefffrey commented Jun 2, 2026

I was checking some outputs against Spark 4.1.2, and it seems they actually operate only on decimal types for the 2-arg variant. See:

>>> spark.sql("select ceil(-25::int)").printSchema()
root
 |-- CEIL(CAST(-25 AS INT)): long (nullable = true)

>>> spark.sql("select ceil(-25::int, -1)").printSchema()
root
 |-- ceil(CAST(-25 AS INT), -1): decimal(11,0) (nullable = true)

And see reference code:

case class RoundCeil(child: Expression, scale: Expression)
  extends RoundBase(child, scale, BigDecimal.RoundingMode.CEILING, "ROUND_CEILING") {

  override def inputTypes: Seq[AbstractDataType] = Seq(DecimalType, IntegerType)

  override def nodeName: String = "ceil"

  override protected def withNewChildrenInternal(
      newLeft: Expression, newRight: Expression): RoundCeil =
    copy(child = newLeft, scale = newRight)
}

So I think this PR won't align with Spark since we need to operate only on decimals (and subsequently return decimal types)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants