From 39ff17929af2d59a528b1419cbe813b6c960dc8c Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Wed, 27 May 2026 18:27:34 -0600 Subject: [PATCH] chore(audit): audit predicate expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 Add per-version audit sub-bullets to all 19 supported predicate SQL function names (`!`, `<`, `<=`, `<=>`, `=`, `==`, `>`, `>=`, `and`, `between`, `ilike`, `in`, `isnan`, `isnotnull`, `isnull`, `like`, `not`, `or`, `rlike`) in `docs/source/contributor-guide/spark_expressions_support.md`. The Spark expression classes are byte-for-byte identical across the four versions; only the `NullIntolerant` -> `nullIntolerant` trait refactor lands in Spark 4.0, with no runtime change. `!` and `==` are registry aliases for `Not` and `EqualTo`. `between` is rewritten by the parser to `expr >= low AND expr <= high`. `ilike` is `RuntimeReplaceable` and rewrites to `Like(Lower(left), Lower(right))`. `like` and `rlike` cross-reference the existing string-expressions audit (#4461). No support-level consistency issues were found in the predicate serdes. `CometNot` already optimizes a few special cases (`Not(EqualTo)`, `Not(EqualNullSafe)`, `Not(In)`). No new tracking issues are filed. --- .../spark_expressions_support.md | 70 +++++++++++++++++++ 1 file changed, 70 insertions(+) diff --git a/docs/source/contributor-guide/spark_expressions_support.md b/docs/source/contributor-guide/spark_expressions_support.md index 8f82847bb6..0fcf36b19d 100644 --- a/docs/source/contributor-guide/spark_expressions_support.md +++ b/docs/source/contributor-guide/spark_expressions_support.md @@ -499,26 +499,96 @@ ### predicate_funcs - [x] `!` + - Spark 3.4.3 (audited 2026-05-27): registry alias of `Not`. Same support as `not`. + - Spark 3.5.8 (audited 2026-05-27): identical to 3.4.3. + - Spark 4.0.1 (audited 2026-05-27): identical to 3.4.3. + - Spark 4.1.1 (audited 2026-05-27): identical to 3.4.3. - [x] `<` + - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. + - Spark 3.5.8 (audited 2026-05-27): baseline. `LessThan(left, right) extends BinaryComparison`. Comet routes via `CometLessThan` to the proto's `lt` binary expression. + - Spark 4.0.1 (audited 2026-05-27): semantics unchanged. + - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. - [x] `<=` + - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. + - Spark 3.5.8 (audited 2026-05-27): baseline. `LessThanOrEqual(left, right) extends BinaryComparison`. Comet routes via `CometLessThanOrEqual` to the proto's `lt_eq` binary expression. + - Spark 4.0.1 (audited 2026-05-27): semantics unchanged. + - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. - [x] `<=>` + - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. + - Spark 3.5.8 (audited 2026-05-27): baseline. `EqualNullSafe(left, right) extends BinaryComparison`; treats two NULLs as equal and a NULL with a non-NULL as not equal. Comet routes via `CometEqualNullSafe` to the proto's `eq_null_safe` binary expression. + - Spark 4.0.1 (audited 2026-05-27): semantics unchanged. + - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. - [x] `=` + - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. + - Spark 3.5.8 (audited 2026-05-27): baseline. `EqualTo(left, right) extends BinaryComparison`. Comet routes via `CometEqualTo` to the proto's `eq` binary expression. + - Spark 4.0.1 (audited 2026-05-27): semantics unchanged. + - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. - [x] `==` + - Spark 3.4.3 (audited 2026-05-27): registry alias of `EqualTo`. Same support as `=`. + - Spark 3.5.8 (audited 2026-05-27): identical to 3.4.3. + - Spark 4.0.1 (audited 2026-05-27): identical to 3.4.3. + - Spark 4.1.1 (audited 2026-05-27): identical to 3.4.3. - [x] `>` + - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. + - Spark 3.5.8 (audited 2026-05-27): baseline. `GreaterThan(left, right) extends BinaryComparison`. Comet routes via `CometGreaterThan` to the proto's `gt` binary expression. + - Spark 4.0.1 (audited 2026-05-27): semantics unchanged. + - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. - [x] `>=` + - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. + - Spark 3.5.8 (audited 2026-05-27): baseline. `GreaterThanOrEqual(left, right) extends BinaryComparison`. Comet routes via `CometGreaterThanOrEqual` to the proto's `gt_eq` binary expression. + - Spark 4.0.1 (audited 2026-05-27): semantics unchanged. + - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. - [x] and + - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. + - Spark 3.5.8 (audited 2026-05-27): baseline. `And(left, right) extends BinaryOperator with Predicate`; short-circuit left-to-right. Comet routes via `CometAnd` to the proto's `and` binary expression. + - Spark 4.0.1 (audited 2026-05-27): semantics unchanged. + - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. - [x] between + - Spark 3.4.3 (audited 2026-05-27): the SQL form `expr BETWEEN low AND high` is rewritten at the parser level to `expr >= low AND expr <= high`. Comet sees only the resulting `And(GreaterThanOrEqual, LessThanOrEqual)` and routes via `CometAnd` + `CometGreaterThanOrEqual` + `CometLessThanOrEqual`. + - Spark 3.5.8 (audited 2026-05-27): identical to 3.4.3. + - Spark 4.0.1 (audited 2026-05-27): identical to 3.4.3. + - Spark 4.1.1 (audited 2026-05-27): identical to 3.4.3. - [x] ilike + - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. + - Spark 3.5.8 (audited 2026-05-27): baseline. `ILike(left, right, escapeChar) extends RuntimeReplaceable`; the analyzer rewrites to `Like(Lower(left), Lower(right), escapeChar)`. Comet handles via `CometLike` and `CometLower` (case-conversion path, gated by `spark.comet.caseConversion.enabled=false` by default). + - Spark 4.0.1 (audited 2026-05-27): identical to 3.5.8. + - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. - [x] in + - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. + - Spark 3.5.8 (audited 2026-05-27): baseline. `In(value, list: Seq[Expression]) extends Predicate`; NULL-sensitive set membership (NULL in any element of the list yields NULL when no match found). Comet routes via `CometIn` to the proto's `In` expression with `negated = false`; `InSet` is rewritten to `In` so the native side can perform its own set-lookup optimization. + - Spark 4.0.1 (audited 2026-05-27): semantics unchanged. + - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. - [x] isnan + - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. + - Spark 3.5.8 (audited 2026-05-27): baseline. `IsNaN(child) extends UnaryExpression`; returns `BooleanType`; `false` for NULL or non-float types. Comet routes via `CometIsNaN` to the native `isnan` scalar. + - Spark 4.0.1 (audited 2026-05-27): semantics unchanged. + - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. - [x] isnotnull + - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. + - Spark 3.5.8 (audited 2026-05-27): baseline. `IsNotNull(child) extends UnaryExpression with Predicate`. Comet routes via `CometIsNotNull` to the proto's `is_not_null` unary expression. + - Spark 4.0.1 (audited 2026-05-27): semantics unchanged. + - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. - [x] isnull + - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. + - Spark 3.5.8 (audited 2026-05-27): baseline. `IsNull(child) extends UnaryExpression with Predicate`. Comet routes via `CometIsNull` to the proto's `is_null` unary expression. + - Spark 4.0.1 (audited 2026-05-27): semantics unchanged. + - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. - [x] like + - See `string_funcs / like` (audited in PR #4461). `CometLike` only supports the default `\\` escape character. - [x] not + - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. + - Spark 3.5.8 (audited 2026-05-27): baseline. `Not(child) extends UnaryExpression with Predicate`. Comet routes via `CometNot`, which optimizes a few special cases: `Not(EqualTo)` -> proto `neq`, `Not(EqualNullSafe)` -> proto `neq_null_safe`, `Not(In)` -> proto `In(negated = true)`. + - Spark 4.0.1 (audited 2026-05-27): semantics unchanged. + - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. - [x] or + - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8. + - Spark 3.5.8 (audited 2026-05-27): baseline. `Or(left, right) extends BinaryOperator with Predicate`; short-circuit left-to-right. Comet routes via `CometOr` to the proto's `or` binary expression. + - Spark 4.0.1 (audited 2026-05-27): semantics unchanged. + - Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1. - [ ] regexp - [ ] regexp_like - [x] rlike + - See `string_funcs / regexp_replace` and the `CometRLike` notes (audited in PR #4461). Uses the Rust `regex` crate, which differs from Java's `Pattern` engine; requires `spark.comet.expression.regexp.allowIncompatible=true`. ### string_funcs