From e2b9356080cf080f743a51a972a82fa90f3f0a5c Mon Sep 17 00:00:00 2001
From: Andy Grove <agrove@apache.org>
Date: Wed, 27 May 2026 07:44:20 -0600
Subject: [PATCH 1/4] docs: clarify support-level and reason consistency in
 audit-comet-expression skill

Expand the audit skill so it produces consistent, accurate findings about
`getSupportLevel`, `getIncompatibleReasons`, `getUnsupportedReasons`, and
`convert`.

Step 3 gains a dedicated subsection covering:

- A decision rule for `Unsupported` vs `Incompatible` (wrong answer vs
  fallback/error).
- The runtime-vs-docs split: support-level `notes` flow into EXPLAIN via
  the dispatcher, while `get*Reasons()` is read only by `GenerateDocs`.
- Six consistency rules covering every branch, dead-reason detection,
  shared `private val` for reason strings, `Incompatible(Some(reason))`
  preference, and gating decisions in `getSupportLevel` rather than
  `convert`.
- Wording guidelines for reason strings (user-observable effect first,
  sentence case, backtick configs/types, tracking issue links).
- Real antipattern callouts naming specific serdes (`CometHour`,
  `CometMinute`, `CometSecond`, `CometInitCap`).

Step 5 gains a structured 9-item support-level consistency audit
checklist that produces findings against the four methods.

Step 7 splits into two asks: one for missing tests, one for consistency
fixes, so the user can opt in to either independently.
---
 .../skills/audit-comet-expression/SKILL.md    | 206 ++++++++++++++++--
 1 file changed, 188 insertions(+), 18 deletions(-)

diff --git a/.claude/skills/audit-comet-expression/SKILL.md b/.claude/skills/audit-comet-expression/SKILL.md
index e2704bb47d..bfabfeaa4e 100644
--- a/.claude/skills/audit-comet-expression/SKILL.md
+++ b/.claude/skills/audit-comet-expression/SKILL.md
@@ -119,19 +119,125 @@ grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/ --include="*.scala"
 Read the serde implementation and check:
 
 - Which Spark versions the serde handles
-- Whether `getSupportLevel` is implemented and accurate
-- Whether all input types are handled
-- Whether any types are explicitly marked `Unsupported`
-- Whether `getIncompatibleReasons()` and `getUnsupportedReasons()` are overridden.
-  `getSupportLevel` controls runtime fallback, but `GenerateDocs` reads these two
-  methods to build the Compatibility Guide. If `getSupportLevel` returns
-  `Incompatible(Some(...))` or `Unsupported(Some(...))` but the corresponding
-  `get*Reasons()` method is not overridden, the reason will not appear in the
-  published docs. Expect both to return a `Seq[String]` containing the same
-  reason text used in `getSupportLevel`. Follow the pattern in
-  `spark/src/main/scala/org/apache/comet/serde/structs.scala::CometStructsToJson`
-  or `spark/src/main/scala/org/apache/comet/serde/datetime.scala::CometHour`:
-  extract the reason as a `private val` and reference it from both places.
+- Whether all input types Spark accepts are handled
+- Whether `convert` validates expression shape (e.g. literal-only arguments) before serializing
+
+Then audit the support-level and reason methods as described below.
+
+#### Audit `getSupportLevel`, `getIncompatibleReasons`, `getUnsupportedReasons`, and `convert`
+
+These four methods must stay aligned. Each has a distinct purpose, and the
+most common bugs in Comet serdes are misalignments between them.
+
+**Pick the right support level.**
+
+- `Unsupported(Some(reason))`: Comet cannot run this case at all. The
+  dispatcher in `QueryPlanSerde.exprToProtoInternal` falls back to Spark
+  unconditionally. Use this when an input type, option, or expression shape
+  is not implemented, or when running the native path would crash or error.
+- `Incompatible(Some(reason))`: Comet can run this, but results may differ
+  from Spark. The dispatcher only allows it when
+  `spark.comet.expr.allowIncompatible=true` (or the per-expression
+  equivalent). Use this for known result differences such as locale
+  sensitivity, timezone handling, ordering ambiguity, or floating-point
+  precision.
+- `Compatible(None)`: full Spark compatibility for this combination of
+  inputs and options.
+- `Compatible(Some(note))`: fully compatible but with a docs-only caveat.
+  Note that any `Some(...)` on `Compatible` triggers a runtime
+  `logWarning`, so reserve it for genuinely useful caveats.
+
+Decision rule: if the user would be surprised by a *wrong answer*, it is
+`Incompatible`. If the user would be surprised by a *runtime error or
+unsupported-type message*, it is `Unsupported`.
+
+**Runtime vs docs split.**
+
+- The `notes` field on `Incompatible(Some(...))` and `Unsupported(Some(...))`
+  flows into EXPLAIN output via the dispatcher (see
+  `QueryPlanSerde.exprToProtoInternal`, around the `case Incompatible` /
+  `case Unsupported` branches). This is what users see when they ask why
+  Comet fell back.
+- `getIncompatibleReasons()` and `getUnsupportedReasons()` are read only by
+  `GenerateDocs` when building
+  `docs/source/user-guide/compatibility.md`. They are static (no `expr`
+  argument) and should enumerate every distinct reason the expression
+  could ever return at runtime.
+- `convert` may also call `withInfo(expr, "reason")` and return `None` for
+  cases that cannot be detected from the expression alone (for example,
+  non-literal arguments, or child conversion failures). Those reasons
+  belong in `getUnsupportedReasons()` too.
+
+**Consistency rules.**
+
+1. If `getSupportLevel` can return `Incompatible(...)` for any input,
+   override `getIncompatibleReasons()` and include a reason for every
+   incompatible branch.
+2. If `getSupportLevel` can return `Unsupported(...)` for any input,
+   override `getUnsupportedReasons()` and include a reason for every
+   unsupported branch.
+3. If `convert` has its own `withInfo(...); None` fallbacks (e.g. literal
+   checks), enumerate those reasons in `getUnsupportedReasons()` too.
+4. Extract each reason into a `private val` and reference it from both
+   `getSupportLevel` and `get*Reasons()`. Do not duplicate the string
+   inline. Canonical example:
+   `spark/src/main/scala/org/apache/comet/serde/arrays.scala::CometArrayIntersect`.
+   It declares `private val incompatReason` and
+   `private val unsupportedCollationReason`, and each is referenced from
+   `getSupportLevel` and the matching reasons method.
+5. Prefer `Incompatible(Some(reason))` over `Incompatible(None)`. The
+   `None` form drops the reason from EXPLAIN output, leaving users with a
+   generic "not fully compatible" message and forcing them to read the
+   docs to find out why.
+6. Gate compatibility decisions in `getSupportLevel`, not inside `convert`.
+   Putting the check inside `convert` (e.g. reading a config flag and
+   calling `withInfo`) bypasses the dispatcher's `allowIncompatible`
+   handling and the EXPLAIN message becomes inconsistent with what the
+   doc generator produces.
+
+**Wording guidelines for reason strings.**
+
+Reasons appear verbatim in the Compatibility Guide (rendered as Markdown)
+and in EXPLAIN output, so they are user-facing.
+
+- Lead with the user-observable effect, then the cause if helpful.
+  ✅ "Result array element order may differ from Spark when the right array
+  is longer than the left."
+  ❌ "DataFusion probes the longer side."
+- Use sentence case and end with a period.
+- Use backticks around config keys, type names, and SQL identifiers.
+- Link to a tracking GitHub issue for known incompatibilities so users can
+  follow progress: `(https://github.com/apache/datafusion-comet/issues/NNNN)`.
+- Keep it concise. Single sentence is best.
+- Do not write "Incompatible reason: ..." or "Unsupported because ...".
+  The doc generator adds the framing.
+- Phrase Incompatible reasons as *what differs from Spark*, not *what is
+  missing*. Phrase Unsupported reasons as *what does not run*. If you find
+  yourself writing an "Incompatible" reason that says "Comet only supports
+  X" or "Y is not supported", the support level is probably wrong: it
+  should be `Unsupported`.
+
+**Common antipatterns to flag during the audit.**
+
+- A `private val reason` constant declared near the top of the object, but
+  `getSupportLevel` hardcodes a different string inline. The doc and the
+  EXPLAIN message will drift.
+  Real example: `CometHour` declares `incompatReason` then hardcodes a
+  near-duplicate string in `getSupportLevel`.
+- Reason text duplicated in two places without a shared constant.
+  Real examples: `CometMinute`, `CometSecond`.
+- `Incompatible(None)` paired with a populated `getIncompatibleReasons()`.
+  The reason reaches the docs but not EXPLAIN output.
+  Real example: `CometInitCap`.
+- `getIncompatibleReasons()` overridden but `getSupportLevel` never returns
+  `Incompatible(...)`. Either the reasons method is dead code, or
+  `getSupportLevel` is missing a branch.
+- A reason string phrased as "X is not supported" attached to an
+  `Incompatible` branch (or vice versa). Re-read it and decide which
+  support level it really belongs to.
+- `convert` bails out via `withInfo` for a case that is fully knowable
+  from the expression (e.g. an unsupported child data type). Move the
+  check into `getSupportLevel` so the dispatcher handles it uniformly.
 
 ### Shims
 
@@ -237,7 +343,52 @@ Also review the Comet implementation (Step 3) against the Spark behavior (Step 1
 - Are there behavioral differences that are NOT marked `Incompatible` but should be?
 - Are there behavioral differences between Spark versions that the Comet implementation does not account for (missing shim)?
 - Does the Rust implementation match the Spark behavior for all edge cases?
-- If `getSupportLevel` returns `Incompatible` or `Unsupported` with a reason, are `getIncompatibleReasons()` / `getUnsupportedReasons()` also overridden so the reason is picked up by `GenerateDocs` and appears in the Compatibility Guide?
+
+### Support-level consistency audit
+
+Walk through this checklist against the serde. Each failed item is a
+finding for Step 6.
+
+1. **Support level matches behavior.** For each branch of
+   `getSupportLevel`, decide whether the user-observable effect is a wrong
+   answer (`Incompatible`) or a fallback / error (`Unsupported`). Flag any
+   branch where the label does not match the behavior.
+2. **Reasons cover every branch.** Every distinct reason that
+   `getSupportLevel` can return as `Incompatible(Some(r))` must appear in
+   `getIncompatibleReasons()`. Same for `Unsupported(Some(r))` and
+   `getUnsupportedReasons()`. Missing reasons silently drop from the
+   Compatibility Guide.
+3. **Reasons are not dead code.** If `getIncompatibleReasons()` is
+   overridden but `getSupportLevel` never returns `Incompatible(...)`,
+   either the reason is stale or `getSupportLevel` is missing a branch.
+   Same for `getUnsupportedReasons()`.
+4. **Reason strings are shared via `private val`.** If the same reason
+   appears as a string literal in two places, flag it: changes to one
+   will not propagate to the other.
+5. **Inline reason matches the constant.** If a `private val reason` is
+   declared but `getSupportLevel` uses a different string literal, flag
+   it as a drift bug.
+6. **`Incompatible(None)` has no docs-only reason.** If
+   `getSupportLevel` returns `Incompatible(None)` but
+   `getIncompatibleReasons()` is non-empty, flag it: the EXPLAIN message
+   will be generic while the docs show a specific reason. Switch to
+   `Incompatible(Some(reason))`.
+7. **`convert` fallbacks are documented.** If `convert` calls
+   `withInfo(expr, "...")` and returns `None` for cases not covered by
+   `getSupportLevel` (e.g. non-literal arguments, unsupported expression
+   shapes), confirm the reason is also listed in
+   `getUnsupportedReasons()`.
+8. **Compatibility decisions live in `getSupportLevel`.** If `convert`
+   reads a config flag and bails out, prefer moving the check into
+   `getSupportLevel` so the dispatcher handles `allowIncompatible`
+   uniformly. `CometCaseConversionBase` is an example of the in-`convert`
+   pattern that this skill recommends against.
+9. **Reason wording.** Each reason should describe the user-observable
+   effect, use sentence case with a period, backtick config keys and
+   types, and link a tracking issue when one exists. Flag reasons that
+   read like internal implementation notes ("DataFusion probes the longer
+   side") or that mismatch their support level (an "Incompatible" reason
+   that says "X is not supported").
 
 ---
 
@@ -259,9 +410,13 @@ Minor gaps, cosmetic improvements, or nice-to-have tests.
 
 ---
 
-## Step 7: Offer to Implement Missing Tests
+## Step 7: Offer to Implement Missing Tests and Fix Consistency Findings
+
+After presenting the gap analysis, ask the user separately about each
+category of finding. Keep the two asks distinct so the user can opt in to
+one without the other.
 
-After presenting the gap analysis, ask the user:
+**Test coverage:**
 
 > I found the following missing test cases. Would you like me to implement them?
 >
@@ -270,8 +425,23 @@ After presenting the gap analysis, ask the user:
 > I can add them as Comet SQL Tests in `spark/src/test/resources/sql-tests/expressions/<category>/$ARGUMENTS.sql`
 > (or as Comet Scala Tests in `CometExpressionSuite` for cases that require programmatic setup).
 
-If the user says yes, implement the missing tests following the Comet SQL Tests format described in
-`docs/source/contributor-guide/sql-file-tests.md`. Prefer Comet SQL Tests over Comet Scala Tests.
+**Support-level consistency:**
+
+> I also found the following alignment issues between `getSupportLevel`,
+> `getIncompatibleReasons`, `getUnsupportedReasons`, and `convert`. Would
+> you like me to fix them?
+>
+> - [list each finding from the Step 5 consistency audit, with the file
+>   and serde object name]
+>
+> Each fix is a small, mechanical edit (extract a `private val` for the
+> reason, switch `Incompatible(None)` to `Incompatible(Some(reason))`,
+> add a missing `get*Reasons()` override, or move a check from `convert`
+> into `getSupportLevel`).
+
+If the user says yes to tests, implement them following the Comet SQL Tests
+format described in `docs/source/contributor-guide/sql-file-tests.md`.
+Prefer Comet SQL Tests over Comet Scala Tests.
 
 ### Comet SQL Tests template
 

From 402d314c642a1cffa4b96ef08904bc8a2e6cf4c4 Mon Sep 17 00:00:00 2001
From: Andy Grove <agrove@apache.org>
Date: Wed, 27 May 2026 07:47:14 -0600
Subject: [PATCH 2/4] style: apply prettier

---
 .claude/skills/audit-comet-expression/SKILL.md | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/.claude/skills/audit-comet-expression/SKILL.md b/.claude/skills/audit-comet-expression/SKILL.md
index bfabfeaa4e..7cac6c252d 100644
--- a/.claude/skills/audit-comet-expression/SKILL.md
+++ b/.claude/skills/audit-comet-expression/SKILL.md
@@ -147,9 +147,9 @@ most common bugs in Comet serdes are misalignments between them.
   Note that any `Some(...)` on `Compatible` triggers a runtime
   `logWarning`, so reserve it for genuinely useful caveats.
 
-Decision rule: if the user would be surprised by a *wrong answer*, it is
-`Incompatible`. If the user would be surprised by a *runtime error or
-unsupported-type message*, it is `Unsupported`.
+Decision rule: if the user would be surprised by a _wrong answer_, it is
+`Incompatible`. If the user would be surprised by a _runtime error or
+unsupported-type message_, it is `Unsupported`.
 
 **Runtime vs docs split.**
 
@@ -211,8 +211,8 @@ and in EXPLAIN output, so they are user-facing.
 - Keep it concise. Single sentence is best.
 - Do not write "Incompatible reason: ..." or "Unsupported because ...".
   The doc generator adds the framing.
-- Phrase Incompatible reasons as *what differs from Spark*, not *what is
-  missing*. Phrase Unsupported reasons as *what does not run*. If you find
+- Phrase Incompatible reasons as _what differs from Spark_, not _what is
+  missing_. Phrase Unsupported reasons as _what does not run_. If you find
   yourself writing an "Incompatible" reason that says "Comet only supports
   X" or "Y is not supported", the support level is probably wrong: it
   should be `Unsupported`.
@@ -431,8 +431,7 @@ one without the other.
 > `getIncompatibleReasons`, `getUnsupportedReasons`, and `convert`. Would
 > you like me to fix them?
 >
-> - [list each finding from the Step 5 consistency audit, with the file
->   and serde object name]
+> - [list each finding from the Step 5 consistency audit, with the file and serde object name]
 >
 > Each fix is a small, mechanical edit (extract a `private val` for the
 > reason, switch `Incompatible(None)` to `Incompatible(Some(reason))`,

From d911a436a644d422053c4f976e0d87af08e6db63 Mon Sep 17 00:00:00 2001
From: Andy Grove <agrove@apache.org>
Date: Wed, 27 May 2026 08:03:09 -0600
Subject: [PATCH 3/4] docs: audit date/time expressions across Spark 3.4.3,
 3.5.8, 4.0.1

Add per-version audit sub-bullets to every implemented date/time
expression in spark_expressions_support.md using the
audit-comet-expression skill. Covers 38 SQL function names across the
33 backing Comet serde objects (some serdes back multiple SQL names,
e.g. `day`/`dayofmonth`, `date_add`/`dateadd`, `date_diff`/`datediff`).

For each function, the sub-bullets record:
- Whether the Spark class is identical across 3.4.3, 3.5.8, 4.0.1
- Spark 4.0 changes (universally the `NullIntolerant` trait /
  `nullIntolerant: Boolean` refactor, plus `StringTypeWithCollation`
  widening on string inputs and some error-helper renames)
- Known divergences between Comet and Spark, with tracking-issue links

The audit was driven by 8 parallel agents, each handling a related
group of expressions (codegen-dispatched, date field extractors,
Hour/Minute/Second, scalar function wrappers, timezone/unix,
truncation, format, Iceberg transforms).

Out of scope: `current_timezone`, `date_part`, `datepart`, `extract`,
`localtimestamp` route through Spark optimizer rewrites or evaluate to
constants and do not have dedicated Comet serdes; `days` and `hours`
are V2 partition transforms with no SQL function name and so do not
appear in this section.

No code changes in this PR; consistency findings flagged during the
audit will be addressed in follow-up PRs.
---
 .../spark_expressions_support.md              | 128 ++++++++++++++++++
 1 file changed, 128 insertions(+)

diff --git a/docs/source/contributor-guide/spark_expressions_support.md b/docs/source/contributor-guide/spark_expressions_support.md
index ccb816c668..c6337c0e41 100644
--- a/docs/source/contributor-guide/spark_expressions_support.md
+++ b/docs/source/contributor-guide/spark_expressions_support.md
@@ -215,86 +215,214 @@
 ### datetime_funcs
 
 - [x] add_months
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `inputTypes = Seq(DateType, IntegerType)`; returns `DateType`; codegen delegates to `DateTimeUtils.dateAddMonths`.
+  - Spark 4.0.1 (audited 2026-05-27): `NullIntolerant` trait replaced by `override def nullIntolerant: Boolean = true` on `AddMonthsBase`; behaviour and codegen unchanged.
 - [x] convert_timezone
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. Ternary `(sourceTz, targetTz, sourceTs)`; `inputTypes = Seq(StringType, StringType, TimestampNTZType)`; delegates to `DateTimeUtils.convertTimestampNtzToAnotherTz`.
+  - Spark 4.0.1 (audited 2026-05-27): timezone `inputTypes` widened to `StringTypeWithCollation(supportsTrimCollation = true)`; behaviour unchanged for ASCII timezone strings.
+  - Known divergence: Comet composes `to_utc_timestamp` then `from_utc_timestamp` and its native timezone parser only accepts IANA zone IDs and `+HH:MM` offsets, so legacy forms like `GMT+1`, `UTC+1`, or three-letter abbreviations throw a native parse error at execution (https://github.com/apache/datafusion-comet/issues/2013).
 - [ ] curdate
 - [ ] current_date
 - [ ] current_time
 - [ ] current_timestamp
 - [x] current_timezone
 - [x] date_add
+  - Spark 3.4.3 (audited 2026-05-27): baseline. `(DateType, IntegerType|ShortType|ByteType) -> DateType`; `nullSafeEval` returns `startDays + d.intValue()` with Java int wrap-around; no ANSI branch.
+  - Spark 3.5.8 (audited 2026-05-27): identical to 3.4.3.
+  - Spark 4.0.1 (audited 2026-05-27): semantics unchanged; `NullIntolerant` trait replaced by `override def nullIntolerant: Boolean = true`.
 - [x] date_diff
+  - Spark 3.4.3 (audited 2026-05-27): baseline. `(DateType, DateType) -> IntegerType`; `nullSafeEval` is `endDays - startDays` with Java int wrap-around.
+  - Spark 3.5.8 (audited 2026-05-27): identical to 3.4.3.
+  - Spark 4.0.1 (audited 2026-05-27): semantics unchanged; `NullIntolerant` trait replaced by `override def nullIntolerant: Boolean = true`.
+  - Known divergence: the native impl uses non-wrapping `i32 -`, which would panic in debug builds on extreme inputs (Spark wraps); practically unreachable for date inputs.
 - [x] date_format
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `(TimestampType, StringType) -> StringType`; format string is parsed via `TimestampFormatter` (`DateTimeFormatter` under `CORRECTED` policy, `SimpleDateFormat` under `LEGACY` policy).
+  - Spark 4.0.1 (audited 2026-05-27): trait set updated to use `DefaultStringProducingExpression`; `nullIntolerant` becomes a field; format `inputTypes` widened to `StringTypeWithCollation(supportsTrimCollation = true)`. Behaviour unchanged for ASCII format strings.
+  - Known divergence: only a curated allow-list of `SimpleDateFormat` patterns runs natively (via DataFusion `to_char`). Non-UTC session timezones with a whitelisted format require `spark.comet.expr.dateFormat.allowIncompatible=true`. Non-literal formats, non-whitelisted formats, and the default disabled-codegen path route through Spark's `DateFormatClass.doGenCode` only when `spark.comet.exec.scalaUDF.codegen.enabled=true`; otherwise the operator falls back to Spark. `spark.sql.legacy.timeParserPolicy=LEGACY` is honoured only on the codegen-dispatch / Spark-fallback paths; the native allow-list assumes corrected semantics.
 - [x] date_from_unix_date
+  - Spark 3.4.3 (audited 2026-05-27): baseline. `IntegerType -> DateType`; `nullSafeEval` is the identity on days-since-epoch.
+  - Spark 3.5.8 (audited 2026-05-27): identical to 3.4.3.
+  - Spark 4.0.1 (audited 2026-05-27): semantics unchanged; `NullIntolerant` trait replaced by `override def nullIntolerant: Boolean = true`.
 - [x] date_part
 - [x] date_sub
+  - Spark 3.4.3 (audited 2026-05-27): baseline. Mirror of `DateAdd` (`startDays - d.intValue()`); same input types and wrap-around behaviour; no ANSI branch.
+  - Spark 3.5.8 (audited 2026-05-27): identical to 3.4.3.
+  - Spark 4.0.1 (audited 2026-05-27): semantics unchanged; `NullIntolerant` trait replaced by `override def nullIntolerant: Boolean = true`.
 - [x] date_trunc
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `inputTypes = Seq(StringType, TimestampType)`; format parsed by `parseTruncLevel` (case-insensitive) and supports `YEAR`/`YYYY`/`YY`, `QUARTER`, `MONTH`/`MM`/`MON`, `WEEK`, `DAY`/`DD`, `HOUR`, `MINUTE`, `SECOND`, `MILLISECOND`, `MICROSECOND`. Unknown levels return NULL. Truncation is `TimeZoneAware` and uses `zoneId` for day-and-coarser units.
+  - Spark 4.0.1 (audited 2026-05-27): format `inputTypes` widened to `StringTypeWithCollation(supportsTrimCollation = true)`; truncation semantics unchanged for ASCII format strings.
+  - Known divergence: Comet returns incorrect results in non-UTC session timezones for day-and-coarser units (https://github.com/apache/datafusion-comet/issues/2649); marked `Incompatible` when the resolved zone is not `UTC` / `Etc/UTC`. Non-literal and unsupported format strings raise a native execution error instead of returning NULL.
 - [x] dateadd
+  - Spark 3.4.3 (audited 2026-05-27): SQL alias for `date_add`; see that entry.
+  - Spark 3.5.8 (audited 2026-05-27): SQL alias for `date_add`; see that entry.
+  - Spark 4.0.1 (audited 2026-05-27): SQL alias for `date_add`; see that entry.
 - [x] datediff
+  - Spark 3.4.3 (audited 2026-05-27): SQL alias for `date_diff`; see that entry.
+  - Spark 3.5.8 (audited 2026-05-27): SQL alias for `date_diff`; see that entry.
+  - Spark 4.0.1 (audited 2026-05-27): SQL alias for `date_diff`; see that entry.
 - [x] datepart
 - [x] day
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. SQL alias for `DayOfMonth`; delegates to `DateTimeUtils.getDayOfMonth` via `LocalDate.getDayOfMonth` (1..31).
+  - Spark 4.0.1 (audited 2026-05-27): identical semantics; `GetDateField` drops the `NullIntolerant` mixin in favour of `nullIntolerant: Boolean = true`.
 - [ ] dayname
 - [x] dayofmonth
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `DayOfMonth extends GetDateField`; delegates to `DateTimeUtils.getDayOfMonth` (1..31).
+  - Spark 4.0.1 (audited 2026-05-27): identical semantics; `GetDateField` drops the `NullIntolerant` mixin in favour of `nullIntolerant: Boolean = true`.
 - [x] dayofweek
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `DayOfWeek extends GetDateField`; returns 1..7 with Sunday=1 via `LocalDate.getDayOfWeek.plus(1).getValue`.
+  - Spark 4.0.1 (audited 2026-05-27): identical semantics; `GetDateField` drops the `NullIntolerant` mixin in favour of `nullIntolerant: Boolean = true`.
 - [x] dayofyear
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `DayOfYear extends GetDateField`; returns 1..366 via `LocalDate.getDayOfYear`.
+  - Spark 4.0.1 (audited 2026-05-27): identical semantics; `GetDateField` drops the `NullIntolerant` mixin in favour of `nullIntolerant: Boolean = true`.
 - [x] extract
 - [x] from_unixtime
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `BinaryExpression` with `inputTypes = Seq(LongType, StringType)`; returns `StringType`; uses session `zoneId` to format the resulting timestamp.
+  - Spark 4.0.1 (audited 2026-05-27): now `DefaultStringProducingExpression`; format `inputTypes` widened to `StringTypeWithCollation(supportsTrimCollation = true)`; `nullIntolerant` set via override instead of the `NullIntolerant` trait.
+  - Known divergence: Comet only honours the default format pattern `yyyy-MM-dd HH:mm:ss`; any other format falls back to Spark. Implemented by composing DataFusion's `from_unixtime` and `to_char`, so DataFusion's valid timestamp range differs from Spark (https://github.com/apache/datafusion/issues/16594) and Spark datetime patterns are not honoured even when supplied (https://github.com/apache/datafusion/issues/16577).
 - [x] from_utc_timestamp
   - Spark 3.4.3 (audited 2026-05-12): identical to 3.5.8.
   - Spark 3.5.8 (audited 2026-05-12): baseline.
   - Spark 4.0.1 (audited 2026-05-12): `inputTypes` widened to `StringTypeWithCollation`; behaviour unchanged for ASCII timezone strings.
   - Known divergence: Comet's native timezone parser does not accept Spark's legacy zone forms (`GMT+1`, `UTC+1`, three-letter abbreviations like `PST`). Such timezones throw a native parse error at execution.
 - [x] hour
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `case class Hour` extends `GetTimeField`.
+  - Spark 4.0.1 (audited 2026-05-27): `case class Hour` is unchanged; parent `GetTimeField` trait refactored to override `nullIntolerant: Boolean = true` instead of mixing in `NullIntolerant` (no behavioural change).
+  - Known divergence: for `TimestampNTZType` inputs Comet's native path applies session-timezone conversion (Spark treats `TIMESTAMP_NTZ` as wall-clock and ignores session timezone), so the returned hour can differ. Marked `Incompatible` and gated by `spark.comet.expr.allowIncompatible` (https://github.com/apache/datafusion-comet/issues/3180).
 - [x] last_day
+  - Spark 3.4.3 (audited 2026-05-27): baseline. `DateType -> DateType`; computes `DateTimeUtils.getLastDayOfMonth`; no ANSI branch.
+  - Spark 3.5.8 (audited 2026-05-27): identical to 3.4.3.
+  - Spark 4.0.1 (audited 2026-05-27): semantics unchanged; `NullIntolerant` trait replaced by `override def nullIntolerant: Boolean = true`.
 - [x] localtimestamp
 - [x] make_date
+  - Spark 3.4.3 (audited 2026-05-27): baseline. `(IntegerType, IntegerType, IntegerType) -> DateType`; under `spark.sql.ansi.enabled=true` invalid `(year, month, day)` throws `ansiDateTimeError`, else returns NULL. Documented valid year range is 1 to 9999.
+  - Spark 3.5.8 (audited 2026-05-27): identical to 3.4.3.
+  - Spark 4.0.1 (audited 2026-05-27): error helper renamed to `ansiDateTimeArgumentOutOfRange`; behaviour otherwise unchanged.
+  - Known divergence: `SparkMakeDate` in `native/spark-expr/src/datetime_funcs/make_date.rs` always returns NULL on invalid input and never raises, so Comet diverges from Spark when `spark.sql.ansi.enabled=true`. It also accepts year 0 and negative years (chrono's proleptic calendar) which Spark rejects.
 - [ ] make_dt_interval
 - [ ] make_interval
 - [ ] make_time
 - [x] make_timestamp
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. Septenary `(year, month, day, hour, min, sec[, timezone])` with `sec: DecimalType(16,6)`; honours `spark.sql.ansi.enabled` (throws on invalid input, else NULL); timezone input is `StringType`; result type follows `spark.sql.timestampType`.
+  - Spark 4.0.1 (audited 2026-05-27): timezone input widened to `StringTypeWithCollation(supportsTrimCollation = true)`; ANSI error helpers renamed (`ansiDateTimeArgumentOutOfRange`, `invalidFractionOfSecondError(value)`); `NullIntolerant` trait replaced by `nullIntolerant: Boolean = true`; new sibling `TryMakeTimestamp` added but routed through a separate expression.
 - [ ] make_timestamp_ltz
 - [ ] make_timestamp_ntz
 - [ ] make_ym_interval
 - [x] minute
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `case class Minute` extends `GetTimeField`.
+  - Spark 4.0.1 (audited 2026-05-27): `case class Minute` is unchanged; parent `GetTimeField` trait refactored to override `nullIntolerant: Boolean = true` instead of mixing in `NullIntolerant` (no behavioural change).
+  - Known divergence: for `TimestampNTZType` inputs Comet's native path applies session-timezone conversion (Spark treats `TIMESTAMP_NTZ` as wall-clock and ignores session timezone), so the returned minute can differ. Marked `Incompatible` and gated by `spark.comet.expr.allowIncompatible` (https://github.com/apache/datafusion-comet/issues/3180).
 - [x] month
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `Month extends GetDateField`; delegates to `DateTimeUtils.getMonth` via `LocalDate.getMonthValue` (1..12).
+  - Spark 4.0.1 (audited 2026-05-27): identical semantics; `GetDateField` drops the `NullIntolerant` mixin in favour of `nullIntolerant: Boolean = true`.
 - [ ] monthname
 - [x] months_between
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. Ternary `(date1: Timestamp, date2: Timestamp, roundOff: Boolean)` returning `DoubleType`; `TimeZoneAwareExpression`; codegen delegates to `DateTimeUtils.monthsBetween`.
+  - Spark 4.0.1 (audited 2026-05-27): `NullIntolerant` trait dropped in favour of `nullIntolerant: Boolean = true` override; signature and runtime behaviour unchanged.
 - [x] next_day
+  - Spark 3.4.3 (audited 2026-05-27): baseline. `(DateType, StringType) -> DateType`; under `spark.sql.ansi.enabled=true` an unrecognised `dayOfWeek` throws `ansiIllegalArgumentError`, else returns NULL. Allowed tokens come from `DateTimeUtils.getDayOfWeekFromString` (`SU/SUN/SUNDAY`, `MO/MON/MONDAY`, ...), case-insensitive via `Locale.ROOT`, no trimming.
+  - Spark 3.5.8 (audited 2026-05-27): identical to 3.4.3.
+  - Spark 4.0.1 (audited 2026-05-27): error type changed to `SparkIllegalArgumentException`; `inputTypes` now uses `StringTypeWithCollation(supportsTrimCollation = true)`.
+  - Known divergence: `datafusion-spark::SparkNextDay` returns NULL for malformed `dayOfWeek` regardless of `spark.sql.ansi.enabled`, so ANSI mode does not throw. It also `trim()`s the day-of-week argument before matching, so `' MO '` succeeds natively while Spark would treat it as invalid.
 - [ ] now
 - [x] quarter
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `Quarter extends GetDateField`; returns 1..4 via `IsoFields.QUARTER_OF_YEAR`.
+  - Spark 4.0.1 (audited 2026-05-27): identical semantics; `GetDateField` drops the `NullIntolerant` mixin in favour of `nullIntolerant: Boolean = true`.
 - [x] second
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `case class Second` extends `GetTimeField`.
+  - Spark 4.0.1 (audited 2026-05-27): `case class Second` is unchanged; parent `GetTimeField` trait refactored to override `nullIntolerant: Boolean = true` instead of mixing in `NullIntolerant` (no behavioural change).
+  - Known divergence: for `TimestampNTZType` inputs Comet's native path applies session-timezone conversion (Spark treats `TIMESTAMP_NTZ` as wall-clock and ignores session timezone), so the returned second can differ. Marked `Incompatible` and gated by `spark.comet.expr.allowIncompatible` (https://github.com/apache/datafusion-comet/issues/3180).
 - [ ] session_window
 - [ ] time_diff
 - [ ] time_trunc
 - [x] timestamp_micros
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `MicrosToTimestamp` extends `IntegralToTimestampBase` with `upScaleFactor = 1`; accepts `IntegralType`, returns `TimestampType`; codegen is identity.
+  - Spark 4.0.1 (audited 2026-05-27): `IntegralToTimestampBase` drops the `NullIntolerant` trait in favour of `nullIntolerant: Boolean = true`; behaviour unchanged.
 - [x] timestamp_millis
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `MillisToTimestamp` extends `IntegralToTimestampBase` with `upScaleFactor = MICROS_PER_MILLIS (1000)`; multiply overflow throws via `Math.multiplyExact`.
+  - Spark 4.0.1 (audited 2026-05-27): same as 3.5.8 modulo the `NullIntolerant` trait/method refactor in `IntegralToTimestampBase`.
 - [x] timestamp_seconds
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `inputTypes = Seq(NumericType)` accepting integral, decimal, float, and double; integral values use `Math.multiplyExact` (overflow throws); float and double return NULL on NaN or Infinity.
+  - Spark 4.0.1 (audited 2026-05-27): `nullIntolerant` set via override instead of the `NullIntolerant` trait; otherwise identical to 3.5.8.
+  - Known divergence: Comet's Rust impl supports only Int32, Int64, Float32, and Float64. `DecimalType`, `ByteType`, and `ShortType` fall back to Spark. Int64 overflow returns a `ComputeError` matching Spark's `ArithmeticException`. NaN and Infinity map to NULL on the float and double paths.
 - [ ] to_date
 - [ ] to_time
 - [ ] to_timestamp
 - [ ] to_timestamp_ltz
 - [ ] to_timestamp_ntz
 - [x] to_unix_timestamp
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `inputTypes = Seq(TypeCollection(StringType, DateType, TimestampType, TimestampNTZType), StringType)`; returns `LongType` seconds; honours `spark.sql.ansi.enabled` (throws on parse error, else NULL); `TimeZoneAwareExpression`.
+  - Spark 4.0.1 (audited 2026-05-27): both the value and the format argument become `StringTypeWithCollation(supportsTrimCollation = true)`; a new `suggestedFuncOnFail = "try_to_timestamp"` field is added on `ToTimestamp` (advisory).
+  - Known divergence: routed through the JVM codegen dispatcher rather than a native kernel, so behaviour is bit-identical to Spark only when `spark.comet.exec.scalaUDF.codegen.enabled=true`; when the flag is off the operator falls back to Spark.
 - [x] to_utc_timestamp
   - Spark 3.4.3 (audited 2026-05-12): identical to 3.5.8.
   - Spark 3.5.8 (audited 2026-05-12): baseline.
   - Spark 4.0.1 (audited 2026-05-12): `inputTypes` widened to `StringTypeWithCollation`; behaviour unchanged for ASCII timezone strings.
   - Known divergence: Comet's native timezone parser does not accept Spark's legacy zone forms (`GMT+1`, `UTC+1`, three-letter abbreviations like `PST`). Such timezones throw a native parse error at execution.
 - [x] trunc
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `inputTypes = Seq(DateType, StringType)`; `parseTruncLevel` is case-insensitive and accepts `YEAR`/`YYYY`/`YY`, `QUARTER`, `MONTH`/`MM`/`MON`, `WEEK`. Unknown or sub-week levels (`DAY`, `HOUR`, ...) return NULL because `MIN_LEVEL_OF_DATE_TRUNC` is `TRUNC_TO_WEEK`.
+  - Spark 4.0.1 (audited 2026-05-27): format `inputTypes` widened to `StringTypeWithCollation(supportsTrimCollation = true)`; truncation semantics unchanged for ASCII format strings.
+  - Known divergence: Comet's native kernel raises an execution error for unknown format strings instead of returning NULL, so non-literal formats are flagged `Incompatible`. Sub-week formats such as `DAY`/`DD` are rejected with `Unsupported` (Spark would return NULL) and fall back to Spark.
 - [ ] try_make_interval
 - [ ] try_make_timestamp
 - [ ] try_to_date
 - [ ] try_to_time
 - [ ] try_to_timestamp
 - [x] unix_date
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `UnixDate(child)`: `inputTypes = Seq(DateType)`, `dataType = IntegerType`; `nullSafeEval` returns the underlying days-since-epoch int unchanged.
+  - Spark 4.0.1 (audited 2026-05-27): semantics unchanged; `NullIntolerant` trait is replaced by `nullIntolerant: Boolean = true`.
 - [x] unix_micros
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `UnixMicros` extends `TimestampToLongBase` with `scaleFactor = 1`; codegen reduces to identity on the underlying micros.
+  - Spark 4.0.1 (audited 2026-05-27): same as 3.5.8 modulo the `NullIntolerant` trait/method refactor.
 - [x] unix_millis
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `UnixMillis` extends `TimestampToLongBase` with `scaleFactor = MICROS_PER_MILLIS`; floor-divides timestamp micros by 1000.
+  - Spark 4.0.1 (audited 2026-05-27): same as 3.5.8 modulo the `NullIntolerant` trait/method refactor.
 - [x] unix_seconds
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `UnixSeconds` extends `TimestampToLongBase` with `scaleFactor = MICROS_PER_SECOND`; accepts `TimestampType` only; returns `LongType`; floor-divides micros by the scale factor.
+  - Spark 4.0.1 (audited 2026-05-27): `TimestampToLongBase` swaps the `NullIntolerant` trait for `nullIntolerant: Boolean = true`; numerics unchanged.
 - [x] unix_timestamp
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. Inherits from `UnixTime` / `ToTimestamp`; `inputTypes = Seq(TypeCollection(StringType, DateType, TimestampType, TimestampNTZType), StringType)`; result is `LongType`; honours `failOnError` for ANSI parse errors on the string path.
+  - Spark 4.0.1 (audited 2026-05-27): `ToTimestamp.inputTypes` widens the string slot to `StringTypeWithCollation(supportsTrimCollation = true)`; behaviour unchanged for non-collated strings.
+  - Known divergence: Comet's native path only accepts `TimestampType`, `DateType`, and `TimestampNTZType` (string inputs fall back to Spark). For `TimestampType` and `DateType` the session timezone is applied via `array_with_timezone`; for `TimestampNTZType` the microsecond value is divided directly without timezone adjustment.
 - [x] weekday
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `WeekDay extends GetDateField`; returns 0..6 with Monday=0 via `LocalDate.getDayOfWeek.ordinal()`.
+  - Spark 4.0.1 (audited 2026-05-27): identical semantics; `GetDateField` drops the `NullIntolerant` mixin in favour of `nullIntolerant: Boolean = true`.
 - [x] weekofyear
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `WeekOfYear extends GetDateField`; returns the ISO-8601 week-of-week-based-year via `IsoFields.WEEK_OF_WEEK_BASED_YEAR` (Monday start, week 1 has more than 3 days). Comet maps this to DataFusion's `datepart('week', ...)` which uses Arrow's `iso_week().week()`, matching Spark.
+  - Spark 4.0.1 (audited 2026-05-27): identical semantics; `GetDateField` drops the `NullIntolerant` mixin in favour of `nullIntolerant: Boolean = true`.
 - [ ] window
 - [ ] window_time
 - [x] year
+  - Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
+  - Spark 3.5.8 (audited 2026-05-27): baseline. `Year extends GetDateField`; delegates to `DateTimeUtils.getYear` via `LocalDate.getYear`.
+  - Spark 4.0.1 (audited 2026-05-27): identical semantics; `GetDateField` drops the `NullIntolerant` mixin in favour of `nullIntolerant: Boolean = true`.
 
 ### generator_funcs
 

From 481e97e084899ba8704eebcef2131359ac9a3823 Mon Sep 17 00:00:00 2001
From: Andy Grove <agrove@apache.org>
Date: Wed, 27 May 2026 08:14:28 -0600
Subject: [PATCH 4/4] test: capture datetime audit findings as ignored SQL
 tests and update audit skill

Skill update (audit-comet-expression):

- Step 6 reorganises priorities: correctness divergences, missing
  coverage, and consistency issues are now three distinct buckets.
- Step 7 now requires correctness findings to land as captured tests
  in the same PR as the audit. Walks through the search-or-file issue
  workflow and the trivial-fix-vs-ignore decision, with a `query
  ignore(<url>)` example.

Captured tests (linked to follow-up issues filed against this PR):

- next_day.sql gains a divergence test for whitespace trimming
  (Comet trims `' MO '`; Spark does not). ignore(#4450).
- next_day_ansi.sql is new and asserts that `next_day` throws under
  `spark.sql.ansi.enabled=true` for malformed dayOfWeek. Comet
  currently returns NULL. ignore(#4449).
- make_date_ansi.sql is new and asserts that `make_date` throws under
  `spark.sql.ansi.enabled=true` for invalid `(year, month, day)`.
  Comet currently returns NULL. ignore(#4451).

The fourth audit finding (`make_date` year 0 / negative years) was
verified against Spark's own implementation and turned out to be a
non-divergence; the issue was closed and no test added.

None of the three remaining bugs are trivial to fix here: both
`SparkNextDay` and `SparkMakeDate` live upstream in the
`datafusion-spark` crate, so the fixes need to flow through that
project. The captured tests will switch from `ignore(...)` to their
intended assertion mode when the upstream changes land.
---
 .../skills/audit-comet-expression/SKILL.md    | 77 +++++++++++++++----
 .../expressions/datetime/make_date_ansi.sql   | 36 +++++++++
 .../expressions/datetime/next_day.sql         |  5 ++
 .../expressions/datetime/next_day_ansi.sql    | 30 ++++++++
 4 files changed, 133 insertions(+), 15 deletions(-)
 create mode 100644 spark/src/test/resources/sql-tests/expressions/datetime/make_date_ansi.sql
 create mode 100644 spark/src/test/resources/sql-tests/expressions/datetime/next_day_ansi.sql

diff --git a/.claude/skills/audit-comet-expression/SKILL.md b/.claude/skills/audit-comet-expression/SKILL.md
index 7cac6c252d..a1b48f00cf 100644
--- a/.claude/skills/audit-comet-expression/SKILL.md
+++ b/.claude/skills/audit-comet-expression/SKILL.md
@@ -396,29 +396,75 @@ finding for Step 6.
 
 Summarize findings as a prioritized list.
 
-### High priority
+### High priority: correctness divergences
 
-Issues where Comet may silently produce wrong results compared to Spark.
+Cases where Comet produces a different observable result from Spark
+(wrong value, missing exception, accepted-instead-of-rejected input,
+etc.). Each one becomes a captured test in Step 7.
 
-### Medium priority
+### Medium priority: missing test coverage
 
-Missing test coverage for edge cases that could expose bugs.
+Edge cases Spark exercises but Comet does not test, where the behaviour
+appears to match but is not verified.
 
-### Low priority
+### Low priority: cosmetic and consistency issues
 
-Minor gaps, cosmetic improvements, or nice-to-have tests.
+Reason-string drift, missing `get*Reasons()` overrides, dead branches,
+etc. These come from the Step 5 consistency audit.
 
 ---
 
-## Step 7: Offer to Implement Missing Tests and Fix Consistency Findings
+## Step 7: Capture Correctness Findings as Tests; Offer Consistency Fixes
 
-After presenting the gap analysis, ask the user separately about each
-category of finding. Keep the two asks distinct so the user can opt in to
-one without the other.
+Correctness divergences (Step 6 high-priority items) must not be left as
+prose. Every divergence the audit identified becomes a regression test
+in the same PR as the audit, so future readers can run it.
 
-**Test coverage:**
+For each correctness finding, do the following in this order:
 
-> I found the following missing test cases. Would you like me to implement them?
+1. **Search for an existing tracking issue.**
+
+   ```bash
+   gh issue list --search "<expression> <symptom> in:title,body" --state all --limit 5
+   ```
+
+   Match on both the expression name and a distinguishing keyword
+   (ANSI, timezone, NTZ, overflow, etc.).
+
+2. **If no issue exists, file one.** Use the `correctness` label plus
+   the relevant area label (e.g. `temporal expressions`). Keep the
+   title in the form "[Bug] <expression> <one-line symptom>". Include
+   the Spark version, a minimal repro, and the divergent result.
+
+3. **If the fix is trivial and you are confident, fix it inline.**
+   Then add the test in the default `query` mode so it locks in the
+   fix. "Trivial" means a few lines in one file with no native code
+   changes, no support-level reshuffling, and no semantics decisions
+   that the user should weigh in on.
+
+4. **Otherwise, add the test in `query ignore(<issue-url>)` mode** with
+   a one-line SQL comment above the query explaining the symptom. The
+   test file lives next to the existing tests for the expression. Do
+   not skip writing the test because the bug is unfixed: the captured
+   reproduction is the whole point of this step.
+
+   ```sql
+   -- Comet returns NULL where Spark throws under spark.sql.ansi.enabled=true
+   query ignore(https://github.com/apache/datafusion-comet/issues/NNNN)
+   SELECT next_day(date('2024-01-01'), 'NOT_A_DAY')
+   ```
+
+   When the underlying bug is fixed, the `ignore(...)` is removed and
+   the test starts running. This is also the contract documented in
+   `docs/source/contributor-guide/sql-file-tests.md`.
+
+After capturing correctness findings, ask the user about the remaining
+categories. Keep these asks distinct so the user can opt in to each.
+
+**Missing test coverage (non-bug):**
+
+> Spark exercises the following cases that have no Comet test. Would
+> you like me to add them?
 >
 > - [list each missing test case]
 >
@@ -438,9 +484,10 @@ one without the other.
 > add a missing `get*Reasons()` override, or move a check from `convert`
 > into `getSupportLevel`).
 
-If the user says yes to tests, implement them following the Comet SQL Tests
-format described in `docs/source/contributor-guide/sql-file-tests.md`.
-Prefer Comet SQL Tests over Comet Scala Tests.
+If the user says yes to test coverage, implement them following the Comet
+SQL Tests format described in
+`docs/source/contributor-guide/sql-file-tests.md`. Prefer Comet SQL Tests
+over Comet Scala Tests.
 
 ### Comet SQL Tests template
 
diff --git a/spark/src/test/resources/sql-tests/expressions/datetime/make_date_ansi.sql b/spark/src/test/resources/sql-tests/expressions/datetime/make_date_ansi.sql
new file mode 100644
index 0000000000..04a28e3e1c
--- /dev/null
+++ b/spark/src/test/resources/sql-tests/expressions/datetime/make_date_ansi.sql
@@ -0,0 +1,36 @@
+-- Licensed to the Apache Software Foundation (ASF) under one
+-- or more contributor license agreements.  See the NOTICE file
+-- distributed with this work for additional information
+-- regarding copyright ownership.  The ASF licenses this file
+-- to you under the Apache License, Version 2.0 (the
+-- "License"); you may not use this file except in compliance
+-- with the License.  You may obtain a copy of the License at
+--
+--   http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing,
+-- software distributed under the License is distributed on an
+-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+-- KIND, either express or implied.  See the License for the
+-- specific language governing permissions and limitations
+-- under the License.
+
+-- ANSI mode: Spark's MakeDate wraps the java.time.DateTimeException from LocalDate.of in
+-- ansiDateTimeArgumentOutOfRange (4.0) / ansiDateTimeError (3.4/3.5) when
+-- spark.sql.ansi.enabled=true. Comet's native SparkMakeDate always returns NULL on
+-- invalid input and never raises, so it does not throw under ANSI. The ignored queries
+-- below capture the divergence; remove ignore(...) when
+-- https://github.com/apache/datafusion-comet/issues/4451 is fixed.
+-- Config: spark.sql.ansi.enabled=true
+
+-- February 30 is not a valid date.
+query ignore(https://github.com/apache/datafusion-comet/issues/4451)
+SELECT make_date(2024, 2, 30)
+
+-- Month 13 is out of range.
+query ignore(https://github.com/apache/datafusion-comet/issues/4451)
+SELECT make_date(2024, 13, 1)
+
+-- Day 0 is out of range.
+query ignore(https://github.com/apache/datafusion-comet/issues/4451)
+SELECT make_date(2024, 6, 0)
diff --git a/spark/src/test/resources/sql-tests/expressions/datetime/next_day.sql b/spark/src/test/resources/sql-tests/expressions/datetime/next_day.sql
index 057c9daea6..65cccb9356 100644
--- a/spark/src/test/resources/sql-tests/expressions/datetime/next_day.sql
+++ b/spark/src/test/resources/sql-tests/expressions/datetime/next_day.sql
@@ -72,3 +72,8 @@ SELECT next_day(date('2023-01-01'), 'Monday'), next_day(date('2023-01-01'), 'Sun
 -- null handling
 query
 SELECT next_day(NULL, 'Monday'), next_day(date('2023-01-01'), NULL)
+
+-- Comet's native impl trims whitespace before matching the day name; Spark does not, so
+-- ' MO ' is invalid in Spark (NULL) but matches Monday in Comet.
+query ignore(https://github.com/apache/datafusion-comet/issues/4450)
+SELECT next_day(date('2024-01-01'), ' MO '), next_day(date('2024-01-01'), 'MO ')
diff --git a/spark/src/test/resources/sql-tests/expressions/datetime/next_day_ansi.sql b/spark/src/test/resources/sql-tests/expressions/datetime/next_day_ansi.sql
new file mode 100644
index 0000000000..9f8f8e435f
--- /dev/null
+++ b/spark/src/test/resources/sql-tests/expressions/datetime/next_day_ansi.sql
@@ -0,0 +1,30 @@
+-- Licensed to the Apache Software Foundation (ASF) under one
+-- or more contributor license agreements.  See the NOTICE file
+-- distributed with this work for additional information
+-- regarding copyright ownership.  The ASF licenses this file
+-- to you under the Apache License, Version 2.0 (the
+-- "License"); you may not use this file except in compliance
+-- with the License.  You may obtain a copy of the License at
+--
+--   http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing,
+-- software distributed under the License is distributed on an
+-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+-- KIND, either express or implied.  See the License for the
+-- specific language governing permissions and limitations
+-- under the License.
+
+-- ANSI mode: Spark's NextDay throws SparkIllegalArgumentException on a malformed
+-- dayOfWeek. Comet's native impl (datafusion-spark::SparkNextDay) always returns NULL,
+-- so it does not throw under ANSI. The expect_error pattern below will be the assertion
+-- once https://github.com/apache/datafusion-comet/issues/4449 is fixed; until then the
+-- queries are ignored so the suite stays green.
+-- Config: spark.sql.ansi.enabled=true
+
+-- Comet returns NULL where Spark throws.
+query ignore(https://github.com/apache/datafusion-comet/issues/4449)
+SELECT next_day(date('2024-01-01'), 'NOT_A_DAY')
+
+query ignore(https://github.com/apache/datafusion-comet/issues/4449)
+SELECT next_day(date('2024-01-01'), '')