From 795ae990fe480f09587156b4d20a2a5336ab5805 Mon Sep 17 00:00:00 2001
From: Andy Grove <agrove@apache.org>
Date: Wed, 27 May 2026 16:19:38 -0600
Subject: [PATCH 1/3] docs: require audit skill to file issues for prose-only
 findings

The audit-comet-expression skill previously permitted leaving
semantics-decision findings as prose recommendations in the PR
description, on the assumption the reviewer would pick them up.
In practice that note dies with the PR.

Tighten Step 6 and Step 7 so that every high-priority finding either
becomes an inline fix + test, or a filed GitHub issue + ignored
regression test, before the audit PR is opened. Add a dedicated
'Findings that need follow-up' subsection spelling out the workflow
(search, file, cross-reference from the support-doc sub-bullet and
the PR description).

Surfaced while re-running the string-expressions audit in #4461:
several higher-risk findings (CometCaseConversionBase compat gating,
StringRepeat negative-count divergence, translate grapheme semantics,
bit_length/octet_length BinaryType, decode legacy flags) were left as
prose only and had to be filed retroactively as #4462-#4467.
---
 .../skills/audit-comet-expression/SKILL.md    | 59 +++++++++++++++++--
 1 file changed, 54 insertions(+), 5 deletions(-)
diff --git a/.claude/skills/audit-comet-expression/SKILL.md b/.claude/skills/audit-comet-expression/SKILL.md
index 3e3b35fa88..1f6c8e5f7e 100644
--- a/.claude/skills/audit-comet-expression/SKILL.md
+++ b/.claude/skills/audit-comet-expression/SKILL.md
@@ -411,6 +411,11 @@ For an untested case, run it manually first to determine the current
 behaviour, then commit either a regression test (passes) or a
 `query ignore(<issue-url>)` test (fails).
 
+Every item in this bucket either becomes an inline fix + test, or a
+filed GitHub issue + ignored regression test, in the audit PR. Step 7
+spells out the workflow: never leave a high-priority finding as PR-body
+prose only.
+
 ### Medium priority: missing test coverage
 
 Low-risk coverage gaps: additional input permutations on already-tested
@@ -429,9 +434,13 @@ etc. These come from the Step 5 consistency audit.
 
 High-priority findings (correctness divergences and high-risk coverage
 gaps) and consistency issues from Step 5 / Step 6 must not be left as
-prose. Apply them in the same PR as the audit. Only low-risk missing
-coverage requires the user's go-ahead, because adding tests for cases
-that already work on well-exercised paths is incremental polish.
+prose. Apply them in the same PR as the audit. Anything you cannot fix
+inline (because it needs a semantics decision, native code change, or
+larger design work) must still be captured as a GitHub issue per the
+"Findings that need follow-up" section below: prose recommendations in
+the PR body alone are insufficient. Only low-risk missing coverage
+requires the user's go-ahead, because adding tests for cases that
+already work on well-exercised paths is incremental polish.
 
 ### High-priority findings: capture as tests
 
@@ -502,13 +511,53 @@ need user approval. The classes of fix are:
 
 Each fix is one of these patterns. If a finding requires a semantics
 decision (e.g. is a specific branch really `Unsupported`, or is it
-`Incompatible`?), do not guess: leave it as a prose recommendation in
-the PR description and call it out for the reviewer.
+`Incompatible`?), do not guess: **file a GitHub issue per the
+"Findings that need follow-up" section below** and link it from the PR
+description. Do not leave the recommendation as prose only: prose in a
+PR description gets buried as soon as the PR merges.
 
 After every fix, build the affected module to make sure the edit
 compiles. Do not run the full suite; targeted tests suffice if the
 fix could plausibly affect behaviour.
 
+### Findings that need follow-up: always file a tracking issue
+
+Any high-priority finding (correctness divergence, robustness gap,
+behavioural difference from Spark, missing-collation guard, etc.) that
+this PR does not fix inline **must** be filed as a GitHub issue before
+the PR is opened. This includes:
+
+- Semantics decisions the audit surfaces but should not unilaterally
+  resolve (e.g. promote `Compatible` to `Incompatible`, change a
+  default).
+- Architectural concerns that span multiple expressions (e.g. Spark 4.0
+  collation propagation across an entire family).
+- Bugs that are fixable in principle but need more design or native
+  changes than fit in the audit PR.
+- Documentation gaps surfaced by the audit (e.g. an expression that
+  doesn't appear in the auto-generated compatibility doc).
+
+For each follow-up:
+
+1. Search for an existing issue first
+   (`gh issue list --search "<expression> <symptom> in:title,body" --state all --limit 5`).
+   If one exists, link it from the PR description and the support-doc
+   sub-bullet, and stop.
+2. If no issue exists, file one with `gh issue create` using the
+   `correctness` label (or `documentation` for doc-only gaps) plus any
+   relevant area labels (e.g. `spark 4.0`). Title format:
+   `[Bug] <expression> <one-line symptom>`. Body includes: Spark version
+   range affected, a minimal repro, the divergent result, the relevant
+   Comet file/line, and a one-line note that the issue was surfaced by
+   this audit PR.
+3. Reference the issue number from both the support-doc sub-bullet and
+   the PR description "What changes are included" section so reviewers
+   can see what work the audit intentionally deferred.
+
+Prose-only "future work" notes in the PR description are not enough.
+The whole point of the audit is to leave behind durable artefacts; an
+unfiled finding evaporates after the PR merges.
+
 ### Low-risk missing test coverage: ask the user
 
 This is the only Step 7 category that pauses for user input. It only

From d518e6b01882ab4744216314889363605990d641 Mon Sep 17 00:00:00 2001
From: Andy Grove <agrove@apache.org>
Date: Wed, 27 May 2026 16:23:42 -0600
Subject: [PATCH 2/3] docs: add Spark 4.1.1 to audit-comet-expression skill
 version list

Spark 4.1.1 is now a tracked release in the project, so the audit
skill should pull and diff against it alongside 3.4.3, 3.5.8, and
4.0.1. Update every Step 1 / Step 2 / Step 5 reference and the
sub-bullet template at Step 8 accordingly.
---
 .claude/skills/audit-comet-expression/SKILL.md | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/.claude/skills/audit-comet-expression/SKILL.md b/.claude/skills/audit-comet-expression/SKILL.md
index 1f6c8e5f7e..07a8152d23 100644
--- a/.claude/skills/audit-comet-expression/SKILL.md
+++ b/.claude/skills/audit-comet-expression/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: audit-comet-expression
-description: Audit an existing Comet expression for correctness and test coverage. Studies the Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1, reviews the Comet and DataFusion implementations, identifies missing test coverage, and offers to implement additional tests.
+description: Audit an existing Comet expression for correctness and test coverage. Studies the Spark implementation across versions 3.4.3, 3.5.8, 4.0.1, and 4.1.1, reviews the Comet and DataFusion implementations, identifies missing test coverage, and offers to implement additional tests.
 argument-hint: <expression-name>
 ---
 
@@ -10,7 +10,7 @@ Audit the Comet implementation of the `$ARGUMENTS` expression for correctness an
 
 This audit covers:
 
-1. Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1
+1. Spark implementation across versions 3.4.3, 3.5.8, 4.0.1, and 4.1.1
 2. Comet Scala serde implementation
 3. Comet Rust / DataFusion implementation
 4. Existing test coverage (Comet SQL Tests and Comet Scala Tests)
@@ -24,7 +24,7 @@ Clone specific Spark version tags (use shallow clones to avoid polluting the wor
 
 ```bash
 set -eu -o pipefail
-for tag in v3.4.3 v3.5.8 v4.0.1; do
+for tag in v3.4.3 v3.5.8 v4.0.1 v4.1.1; do
   dir="/tmp/spark-${tag}"
   if [ ! -d "$dir" ]; then
     git clone --depth 1 --branch "$tag" https://github.com/apache/spark.git "$dir"
@@ -37,7 +37,7 @@ done
 Search the Catalyst SQL expressions source:
 
 ```bash
-for tag in v3.4.3 v3.5.8 v4.0.1; do
+for tag in v3.4.3 v3.5.8 v4.0.1 v4.1.1; do
   dir="/tmp/spark-${tag}"
   echo "=== $tag ==="
   find "$dir/sql/catalyst/src/main/scala" -name "*.scala" | \
@@ -48,7 +48,7 @@ done
 If the expression is not found in catalyst, also check core:
 
 ```bash
-for tag in v3.4.3 v3.5.8 v4.0.1; do
+for tag in v3.4.3 v3.5.8 v4.0.1 v4.1.1; do
   dir="/tmp/spark-${tag}"
   echo "=== $tag ==="
   find "$dir/sql" -name "*.scala" | \
@@ -73,6 +73,7 @@ Produce a concise diff summary of what changed between:
 
 - 3.4.3 → 3.5.8
 - 3.5.8 → 4.0.1
+- 4.0.1 → 4.1.1
 
 Pay attention to:
 
@@ -87,7 +88,7 @@ Pay attention to:
 ## Step 2: Locate the Spark Tests
 
 ```bash
-for tag in v3.4.3 v3.5.8 v4.0.1; do
+for tag in v3.4.3 v3.5.8 v4.0.1 v4.1.1; do
   dir="/tmp/spark-${tag}"
   echo "=== $tag ==="
   find "$dir/sql" -name "*.scala" -path "*/test/*" | \
@@ -635,7 +636,7 @@ entry in `docs/source/contributor-guide/spark_expressions_support.md`.
 
 Add one sub-bullet per Spark version checked, each including:
 
-- Spark version (e.g. 3.4.3, 3.5.8, 4.0.1)
+- Spark version (e.g. 3.4.3, 3.5.8, 4.0.1, 4.1.1)
 - Today's date
 - A brief note for any version-specific finding (behavioral difference, known incompatibility); omit if nothing notable
 
@@ -646,7 +647,7 @@ Add one sub-bullet per Spark version checked, each including:
 Present the audit as:
 
 1. **Expression Summary** - Brief description of what `$ARGUMENTS` does, its input/output types, and null behavior
-2. **Spark Version Differences** - Summary of any behavioral or API differences across Spark 3.4.3, 3.5.8, and 4.0.1
+2. **Spark Version Differences** - Summary of any behavioral or API differences across Spark 3.4.3, 3.5.8, 4.0.1, and 4.1.1
 3. **Comet Implementation Notes** - Summary of how Comet implements this expression and any concerns
 4. **Coverage Gap Analysis** - The gap table from Step 5, plus implementation gaps
 5. **Recommendations** - Prioritized list from Step 6

From 92adb14fb2741f537d933659b1fc4be661a99431 Mon Sep 17 00:00:00 2001
From: Andy Grove <agrove@apache.org>
Date: Thu, 28 May 2026 11:15:15 -0600
Subject: [PATCH 3/3] docs: address audit-skill failure modes surfaced by PR
 #4461

The string-expressions audit (PR #4461) revealed six recurring
failure modes where the skill documented findings rather than acting
on them. Strengthen the consistency checklist and auto-fix list to
close these loopholes:

- Add checklist item 10: expression-shape restrictions (literal-only
  argument, child data type, etc.) must be declared in
  `getSupportLevel`, not gated inside `convert` with `withInfo`. Cite
  `CometLeft` / `CometRight` / `CometSubstring` as the canonical
  example.
- Add checklist item 11: Spark 4.0+ collation routing through
  `CollationSupport.X.exec` and `StringTypeWithCollation` means the
  expression is `Incompatible` for non-default collations. Link
  #4496 as the umbrella issue and reject "behaviour unchanged for
  `UTF8_BINARY`" as a justification for `Compatible`.
- Add checklist item 12: a sub-bullet that says "Known divergence" or
  "Known limitation" on a `Compatible` branch is a smell. The skill
  must promote the support level rather than documenting the
  divergence in prose only. Cite the `replace` empty-search-string
  case.
- Add checklist item 13: unreachable serde registrations (e.g. the
  `btrim` mapping for `StringTrimBoth`, which is rewritten by
  `RuntimeReplaceable` before serde runs) must be deleted, not
  catalogued.
- Add an issue-verification step to the reason-wording guidance and
  the follow-up-issue workflow. Every cited issue must be opened
  with `gh issue view` to confirm it exists, is open, and matches
  the divergence before the URL ships in a reason string or
  support-doc sub-bullet.

Add the matching auto-fix patterns to Step 7's "apply fixes
automatically" list so future audits resolve these inline rather than
filing them as prose follow-ups.
---
 .../skills/audit-comet-expression/SKILL.md    | 74 ++++++++++++++++++-
 1 file changed, 73 insertions(+), 1 deletion(-)

diff --git a/.claude/skills/audit-comet-expression/SKILL.md b/.claude/skills/audit-comet-expression/SKILL.md
index 07a8152d23..13bfa54464 100644
--- a/.claude/skills/audit-comet-expression/SKILL.md
+++ b/.claude/skills/audit-comet-expression/SKILL.md
@@ -209,6 +209,11 @@ and in EXPLAIN output, so they are user-facing.
 - Use backticks around config keys, type names, and SQL identifiers.
 - Link to a tracking GitHub issue for known incompatibilities so users can
   follow progress: `(https://github.com/apache/datafusion-comet/issues/NNNN)`.
+  **Verify the issue exists and is open** before citing it
+  (`gh issue view <N> --repo apache/datafusion-comet`). Issue numbers
+  invented from context or recalled from memory are a recurring failure
+  mode: a stale link is worse than no link because the reader follows it
+  and finds nothing.
 - Keep it concise. Single sentence is best.
 - Do not write "Incompatible reason: ..." or "Unsupported because ...".
   The doc generator adds the framing.
@@ -390,6 +395,49 @@ finding for Step 6.
    read like internal implementation notes ("DataFusion probes the longer
    side") or that mismatch their support level (an "Incompatible" reason
    that says "X is not supported").
+10. **Expression-shape restrictions live in `getSupportLevel`.** Any
+    restriction that is knowable from the expression alone (literal-only
+    arguments, unsupported child data type, foldable-only options, a
+    specific operator shape) must be declared as an
+    `Unsupported(Some(reason))` branch in `getSupportLevel`, not gated
+    inside `convert` with a `withInfo` + `return None`. Putting the
+    check in `convert` means EXPLAIN surfaces the reason only at
+    conversion time, the doc generator never sees it, and the
+    dispatcher cannot route around it. The literal-only `len`
+    restrictions on `CometLeft`, `CometRight`, and `CometSubstring`
+    are the canonical example of the in-`convert` pattern that this
+    skill forbids: lift them into `getSupportLevel`.
+11. **Spark 4.0 collation divergences are flagged, not glossed over.**
+    If the Spark 4.0/4.1 implementation routes through
+    `CollationSupport.X.exec(..., collationId)` (or uses
+    `StringTypeWithCollation` / `StringTypeNonCSAICollation` for input
+    types) and the Comet path does not propagate collation, the
+    expression is `Incompatible` for non-default collations. Mark the
+    branch `Incompatible(Some(reason))` linking to the collation
+    umbrella issue
+    (https://github.com/apache/datafusion-comet/issues/4496) so the
+    follow-up sweep can find every site. "Behaviour unchanged for
+    `UTF8_BINARY`" alone is not a justification for leaving the
+    support level at `Compatible`: users running with non-default
+    collations get silently wrong answers.
+12. **Known divergences flip the support level.** If you find yourself
+    writing the words "Known divergence" or "Known limitation" in the
+    support-doc sub-bullet while leaving `getSupportLevel` returning
+    `Compatible`, the audit has skipped its job. A documented
+    divergence is by definition not `Compatible`. Promote the branch
+    to `Incompatible(Some(reason))` (or `Unsupported` if the native
+    path errors rather than producing a wrong answer) and link the
+    tracking issue. The `replace` empty-search-string divergence with
+    DataFusion is the canonical example of this anti-pattern.
+13. **Unreachable serde mappings are removed.** Expressions registered
+    as `RuntimeReplaceable` (or otherwise rewritten by an analyzer
+    rule before serde) never reach `QueryPlanSerde.exprToProtoInternal`
+    with their original class. If the audit finds that a registered
+    `CometScalarFunction("name")` or `CometExpressionSerde` entry can
+    never be hit (e.g. the `btrim` mapping for `StringTrimBoth`, which
+    is rewritten to `StringTrim` before serde runs), delete the
+    registration in the same audit PR. Documenting the dead code in
+    the support doc is not enough.
 
 ---
 
@@ -509,6 +557,24 @@ need user approval. The classes of fix are:
 - Hoist a reason shared by multiple serdes (e.g. a recurring
   TimestampNTZ caveat) into a small `private object` companion in the
   same file, mirroring `UTCTimestampSerde`.
+- Lift expression-shape restrictions (literal-only argument, foldable
+  child, unsupported child data type) out of `convert`'s `withInfo` +
+  `return None` and into an `Unsupported(Some(reason))` branch in
+  `getSupportLevel`. The `convert` body should then assume the
+  precondition holds and stop calling `withInfo` for that case.
+- Promote a documented "Known divergence" or "Known limitation" sub-
+  bullet from a `Compatible` branch to `Incompatible(Some(reason))`
+  (or `Unsupported` if the native path errors) and link the tracking
+  issue. The sub-bullet stays as user-facing documentation. The
+  support level catches up to match.
+- Mark expressions whose Spark 4.0+ path routes through
+  `CollationSupport.X.exec` (or accepts `StringTypeWithCollation` /
+  `StringTypeNonCSAICollation`) as `Incompatible(Some(reason))` for
+  non-default collations, linking
+  https://github.com/apache/datafusion-comet/issues/4496.
+- Delete unreachable serde registrations (`RuntimeReplaceable` rewrites
+  the expression before serde runs, an analyzer rule strips the case,
+  etc.) rather than documenting them as a curiosity.
 
 Each fix is one of these patterns. If a finding requires a semantics
 decision (e.g. is a specific branch really `Unsupported`, or is it
@@ -542,7 +608,13 @@ For each follow-up:
 
 1. Search for an existing issue first
    (`gh issue list --search "<expression> <symptom> in:title,body" --state all --limit 5`).
-   If one exists, link it from the PR description and the support-doc
+   If a candidate match comes back, **open it
+   (`gh issue view <N> --repo apache/datafusion-comet`) and confirm the
+   title and body actually describe the divergence you found, and that
+   the issue is still `OPEN`**. A closed-but-fixed issue cited as
+   "known divergence" is worse than no citation, because the reader
+   follows the link and finds a fix that was already shipped. If it
+   matches, link it from the PR description and the support-doc
    sub-bullet, and stop.
 2. If no issue exists, file one with `gh issue create` using the
    `correctness` label (or `documentation` for doc-only gaps) plus any