From e2b9356080cf080f743a51a972a82fa90f3f0a5c Mon Sep 17 00:00:00 2001
From: Andy Grove <agrove@apache.org>
Date: Wed, 27 May 2026 07:44:20 -0600
Subject: [PATCH 1/4] docs: clarify support-level and reason consistency in
 audit-comet-expression skill

Expand the audit skill so it produces consistent, accurate findings about
`getSupportLevel`, `getIncompatibleReasons`, `getUnsupportedReasons`, and
`convert`.

Step 3 gains a dedicated subsection covering:

- A decision rule for `Unsupported` vs `Incompatible` (wrong answer vs
  fallback/error).
- The runtime-vs-docs split: support-level `notes` flow into EXPLAIN via
  the dispatcher, while `get*Reasons()` is read only by `GenerateDocs`.
- Six consistency rules covering every branch, dead-reason detection,
  shared `private val` for reason strings, `Incompatible(Some(reason))`
  preference, and gating decisions in `getSupportLevel` rather than
  `convert`.
- Wording guidelines for reason strings (user-observable effect first,
  sentence case, backtick configs/types, tracking issue links).
- Real antipattern callouts naming specific serdes (`CometHour`,
  `CometMinute`, `CometSecond`, `CometInitCap`).

Step 5 gains a structured 9-item support-level consistency audit
checklist that produces findings against the four methods.

Step 7 splits into two asks: one for missing tests, one for consistency
fixes, so the user can opt in to either independently.
---
 .../skills/audit-comet-expression/SKILL.md    | 206 ++++++++++++++++--
 1 file changed, 188 insertions(+), 18 deletions(-)

diff --git a/.claude/skills/audit-comet-expression/SKILL.md b/.claude/skills/audit-comet-expression/SKILL.md
index e2704bb47d..bfabfeaa4e 100644
--- a/.claude/skills/audit-comet-expression/SKILL.md
+++ b/.claude/skills/audit-comet-expression/SKILL.md
@@ -119,19 +119,125 @@ grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/ --include="*.scala"
 Read the serde implementation and check:
 
 - Which Spark versions the serde handles
-- Whether `getSupportLevel` is implemented and accurate
-- Whether all input types are handled
-- Whether any types are explicitly marked `Unsupported`
-- Whether `getIncompatibleReasons()` and `getUnsupportedReasons()` are overridden.
-  `getSupportLevel` controls runtime fallback, but `GenerateDocs` reads these two
-  methods to build the Compatibility Guide. If `getSupportLevel` returns
-  `Incompatible(Some(...))` or `Unsupported(Some(...))` but the corresponding
-  `get*Reasons()` method is not overridden, the reason will not appear in the
-  published docs. Expect both to return a `Seq[String]` containing the same
-  reason text used in `getSupportLevel`. Follow the pattern in
-  `spark/src/main/scala/org/apache/comet/serde/structs.scala::CometStructsToJson`
-  or `spark/src/main/scala/org/apache/comet/serde/datetime.scala::CometHour`:
-  extract the reason as a `private val` and reference it from both places.
+- Whether all input types Spark accepts are handled
+- Whether `convert` validates expression shape (e.g. literal-only arguments) before serializing
+
+Then audit the support-level and reason methods as described below.
+
+#### Audit `getSupportLevel`, `getIncompatibleReasons`, `getUnsupportedReasons`, and `convert`
+
+These four methods must stay aligned. Each has a distinct purpose, and the
+most common bugs in Comet serdes are misalignments between them.
+
+**Pick the right support level.**
+
+- `Unsupported(Some(reason))`: Comet cannot run this case at all. The
+  dispatcher in `QueryPlanSerde.exprToProtoInternal` falls back to Spark
+  unconditionally. Use this when an input type, option, or expression shape
+  is not implemented, or when running the native path would crash or error.
+- `Incompatible(Some(reason))`: Comet can run this, but results may differ
+  from Spark. The dispatcher only allows it when
+  `spark.comet.expr.allowIncompatible=true` (or the per-expression
+  equivalent). Use this for known result differences such as locale
+  sensitivity, timezone handling, ordering ambiguity, or floating-point
+  precision.
+- `Compatible(None)`: full Spark compatibility for this combination of
+  inputs and options.
+- `Compatible(Some(note))`: fully compatible but with a docs-only caveat.
+  Note that any `Some(...)` on `Compatible` triggers a runtime
+  `logWarning`, so reserve it for genuinely useful caveats.
+
+Decision rule: if the user would be surprised by a *wrong answer*, it is
+`Incompatible`. If the user would be surprised by a *runtime error or
+unsupported-type message*, it is `Unsupported`.
+
+**Runtime vs docs split.**
+
+- The `notes` field on `Incompatible(Some(...))` and `Unsupported(Some(...))`
+  flows into EXPLAIN output via the dispatcher (see
+  `QueryPlanSerde.exprToProtoInternal`, around the `case Incompatible` /
+  `case Unsupported` branches). This is what users see when they ask why
+  Comet fell back.
+- `getIncompatibleReasons()` and `getUnsupportedReasons()` are read only by
+  `GenerateDocs` when building
+  `docs/source/user-guide/compatibility.md`. They are static (no `expr`
+  argument) and should enumerate every distinct reason the expression
+  could ever return at runtime.
+- `convert` may also call `withInfo(expr, "reason")` and return `None` for
+  cases that cannot be detected from the expression alone (for example,
+  non-literal arguments, or child conversion failures). Those reasons
+  belong in `getUnsupportedReasons()` too.
+
+**Consistency rules.**
+
+1. If `getSupportLevel` can return `Incompatible(...)` for any input,
+   override `getIncompatibleReasons()` and include a reason for every
+   incompatible branch.
+2. If `getSupportLevel` can return `Unsupported(...)` for any input,
+   override `getUnsupportedReasons()` and include a reason for every
+   unsupported branch.
+3. If `convert` has its own `withInfo(...); None` fallbacks (e.g. literal
+   checks), enumerate those reasons in `getUnsupportedReasons()` too.
+4. Extract each reason into a `private val` and reference it from both
+   `getSupportLevel` and `get*Reasons()`. Do not duplicate the string
+   inline. Canonical example:
+   `spark/src/main/scala/org/apache/comet/serde/arrays.scala::CometArrayIntersect`.
+   It declares `private val incompatReason` and
+   `private val unsupportedCollationReason`, and each is referenced from
+   `getSupportLevel` and the matching reasons method.
+5. Prefer `Incompatible(Some(reason))` over `Incompatible(None)`. The
+   `None` form drops the reason from EXPLAIN output, leaving users with a
+   generic "not fully compatible" message and forcing them to read the
+   docs to find out why.
+6. Gate compatibility decisions in `getSupportLevel`, not inside `convert`.
+   Putting the check inside `convert` (e.g. reading a config flag and
+   calling `withInfo`) bypasses the dispatcher's `allowIncompatible`
+   handling and the EXPLAIN message becomes inconsistent with what the
+   doc generator produces.
+
+**Wording guidelines for reason strings.**
+
+Reasons appear verbatim in the Compatibility Guide (rendered as Markdown)
+and in EXPLAIN output, so they are user-facing.
+
+- Lead with the user-observable effect, then the cause if helpful.
+  ✅ "Result array element order may differ from Spark when the right array
+  is longer than the left."
+  ❌ "DataFusion probes the longer side."
+- Use sentence case and end with a period.
+- Use backticks around config keys, type names, and SQL identifiers.
+- Link to a tracking GitHub issue for known incompatibilities so users can
+  follow progress: `(https://github.com/apache/datafusion-comet/issues/NNNN)`.
+- Keep it concise. Single sentence is best.
+- Do not write "Incompatible reason: ..." or "Unsupported because ...".
+  The doc generator adds the framing.
+- Phrase Incompatible reasons as *what differs from Spark*, not *what is
+  missing*. Phrase Unsupported reasons as *what does not run*. If you find
+  yourself writing an "Incompatible" reason that says "Comet only supports
+  X" or "Y is not supported", the support level is probably wrong: it
+  should be `Unsupported`.
+
+**Common antipatterns to flag during the audit.**
+
+- A `private val reason` constant declared near the top of the object, but
+  `getSupportLevel` hardcodes a different string inline. The doc and the
+  EXPLAIN message will drift.
+  Real example: `CometHour` declares `incompatReason` then hardcodes a
+  near-duplicate string in `getSupportLevel`.
+- Reason text duplicated in two places without a shared constant.
+  Real examples: `CometMinute`, `CometSecond`.
+- `Incompatible(None)` paired with a populated `getIncompatibleReasons()`.
+  The reason reaches the docs but not EXPLAIN output.
+  Real example: `CometInitCap`.
+- `getIncompatibleReasons()` overridden but `getSupportLevel` never returns
+  `Incompatible(...)`. Either the reasons method is dead code, or
+  `getSupportLevel` is missing a branch.
+- A reason string phrased as "X is not supported" attached to an
+  `Incompatible` branch (or vice versa). Re-read it and decide which
+  support level it really belongs to.
+- `convert` bails out via `withInfo` for a case that is fully knowable
+  from the expression (e.g. an unsupported child data type). Move the
+  check into `getSupportLevel` so the dispatcher handles it uniformly.
 
 ### Shims
 
@@ -237,7 +343,52 @@ Also review the Comet implementation (Step 3) against the Spark behavior (Step 1
 - Are there behavioral differences that are NOT marked `Incompatible` but should be?
 - Are there behavioral differences between Spark versions that the Comet implementation does not account for (missing shim)?
 - Does the Rust implementation match the Spark behavior for all edge cases?
-- If `getSupportLevel` returns `Incompatible` or `Unsupported` with a reason, are `getIncompatibleReasons()` / `getUnsupportedReasons()` also overridden so the reason is picked up by `GenerateDocs` and appears in the Compatibility Guide?
+
+### Support-level consistency audit
+
+Walk through this checklist against the serde. Each failed item is a
+finding for Step 6.
+
+1. **Support level matches behavior.** For each branch of
+   `getSupportLevel`, decide whether the user-observable effect is a wrong
+   answer (`Incompatible`) or a fallback / error (`Unsupported`). Flag any
+   branch where the label does not match the behavior.
+2. **Reasons cover every branch.** Every distinct reason that
+   `getSupportLevel` can return as `Incompatible(Some(r))` must appear in
+   `getIncompatibleReasons()`. Same for `Unsupported(Some(r))` and
+   `getUnsupportedReasons()`. Missing reasons silently drop from the
+   Compatibility Guide.
+3. **Reasons are not dead code.** If `getIncompatibleReasons()` is
+   overridden but `getSupportLevel` never returns `Incompatible(...)`,
+   either the reason is stale or `getSupportLevel` is missing a branch.
+   Same for `getUnsupportedReasons()`.
+4. **Reason strings are shared via `private val`.** If the same reason
+   appears as a string literal in two places, flag it: changes to one
+   will not propagate to the other.
+5. **Inline reason matches the constant.** If a `private val reason` is
+   declared but `getSupportLevel` uses a different string literal, flag
+   it as a drift bug.
+6. **`Incompatible(None)` has no docs-only reason.** If
+   `getSupportLevel` returns `Incompatible(None)` but
+   `getIncompatibleReasons()` is non-empty, flag it: the EXPLAIN message
+   will be generic while the docs show a specific reason. Switch to
+   `Incompatible(Some(reason))`.
+7. **`convert` fallbacks are documented.** If `convert` calls
+   `withInfo(expr, "...")` and returns `None` for cases not covered by
+   `getSupportLevel` (e.g. non-literal arguments, unsupported expression
+   shapes), confirm the reason is also listed in
+   `getUnsupportedReasons()`.
+8. **Compatibility decisions live in `getSupportLevel`.** If `convert`
+   reads a config flag and bails out, prefer moving the check into
+   `getSupportLevel` so the dispatcher handles `allowIncompatible`
+   uniformly. `CometCaseConversionBase` is an example of the in-`convert`
+   pattern that this skill recommends against.
+9. **Reason wording.** Each reason should describe the user-observable
+   effect, use sentence case with a period, backtick config keys and
+   types, and link a tracking issue when one exists. Flag reasons that
+   read like internal implementation notes ("DataFusion probes the longer
+   side") or that mismatch their support level (an "Incompatible" reason
+   that says "X is not supported").
 
 ---
 
@@ -259,9 +410,13 @@ Minor gaps, cosmetic improvements, or nice-to-have tests.
 
 ---
 
-## Step 7: Offer to Implement Missing Tests
+## Step 7: Offer to Implement Missing Tests and Fix Consistency Findings
+
+After presenting the gap analysis, ask the user separately about each
+category of finding. Keep the two asks distinct so the user can opt in to
+one without the other.
 
-After presenting the gap analysis, ask the user:
+**Test coverage:**
 
 > I found the following missing test cases. Would you like me to implement them?
 >
@@ -270,8 +425,23 @@ After presenting the gap analysis, ask the user:
 > I can add them as Comet SQL Tests in `spark/src/test/resources/sql-tests/expressions/<category>/$ARGUMENTS.sql`
 > (or as Comet Scala Tests in `CometExpressionSuite` for cases that require programmatic setup).
 
-If the user says yes, implement the missing tests following the Comet SQL Tests format described in
-`docs/source/contributor-guide/sql-file-tests.md`. Prefer Comet SQL Tests over Comet Scala Tests.
+**Support-level consistency:**
+
+> I also found the following alignment issues between `getSupportLevel`,
+> `getIncompatibleReasons`, `getUnsupportedReasons`, and `convert`. Would
+> you like me to fix them?
+>
+> - [list each finding from the Step 5 consistency audit, with the file
+>   and serde object name]
+>
+> Each fix is a small, mechanical edit (extract a `private val` for the
+> reason, switch `Incompatible(None)` to `Incompatible(Some(reason))`,
+> add a missing `get*Reasons()` override, or move a check from `convert`
+> into `getSupportLevel`).
+
+If the user says yes to tests, implement them following the Comet SQL Tests
+format described in `docs/source/contributor-guide/sql-file-tests.md`.
+Prefer Comet SQL Tests over Comet Scala Tests.
 
 ### Comet SQL Tests template
 

From 402d314c642a1cffa4b96ef08904bc8a2e6cf4c4 Mon Sep 17 00:00:00 2001
From: Andy Grove <agrove@apache.org>
Date: Wed, 27 May 2026 07:47:14 -0600
Subject: [PATCH 2/4] style: apply prettier

---
 .claude/skills/audit-comet-expression/SKILL.md | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/.claude/skills/audit-comet-expression/SKILL.md b/.claude/skills/audit-comet-expression/SKILL.md
index bfabfeaa4e..7cac6c252d 100644
--- a/.claude/skills/audit-comet-expression/SKILL.md
+++ b/.claude/skills/audit-comet-expression/SKILL.md
@@ -147,9 +147,9 @@ most common bugs in Comet serdes are misalignments between them.
   Note that any `Some(...)` on `Compatible` triggers a runtime
   `logWarning`, so reserve it for genuinely useful caveats.
 
-Decision rule: if the user would be surprised by a *wrong answer*, it is
-`Incompatible`. If the user would be surprised by a *runtime error or
-unsupported-type message*, it is `Unsupported`.
+Decision rule: if the user would be surprised by a _wrong answer_, it is
+`Incompatible`. If the user would be surprised by a _runtime error or
+unsupported-type message_, it is `Unsupported`.
 
 **Runtime vs docs split.**
 
@@ -211,8 +211,8 @@ and in EXPLAIN output, so they are user-facing.
 - Keep it concise. Single sentence is best.
 - Do not write "Incompatible reason: ..." or "Unsupported because ...".
   The doc generator adds the framing.
-- Phrase Incompatible reasons as *what differs from Spark*, not *what is
-  missing*. Phrase Unsupported reasons as *what does not run*. If you find
+- Phrase Incompatible reasons as _what differs from Spark_, not _what is
+  missing_. Phrase Unsupported reasons as _what does not run_. If you find
   yourself writing an "Incompatible" reason that says "Comet only supports
   X" or "Y is not supported", the support level is probably wrong: it
   should be `Unsupported`.
@@ -431,8 +431,7 @@ one without the other.
 > `getIncompatibleReasons`, `getUnsupportedReasons`, and `convert`. Would
 > you like me to fix them?
 >
-> - [list each finding from the Step 5 consistency audit, with the file
->   and serde object name]
+> - [list each finding from the Step 5 consistency audit, with the file and serde object name]
 >
 > Each fix is a small, mechanical edit (extract a `private val` for the
 > reason, switch `Incompatible(None)` to `Incompatible(Some(reason))`,

From 44182ad6ed02b54a5e3df6e17c5d5d1672cd9876 Mon Sep 17 00:00:00 2001
From: Andy Grove <agrove@apache.org>
Date: Wed, 27 May 2026 08:45:14 -0600
Subject: [PATCH 3/4] docs: capture audit findings as tests and apply
 consistency fixes automatically

Tightens Step 6 / Step 7 of the audit-comet-expression skill so the
workflow produces concrete output without pausing for user approval on
mechanical steps:

- Step 6 reorganises the priorities into three named buckets:
  correctness divergences, missing coverage, and consistency issues.
- Step 7 requires correctness findings to be captured as SQL file
  tests in the same PR as the audit. Walks through the
  "search for existing issue, file if missing" workflow and the
  trivial-fix-vs-`query ignore(<url>)` decision rule, with a concrete
  example.
- Step 7 also applies every Step 5 consistency finding automatically.
  These are mechanical edits (extract `private val`, switch
  `Incompatible(None)` to `Some(reason)`, add missing `get*Reasons`,
  move a check from `convert` into `getSupportLevel`, hoist a shared
  reason into a private companion) and do not need user approval.
- Only "missing test coverage" still pauses for user input, because
  adding tests for cases that already work is incremental polish.

Surfaced while applying the skill to a multi-expression audit in
#4448; both behaviours felt obviously right when the audit ran on a
larger surface and several recurring patterns made the asking step
feel like friction rather than safety.
---
 .../skills/audit-comet-expression/SKILL.md    | 124 ++++++++++++++----
 1 file changed, 96 insertions(+), 28 deletions(-)

diff --git a/.claude/skills/audit-comet-expression/SKILL.md b/.claude/skills/audit-comet-expression/SKILL.md
index 7cac6c252d..46b2398b30 100644
--- a/.claude/skills/audit-comet-expression/SKILL.md
+++ b/.claude/skills/audit-comet-expression/SKILL.md
@@ -396,51 +396,119 @@ finding for Step 6.
 
 Summarize findings as a prioritized list.
 
-### High priority
+### High priority: correctness divergences
 
-Issues where Comet may silently produce wrong results compared to Spark.
+Cases where Comet produces a different observable result from Spark
+(wrong value, missing exception, accepted-instead-of-rejected input,
+etc.). Each one becomes a captured test in Step 7.
 
-### Medium priority
+### Medium priority: missing test coverage
 
-Missing test coverage for edge cases that could expose bugs.
+Edge cases Spark exercises but Comet does not test, where the behaviour
+appears to match but is not verified.
 
-### Low priority
+### Low priority: cosmetic and consistency issues
 
-Minor gaps, cosmetic improvements, or nice-to-have tests.
+Reason-string drift, missing `get*Reasons()` overrides, dead branches,
+etc. These come from the Step 5 consistency audit.
 
 ---
 
-## Step 7: Offer to Implement Missing Tests and Fix Consistency Findings
+## Step 7: Apply Findings (Tests and Fixes), Then Offer the Rest
 
-After presenting the gap analysis, ask the user separately about each
-category of finding. Keep the two asks distinct so the user can opt in to
-one without the other.
+Correctness divergences and consistency issues from Step 5 / Step 6 must
+not be left as prose. Apply them in the same PR as the audit. Only the
+"missing test coverage" bucket requires the user's go-ahead, because
+adding new tests for cases that already work is incremental polish.
 
-**Test coverage:**
+### Correctness divergences: capture as tests
 
-> I found the following missing test cases. Would you like me to implement them?
+Every divergence the audit identified becomes a regression test in the
+same PR as the audit, so future readers can run it.
+
+For each correctness finding, do the following in this order:
+
+1. **Search for an existing tracking issue.**
+
+   ```bash
+   gh issue list --search "<expression> <symptom> in:title,body" --state all --limit 5
+   ```
+
+   Match on both the expression name and a distinguishing keyword
+   (ANSI, timezone, NTZ, overflow, etc.).
+
+2. **If no issue exists, file one.** Use the `correctness` label plus
+   the relevant area label (e.g. `temporal expressions`). Keep the
+   title in the form "[Bug] <expression> <one-line symptom>". Include
+   the Spark version, a minimal repro, and the divergent result.
+
+3. **If the fix is trivial and you are confident, fix it inline.**
+   Then add the test in the default `query` mode so it locks in the
+   fix. "Trivial" means a few lines in one file with no native code
+   changes, no support-level reshuffling, and no semantics decisions
+   that the user should weigh in on.
+
+4. **Otherwise, add the test in `query ignore(<issue-url>)` mode** with
+   a one-line SQL comment above the query explaining the symptom. The
+   test file lives next to the existing tests for the expression. Do
+   not skip writing the test because the bug is unfixed: the captured
+   reproduction is the whole point of this step.
+
+   ```sql
+   -- Comet returns NULL where Spark throws under spark.sql.ansi.enabled=true
+   query ignore(https://github.com/apache/datafusion-comet/issues/NNNN)
+   SELECT next_day(date('2024-01-01'), 'NOT_A_DAY')
+   ```
+
+   When the underlying bug is fixed, the `ignore(...)` is removed and
+   the test starts running. This is also the contract documented in
+   `docs/source/contributor-guide/sql-file-tests.md`.
+
+### Support-level consistency: apply fixes automatically
+
+Apply every finding from the Step 5 consistency audit in the same PR.
+These are mechanical edits with no behavioural impact, so they do not
+need user approval. The classes of fix are:
+
+- Extract a duplicated or drifting reason string into a `private val`
+  and reference it from both `getSupportLevel` and `get*Reasons()`.
+- Switch `Incompatible(None)` to `Incompatible(Some(reason))` so the
+  reason reaches EXPLAIN output, not only the docs.
+- Add a missing `get*Reasons()` override that enumerates every
+  reason the support level can return.
+- Move a compatibility check out of `convert` and into
+  `getSupportLevel` so the dispatcher handles `allowIncompatible`
+  uniformly (the `withInfo` call in `convert` should disappear).
+- Hoist a reason shared by multiple serdes (e.g. a recurring
+  TimestampNTZ caveat) into a small `private object` companion in the
+  same file, mirroring `UTCTimestampSerde`.
+
+Each fix is one of these patterns. If a finding requires a semantics
+decision (e.g. is a specific branch really `Unsupported`, or is it
+`Incompatible`?), do not guess: leave it as a prose recommendation in
+the PR description and call it out for the reviewer.
+
+After every fix, build the affected module to make sure the edit
+compiles. Do not run the full suite; targeted tests suffice if the
+fix could plausibly affect behaviour.
+
+### Missing test coverage (non-bug): ask the user
+
+This is the only Step 7 category that pauses for user input. Adding
+tests for cases that already work is incremental polish, not part of
+the audit's required output.
+
+> Spark exercises the following cases that have no Comet test. Would
+> you like me to add them?
 >
 > - [list each missing test case]
 >
 > I can add them as Comet SQL Tests in `spark/src/test/resources/sql-tests/expressions/<category>/$ARGUMENTS.sql`
 > (or as Comet Scala Tests in `CometExpressionSuite` for cases that require programmatic setup).
 
-**Support-level consistency:**
-
-> I also found the following alignment issues between `getSupportLevel`,
-> `getIncompatibleReasons`, `getUnsupportedReasons`, and `convert`. Would
-> you like me to fix them?
->
-> - [list each finding from the Step 5 consistency audit, with the file and serde object name]
->
-> Each fix is a small, mechanical edit (extract a `private val` for the
-> reason, switch `Incompatible(None)` to `Incompatible(Some(reason))`,
-> add a missing `get*Reasons()` override, or move a check from `convert`
-> into `getSupportLevel`).
-
-If the user says yes to tests, implement them following the Comet SQL Tests
-format described in `docs/source/contributor-guide/sql-file-tests.md`.
-Prefer Comet SQL Tests over Comet Scala Tests.
+If the user says yes, implement the tests following the format described
+in `docs/source/contributor-guide/sql-file-tests.md`. Prefer Comet SQL
+Tests over Comet Scala Tests.
 
 ### Comet SQL Tests template
 

From 5e9ccc34324bffcbc8aca5c3c3b9f038e9a52453 Mon Sep 17 00:00:00 2001
From: Andy Grove <agrove@apache.org>
Date: Wed, 27 May 2026 15:38:53 -0600
Subject: [PATCH 4/4] docs: promote high-risk untested cases to High priority
 in audit skill
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Address PR review feedback (#4447): missing tests can mask correctness
divergences. Split the Step 6 'missing test coverage' bucket — gaps on
known-fragile axes (overflow, NULL handling, timezones, ANSI mode, leap
years, version-specific semantics) are now High priority and captured
as tests automatically; only low-risk gaps on well-exercised paths stay
Medium priority and require user approval.

Step 7's 'Correctness divergences: capture as tests' is renamed to
'High-priority findings: capture as tests' and now covers both confirmed
divergences and high-risk untested cases (run the case manually first,
then commit either a regression test or a query ignore(<url>) test).
---
 .../skills/audit-comet-expression/SKILL.md    | 52 +++++++++++++------
 1 file changed, 36 insertions(+), 16 deletions(-)

diff --git a/.claude/skills/audit-comet-expression/SKILL.md b/.claude/skills/audit-comet-expression/SKILL.md
index 46b2398b30..3e3b35fa88 100644
--- a/.claude/skills/audit-comet-expression/SKILL.md
+++ b/.claude/skills/audit-comet-expression/SKILL.md
@@ -396,16 +396,27 @@ finding for Step 6.
 
 Summarize findings as a prioritized list.
 
-### High priority: correctness divergences
+### High priority: correctness divergences and high-risk coverage gaps
 
 Cases where Comet produces a different observable result from Spark
 (wrong value, missing exception, accepted-instead-of-rejected input,
-etc.). Each one becomes a captured test in Step 7.
+etc.), **and** untested cases on a known-fragile axis where a
+divergence is plausible: overflow / out-of-range inputs, NULL handling,
+timezones and DST transitions, ANSI mode, leap years, version-specific
+Spark semantics, and any edge case explicitly called out in Step 1.
+
+Treat an untested high-risk case as a likely divergence until proven
+otherwise. Each item in this bucket becomes a captured test in Step 7.
+For an untested case, run it manually first to determine the current
+behaviour, then commit either a regression test (passes) or a
+`query ignore(<issue-url>)` test (fails).
 
 ### Medium priority: missing test coverage
 
-Edge cases Spark exercises but Comet does not test, where the behaviour
-appears to match but is not verified.
+Low-risk coverage gaps: additional input permutations on already-tested
+code paths, redundant happy-path values, or cases where Comet and Spark
+share a well-exercised implementation. The behaviour appears to match
+and the axis is not on the high-risk list above.
 
 ### Low priority: cosmetic and consistency issues
 
@@ -416,17 +427,23 @@ etc. These come from the Step 5 consistency audit.
 
 ## Step 7: Apply Findings (Tests and Fixes), Then Offer the Rest
 
-Correctness divergences and consistency issues from Step 5 / Step 6 must
-not be left as prose. Apply them in the same PR as the audit. Only the
-"missing test coverage" bucket requires the user's go-ahead, because
-adding new tests for cases that already work is incremental polish.
+High-priority findings (correctness divergences and high-risk coverage
+gaps) and consistency issues from Step 5 / Step 6 must not be left as
+prose. Apply them in the same PR as the audit. Only low-risk missing
+coverage requires the user's go-ahead, because adding tests for cases
+that already work on well-exercised paths is incremental polish.
 
-### Correctness divergences: capture as tests
+### High-priority findings: capture as tests
 
-Every divergence the audit identified becomes a regression test in the
-same PR as the audit, so future readers can run it.
+Every high-priority finding becomes a regression test in the same PR as
+the audit, so future readers can run it. This covers both confirmed
+divergences and high-risk untested cases. For the latter, run the case
+manually first to determine whether it currently passes or fails, then
+follow the same workflow below: a passing case is committed as a plain
+`query` regression test; a failing case is committed as
+`query ignore(<issue-url>)` after filing the bug.
 
-For each correctness finding, do the following in this order:
+For each high-priority finding, do the following in this order:
 
 1. **Search for an existing tracking issue.**
 
@@ -492,11 +509,14 @@ After every fix, build the affected module to make sure the edit
 compiles. Do not run the full suite; targeted tests suffice if the
 fix could plausibly affect behaviour.
 
-### Missing test coverage (non-bug): ask the user
+### Low-risk missing test coverage: ask the user
 
-This is the only Step 7 category that pauses for user input. Adding
-tests for cases that already work is incremental polish, not part of
-the audit's required output.
+This is the only Step 7 category that pauses for user input. It only
+covers the Medium-priority bucket from Step 6: low-risk gaps on
+well-exercised code paths. High-risk untested cases belong above, in
+"High-priority findings: capture as tests". Adding tests for cases
+that already work on well-exercised paths is incremental polish, not
+part of the audit's required output.
 
 > Spark exercises the following cases that have no Comet test. Would
 > you like me to add them?