Skip to content

docs: require audit skill to file issues and add Spark 4.1.1 to version list#4468

Open
andygrove wants to merge 3 commits into
apache:mainfrom
andygrove:audit-skill-require-issues
Open

docs: require audit skill to file issues and add Spark 4.1.1 to version list#4468
andygrove wants to merge 3 commits into
apache:mainfrom
andygrove:audit-skill-require-issues

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #.

Rationale for this change

Two improvements to the audit-comet-expression skill surfaced by the string-expressions audit in #4461.

  1. The skill previously permitted leaving semantics-decision findings as prose recommendations in the PR description, on the assumption the reviewer would pick them up. In practice that note dies with the PR. Several higher-risk findings from the string audit (CometCaseConversionBase compat gating, StringRepeat negative-count divergence, translate grapheme semantics, bit_length/octet_length BinaryType native error, decode legacy flags) had to be filed retroactively as [Bug] repeat throws on negative count where Spark returns empty string #4462-[Bug] CometCaseConversionBase gates compat inside convert() instead of getSupportLevel #4467 because the previous skill version did not enforce filing an issue.
  2. The skill's Spark version list still only covered 3.4.3, 3.5.8, and 4.0.1. Spark 4.1.1 is now a tracked release in the project and should be diffed against alongside the others.

What changes are included in this PR?

  • Tighten Step 6 and Step 7 so that every high-priority finding either becomes an inline fix + test, or a filed GitHub issue + ignored regression test, before the audit PR is opened. Add a dedicated "Findings that need follow-up" subsection spelling out the workflow (search, file with correctness / documentation label, cross-reference from the support-doc sub-bullet and the PR description).
  • Add Spark v4.1.1 to every for tag in ... loop in Step 1 / Step 2, add a 4.0.1 → 4.1.1 row to the cross-version diff list, and update the descriptions, Step 8 sub-bullet template, and Step 5 output-format section to enumerate four Spark versions.

How are these changes tested?

Skill-only documentation change. The next per-category audit will exercise both new behaviours: it will clone Spark 4.1.1 alongside the existing versions, and any high-priority finding it cannot fix inline will be filed as a tracking issue rather than left as prose.

andygrove added 2 commits May 27, 2026 16:22
The audit-comet-expression skill previously permitted leaving
semantics-decision findings as prose recommendations in the PR
description, on the assumption the reviewer would pick them up.
In practice that note dies with the PR.

Tighten Step 6 and Step 7 so that every high-priority finding either
becomes an inline fix + test, or a filed GitHub issue + ignored
regression test, before the audit PR is opened. Add a dedicated
'Findings that need follow-up' subsection spelling out the workflow
(search, file, cross-reference from the support-doc sub-bullet and
the PR description).

Surfaced while re-running the string-expressions audit in apache#4461:
several higher-risk findings (CometCaseConversionBase compat gating,
StringRepeat negative-count divergence, translate grapheme semantics,
bit_length/octet_length BinaryType, decode legacy flags) were left as
prose only and had to be filed retroactively as apache#4462-apache#4467.
Spark 4.1.1 is now a tracked release in the project, so the audit
skill should pull and diff against it alongside 3.4.3, 3.5.8, and
4.0.1. Update every Step 1 / Step 2 / Step 5 reference and the
sub-bullet template at Step 8 accordingly.
The string-expressions audit (PR apache#4461) revealed six recurring
failure modes where the skill documented findings rather than acting
on them. Strengthen the consistency checklist and auto-fix list to
close these loopholes:

- Add checklist item 10: expression-shape restrictions (literal-only
  argument, child data type, etc.) must be declared in
  `getSupportLevel`, not gated inside `convert` with `withInfo`. Cite
  `CometLeft` / `CometRight` / `CometSubstring` as the canonical
  example.
- Add checklist item 11: Spark 4.0+ collation routing through
  `CollationSupport.X.exec` and `StringTypeWithCollation` means the
  expression is `Incompatible` for non-default collations. Link
  apache#4496 as the umbrella issue and reject "behaviour unchanged for
  `UTF8_BINARY`" as a justification for `Compatible`.
- Add checklist item 12: a sub-bullet that says "Known divergence" or
  "Known limitation" on a `Compatible` branch is a smell. The skill
  must promote the support level rather than documenting the
  divergence in prose only. Cite the `replace` empty-search-string
  case.
- Add checklist item 13: unreachable serde registrations (e.g. the
  `btrim` mapping for `StringTrimBoth`, which is rewritten by
  `RuntimeReplaceable` before serde runs) must be deleted, not
  catalogued.
- Add an issue-verification step to the reason-wording guidance and
  the follow-up-issue workflow. Every cited issue must be opened
  with `gh issue view` to confirm it exists, is open, and matches
  the divergence before the URL ships in a reason string or
  support-doc sub-bullet.

Add the matching auto-fix patterns to Step 7's "apply fixes
automatically" list so future audits resolve these inline rather than
filing them as prose follow-ups.
---
name: audit-comet-expression
description: Audit an existing Comet expression for correctness and test coverage. Studies the Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1, reviews the Comet and DataFusion implementations, identifies missing test coverage, and offers to implement additional tests.
description: Audit an existing Comet expression for correctness and test coverage. Studies the Spark implementation across versions 3.4.3, 3.5.8, 4.0.1, and 4.1.1, reviews the Comet and DataFusion implementations, identifies missing test coverage, and offers to implement additional tests.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is 4.0.2 now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants