Triage pass for issues labeled requires-triage.
Date: 2026-06-01
Issues processed: 48 (42 triaged, 6 skipped, 0 failed)
Priority counts applied: priority:critical 11, priority:high 5, priority:medium 19, priority:low 7
Guide: docs/source/contributor-guide/bug_triage.md
Labels have already been applied and requires-triage removed from the triaged issues. Please spot-check the calls below and close this issue when satisfied. Correct any label directly on the affected issue.
Triaged
priority:critical
JVM codegen dispatcher miscompiles map-typed (MapType) output (#4539 )
Area labels: area:expressions, area:ffi
Rationale: silent wrong result (map key corrupted, runs natively with no fallback); decision-tree step 1.
[Bug] replace returns wrong result for empty-string search (#4497 )
Area labels: area:expressions
Rationale: silent wrong result vs Spark for empty search string.
[Bug] CAST(complex AS STRING) does not honour spark.sql.legacy.castComplexTypesToString.enabled (#4492 )
Area labels: area:expressions
Rationale: ignores a config and produces wrong cast output (guide lists config-ignoring as critical).
[Bug] array_max and array_min disagree with Spark on NaN ordering (#4482 )
Area labels: area:expressions
Rationale: silent wrong result for NaN-containing arrays.
[Bug] array_distinct / array_union / array_except do not canonicalize NaN like Spark (#4481 )
Area labels: area:expressions
Rationale: silent wrong result for NaN / signed-zero elements.
[Bug] str_to_map does not honour Spark 4.1.1 legacy.truncateForEmptyRegexSplit (#4477 )
Area labels: area:expressions
Rationale: ignores a Spark 4.1.1 config, silently diverging when it is set.
[Bug] decode ignores Spark 4.0 legacyCharsets and legacyErrorAction flags (#4465 )
Area labels: area:expressions
Rationale: returns NULL where Spark substitutes or raises, a silent divergence in default and legacy modes.
[Bug] translate uses graphemes vs Spark code points and ignores U+0000 deletion (#4463 )
Area labels: area:expressions
Rationale: silent wrong result for combining marks and NUL-deletion semantics.
[Bug] make_date does not throw under spark.sql.ansi.enabled=true (#4451 )
Area labels: area:expressions
Rationale: returns NULL instead of the Spark ANSI error, a silent divergence when ANSI is on.
[Bug] next_day trims whitespace from dayOfWeek; Spark does not (#4450 )
Area labels: area:expressions
Rationale: returns a date where Spark returns NULL, an unconditional silent wrong result.
[Bug] next_day does not throw under spark.sql.ansi.enabled=true (#4449 )
Area labels: area:expressions
Rationale: returns NULL instead of the Spark ANSI error, a silent divergence when ANSI is on.
priority:high
CreateArray with nullability-divergent children panics in native make_array (#4528 )
Area labels: area:expressions
Rationale: native panic (assertion failure in make_array); decision-tree step 2.
ConstantColumnVector inputs fail Comet export with "Comet execution only takes Arrow Arrays" (#4527 )
Area labels: area:ffi
Rationale: unhandled exception on a supported path (partition / constant columns, e.g. OPTIMIZE).
native shuffle: get_string should not panic on non-UTF-8 bytes (use lossy decode) (#4521 )
Area labels: area:shuffle
Rationale: native panic in shuffle on non-UTF-8 string bytes.
CometScanRule: decline native V1 scans on object_store-unsupported filesystem schemes (#4520 )
Area labels: area:scan
Rationale: native scan crashes at execution on custom filesystem schemes instead of falling back.
[Bug] CAST(BinaryType AS StringType) uses unsafe from_utf8_unchecked (undefined behaviour) (#4488 )
Area labels: area:expressions
Rationale: Rust undefined behaviour / memory-safety risk on the cast path (see escalation note).
priority:medium
Native scan file-read failures should surface as Spark's FAILED_READ_FILE.NO_HINT (#4529 )
Area labels: area:scan
Rationale: error-compatibility gap (raw native message and missing path) with a fallback workaround.
Deep AND/OR predicate chains overflow protobuf recursion limit when the serialized plan is re-parsed (#4526 )
Area labels: area:expressions
Rationale: query fails on deep chains, but the trigger (>100 operands) is uncommon and degrades to a clean error.
Revert transition-heavy stages to Spark row-based execution (#4518 )
Area labels: none
Rationale: performance optimization for stages that accumulate many C2R/R2C transitions.
Native divide-by-zero in a dispatched ScalaUDF surfaces CometNativeException instead of SparkArithmeticException (#4517 )
Area labels: area:expressions
Rationale: wrong exception class under ANSI (errors either way, only the surface differs).
CometProject and CometHashAggregate do not perform cross-sibling subexpression elimination over ScalaUDF (#4516 )
Area labels: area:expressions, area:aggregation
Rationale: result correct but UDF invoked N times instead of once, a performance gap for UDF-heavy queries.
DataFusion / DataFusion-Spark functions whose Arrow return type drifts from Spark catalyst's declared type (#4515 )
Area labels: area:ffi, area:expressions
Rationale: latent type-drift (masked by FFI re-stamping today) that errors when FFI hops are reduced.
map expression audit follow-ups (from chore(audit): audit map expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 #4478 ) (#4505 )
Area labels: area:expressions
Rationale: deferred audit follow-up tracker, mostly support-level / serde consistency work (see escalation note).
collection expression audit follow-ups (from chore(audit): audit collection expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 #4473 ) (#4504 )
Area labels: area:expressions
Rationale: deferred audit follow-up tracker, mostly support-level / serde consistency work (see escalation note).
array expression audit follow-ups (from chore(audit): audit remaining array expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 #4483 ) (#4503 )
Area labels: area:expressions
Rationale: deferred audit follow-up tracker, mostly support-level / serde consistency work (see escalation note).
date/time expression audit follow-ups (from chore: audit date/time expressions #4448 ) (#4502 )
Area labels: area:expressions
Rationale: deferred audit follow-up tracker, mostly support-level / serde consistency work (see escalation note).
cast expression audit follow-ups (from chore(audit): audit cast across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 #4493 ) (#4501 )
Area labels: area:expressions
Rationale: deferred audit follow-up tracker, mostly support-level / serde consistency work (see escalation note).
Math expression audit follow-ups (from chore(audit): audit math expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 #4486 ) (#4500 )
Area labels: area:expressions
Rationale: deferred audit follow-up tracker, mostly support-level / serde consistency work (see escalation note).
[Feature] CAST(MapType AS MapType) falls back even though native cast_map_to_map exists (#4491 )
Area labels: area:expressions
Rationale: missing cast support, falls back to Spark (correct but unaccelerated).
[Bug] try_mod falls back to Spark because CometRemainder rejects EvalMode.TRY (#4484 )
Area labels: area:expressions
Rationale: feature gap, falls back to Spark; result correct via fallback.
[Feature] support size() for MapType inputs (#4472 )
Area labels: area:expressions
Rationale: missing expression support with a Spark fallback.
[Feature] support concat() for BinaryType and ArrayType inputs (#4471 )
Area labels: area:expressions
Rationale: missing expression support with a Spark fallback.
[Bug] CometCaseConversionBase gates compat inside convert() instead of getSupportLevel (#4467 )
Area labels: area:expressions
Rationale: the allowIncompatible config is bypassed for upper/lower, a functional config bug.
[Bug] bit_length and octet_length error natively for BinaryType input instead of falling back (#4464 )
Area labels: area:expressions
Rationale: native execution error on binary input instead of a clean fallback; uncommon input, workaround exists.
Bound CometS3CredentialDispatcher cache via refcounted handle lifecycle (#4456 )
Area labels: area:scan
Rationale: unbounded cache growth on long-running JVMs (eventual OOM), a conditional degradation.
priority:low
CI lint check passed, but then later jobs failed with lint errors (#4545 )
Area labels: area:ci
Rationale: CI/tooling lint inconsistency (see escalation note).
PlanDataInjector does N x M canInject calls per operator tree (#4530 )
Area labels: none
Rationale: minor micro-optimization, explicitly no behavior change.
Do another audit sweep for string collation differences (#4496 )
Area labels: area:expressions
Rationale: process / tooling task (audit sweep), no concrete defect identified.
[Doc] CAST has no explicit TimeType branch (Spark 4.1) (#4490 )
Area labels: area:expressions
Rationale: documentation / support-level gap; the fallback itself is correct.
[Doc] CAST collated-string handling on Spark 4.0+ is implicit and untested (#4489 )
Area labels: area:expressions
Rationale: documentation / test gap; current fallback behavior is correct.
[Bug] width_bucket bypasses CometExpressionSerde framework (#4485 )
Area labels: area:expressions
Rationale: serde-framework consistency refactor; no wrong result or crash.
[Doc] decode does not appear in auto-generated compatibility docs (#4466 )
Area labels: area:expressions
Rationale: documentation gap (decode wired via shim, not a serde).
Escalations to consider
[Bug] CAST(BinaryType AS StringType) uses unsafe from_utf8_unchecked (undefined behaviour) (#4488 )
Labeled priority:high for memory safety. Per the guide's "high crash that also produces wrong results silently" trigger, undefined behaviour that could silently corrupt output may warrant priority:critical.
CI lint check passed, but then later jobs failed with lint errors (#4545 )
Labeled priority:low. Per the guide, a CI issue that consistently blocks PR merges should escalate to priority:medium.
Audit follow-up trackers (#4505 , #4504 , #4503 , #4502 , #4501 , #4500 )
Each bundles many sub-items of mixed severity, including Spark 4.0+ non-default-collation correctness gaps that silently diverge. Labeled priority:medium as trackers; the reviewer may want to split the collation sub-items into standalone priority:critical issues.
Skipped — needs more info
[EPIC] Support Spark interval types (CalendarInterval / YearMonthInterval / DayTimeInterval) and interval expressions (#4540 )
Open-ended EPIC umbrella; a single priority is a roadmap decision rather than a mechanical triage call.
[EPIC] Provide JVM/codegen-dispatch implementations for Incompatible expressions so they never fall back by default (#4506 )
Open-ended EPIC umbrella; a single priority is a roadmap decision rather than a mechanical triage call.
Discussion: Should Comet add geospatial (ST_*) function support? (#4455 )
Discussion / scope question needing community and maintainer input, not a triageable defect.
Bug triage results: 2026-05-26 (#4441 )
Prior triage summary issue (auto-labeled requires-triage); meta, awaiting human review and closure, not a bug.
Bug triage results: 2026-05-18 (#4359 )
Prior triage summary issue (auto-labeled requires-triage); meta, awaiting human review and closure, not a bug.
Bug triage results: 2026-05-11 (#4287 )
Prior triage summary issue (auto-labeled requires-triage); meta, awaiting human review and closure, not a bug.
Triage pass for issues labeled
requires-triage.priority:critical11,priority:high5,priority:medium19,priority:low7Labels have already been applied and
requires-triageremoved from the triaged issues. Please spot-check the calls below and close this issue when satisfied. Correct any label directly on the affected issue.Triaged
priority:critical
area:expressions,area:ffiarea:expressionsarea:expressionsarea:expressionsarea:expressionsarea:expressionsarea:expressionsarea:expressionsarea:expressionsarea:expressionsarea:expressionspriority:high
area:expressionsarea:ffiarea:shufflearea:scanarea:expressionspriority:medium
area:scanarea:expressionsarea:expressionsarea:expressions,area:aggregationarea:ffi,area:expressionsarea:expressionsarea:expressionsarea:expressionsarea:expressionsarea:expressionsarea:expressionsarea:expressionsarea:expressionsarea:expressionsarea:expressionsarea:expressionsarea:expressionsarea:scanpriority:low
area:ciarea:expressionsarea:expressionsarea:expressionsarea:expressionsarea:expressionsEscalations to consider
priority:highfor memory safety. Per the guide's "high crash that also produces wrong results silently" trigger, undefined behaviour that could silently corrupt output may warrantpriority:critical.priority:low. Per the guide, a CI issue that consistently blocks PR merges should escalate topriority:medium.priority:mediumas trackers; the reviewer may want to split the collation sub-items into standalonepriority:criticalissues.Skipped — needs more info
requires-triage); meta, awaiting human review and closure, not a bug.requires-triage); meta, awaiting human review and closure, not a bug.requires-triage); meta, awaiting human review and closure, not a bug.