Skip to content

Bug triage results: 2026-06-01 #4548

@andygrove

Description

@andygrove

Triage pass for issues labeled requires-triage.

  • Date: 2026-06-01
  • Issues processed: 48 (42 triaged, 6 skipped, 0 failed)
  • Priority counts applied: priority:critical 11, priority:high 5, priority:medium 19, priority:low 7
  • Guide: docs/source/contributor-guide/bug_triage.md

Labels have already been applied and requires-triage removed from the triaged issues. Please spot-check the calls below and close this issue when satisfied. Correct any label directly on the affected issue.

Triaged

priority:critical

  • JVM codegen dispatcher miscompiles map-typed (MapType) output (#4539)
    • Area labels: area:expressions, area:ffi
    • Rationale: silent wrong result (map key corrupted, runs natively with no fallback); decision-tree step 1.
  • [Bug] replace returns wrong result for empty-string search (#4497)
    • Area labels: area:expressions
    • Rationale: silent wrong result vs Spark for empty search string.
  • [Bug] CAST(complex AS STRING) does not honour spark.sql.legacy.castComplexTypesToString.enabled (#4492)
    • Area labels: area:expressions
    • Rationale: ignores a config and produces wrong cast output (guide lists config-ignoring as critical).
  • [Bug] array_max and array_min disagree with Spark on NaN ordering (#4482)
    • Area labels: area:expressions
    • Rationale: silent wrong result for NaN-containing arrays.
  • [Bug] array_distinct / array_union / array_except do not canonicalize NaN like Spark (#4481)
    • Area labels: area:expressions
    • Rationale: silent wrong result for NaN / signed-zero elements.
  • [Bug] str_to_map does not honour Spark 4.1.1 legacy.truncateForEmptyRegexSplit (#4477)
    • Area labels: area:expressions
    • Rationale: ignores a Spark 4.1.1 config, silently diverging when it is set.
  • [Bug] decode ignores Spark 4.0 legacyCharsets and legacyErrorAction flags (#4465)
    • Area labels: area:expressions
    • Rationale: returns NULL where Spark substitutes or raises, a silent divergence in default and legacy modes.
  • [Bug] translate uses graphemes vs Spark code points and ignores U+0000 deletion (#4463)
    • Area labels: area:expressions
    • Rationale: silent wrong result for combining marks and NUL-deletion semantics.
  • [Bug] make_date does not throw under spark.sql.ansi.enabled=true (#4451)
    • Area labels: area:expressions
    • Rationale: returns NULL instead of the Spark ANSI error, a silent divergence when ANSI is on.
  • [Bug] next_day trims whitespace from dayOfWeek; Spark does not (#4450)
    • Area labels: area:expressions
    • Rationale: returns a date where Spark returns NULL, an unconditional silent wrong result.
  • [Bug] next_day does not throw under spark.sql.ansi.enabled=true (#4449)
    • Area labels: area:expressions
    • Rationale: returns NULL instead of the Spark ANSI error, a silent divergence when ANSI is on.

priority:high

  • CreateArray with nullability-divergent children panics in native make_array (#4528)
    • Area labels: area:expressions
    • Rationale: native panic (assertion failure in make_array); decision-tree step 2.
  • ConstantColumnVector inputs fail Comet export with "Comet execution only takes Arrow Arrays" (#4527)
    • Area labels: area:ffi
    • Rationale: unhandled exception on a supported path (partition / constant columns, e.g. OPTIMIZE).
  • native shuffle: get_string should not panic on non-UTF-8 bytes (use lossy decode) (#4521)
    • Area labels: area:shuffle
    • Rationale: native panic in shuffle on non-UTF-8 string bytes.
  • CometScanRule: decline native V1 scans on object_store-unsupported filesystem schemes (#4520)
    • Area labels: area:scan
    • Rationale: native scan crashes at execution on custom filesystem schemes instead of falling back.
  • [Bug] CAST(BinaryType AS StringType) uses unsafe from_utf8_unchecked (undefined behaviour) (#4488)
    • Area labels: area:expressions
    • Rationale: Rust undefined behaviour / memory-safety risk on the cast path (see escalation note).

priority:medium

  • Native scan file-read failures should surface as Spark's FAILED_READ_FILE.NO_HINT (#4529)
    • Area labels: area:scan
    • Rationale: error-compatibility gap (raw native message and missing path) with a fallback workaround.
  • Deep AND/OR predicate chains overflow protobuf recursion limit when the serialized plan is re-parsed (#4526)
    • Area labels: area:expressions
    • Rationale: query fails on deep chains, but the trigger (>100 operands) is uncommon and degrades to a clean error.
  • Revert transition-heavy stages to Spark row-based execution (#4518)
    • Area labels: none
    • Rationale: performance optimization for stages that accumulate many C2R/R2C transitions.
  • Native divide-by-zero in a dispatched ScalaUDF surfaces CometNativeException instead of SparkArithmeticException (#4517)
    • Area labels: area:expressions
    • Rationale: wrong exception class under ANSI (errors either way, only the surface differs).
  • CometProject and CometHashAggregate do not perform cross-sibling subexpression elimination over ScalaUDF (#4516)
    • Area labels: area:expressions, area:aggregation
    • Rationale: result correct but UDF invoked N times instead of once, a performance gap for UDF-heavy queries.
  • DataFusion / DataFusion-Spark functions whose Arrow return type drifts from Spark catalyst's declared type (#4515)
    • Area labels: area:ffi, area:expressions
    • Rationale: latent type-drift (masked by FFI re-stamping today) that errors when FFI hops are reduced.
  • map expression audit follow-ups (from chore(audit): audit map expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 #4478) (#4505)
    • Area labels: area:expressions
    • Rationale: deferred audit follow-up tracker, mostly support-level / serde consistency work (see escalation note).
  • collection expression audit follow-ups (from chore(audit): audit collection expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 #4473) (#4504)
    • Area labels: area:expressions
    • Rationale: deferred audit follow-up tracker, mostly support-level / serde consistency work (see escalation note).
  • array expression audit follow-ups (from chore(audit): audit remaining array expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 #4483) (#4503)
    • Area labels: area:expressions
    • Rationale: deferred audit follow-up tracker, mostly support-level / serde consistency work (see escalation note).
  • date/time expression audit follow-ups (from chore: audit date/time expressions #4448) (#4502)
    • Area labels: area:expressions
    • Rationale: deferred audit follow-up tracker, mostly support-level / serde consistency work (see escalation note).
  • cast expression audit follow-ups (from chore(audit): audit cast across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 #4493) (#4501)
    • Area labels: area:expressions
    • Rationale: deferred audit follow-up tracker, mostly support-level / serde consistency work (see escalation note).
  • Math expression audit follow-ups (from chore(audit): audit math expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 #4486) (#4500)
    • Area labels: area:expressions
    • Rationale: deferred audit follow-up tracker, mostly support-level / serde consistency work (see escalation note).
  • [Feature] CAST(MapType AS MapType) falls back even though native cast_map_to_map exists (#4491)
    • Area labels: area:expressions
    • Rationale: missing cast support, falls back to Spark (correct but unaccelerated).
  • [Bug] try_mod falls back to Spark because CometRemainder rejects EvalMode.TRY (#4484)
    • Area labels: area:expressions
    • Rationale: feature gap, falls back to Spark; result correct via fallback.
  • [Feature] support size() for MapType inputs (#4472)
    • Area labels: area:expressions
    • Rationale: missing expression support with a Spark fallback.
  • [Feature] support concat() for BinaryType and ArrayType inputs (#4471)
    • Area labels: area:expressions
    • Rationale: missing expression support with a Spark fallback.
  • [Bug] CometCaseConversionBase gates compat inside convert() instead of getSupportLevel (#4467)
    • Area labels: area:expressions
    • Rationale: the allowIncompatible config is bypassed for upper/lower, a functional config bug.
  • [Bug] bit_length and octet_length error natively for BinaryType input instead of falling back (#4464)
    • Area labels: area:expressions
    • Rationale: native execution error on binary input instead of a clean fallback; uncommon input, workaround exists.
  • Bound CometS3CredentialDispatcher cache via refcounted handle lifecycle (#4456)
    • Area labels: area:scan
    • Rationale: unbounded cache growth on long-running JVMs (eventual OOM), a conditional degradation.

priority:low

  • CI lint check passed, but then later jobs failed with lint errors (#4545)
    • Area labels: area:ci
    • Rationale: CI/tooling lint inconsistency (see escalation note).
  • PlanDataInjector does N x M canInject calls per operator tree (#4530)
    • Area labels: none
    • Rationale: minor micro-optimization, explicitly no behavior change.
  • Do another audit sweep for string collation differences (#4496)
    • Area labels: area:expressions
    • Rationale: process / tooling task (audit sweep), no concrete defect identified.
  • [Doc] CAST has no explicit TimeType branch (Spark 4.1) (#4490)
    • Area labels: area:expressions
    • Rationale: documentation / support-level gap; the fallback itself is correct.
  • [Doc] CAST collated-string handling on Spark 4.0+ is implicit and untested (#4489)
    • Area labels: area:expressions
    • Rationale: documentation / test gap; current fallback behavior is correct.
  • [Bug] width_bucket bypasses CometExpressionSerde framework (#4485)
    • Area labels: area:expressions
    • Rationale: serde-framework consistency refactor; no wrong result or crash.
  • [Doc] decode does not appear in auto-generated compatibility docs (#4466)
    • Area labels: area:expressions
    • Rationale: documentation gap (decode wired via shim, not a serde).

Escalations to consider

  • [Bug] CAST(BinaryType AS StringType) uses unsafe from_utf8_unchecked (undefined behaviour) (#4488)
    • Labeled priority:high for memory safety. Per the guide's "high crash that also produces wrong results silently" trigger, undefined behaviour that could silently corrupt output may warrant priority:critical.
  • CI lint check passed, but then later jobs failed with lint errors (#4545)
    • Labeled priority:low. Per the guide, a CI issue that consistently blocks PR merges should escalate to priority:medium.
  • Audit follow-up trackers (#4505, #4504, #4503, #4502, #4501, #4500)
    • Each bundles many sub-items of mixed severity, including Spark 4.0+ non-default-collation correctness gaps that silently diverge. Labeled priority:medium as trackers; the reviewer may want to split the collation sub-items into standalone priority:critical issues.

Skipped — needs more info

  • [EPIC] Support Spark interval types (CalendarInterval / YearMonthInterval / DayTimeInterval) and interval expressions (#4540)
    • Open-ended EPIC umbrella; a single priority is a roadmap decision rather than a mechanical triage call.
  • [EPIC] Provide JVM/codegen-dispatch implementations for Incompatible expressions so they never fall back by default (#4506)
    • Open-ended EPIC umbrella; a single priority is a roadmap decision rather than a mechanical triage call.
  • Discussion: Should Comet add geospatial (ST_*) function support? (#4455)
    • Discussion / scope question needing community and maintainer input, not a triageable defect.
  • Bug triage results: 2026-05-26 (#4441)
    • Prior triage summary issue (auto-labeled requires-triage); meta, awaiting human review and closure, not a bug.
  • Bug triage results: 2026-05-18 (#4359)
    • Prior triage summary issue (auto-labeled requires-triage); meta, awaiting human review and closure, not a bug.
  • Bug triage results: 2026-05-11 (#4287)
    • Prior triage summary issue (auto-labeled requires-triage); meta, awaiting human review and closure, not a bug.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions