Skip to content

CreateArray with nullability-divergent children panics in native make_array #4528

@schenksj

Description

@schenksj

Problem

DataFusion's make_array asserts strict element-type equality in MutableArrayData::with_capacities and panics on a mismatch. Spark's CreateArray is more permissive: its type coercion compares element types with sameType, which ignores nullability, so children that share a surface type but differ only in a nested struct field's nullability get no unifying cast. Native execution then panics inside make_array_inner:

native panic: assertion `left == right` failed: Arrays with inconsistent types passed to MutableArrayData

Reproducible with array(struct(id, ct = lit("a")), struct(id, ct = <nullable expr>)) — one arm's ct is non-nullable (literal), the other nullable.

Scope (narrower than "any nullability difference")

DataFusion tolerates container nullability differences — an ArrayType.containsNull / MapType.valueContainsNull mismatch is coerced and runs natively (e.g. array(array(int) non-null-elem, array(int) nullable-elem)). Only a struct field's nullability actually panics. So the decline must not fire on the container-only case, or it over-declines and loses native execution for legitimate arrays-of-arrays/maps.

Proposed fix

In CometCreateArray, decline serialization (withFallbackReason) only when the children still differ after normalizing container nullability (force ArrayType.containsNull / MapType.valueContainsNull to true, keep struct field nullability significant). Those fall back to Spark's JVM evaluator, which has no such strictness.

This tracks upstream apache/datafusion#22366; the caller-side decline can be removed once that fix lands.

Relationship to the Delta integration

Standalone guard against a native panic. It is surfaced by the in-progress Delta Lake contrib integration (Delta's CDC write path builds one struct per change type, leaving _change_type nullability divergent across arms), so it would help to prioritize it alongside that work.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions