Problem
DataFusion's make_array asserts strict element-type equality in MutableArrayData::with_capacities and panics on a mismatch. Spark's CreateArray is more permissive: its type coercion compares element types with sameType, which ignores nullability, so children that share a surface type but differ only in a nested struct field's nullability get no unifying cast. Native execution then panics inside make_array_inner:
native panic: assertion `left == right` failed: Arrays with inconsistent types passed to MutableArrayData
Reproducible with array(struct(id, ct = lit("a")), struct(id, ct = <nullable expr>)) — one arm's ct is non-nullable (literal), the other nullable.
Scope (narrower than "any nullability difference")
DataFusion tolerates container nullability differences — an ArrayType.containsNull / MapType.valueContainsNull mismatch is coerced and runs natively (e.g. array(array(int) non-null-elem, array(int) nullable-elem)). Only a struct field's nullability actually panics. So the decline must not fire on the container-only case, or it over-declines and loses native execution for legitimate arrays-of-arrays/maps.
Proposed fix
In CometCreateArray, decline serialization (withFallbackReason) only when the children still differ after normalizing container nullability (force ArrayType.containsNull / MapType.valueContainsNull to true, keep struct field nullability significant). Those fall back to Spark's JVM evaluator, which has no such strictness.
This tracks upstream apache/datafusion#22366; the caller-side decline can be removed once that fix lands.
Relationship to the Delta integration
Standalone guard against a native panic. It is surfaced by the in-progress Delta Lake contrib integration (Delta's CDC write path builds one struct per change type, leaving _change_type nullability divergent across arms), so it would help to prioritize it alongside that work.
Problem
DataFusion's
make_arrayasserts strict element-type equality inMutableArrayData::with_capacitiesand panics on a mismatch. Spark'sCreateArrayis more permissive: its type coercion compares element types withsameType, which ignores nullability, so children that share a surface type but differ only in a nested struct field's nullability get no unifying cast. Native execution then panics insidemake_array_inner:Reproducible with
array(struct(id, ct = lit("a")), struct(id, ct = <nullable expr>))— one arm'sctis non-nullable (literal), the other nullable.Scope (narrower than "any nullability difference")
DataFusion tolerates container nullability differences — an
ArrayType.containsNull/MapType.valueContainsNullmismatch is coerced and runs natively (e.g.array(array(int) non-null-elem, array(int) nullable-elem)). Only a struct field's nullability actually panics. So the decline must not fire on the container-only case, or it over-declines and loses native execution for legitimate arrays-of-arrays/maps.Proposed fix
In
CometCreateArray, decline serialization (withFallbackReason) only when the children still differ after normalizing container nullability (forceArrayType.containsNull/MapType.valueContainsNulltotrue, keep struct field nullability significant). Those fall back to Spark's JVM evaluator, which has no such strictness.This tracks upstream apache/datafusion#22366; the caller-side decline can be removed once that fix lands.
Relationship to the Delta integration
Standalone guard against a native panic. It is surfaced by the in-progress Delta Lake contrib integration (Delta's CDC write path builds one struct per change type, leaving
_change_typenullability divergent across arms), so it would help to prioritize it alongside that work.