Skip to content

JVM codegen dispatcher miscompiles map-typed (MapType) output #4539

@andygrove

Description

@andygrove

Describe the bug

Routing a map-typed expression through the JVM codegen dispatcher (CometScalaUDF.emitJvmCodegenDispatch / CometCodegenDispatch) produces incorrect results: map keys (and likely values) are corrupted in the output MapType array.

CometBatchKernelCodegen.canHandle accepts MapType (isSupportedDataType returns true for maps), so the dispatcher emits a kernel for the expression, but the kernel does not marshal map output correctly back through Arrow FFI.

To Reproduce

Routing map_concat through CometCodegenDispatch and running:

SELECT map_concat(map(1, 'a', 2, 'b'), map(3, 'c'))

produces:

Spark : Map(1 -> a, 2 -> b, 3 -> c)
Comet : Map(1 -> a, 2 -> b, 0 -> c)     -- key 3 corrupted to 0

The query is executed natively (a CometProject is produced, no fallback), so this is a wrong-answer bug, not a fallback. A map built from a column (e.g. map_concat(map(id, 'a'), map(id + 10, 'b'))) happened to come out correct in the same test, so the corruption appears tied to how literal / certain map entries are marshaled.

Expected behavior

The codegen dispatcher should either evaluate map-typed output identically to Spark, or canHandle should reject MapType output so such expressions fall back cleanly instead of returning wrong results.

Impact

This blocks routing any map-output expression through the dispatcher. Affected expressions (currently kept on the fallback / native path to avoid the bug):

  • map_concat
  • map / create_map
  • map_from_entries (the Incompatible BinaryType key/value branch)

In #4538 these opt out of codegen dispatch (allowIncompatCodegenDispatch = false for map_from_entries; map_concat / create_map are not registered) so the bug is not currently user-visible, but it prevents arrow-native-via-dispatch coverage for map functions and is a latent correctness hazard for any future map-output dispatch.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions