Describe the bug
Routing a map-typed expression through the JVM codegen dispatcher (CometScalaUDF.emitJvmCodegenDispatch / CometCodegenDispatch) produces incorrect results: map keys (and likely values) are corrupted in the output MapType array.
CometBatchKernelCodegen.canHandle accepts MapType (isSupportedDataType returns true for maps), so the dispatcher emits a kernel for the expression, but the kernel does not marshal map output correctly back through Arrow FFI.
To Reproduce
Routing map_concat through CometCodegenDispatch and running:
SELECT map_concat(map(1, 'a', 2, 'b'), map(3, 'c'))
produces:
Spark : Map(1 -> a, 2 -> b, 3 -> c)
Comet : Map(1 -> a, 2 -> b, 0 -> c) -- key 3 corrupted to 0
The query is executed natively (a CometProject is produced, no fallback), so this is a wrong-answer bug, not a fallback. A map built from a column (e.g. map_concat(map(id, 'a'), map(id + 10, 'b'))) happened to come out correct in the same test, so the corruption appears tied to how literal / certain map entries are marshaled.
Expected behavior
The codegen dispatcher should either evaluate map-typed output identically to Spark, or canHandle should reject MapType output so such expressions fall back cleanly instead of returning wrong results.
Impact
This blocks routing any map-output expression through the dispatcher. Affected expressions (currently kept on the fallback / native path to avoid the bug):
map_concat
map / create_map
map_from_entries (the Incompatible BinaryType key/value branch)
In #4538 these opt out of codegen dispatch (allowIncompatCodegenDispatch = false for map_from_entries; map_concat / create_map are not registered) so the bug is not currently user-visible, but it prevents arrow-native-via-dispatch coverage for map functions and is a latent correctness hazard for any future map-output dispatch.
Additional context
Describe the bug
Routing a map-typed expression through the JVM codegen dispatcher (
CometScalaUDF.emitJvmCodegenDispatch/CometCodegenDispatch) produces incorrect results: map keys (and likely values) are corrupted in the outputMapTypearray.CometBatchKernelCodegen.canHandleacceptsMapType(isSupportedDataTypereturns true for maps), so the dispatcher emits a kernel for the expression, but the kernel does not marshal map output correctly back through Arrow FFI.To Reproduce
Routing
map_concatthroughCometCodegenDispatchand running:produces:
The query is executed natively (a
CometProjectis produced, no fallback), so this is a wrong-answer bug, not a fallback. A map built from a column (e.g.map_concat(map(id, 'a'), map(id + 10, 'b'))) happened to come out correct in the same test, so the corruption appears tied to how literal / certain map entries are marshaled.Expected behavior
The codegen dispatcher should either evaluate map-typed output identically to Spark, or
canHandleshould rejectMapTypeoutput so such expressions fall back cleanly instead of returning wrong results.Impact
This blocks routing any map-output expression through the dispatcher. Affected expressions (currently kept on the fallback / native path to avoid the bug):
map_concatmap/create_mapmap_from_entries(theIncompatibleBinaryTypekey/value branch)In #4538 these opt out of codegen dispatch (
allowIncompatCodegenDispatch = falseformap_from_entries;map_concat/create_mapare not registered) so the bug is not currently user-visible, but it prevents arrow-native-via-dispatch coverage for map functions and is a latent correctness hazard for any future map-output dispatch.Additional context
spark/src/main/scala/org/apache/comet/codegen/CometBatchKernelCodegen.scala(isSupportedDataType/canHandle) and theJvmScalarUdfexecution path in native code.