Skip to content

[EPIC] Support Spark interval types (CalendarInterval / YearMonthInterval / DayTimeInterval) and interval expressions #4540

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Comet has no support for Spark's interval data types:

  • CalendarIntervalType (months + days + microseconds)
  • YearMonthIntervalType (ANSI INTERVAL YEAR TO MONTH)
  • DayTimeIntervalType (ANSI INTERVAL DAY TO SECOND)

Because the types are unsupported, every expression that produces or consumes an interval falls back to Spark, and any query carrying an interval column through a Comet operator falls back as well. CometBatchKernelCodegen.isSupportedDataType also rejects these types, so they cannot even be routed through the JVM codegen dispatcher (see #4506 / #4538): the interval expressions are a genuine arrow-native gap with no stopgap.

This issue tracks the foundational type support plus the dependent expression family. It is the prerequisite for the already-filed per-expression requests below.

Describe the potential solution

Type support (prerequisite)

  • Map the three Spark interval types to Arrow:
    • YearMonthIntervalType -> Arrow Interval(YearMonth)
    • DayTimeIntervalType -> Arrow Interval(MonthDayNano) / Duration (decide representation that round-trips with Spark's microsecond storage)
    • CalendarIntervalType -> Arrow Interval(MonthDayNano) (Spark stores months/days/micros)
  • Wire the types through the CometVector hierarchy, FFI import/export (NativeUtil / scan.rs), and serializeDataType in QueryPlanSerde.
  • Allow these types in CometBatchKernelCodegen.isSupportedDataType once the FFI path is correct, so codegen dispatch can also cover interval expressions.

Expressions (depend on the type work)

Constructors and arithmetic already tracked individually:

(The list of per-expression issues is derived from the // datetime functions section of FunctionRegistry; this umbrella should be linked from each.)

Additional context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions