Problem
Comet serializes And/Or as a left-deep BinaryExpr tree (one nesting level per operand). A query with many conjuncts/disjuncts -- e.g. a wide WHERE a AND b AND c AND ..., or generated data-skipping predicates -- produces a proto nested deeper than protobuf's default recursion limit (100). The overflow fires when the serialized Operator is re-parsed: on the JVM via OperatorOuterClass.Operator.parseFrom (e.g. findShuffleScanIndices, explain) and in the Rust prost decoder. The query fails even though Comet can evaluate it.
Why it's safe to rebalance
Comet evaluates And/Or vectorially -- both sides are always evaluated, with no row-level short-circuit -- so the chain is fully associative. Rebalancing the flattened operands into a depth-O(log n) tree is semantically identical; it only changes the proto's shape.
Proposed fix
Add QueryPlanSerde.flattenAssociative (flatten an associative chain to its leaves) and createBalancedBinaryExpr (rebuild as a balanced tree), and route CometAnd/CometOr through them.
Relationship to the Delta integration
This is a standalone correctness/robustness fix for any wide boolean predicate. It is surfaced by the in-progress Delta Lake contrib integration (Delta data-skipping builds deep conjunctions), so it would help to prioritize it alongside that work. A PR will follow shortly.
Problem
Comet serializes
And/Oras a left-deepBinaryExprtree (one nesting level per operand). A query with many conjuncts/disjuncts -- e.g. a wideWHERE a AND b AND c AND ..., or generated data-skipping predicates -- produces a proto nested deeper than protobuf's default recursion limit (100). The overflow fires when the serializedOperatoris re-parsed: on the JVM viaOperatorOuterClass.Operator.parseFrom(e.g.findShuffleScanIndices, explain) and in the Rustprostdecoder. The query fails even though Comet can evaluate it.Why it's safe to rebalance
Comet evaluates
And/Orvectorially -- both sides are always evaluated, with no row-level short-circuit -- so the chain is fully associative. Rebalancing the flattened operands into a depth-O(log n)tree is semantically identical; it only changes the proto's shape.Proposed fix
Add
QueryPlanSerde.flattenAssociative(flatten an associative chain to its leaves) andcreateBalancedBinaryExpr(rebuild as a balanced tree), and routeCometAnd/CometOrthrough them.Relationship to the Delta integration
This is a standalone correctness/robustness fix for any wide boolean predicate. It is surfaced by the in-progress Delta Lake contrib integration (Delta data-skipping builds deep conjunctions), so it would help to prioritize it alongside that work. A PR will follow shortly.