Skip to content

[CALCITE-7628] Interpreter gives wrong result for query with MINUS or INTERSECT with 3 or more inputs#5055

Open
julianhyde wants to merge 8 commits into
apache:mainfrom
julianhyde:7628-interpreter-minus
Open

[CALCITE-7628] Interpreter gives wrong result for query with MINUS or INTERSECT with 3 or more inputs#5055
julianhyde wants to merge 8 commits into
apache:mainfrom
julianhyde:7628-interpreter-minus

Conversation

@julianhyde

Copy link
Copy Markdown
Contributor

julianhyde and others added 8 commits June 28, 2026 13:50
…with 3 or more inputs return wrong result

Add test cases.
… inputs return wrong result

In the interpreter, a query where MINUS or INTERSECT has 3 or more
inputs previously returned the wrong result, because SetOpNode evaluated
only the first two inputs. It now evaluates all inputs.

We add tests in a new file `interpreter.iq`, that runs SQL queries using
the interpreter.
SetOpNode evaluates EXCEPT ALL by counting occurrences in a
Map<Row, Integer>: each row from the first input increments the count
for its value, each row from a later input decrements it, and a value
whose count reaches zero is removed from the map. After all inputs have
been read, each value remaining in the map is emitted as many times as
its surviving count. The later inputs are streamed rather than buffered.

testInterpretMinusAll no longer asserts row order, which the map-based
output does not preserve.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Switch on setOp.kind to a method per operation (union, intersect, minus),
with a shared helper that buffers the inputs for intersect and minus.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…terpreter

Implement INTERSECT ALL in SetOpNode with a Map<Row, CountPair> that
tracks, per value, the running minimum multiplicity and its count in the
current input, reducing both in a single pass per intermediate input and
emitting min(min, current) copies for values present in the last input.
Use a mutable Count holder for the EXCEPT ALL occurrence counts.

testBindableIntersect no longer asserts row order, which INTERSECT ALL
does not define.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Avoid materializing every input into a List<Set<Row>>. EXCEPT DISTINCT
streams each later input directly into Set.remove on the running result;
INTERSECT DISTINCT processes one input at a time, streaming it into a new
set of the rows it shares with the result so far. Peak memory is now
bounded by the result set plus one input rather than the sum of all
inputs. Replace readInputs() with a single-source read(Source) helper.

Require at least two inputs in the constructor.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…empty

When the running result (for the distinct paths) or the count map (for
the ALL paths) becomes empty, the answer can no longer change, so stop
reading the remaining inputs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants