[CALCITE-7437] Type coercion for quantifier operators is incomplete#4877
[CALCITE-7437] Type coercion for quantifier operators is incomplete#4877mihaibudiu merged 1 commit intoapache:mainfrom
Conversation
|
This appears to be an AI-generated solution. We don't object to using AI to solve problems, but it's necessary to confirm that the code changes and comments meet review criteria before submission. For example, the title should be [CALCITE-7437], and the PR description should ideally follow the existing template. |
yeah. The initial draft of this description was generated with the assistance of Gemini to ensure clarity and professional phrasing, then reviewed and refined by me to accurately reflect the technical implementation. The core solution and logic were developed by myself. |
| collectionWidenType = | ||
| binding.getTypeFactory() | ||
| .enforceTypeWithNullability(collectionWidenType, type2.isNullable()); | ||
| if (coerceOperandType(scope, binding.getCall(), 1, collectionWidenType)) { |
There was a problem hiding this comment.
Can you please make sure there is test coverage for both cases (subquery and collections)?
If there are already tests, fine, but if not, please write some.
In particular, this will be tricky when the row type on the RHS contains nested collections or ROW, e.g., ROW(ROW()) or ROW(MAP<VARCHAR, INT ARRAY>>.
There was a problem hiding this comment.
I've added test cases for nested collections and complex ROW types. Please note that I've observed a behavior discrepancy between interpreted and compiled modes for ARRAY[ARRAY[1]] = SOME (ARRAY[ARRAY[1]]) (it currently evaluates to true in interpreted but false in compiled). I've added a test case to track this, but the primary focus of this PR remains the fix for the coercion logic.
There was a problem hiding this comment.
What is "interpreted" mode? You mean, at runtime?
There must be some incorrect optimization step if the compiler folds this to FALSE.
Please file an issue if you cannot diagnose it. If this behavior is introduced by this PR, I think it will have to be resolved before we can merge.
There was a problem hiding this comment.
To be more precise: when running the test directly in IntelliJ IDEA, the result is true, but when running via ./gradlew clean build, the result is false.
There was a problem hiding this comment.
That is really strange, I have never seen this before. If you have a deterministic reproduction using gradle please file an issue about it. If the issue is introduced by this PR, it sounds like something that should be investigated. The only solution I can imagine is by using printf debugging, by logging various results during compilation/evaluation and comparing them with the results you get with the debugger. At some point they have to differ.
There was a problem hiding this comment.
sorry, I found that f.checkBoolean("array[array[1]] = some (array[array[1]])", true) passes in both true and false cases during unit testing, which suggests the assertion may not be executing for this expression. The ./gradlew clean build result is false. I'll investigate why the assertion is being skipped in the unit test and whether the false result from the full build is correct or an optimization issue. I'll follow up with findings.
There was a problem hiding this comment.
What kind of test are you running?
There was a problem hiding this comment.
Try adding "-ea" to the JVM arguments in your IDEA setup. I think this is likely an issue with the IntelliJ IDEA environment, as running with "gradlew" works fine.
There was a problem hiding this comment.
Sorry,my mistake,I have updated the test cases for testQuantifyOperatorsWithTypeCoercion to be more accurate.
Previously, I mistakenly expected array[array[1]] = some (array[array[1]]) to evaluate to true, but it should actually trigger a validation error ('Values passed to = SOME operator must have compatible types') due to type incompatibility.
I also conducted a research on PostgreSQL's behavior for comparison operators with nested arrays. For instance, SELECT ARRAY[1] = ANY(ARRAY[ARRAY[1], ARRAY[2]]); throws an error: operator does not exist: integer[] = integer.
Therefore, I have decided to keep the type coercion logic restrictive and consistent with standard database behavior. The current design, which supports cases like array[1] = some (array[1, 2, 3]), is sufficient and follows the principle of not introducing ambiguous type coercions.
|
So are there tests for more complex nested data types? |
yeah. I have added several test cases covering more complex nested data types to ensure the robustness of the type coercion mechanism. For instance, I included scenarios like ARRAY[ARRAY[1]] IN SOME(SELECT ARRAY[ARRAY[1.0]]). These tests verify that the type system correctly handles nested array dimensions and implicit type promotion (e.g., INTEGER to DECIMAL) during the SOME/ANY quantification process. This helps in explicitly defining the supported boundaries and preventing unexpected behavior with multi-dimensional data structures. |
|
Thank you for your patience! |
Thank you very much for your time and all the detailed feedback throughout this PR process. I've learned a lot from this discussion. thanks everyone. |
|



Summary
Fix RuntimeException when using SOME/ANY/ALL with mismatched types and all quantifiers should work in the same way, like IN.
The issue:
Calcite currently crashes with a RuntimeException when a quantifier operator (like SOME, ANY, or ALL) compares columns with different types—for example, a VARCHAR on the left and a SMALLINT from the subquery.
The error (SELECT deptno, dname > SOME(SELECT empno FROM emp) AS b FROM dept):
'java.lang.RuntimeException: while resolving method 'gt[class java.lang.String, short]' in c org.apache.calcite.runtime.SqlFunctions This happens because the SqlValidator doesn't trigger TypeCoercion for these operators, so no CAST is inserted. When it gets to code generation, Linq4j can't find a way to compare a raw String with a primitive short.
The fix:
I updated TypeCoercionImpl to handle these quantifier operators. Now, it follows the same implicit cast rules we use for standard comparisons or the IN operator.
Testing:
Added a case in sub-query.iq. The plan now correctly shows a CAST (e.g., CAST($1):SMALLINT) before the join condition.
Note: If the data itself is a non-numeric string (like "ACCOUNTING"), you'll still get a NumberFormatException at runtime, but this is expected behavior (similar to how CAST('A' AS INT) behaves in other engines). The fix ensures the engine can at least generate the correct execution plan instead of crashing during the validation or code-gen phase.