[CALCITE-7505] RelToSqlConverter produces duplicate FROM aliases for correlated subqueries when hasImplicitTableAlias() is true#5036
Conversation
…correlated subqueries when hasImplicitTableAlias() is true
| /** Test case for | ||
| * <a href="https://issues.apache.org/jira/browse/CALCITE-7505">[CALCITE-7505] | ||
| * RelToSqlConverter produces duplicate FROM aliases for correlated subqueries</a>. */ | ||
| @Test void testExistsSubQueryAliasConflict() { |
There was a problem hiding this comment.
why is this using hasImplicitTableAlias()?
There was a problem hiding this comment.
Thanks for pointing this out — I’ve updated the PR description. The previous wording was inaccurate: this change is not really about hasImplicitTableAlias(). The actual issue is that Filter did not preserve the correlation variables it binds, so RelToSqlConverter could not correctly determine the binding scope for correlated references.
There was a problem hiding this comment.
Do you need to change the JIRA issue title and commit/PR message as well?
| } | ||
|
|
||
| @Override public Void visitCorrelVariable(RexCorrelVariable v) { | ||
| if (correlIds.contains(v.id) |
There was a problem hiding this comment.
The JavaDoc above says: "a correlated variable defined by the given input"
Where is this check validating this fact?
(I am not sure how "defined by an input" is defined)
There was a problem hiding this comment.
Sorry, that's my english fault, The wording is inaccurate. The method does not actually verify that the correlation variable is defined by the given input; it only checks for a correlated reference with a matching row type. I'll update the Javadoc to reflect what it actually does.
There was a problem hiding this comment.
There can be many variables with the same type, please explain why your approach is correct.
There was a problem hiding this comment.
Hi @mihaibudiu
Thanks for the question. During our test case execution, we observed that planner rules like FilterIntoJoinRule( test case with testFilterCorrelateMissingVariableCor) can reposition or push down a Filter. This optimization often causes the Filter's variablesSet to become stale, meaning the variable's rowType no longer matches the schema of the new underlying input.
Therefore, relying solely on correlIds.contains(v.id) is insufficient. We need the rowType equality check to ensure that this Filter remains the true binding point of the correlation variable post-optimization. If the types mismatch, it indicates the Filter has been relocated, and we should not trigger resetAliasForCorrelation.
Regarding the structure check, rather than a generic BiRel, I specifically restrict this validation to Join and LogicalCorrelate operators. These two operators represent the exact semantic boundaries where correlation aliases are defined and resolved. Checking for Join and LogicalCorrelate allows us to correctly determine whether the alias needs to be appended, ensuring correctness without interfering with other relational nodes.
Let me know if this clarifies the design choice!
…rrelated sub-queries in Filter
…rrelated sub-queries in Filter
|



Jira Link
CALCITE-7505
This PR fixes incorrect SQL generation for correlated sub-queries in a Filter's WHERE clause.
The root cause is that Filter did not preserve the correlation variables it binds, so RelToSqlConverter could not reliably determine when the current node was the binding point of a correlation variable. As a result, correlation alias handling could be applied at the wrong level, leading to incorrect correlated references in generated SQL.
The fix adds variablesSet to Filter / LogicalFilter and only resets the correlation alias when the current Filter / Project actually binds the variable, the input row type matches, and the input is not a BiRel.
A follow-up change will cover similar correlated cases in Join conditions.