Skip to content

[CALCITE-6104] Aggregate function that references outer column should be evaluated in outer query#5052

Open
xuzifu666 wants to merge 10 commits into
apache:mainfrom
xuzifu666:calcite-6104
Open

[CALCITE-6104] Aggregate function that references outer column should be evaluated in outer query#5052
xuzifu666 wants to merge 10 commits into
apache:mainfrom
xuzifu666:calcite-6104

Conversation

@xuzifu666

Copy link
Copy Markdown
Member


!ok

# [CALCITE-6104] Aggregate function that references outer column should be evaluated in outer query

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment that these are validated using an independent database.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also have a test with 2 outer aggregates?
Showing the plan could also help.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder, I had added comment about it and the test case had been added.
But agg.iq to verify the correctness of the numerical results calculated after the Calcite fix within PostgreSQL, the original SQL must be rewritten into standard SQL supported by PostgreSQL.
Validated these sql in https://onecompiler.com/postgresql/44tgmwq6p and result is as expected.


!ok

SELECT (SELECT sum(sal) FROM dept) AS sum_sal

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blog post quoted by @julianhyde has more examples, please add all of the relevant ones

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I review the blog overall and add new case for it. The correspondence is as follows:

SELECT (SELECT sum(1) FROM xx LIMIT 1) FROM aa

Aggregate with no column references aggregates at the innermost level.

SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa

Aggregate function that references outer column should be evaluated in outer query.

SELECT (SELECT sum(x) FROM xx LIMIT 1) FROM aa

Aggregate referencing only inner column aggregates at the inner level.

* <p>For example,
* <blockquote><pre>SELECT (SELECT sum(a) FROM t LIMIT 1) FROM aa</pre></blockquote>
* is rewritten to
* <blockquote><pre>SELECT sum(a) * (SELECT 1 FROM t LIMIT 1) FROM aa</pre></blockquote>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this example it's not obvious to me at all what the general rewrite rule is.
Can you describe the algorithm?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I’ve added the logic for the overall process; I hope that clarifies things.

* </ol>
*
* <p>For example,
* <blockquote><pre>SELECT (SELECT sum(a) FROM t LIMIT 1) FROM aa</pre></blockquote>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot read this example because we don't know where column a is.
Can you also add tests where one or both tables are empty or contain only nulls in the aggregated column?

@xuzifu666 xuzifu666 Jun 27, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, The Javadoc examples for rewriteOuterAggregatesInSelectList have been improved, explicitly stating that a comes from aa and x comes from t.
Boundary tests have been added to agg.iq: Inner table empty → 1 row NULL
Outer table empty → 1 row NULL (because the outer aggregation has no input rows)
and postgresql test also updated (result is also keep the same with agg.iq, empty results for VALUE are not currently supported, so I am using the SQL statement SELECT 1 FROM emp WHERE 1 = 0 as a substitute.) https://onecompiler.com/postgresql/44tgmwq6p

Comment thread core/src/test/resources/sql/agg.iq Outdated
!ok

# [CALCITE-6104] Aggregate function that references outer column should be evaluated in outer query
# The expected results below have been verified in PostgreSQL by running

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused, I thought these queries are standard SQL, but, indeed, I cannot run them on postgres or other databases I tried. Isn't this PR supposed to implement a standard SQL feature? Why can't these queries use standard SQL then?

@xuzifu666 xuzifu666 Jun 27, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is indeed a confusing point. These correlated-aggregate queries are not standard SQL, and CALCITE-6104 should not implement a SQL standard feature. Instead, it makes Calcite support a non-standard extension that some databases already implement.

The SQL standard does not allow aggregate functions to reference outer columns
In standard SQL, an aggregate function like sum(a) can only reference columns from the SELECT level where it appears. So a query like:

SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa;

will fail on PostgreSQL, Oracle, and MySQL/MariaDB because a belongs to the outer table aa, not the sub-query xx.

But PostgreSQL and others support this extension.
These databases allow an aggregate to "climb up" to the nearest SELECT that contains its free variables. Calcite previously handled this pattern incorrectly (it aggregated inside the sub-query, grouped by the outer rows). The fix makes Calcite behave consistently with those databases.
This test is provided in Jira (julianhyde@3f4cab5) is supported—at least with the current PR applied; these are not standard SQL either.
Additionally, I tested converting these SQL statements to standard SQL; they passed even without this PR, which is why I hold the view mentioned above.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarification. Maybe this SQL should not be accepted by default? There are these conformance modes, I think only modes like LENIENT or BABEL should accept this construct.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say that postgres accepts these queries, then why do you have to modify the queries to run them on postgres? I couldn't run them as written in Postgres.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for pointing this out. The current comment is incorrect, should state "These SQL queries are non-standard and their successful execution is only guaranteed within Calcite for now" makes perfect sense.

Regarding the limitation where correlated aggregate rewriting is only available under modes like LENIENT or BABEL, could log a separate jira to resolve this later(If ok I would create a new jira)? This way, we can track these changes independently and easily gather any additional feedback. From my perspective, this limitation bears little relevance to the current Jira ticket. I’d like to hear if you agree with my view on this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will require 3 lines of code and should be done as part of this PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll implement this restriction directly in this PR.

// Rewrite scalar sub-queries whose single select item is an aggregate
// over outer columns. The aggregate belongs to the outer query per SQL
// standard.
rewriteOuterAggregatesInSelectList(select);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this handleOuterAggregate, and make the function reject it if not in a suitable conformance mode.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the latest commit implements controls in this area.

Comment thread core/src/test/resources/sql/agg.iq Outdated
!ok

# [CALCITE-6104] Aggregate function that references outer column should be evaluated in outer query
# The expected results below have been verified in PostgreSQL by running

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say that postgres accepts these queries, then why do you have to modify the queries to run them on postgres? I couldn't run them as written in Postgres.

* <blockquote><pre>
* WITH aa(a) AS (VALUES 1, 2, 3),
* t(x) AS (VALUES 10, 20, 30)
* SELECT (SELECT sum(a) FROM t LIMIT 1) FROM aa

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the LIMIT 1 needed here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not need, I had updated it.

* <blockquote><pre>
* WITH aa(a) AS (VALUES 1, 2, 3),
* t(x) AS (VALUES 10, 20, 30)
* SELECT sum(a) * (SELECT 1 FROM t LIMIT 1) FROM aa

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rewrite will only work for numeric aggregates on types which have a multiplication operation

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this message to the comments first.

@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants