[Enhancement] Push eventstats down by rewriting RexOver to Join + Aggregate (#5483) by RyanL1997 · Pull Request #5495 · opensearch-project/sql

RyanL1997 · 2026-06-02T00:55:31Z

Description

PPL eventstats lowers to LogicalProject(RexOver(...)) directly above the scan. No rule in OpenSearchIndexRules.OPEN_SEARCH_PUSHDOWN_RULES matches that shape — every AggregateIndexScanRule config requires LogicalAggregate at the operand root, and RareTopPushdownRule requires a ROW_NUMBER window with a LESS_THAN_OR_EQUAL filter above. The plan therefore reaches Volcano with RexOver intact, becomes EnumerableWindow, and the scan beneath it stays in _source + requestedTotalSize=MAX_INT mode. On 47B-doc indices the coordinator times out streaming every matching document just to count it. Same pathological behavior with BY (the production query in the issue).

This PR rewrites Window AST nodes in CalciteRelNodeVisitor.visitWindow into Project → Join → (input, Aggregate(input)). The right-side Aggregate sits directly over a re-pushed copy of the input, matching AggregateIndexScanRule.AGGREGATE_SCAN (no-BY) or DEFAULT / BUCKET_NON_NULL_AGG (BY). OpenSearch sees the same shape as stats count() — size:0 + track_total_hits or a terms aggregation — instead of an unsized row fetch. The left side returns rows as before; the join broadcasts the aggregate value(s) onto each row, preserving the [original cols, agg cols] row type so downstream consumers (limit, head, fields) see no change.

Design

Follows the precedent in buildStreamWindowSelfJoinPlan (CalciteRelNodeVisitor.java:2348-2362):

Join, not LogicalCorrelate, because LogicalCorrelate causes NPE in RelDecorrelator per the existing comment.
NULL bucket handling mirrors lines 2442-2449 of the streamstats self-join: bucketNullable=true → INNER join with IS NOT DISTINCT FROM ((left.k = right.k) OR (both NULL)); bucketNullable=false → LEFT join with simple equality, IS NOT NULL filter pushed below the right aggregate to match BUCKET_NON_NULL_AGG. NULL-keyed left rows are preserved with NULL aggregate values, matching the previous CASE-wrapped behavior.
aggregateWithTrimming is reused for the right-side aggregate construction, so agg-resolution semantics are identical to stats/streamstats.

Rewriteability predicate (canRewriteWindowAsAggregateJoin) rejects: non-aggregate window functions (ROW_NUMBER / LAG / etc.), non-empty sortList, non-default frame, non-bare-field partition keys. Anything outside the eventstats shape falls through to visitWindowAsRexOver, preserving existing behavior for any future Window producer.

Coverage against the issue's 7-requirement comment

Comment:

No-BY + BY both addressed in a single rewrite; hasGroup toggles cross-join vs equi-join. ✓
All aggregate functions — uses aggregateWithTrimming (same path stats uses), supports COUNT/SUM/AVG/MIN/MAX/STDDEV/VAR/etc. uniformly. ✓
Multiple aggregates per call packed into one right-side Aggregate with multiple agg calls, one broadcast join. ✓
NULL semantics on BY join-back — IS NOT DISTINCT FROM for bucketNullable=true; LEFT join + right-side IS NOT NULL filter for bucketNullable=false. Equivalent to the existing Window+CASE behavior. ✓
Tight predicate rejects ordered/framed/non-aggregate/non-bare-field cases. ✓
No decorrelation dependency — rewrite directly produces the operand chain AggregateIndexScanRule expects. ✓
No symptom-masking shortcuts — Predicate.not(LogicalProject::containsOver) guards in AggregateIndexScanRule are untouched. Real Window→Aggregate rewrite at the AST level. ✓

Verification — status

Draft because the following needs CI to finalize, blocked locally by a pre-existing core:compileJava failure on AnalyticsExecutionEngine.java:92 (unrelated analytics-api:3.7.0-SNAPSHOT symbol issue that also reproduces on main with no edits):

CalcitePPLEventstatsTest verifyLogical expectations have been updated to my pattern-derived prediction of the new lowered shape. May need 1-2 small adjustments (trim-project corner cases) based on actual output. verifyPPLToSparkSQL calls are temporarily removed pending observation of the SparkSqlDialect output for the join+aggregate form.
Existing eventstats EXPLAIN expected-output files (explain_eventstats_avg.json, _dc.json, _distinct_count.json, _earliest_latest.json, _earliest_latest_custom_time.json, _earliest_latest_no_group.json, _null_bucket.yaml) will diff against the new plan shape. Regeneration from CI logs incoming as a follow-up commit.
A new positive-pushdown EXPLAIN assertion (testEventstatsPushdownExplain mirroring stats count() — asserting AGGREGATION->...COUNT() in PushDownContext and track_total_hits in sourceBuilder) incoming.
NULL-bucket BY integration tests in CalcitePPLEventstatsIT against TEST_INDEX_STATE_COUNTRY_WITH_NULL (both bucketNullable=true and =false) incoming.

Adjacent unit suites (CalcitePPLAggregationTest, CalcitePPLStreamstatsTest, CalcitePPLRareTest, CalcitePPLTopTest) should all be unchanged — different visitor paths or StreamWindow/RareTopN AST nodes.

Related Issues

Resolves #5483

Check List

New functionality includes testing.
New functionality has been documented.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using `--signoff` or `-s`.
Public documentation issue/PR created.

opensearch-project#5483) PPL eventstats lowers to LogicalProject(RexOver(...)) above the scan. No rule in OpenSearchIndexRules.OPEN_SEARCH_PUSHDOWN_RULES matches that shape: every AggregateIndexScanRule config requires LogicalAggregate at the operand root, and RareTopPushdownRule requires a ROW_NUMBER window with a LESS_THAN_OR_EQUAL filter above it. The plan therefore reaches Volcano with RexOver intact, gets converted to EnumerableWindow, and the scan beneath it stays in _source-includes + requestedTotalSize=MAX_INT mode, streaming every matching document to the coordinator just to count it. On 47B-doc indices this times out. This change rewrites Window AST nodes in CalciteRelNodeVisitor.visitWindow into a Join + Aggregate plan: the right side is an Aggregate over a re-pushed copy of the input, which matches AggregateIndexScanRule and pushes down to OpenSearch as size:0 + track_total_hits (no-BY) or a terms aggregation (BY). The left side returns rows as before. The join broadcasts the aggregate value(s) onto each row, preserving the row type [original cols, agg cols] that the legacy lowering produced so downstream consumers see no shape change. NULL-bucket semantics: - bucketNullable=true: INNER join with IS NOT DISTINCT FROM on each partition key, so the NULL bucket on each side matches and NULL-keyed left rows still receive the NULL-bucket aggregate value. - bucketNullable=false: LEFT join with simple equality, IS NOT NULL filter pushed below the right aggregate to match the BUCKET_NON_NULL_AGG pushdown shape stats already uses. NULL-keyed left rows survive with a NULL aggregate value, matching the previous CASE-wrapped behavior. The rewriteability predicate (canRewriteWindowAsAggregateJoin) rejects non-aggregate window functions (ROW_NUMBER / LAG / etc.), non-empty sort lists, non-default frames, and non-bare-field partition keys. Anything outside the eventstats shape falls through to visitWindowAsRexOver, preserving existing behavior for any future Window producer. Follows the precedent in buildStreamWindowSelfJoinPlan: uses Join (not LogicalCorrelate, which causes NPE in RelDecorrelator per the comment at CalciteRelNodeVisitor.java:2348-2352) and mirrors the canonical NULL bucket handling at lines 2442-2449. Reuses aggregateWithTrimming for the right-side aggregate construction so agg-resolution semantics are identical to stats and streamstats. CalcitePPLEventstatsTest verifyLogical expectations are updated to the new lowered shape. verifyPPLToSparkSQL assertions are temporarily removed pending observation of the SparkSqlDialect output for the join+aggregate form; the previous window-form expectations no longer apply. Draft: existing CalciteExplainIT eventstats expected-output files and new NULL-bucket BY integration tests in CalcitePPLEventstatsIT will be added in follow-up commits once CI confirms the lowered shape is exact. Resolves opensearch-project#5483 Signed-off-by: Jialiang Liang <ryanleeang@gmail.com> Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-06-02T00:56:43Z

PR Reviewer Guide 🔍

(Review updated until commit `6f333fc`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Possible Issue In `rewriteWindowAsAggregateJoin`, when `hasGroup` is false, the code projects a literal-0 key onto both sides and then references these keys by absolute position (`leftFieldCount` on left, `aggCount` on right). However, `leftFieldCount` is the count of fields in `leftInput` before the literal-0 key is appended. After `projectPlus` adds the key, the left side has `leftFieldCount + 1` fields, so the key is at position `leftFieldCount`, not `leftFieldCount` itself. The subsequent `field(2, 0, leftFieldCount)` call at line 2204 will reference the wrong column (the last original column instead of the appended key), causing the join condition to be incorrect and potentially leading to a cross-join or wrong results. RelNode leftForJoin = leftInput; RelNode rightForJoin = rightAggregate; int leftJoinKeyOffset = 0; if (!hasGroup) { context.relBuilder.push(leftInput); context.relBuilder.projectPlus( context.relBuilder.alias(context.relBuilder.literal(0), EVENTSTATS_NOGROUP_JOIN_KEY)); leftForJoin = context.relBuilder.build(); context.relBuilder.push(rightAggregate); context.relBuilder.projectPlus( context.relBuilder.alias(context.relBuilder.literal(0), EVENTSTATS_NOGROUP_JOIN_KEY)); rightForJoin = context.relBuilder.build(); leftJoinKeyOffset = 1; } context.relBuilder.push(leftForJoin); context.relBuilder.push(rightForJoin); RexNode joinCondition; if (!hasGroup) { // Equi-join on the literal-0 keys we just projected (positions: leftFieldCount on the left // input post-projection, aggCount on the right after the agg outputs). RexNode leftKey = context.relBuilder.field(2, 0, leftFieldCount); RexNode rightKey = context.relBuilder.field(2, 1, aggCount); joinCondition = context.relBuilder.equals(leftKey, rightKey); Possible Issue The `extractFieldName` method at line 2399 calls `toString()` on `QualifiedName` and `Field.getField()` to produce the field name used in `relBuilder.field(2, side, name)` for the join condition. However, `QualifiedName.toString()` may include table qualifiers or other formatting that does not match the actual field name in the row type's field list. If the row type uses simple unqualified names but `toString()` returns a qualified name, the `field(...)` lookup will fail or reference the wrong column, breaking the join condition for BY-case eventstats. private static String extractFieldName(UnresolvedExpression expr) { if (expr instanceof Field f) { return f.getField().toString(); } if (expr instanceof QualifiedName qn) { return qn.toString(); } if (expr instanceof Alias a) { return extractFieldName(a.getDelegated()); } throw new IllegalArgumentException( "Cannot extract field name from non-field expression: " + expr); }

github-actions · 2026-06-02T00:57:12Z

PR Code Suggestions ✨

Latest suggestions up to 6f333fc

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Validate partition key extraction for filtering When `bucketNullable=false`, the code filters `IS NOT NULL` on partition keys before aggregation. However, if `PlanUtils.getSelectColumns(groupRex)` returns an empty list or fails to extract columns correctly, the filter is silently skipped. Verify that `getSelectColumns` returns the expected columns and log a warning if the extraction fails to ensure the NULL-filtering logic is applied as intended. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2154-2163] List<RexNode> groupRex = groupList.stream().map(expr -> rexVisitor.analyze(expr, context)).toList(); +List<RexNode> selectColumns = PlanUtils.getSelectColumns(groupRex); +if (selectColumns.isEmpty() && !groupRex.isEmpty()) { + throw new IllegalStateException("Failed to extract partition key columns for NULL filtering"); +} List<RexNode> isNotNullList = - PlanUtils.getSelectColumns(groupRex).stream() + selectColumns.stream() .map(context.relBuilder::field) .map(context.relBuilder::isNotNull) .toList(); if (!isNotNullList.isEmpty()) { context.relBuilder.filter(isNotNullList); } Suggestion importance[1-10]: 5 __ Why: The suggestion validates that `PlanUtils.getSelectColumns()` successfully extracts columns before applying NULL filtering. This is a reasonable defensive check to ensure the NULL-filtering logic is applied correctly when `bucketNullable=false`. However, the suggestion throws an exception rather than logging a warning as described, and it's unclear if `getSelectColumns` can legitimately return empty for non-empty `groupRex`. The improvement adds validation but may be overly strict.	Low
General	Prevent unbounded recursion in extractFieldName The recursive call in `extractFieldName` when handling `Alias` can lead to unbounded recursion if there are circular or deeply nested alias chains. Consider adding a depth limit or cycle detection to prevent stack overflow in pathological cases. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2399-2411] private static String extractFieldName(UnresolvedExpression expr) { + return extractFieldNameWithDepth(expr, 0, 100); +} + +private static String extractFieldNameWithDepth(UnresolvedExpression expr, int depth, int maxDepth) { + if (depth > maxDepth) { + throw new IllegalArgumentException("Maximum alias nesting depth exceeded for expression: " + expr); + } if (expr instanceof Field f) { return f.getField().toString(); } if (expr instanceof QualifiedName qn) { return qn.toString(); } if (expr instanceof Alias a) { - return extractFieldName(a.getDelegated()); + return extractFieldNameWithDepth(a.getDelegated(), depth + 1, maxDepth); } throw new IllegalArgumentException( "Cannot extract field name from non-field expression: " + expr); } Suggestion importance[1-10]: 4 __ Why: The suggestion addresses a potential stack overflow from deeply nested or circular `Alias` chains by adding depth limiting. While this is a valid concern for robustness, the likelihood of encountering such pathological cases in practice is low given the controlled nature of the AST construction. The improvement adds safety but is not critical for typical usage.	Low
Possible issue	Add null-safety check for getSortList The method checks `wf.getSortList().isEmpty()` but does not verify that `getSortList()` itself is non-null before calling `isEmpty()`. If `getSortList()` returns null, this will throw a NullPointerException. Add a null-safety check before invoking `isEmpty()`. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2297-2299] private static boolean canRewriteWindowAsAggregateJoin(Window node) { if (node.getWindowFunctionList().isEmpty()) { return false; } for (UnresolvedExpression expr : node.getWindowFunctionList()) { UnresolvedExpression inner = (expr instanceof Alias a) ? a.getDelegated() : expr; if (!(inner instanceof WindowFunction wf)) { return false; } String funcName = extractAggregateFunctionName(wf.getFunction()); if (funcName == null) { return false; } Optional<BuiltinFunctionName> windowName = BuiltinFunctionName.ofWindowFunction(funcName); if (windowName.isEmpty()) { return false; } String canonical = canonicalAggregationName(windowName.get()); if (BuiltinFunctionName.ofAggregation(canonical).isEmpty()) { return false; } - if (!wf.getSortList().isEmpty()) { + if (wf.getSortList() != null && !wf.getSortList().isEmpty()) { return false; } if (wf.getWindowFrame() != null && !Objects.equals(wf.getWindowFrame(), WindowFrame.rowsUnbounded())) { return false; } } if (node.getGroupList() != null) { for (UnresolvedExpression expr : node.getGroupList()) { if (!isBareFieldReference(expr)) { return false; } } } return true; } Suggestion importance[1-10]: 3 __ Why: The suggestion adds a null check for `getSortList()` before calling `isEmpty()`. While this is a defensive programming practice, the existing code pattern `!wf.getSortList().isEmpty()` is commonly used in Java and would only fail if the API contract is violated. Without evidence that `getSortList()` can return null, this is a minor defensive improvement rather than a critical fix.	Low

Previous suggestions

Suggestions up to commit bb8d9b2

Category	Suggestion	Impact
General	Validate partition keys are non-dotted The join condition construction references partition keys by name via `relBuilder.field(2, side, keyName)`, which does not perform nested-field resolution. If a dotted path (e.g., `doc.user.city`) bypasses the `isBareFieldReference` check due to a future code change, the join will silently fail to match rows. Add a runtime assertion to verify that `keyName` contains no dots, catching any validation gaps early. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2181-2194] for (UnresolvedExpression groupExpr : groupList) { String keyName = extractFieldName(groupExpr); + if (keyName.contains(".")) { + throw new IllegalStateException("Dotted partition key bypassed validation: " + keyName); + } RexNode leftKey = context.relBuilder.field(2, 0, keyName); RexNode rightKey = context.relBuilder.field(2, 1, keyName); RexNode eq = context.relBuilder.equals(leftKey, rightKey); if (bucketNullable) { RexNode bothNull = context.relBuilder.and( context.relBuilder.isNull(leftKey), context.relBuilder.isNull(rightKey)); perKeyConditions.add(context.relBuilder.or(eq, bothNull)); } else { perKeyConditions.add(eq); } } Suggestion importance[1-10]: 6 __ Why: The suggestion adds a defensive runtime check to catch validation gaps if dotted paths bypass `isBareFieldReference`. This is a reasonable safeguard that could prevent silent failures, though the existing validation in `canRewriteWindowAsAggregateJoin` should already prevent this scenario. The check adds minimal overhead and improves robustness.	Low
	Deduplicate partition keys before filtering When `bucketNullable=false`, the code filters `IS NOT NULL` on each partition key before grouping. However, if `groupRex` contains duplicate columns (e.g., `BY field1, field1`), `PlanUtils.getSelectColumns` may return duplicates, causing redundant `IS NOT NULL` filters on the same field. Consider deduplicating the column list before building the filter to avoid unnecessary predicates. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2156-2163] List<RexNode> isNotNullList = PlanUtils.getSelectColumns(groupRex).stream() + .distinct() .map(context.relBuilder::field) .map(context.relBuilder::isNotNull) .toList(); if (!isNotNullList.isEmpty()) { context.relBuilder.filter(isNotNullList); } Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies a potential issue where duplicate partition keys could lead to redundant `IS NOT NULL` filters. However, this is a minor optimization that would only affect edge cases with duplicate BY keys, which are unlikely in practice. The `.distinct()` call is a reasonable safeguard.	Low
	Use IllegalStateException for internal errors The method throws `IllegalArgumentException` for unsupported expression types, but `canRewriteWindowAsAggregateJoin` already validates that all partition keys are bare field references. If an invalid expression reaches this point, it indicates a logic error in the validation predicate. Consider using `IllegalStateException` to signal an internal inconsistency rather than invalid user input. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2367-2379] private static String extractFieldName(UnresolvedExpression expr) { if (expr instanceof Field f) { return f.getField().toString(); } if (expr instanceof QualifiedName qn) { return qn.toString(); } if (expr instanceof Alias a) { return extractFieldName(a.getDelegated()); } - throw new IllegalArgumentException( + throw new IllegalStateException( "Cannot extract field name from non-field expression: " + expr); } Suggestion importance[1-10]: 4 __ Why: The suggestion is technically correct that `IllegalStateException` better signals an internal inconsistency since `canRewriteWindowAsAggregateJoin` validates inputs. However, this is a minor semantic improvement that doesn't affect functionality. The choice between exception types is somewhat subjective.	Low

Suggestions up to commit 515f35e

Category	Suggestion	Impact
General	Add validation before field extraction The method throws `IllegalArgumentException` for unexpected expression types, but `canRewriteWindowAsAggregateJoin` only validates that partition keys are bare field references. If validation logic changes or is bypassed, this could cause runtime failures. Add a defensive check in `rewriteWindowAsAggregateJoin` before calling `extractFieldName` to ensure all partition keys pass `isBareFieldReference`. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2336-2348] private static String extractFieldName(UnresolvedExpression expr) { + if (!isBareFieldReference(expr)) { + throw new IllegalArgumentException( + "Expected bare field reference, got: " + expr); + } if (expr instanceof Field f) { return f.getField().toString(); } if (expr instanceof QualifiedName qn) { return qn.toString(); } if (expr instanceof Alias a) { return extractFieldName(a.getDelegated()); } throw new IllegalArgumentException( "Cannot extract field name from non-field expression: " + expr); } Suggestion importance[1-10]: 3 __ Why: The suggestion adds a redundant check since `extractFieldName` is only called after `canRewriteWindowAsAggregateJoin` validates all partition keys via `isBareFieldReference`. The existing validation logic already ensures this precondition, making the defensive check unnecessary.	Low

Suggestions up to commit f19e1d9

Category	Suggestion	Impact
General	Validate join output field count The final projection assumes the join output has exactly `leftFieldCount +` `rightGroupKeyCount + aggCount` fields, but if the right aggregate produces a different number of columns (e.g., due to a mismatch between `groupList.size()` and the actual group keys in the aggregate), `context.relBuilder.field(rightAggStart + i)` will reference an out-of-bounds field index. Add a bounds check or assertion to verify the join output row type matches the expected shape before projecting. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2212-2216] int rightAggStart = leftFieldCount + rightGroupKeyCount; +int joinFieldCount = context.relBuilder.peek().getRowType().getFieldCount(); +int expectedFieldCount = leftFieldCount + rightGroupKeyCount + aggCount; +if (joinFieldCount != expectedFieldCount) { + throw new IllegalStateException("Join output field count mismatch: expected " + expectedFieldCount + ", got " + joinFieldCount); +} for (int i = 0; i < aggCount; i++) { finalProjects.add(context.relBuilder.field(rightAggStart + i)); finalNames.add(extractAliasName(node.getWindowFunctionList().get(i))); } Suggestion importance[1-10]: 7 __ Why: This suggestion addresses a potential runtime error if the join output doesn't match the expected field count. Adding a bounds check before field projection would prevent out-of-bounds access and make the code more defensive against unexpected aggregate output shapes.	Medium
	Validate non-empty NULL filter generation The `bucketNullable=false` path filters `IS NOT NULL` on partition keys before aggregation, but if `PlanUtils.getSelectColumns(groupRex)` returns an empty list (e.g., if `groupRex` contains only literals or complex expressions), the filter is skipped silently. This could allow NULL-keyed rows into the aggregate when they should be excluded. Verify that `getSelectColumns` always returns at least one column reference for each partition key, or add a check to ensure the filter is non-empty. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2152-2163] if (hasGroup && !bucketNullable) { List<RexNode> groupRex = groupList.stream().map(expr -> rexVisitor.analyze(expr, context)).toList(); List<RexNode> isNotNullList = PlanUtils.getSelectColumns(groupRex).stream() .map(context.relBuilder::field) .map(context.relBuilder::isNotNull) .toList(); - if (!isNotNullList.isEmpty()) { - context.relBuilder.filter(isNotNullList); + if (isNotNullList.isEmpty()) { + throw new IllegalStateException("Expected non-empty IS NOT NULL filter for bucketNullable=false with groupList: " + groupList); } + context.relBuilder.filter(isNotNullList); } Suggestion importance[1-10]: 6 __ Why: The suggestion identifies a potential edge case where `PlanUtils.getSelectColumns` might return an empty list, which could lead to incorrect NULL handling. However, the existing code already has a check `if (!isNotNullList.isEmpty())`, so the concern is partially addressed. Adding an explicit validation would improve robustness.	Low
	Handle unexpected expression types defensively The `extractFieldName` method can throw an `IllegalArgumentException` for unexpected expression types, but `rewriteWindowAsAggregateJoin` calls it on every `groupList` entry without handling this exception. If `canRewriteWindowAsAggregateJoin` incorrectly allows a non-field expression through (e.g., due to a future code change), the rewrite will fail at runtime. Add defensive validation or document the precondition that `canRewriteWindowAsAggregateJoin` guarantees all group expressions pass `isBareFieldReference`. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2336-2348] private static String extractFieldName(UnresolvedExpression expr) { if (expr instanceof Field f) { return f.getField().toString(); } if (expr instanceof QualifiedName qn) { return qn.toString(); } if (expr instanceof Alias a) { return extractFieldName(a.getDelegated()); } - throw new IllegalArgumentException( - "Cannot extract field name from non-field expression: " + expr); + throw new IllegalStateException( + "extractFieldName called on non-field expression (should be prevented by canRewriteWindowAsAggregateJoin): " + expr); } Suggestion importance[1-10]: 4 __ Why: The suggestion to change `IllegalArgumentException` to `IllegalStateException` is a minor improvement in exception semantics, but the method is already protected by `canRewriteWindowAsAggregateJoin` which validates all group expressions. The impact is low since the precondition is already enforced.	Low

Suggestions up to commit 15c52f7

Category	Suggestion	Impact
Possible issue	Validate filter count matches group key count When `bucketNullable=false`, the code filters out NULL partition keys before aggregation. However, `PlanUtils.getSelectColumns(groupRex)` may return an empty list if `groupRex` contains only literals or complex expressions that don't resolve to column references. In such cases, no filter is applied, and NULL rows may incorrectly remain in the aggregate. Verify that `getSelectColumns` always returns the expected columns for the partition keys, or add validation to ensure the filter list matches the group key count. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2153-2163] List<RexNode> isNotNullList = PlanUtils.getSelectColumns(groupRex).stream() .map(context.relBuilder::field) .map(context.relBuilder::isNotNull) .toList(); +if (isNotNullList.size() != groupList.size()) { + throw new IllegalStateException( + "Mismatch between group keys and IS NOT NULL filters: expected " + groupList.size() + " but got " + isNotNullList.size()); +} if (!isNotNullList.isEmpty()) { context.relBuilder.filter(isNotNullList); } Suggestion importance[1-10]: 5 __ Why: The suggestion identifies a potential edge case where `PlanUtils.getSelectColumns(groupRex)` might not return the expected number of columns. Adding validation would catch mismatches early, but the concern is somewhat speculative since `groupRex` is derived from validated `groupList` entries that pass `isBareFieldReference`.	Low
General	Validate aggregate function argument requirements When converting a `Function` to `AggregateFunction`, the code sets `field` to `null` if `args` is empty. However, `AggregateFunction` may not handle a `null` field correctly for all aggregate types (e.g., `COUNT()` expects no field, but other aggregates like `SUM` require one). If an aggregate function is called without arguments when it requires them, this could lead to incorrect results or runtime errors downstream. Validate that the function name is compatible with a null field before constructing the `AggregateFunction`. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2323-2330] if (fn instanceof Function f) { List<UnresolvedExpression> args = f.getFuncArgs(); + String funcName = f.getFuncName(); UnresolvedExpression field = args.isEmpty() ? null : args.get(0); + // COUNT() is the only common aggregate that accepts no field argument + if (field == null && !funcName.equalsIgnoreCase("count")) { + throw new IllegalArgumentException( + "Aggregate function " + funcName + " requires at least one argument"); + } List<UnresolvedExpression> argList = args.size() <= 1 ? List.of() : args.subList(1, args.size()); - AggregateFunction agg = new AggregateFunction(f.getFuncName(), field, argList); + AggregateFunction agg = new AggregateFunction(funcName, field, argList); return agg; } Suggestion importance[1-10]: 4 __ Why: The suggestion adds validation for aggregate functions that require arguments, which could prevent runtime errors. However, `canRewriteWindowAsAggregateJoin` already filters for functions present in both window and aggregation maps, reducing the likelihood of invalid configurations. The check is defensive but not critical.	Low
General	Add defensive error handling for unexpected types The `extractFieldName` method can throw an `IllegalArgumentException` if the expression type is unexpected, but this exception is not caught in `rewriteWindowAsAggregateJoin`. Since `canRewriteWindowAsAggregateJoin` already validates that all partition keys are bare field references via `isBareFieldReference`, this exception should never occur in practice. However, if the validation logic has a gap or the expression structure changes unexpectedly, the uncaught exception will propagate and potentially crash the query. Add defensive error handling or document the precondition clearly. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2336-2348] private static String extractFieldName(UnresolvedExpression expr) { if (expr instanceof Field f) { return f.getField().toString(); } if (expr instanceof QualifiedName qn) { return qn.toString(); } if (expr instanceof Alias a) { return extractFieldName(a.getDelegated()); } - throw new IllegalArgumentException( - "Cannot extract field name from non-field expression: " + expr); + // This should never happen if canRewriteWindowAsAggregateJoin validated correctly + throw new IllegalStateException( + "Unexpected expression type in extractFieldName (validation gap): " + expr.getClass().getName()); } Suggestion importance[1-10]: 3 __ Why: The suggestion to change `IllegalArgumentException` to `IllegalStateException` is a minor improvement in semantic clarity, but the existing code is already correct. Since `canRewriteWindowAsAggregateJoin` validates inputs, this exception should never occur in practice, making the change low-impact.	Low

Suggestions up to commit 1a043bc

Category	Suggestion	Impact
General	Deduplicate column references before filtering The `getSelectColumns` call may return duplicate column references when the same field appears multiple times in `groupRex`, causing redundant `IS NOT NULL` filters. This could lead to incorrect query plans or performance degradation. Consider deduplicating the column references before building the filter list. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2155-2162] List<RexNode> isNotNullList = PlanUtils.getSelectColumns(groupRex).stream() + .distinct() .map(context.relBuilder::field) .map(context.relBuilder::isNotNull) .toList(); if (!isNotNullList.isEmpty()) { context.relBuilder.filter(isNotNullList); } Suggestion importance[1-10]: 4 __ Why: The suggestion addresses a potential edge case where duplicate column references could create redundant filters. However, `PlanUtils.getSelectColumns` likely already handles deduplication or the input `groupRex` is unlikely to contain duplicates in typical usage. The impact is minor and the issue may not occur in practice.	Low
	Deduplicate partition keys in join condition If `groupList` contains duplicate field names, the join condition will include redundant equality checks for the same column pair. This could lead to suboptimal query plans. Consider deduplicating `groupList` by field name before constructing the join condition. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2180-2193] +Set<String> seenKeys = new HashSet<>(); for (UnresolvedExpression groupExpr : groupList) { String keyName = extractFieldName(groupExpr); + if (!seenKeys.add(keyName)) { + continue; + } RexNode leftKey = context.relBuilder.field(2, 0, keyName); RexNode rightKey = context.relBuilder.field(2, 1, keyName); RexNode eq = context.relBuilder.equals(leftKey, rightKey); if (bucketNullable) { RexNode bothNull = context.relBuilder.and( context.relBuilder.isNull(leftKey), context.relBuilder.isNull(rightKey)); perKeyConditions.add(context.relBuilder.or(eq, bothNull)); } else { perKeyConditions.add(eq); } } Suggestion importance[1-10]: 4 __ Why: The suggestion prevents redundant join conditions if `groupList` contains duplicate field names. While this could theoretically improve query plans, the PPL parser likely ensures `groupList` contains unique fields, making this edge case unlikely. The improvement is defensive but has limited practical impact.	Low
	Clarify precondition violation with IllegalStateException The method throws `IllegalArgumentException` for unexpected expression types, but `canRewriteWindowAsAggregateJoin` only validates that expressions are bare field references. If validation logic changes or has gaps, this could cause runtime failures. Add a defensive check in the caller or document the precondition more explicitly. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [2336-2348] private static String extractFieldName(UnresolvedExpression expr) { if (expr instanceof Field f) { return f.getField().toString(); } if (expr instanceof QualifiedName qn) { return qn.toString(); } if (expr instanceof Alias a) { return extractFieldName(a.getDelegated()); } - throw new IllegalArgumentException( - "Cannot extract field name from non-field expression: " + expr); + throw new IllegalStateException( + "extractFieldName called with invalid expression type (should be validated by canRewriteWindowAsAggregateJoin): " + expr); } Suggestion importance[1-10]: 3 __ Why: Changing `IllegalArgumentException` to `IllegalStateException` with a more descriptive message improves code clarity by indicating a precondition violation. However, this is a minor stylistic improvement with limited practical impact since `canRewriteWindowAsAggregateJoin` already validates inputs.	Low

…tions PPL AstExpressionBuilder.visitWindowFunction wraps the parsed function in a WindowFunction whose inner is a Function, not an AggregateFunction (SQL emits AggregateFunction). The original predicate required AggregateFunction, so it returned false for every eventstats case and the rewrite never fired. Use BuiltinFunctionName.ofAggregation(funcName) so the predicate accepts both inner types, and convert Function to AggregateFunction in stripWindowFunctionForAggregate so aggVisitor resolves it the same way stats does. Test expectation adjustments observed from actual planner output: - IS NOT DISTINCT FROM: Calcite canonicalizes OR(=, AND(IS NULL, IS NULL)) to IS NOT DISTINCT FROM on nullable partition keys (DEPTNO in EMP). - Plain =: on non-nullable partition keys (server in POST.LOGS), RexSimplify drops the IS NULL conjuncts and leaves equality. - Outer Project folded for no-BY cases: the final passthrough projection is a no-op identity in the no-BY case and Calcite folds it; the BY case keeps the project because it drops the right-side group-key column. verifyPPLToSparkSQL calls in CalcitePPLEventstatsEarliestLatestTest are removed pending stabilization of SparkSqlDialect emission for the join+aggregate form. Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-06-02T16:56:26Z

Persistent review updated to latest commit a444b56

…Y keys CI integration failures revealed two cases the rewrite shouldn't fire on: 1. testUnsupportedWindowFunctions — percentile / percentile_approx are in AGGREGATION_FUNC_MAPPING but not WINDOW_FUNC_MAPPING. The legacy rex visitor throws "Unexpected window function: ..." for them, and the test pins that error. My predicate used only ofAggregation so it accepted percentile and the rewrite ran instead of throwing. Now require presence in both maps — percentile (and take/first/last/median, all aggregation- only) fall through to the legacy throw; dc/distinct_count/row_number (all window-only) also fall through unchanged. 2. testEventstatsOnMapPath — `eventstats count() by doc.user.city`. The join condition uses relBuilder.field(2, side, name), which doesn't resolve nested paths; my predicate accepted any QualifiedName so the rewrite produced a plan that failed at field lookup. Now isSimpleQualifiedName requires parts.size() == 1; dotted paths fall through to the legacy RexOver lowering, which handles nested fields via the existing rexVisitor. Plus a spotless reformat to the earliest/latest test that wasn't picked up before push. Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-06-02T18:06:30Z

Persistent review updated to latest commit 1a043bc

The eventstats rewrite changes the lowered plan from `Project(RexOver)` to `Project + Join + Aggregate`. The four affected EXPLAIN tests run in both modes: - CalciteExplainIT (pushdown enabled) - CalciteNoPushdownIT (pushdown disabled) Both modes share the same logical plan (the rewrite is AST-level, not pushdown-gated) but the physical plan diverges: - Pushdown ON: right-side LogicalAggregate gets pushed into the inner scan as `PushDownContext=[[AGGREGATION->...]]` with `size:0` + `composite`+`terms` (BY) or `track_total_hits` (no-BY) source builder. - Pushdown OFF: right side stays as a coordinator-side EnumerableAggregate over a raw CalciteEnumerableIndexScan, no AGGREGATION in PushDownContext. The existing convention (loadExpectedPlan in PPLIntegTestCase) loads from `expectedOutput/calcite/` when pushdown is on and `expectedOutput/calcite_no_pushdown/` when it's off, so each test keeps its NoPushdownIT coverage — both variants of the same expected file get regenerated against a local node running the rewrite. Files regenerated (4 per mode, 8 total): - explain_eventstats_earliest_latest.{json} - explain_eventstats_earliest_latest_custom_time.{json} - explain_eventstats_earliest_latest_no_group.{json} - explain_eventstats_null_bucket.{yaml} Captured against: - logs_index_mapping.json + logs.json fixtures (5 docs) - account_index_mapping.json + accounts.json fixtures (1000 docs) - same fixtures CI uses; rel#N / RelSubset#N IDs are normalized at compare time by assertJsonEqualsIgnoreId / assertYamlEqualsIgnoreId. Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-06-02T18:32:51Z

Persistent review updated to latest commit 15c52f7

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-06-02T19:57:54Z

Persistent review updated to latest commit f19e1d9

My local OS captured composite size as 10000; CI's integ-test cluster sets it to 1000 (matches every other expected file under expectedOutput/calcite/agg_composite_*.{json,yaml}). The number doesn't affect correctness — it's just the per-bucket page size for the composite aggregation request. Align with the test-framework convention. Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-06-02T20:02:01Z

Persistent review updated to latest commit 515f35e

PPL eventstats accepts three aliases for the cardinality aggregation — dc, distinct_count, distinct_count_approx — all resolving to BuiltinFunctionName.DISTINCT_COUNT_APPROX. The stats command only accepts distinct_count_approx, so AGGREGATION_FUNC_MAPPING registers only that name; the other two are window-only aliases in WINDOW_FUNC_MAPPING. The previous predicate required intersection of both maps, which rejected dc and distinct_count. They fell through to the legacy RexOver lowering — which is the exact buggy "EnumerableWindow over a row-fetching scan" shape opensearch-project#5483 was filed against. Fix was incomplete. Replace the intersection check with: name is in ofWindowFunction AND its canonical aggregation name (BuiltinFunctionName.name().toLowerCase, e.g. "distinct_count_approx") is in ofAggregation. Translate the same way in stripWindowFunctionForAggregate so aggVisitor sees the registered name. For names already in both maps (count/sum/avg/etc.) the canonical name equals the user-typed name, so the lookup is a no-op — no behavior change for the cases that already worked. ROW_NUMBER still falls through because its canonical name "row_number" isn't in the aggregation map. Same for percentile / take / first / last / median / list / values — all rejected by the canonical-name lookup. Verified locally: - eventstats dc(state) → cardinality agg, size:0 - eventstats distinct_count(state) by gender → composite over gender + nested cardinality on state, size:0 Regenerated explain_eventstats_dc.json and explain_eventstats_distinct_count.json with the new shape (composite size 1000 to match CI). Both tests are pushdown-only (enabledOnlyWhenPushdownIsEnabled() + loadFromFile hardcoded to calcite/), so no calcite_no_pushdown/ variants needed. Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-06-02T22:16:28Z

Persistent review updated to latest commit bb8d9b2

…owup A perf A/B on a local 20k-doc index uncovered a real problem in the no-BY rewrite that was hidden by tests with bounded result sets. Before this fix, the no-BY case emitted: EnumerableNestedLoopJoin(condition=[true], joinType=[inner]) leftScan (returns N rows) rightScan (returns 1 row — the COUNT() scalar) Calcite's NestedLoopJoin contract calls Enumerable.enumerator() on the right side once per left tuple. Each enumerator open on a CalciteEnumerableIndexScan triggers a fresh OpenSearch _search request. For a 10k-row left side that means 10k OpenSearch calls. On a remote cluster (1-10ms RTT per call), the head-less query would take tens of seconds. Measured on the local node, 20k docs, single shard, no head: before: 10,004 OS calls per PPL query, ~1055ms wall after: 4 OS calls per PPL query, ~174ms wall That's ~6x faster wall and ~2500x fewer OS round-trips, with no correctness change (results identical). Fix: in the no-BY branch of rewriteWindowAsAggregateJoin, project a literal-0 key column onto both sides (left: append after orig cols; right: append after agg outputs) and join on equality. The equi-join condition makes the planner pick EnumerableHashJoin, which drains the single-row right side once into a hash table and probes per left row in O(1). Pushdown still fires on the right side — verified via EXPLAIN that the right scan still carries `AGGREGATION->...COUNT()` and `size:0` in PushDownContext; the literal-0 projection is a top-level wrapper that doesn't disrupt the Aggregate→Scan operand chain AggregateIndexScanRule.AGGREGATE_SCAN matches. The BY case is unchanged — it already has an equi-join condition (or IS NOT DISTINCT FROM for bucketNullable=true) which Calcite handles correctly via EnumerableMergeJoin. Test expectation updates: - CalcitePPLEventstatsTest.testEventstatsCount / testEventstatsAvg - CalcitePPLEventstatsEarliestLatestTest no-BY variants (4 tests) - explain_eventstats_dc.json (no-BY, pushdown) - explain_eventstats_earliest_latest_no_group.json (no-BY, both modes) Outer LogicalProject now appears in the no-BY case because we must strip the literal-0 key columns from the join output — it's no longer a no-op passthrough that Calcite folds. Signed-off-by: Jialiang Liang <jiallian@amazon.com>

RyanL1997 · 2026-06-03T00:23:29Z

Perf finding & fix on the no-BY rewrite

An internal review flagged a concern about the cost of the two-scan + coordinator-join shape vs the original window form. Ran a local A/B against a 20k-doc accounts index to put numbers on it.

Initial measurement uncovered a real problem

Scenario	Before rewrite (legacy window)	With rewrite (commit before this fix)
`...	head 10` (10-row result)	~234ms, ~13 OS calls
no `head` (10k-row result)	~169ms, ~1 OS call	~1055ms, ~10,004 OS calls ❌

For the unbounded result-set case, the no-BY rewrite was triggering one OpenSearch search request per left-side row.

Root cause

The no-BY case emitted:

EnumerableNestedLoopJoin(condition=[true], joinType=[inner])
  leftScan   ← returns N rows
  rightScan  ← returns 1 row (the COUNT() scalar)

EnumerableNestedLoopJoin calls Enumerable.enumerator() on the right side per left tuple. Each enumerator open on a CalciteEnumerableIndexScan triggers a fresh _search request. For a 10k-row left side that means 10k OS calls. On a remote cluster (1–10ms RTT per call) that would blow up to tens of seconds.

The BY case never hit this — it already has an equi-join condition (or IS NOT DISTINCT FROM for bucketNullable=true) which lets Calcite pick EnumerableMergeJoin and drain one side once.

Fix (commit `6f333fc22`)

For the no-BY case, project a literal-0 key column onto both sides and join on equality:

Left: append the key after the original columns.
Right: append the key after the aggregate outputs.
Join condition: left.__eventstats_join_key__ = right.__eventstats_join_key__.

The equi-join condition makes the planner pick EnumerableHashJoin, which drains the single-row right side once into a hash table and probes per left row in O(1). Final projection strips the two key columns so the output row type is unchanged.

After-fix measurement (same data, same node, same queries)

Scenario	After fix
`...	head 10`
no `head` (10k-row result)	~174ms warm, 4 OS calls across 5 runs

That's ~6× faster wall and ~2,500× fewer OS round-trips for the unbounded case. Latency for the head-bounded case ticks up slightly (~30ms) because of the extra projection + hash-join overhead at small scale, but the absolute numbers are well under one round-trip.

Pushdown verification

Confirmed via EXPLAIN that the right-side aggregate still pushes down with the new shape:

EnumerableHashJoin(condition=[=($1, $4)], joinType=[inner])
  EnumerableCalc(... LITERAL(0) as join_key)
    CalciteEnumerableIndexScan(
      PushDownContext=[[AGGREGATION->...COUNT()]],
      sourceBuilder={"size":0,"track_total_hits":2147483647})
  EnumerableCalc(... LITERAL(0) as join_key)
    CalciteEnumerableIndexScan(... full _source path ...)

The literal-0 projection wraps the aggregate but doesn't disrupt the Aggregate → Scan operand chain that AggregateIndexScanRule.AGGREGATE_SCAN matches — pushdown fires normally and the right side still issues exactly one size:0 + track_total_hits (or cardinality for dc(), or composite + top_hits for earliest/latest) request.

Test impact

Unit-test expectations updated for 2 tests in CalcitePPLEventstatsTest and 4 in CalcitePPLEventstatsEarliestLatestTest (the no-BY variants now show an outer LogicalProject because the literal-0 key columns must be stripped). 3 EXPLAIN expected files regenerated (explain_eventstats_dc.json pushdown, explain_eventstats_earliest_latest_no_group.json in both modes).

BY-case tests are unchanged — they already use equi-join conditions.

github-actions · 2026-06-03T00:24:27Z

Persistent review updated to latest commit 6f333fc

RyanL1997 added the bugFix label Jun 2, 2026

spotless: rewrap predicate JavaDoc

f19e1d9

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

RyanL1997 added the enhancement New feature or request label Jun 2, 2026

RyanL1997 changed the title ~~[BugFix] Push eventstats down by rewriting RexOver to Join + Aggregate (#5483)~~ [Enhancement] Push eventstats down by rewriting RexOver to Join + Aggregate (#5483) Jun 2, 2026

RyanL1997 added the PPL Piped processing language label Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Push eventstats down by rewriting RexOver to Join + Aggregate (#5483)#5495

[Enhancement] Push eventstats down by rewriting RexOver to Join + Aggregate (#5483)#5495
RyanL1997 wants to merge 8 commits into
opensearch-project:mainfrom
RyanL1997:worktree-eventstats-pushdown-5483

RyanL1997 commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

RyanL1997 commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RyanL1997 commented Jun 2, 2026

Description

Design

Coverage against the issue's 7-requirement comment

Verification — status

Related Issues

Check List

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 6f333fc)

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

RyanL1997 commented Jun 3, 2026

Perf finding & fix on the no-BY rewrite

Initial measurement uncovered a real problem

Root cause

Fix (commit 6f333fc22)

After-fix measurement (same data, same node, same queries)

Pushdown verification

Test impact

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 2, 2026 •

edited

Loading

(Review updated until commit `6f333fc`)

github-actions Bot commented Jun 2, 2026 •

edited

Loading

Fix (commit `6f333fc22`)