Skip to content

[Enhancement] Support bare-field join criteria join on <field> shorthand#5517

Open
Hanyu-W wants to merge 1 commit into
opensearch-project:mainfrom
Hanyu-W:ppl-join-shorthand
Open

[Enhancement] Support bare-field join criteria join on <field> shorthand#5517
Hanyu-W wants to merge 1 commit into
opensearch-project:mainfrom
Hanyu-W:ppl-join-shorthand

Conversation

@Hanyu-W
Copy link
Copy Markdown

@Hanyu-W Hanyu-W commented Jun 4, 2026

Description

Lets join on/where criteria accept a bare field name (or an AND chain of bare fields) as shorthand for the qualified equality on that common field. Before this change, the only way to join on a shared column was to spell out both qualifiers:

source=t1 | inner join on a t2          == on t1.a = t2.a
source=t1 | inner join on a AND b t2     == on t1.a = t2.a AND t1.b = t2.b
source=t1 | join on a t2                 == on t1.a = t2.a   (no prefix/alias)

The shorthand behaves exactly like the explicit on l.f = r.f it sugars: it keeps both key columns (the right key surfaced as <rightAlias>.f, or <rightTable>.f when no alias is given). This is intentionally different from the field-list join join f t2, which merges the duplicate key into a single column.

Two things made this more than a one-line grammar tweak, and shaped the design:

  1. Grammar ambiguity. on is a keyword that can also be an identifier, so the no-prefix form join on a t2 was greedily consumed by the field-list alternative — on parsed as a field name, failing with field [on] not found. Reordering the joinCommand alternatives so the criteria alternative is listed first makes ANTLR ALL(*) prefer reading on as the criteria keyword on genuine ambiguity, with zero change to the field-list form's behavior.
  2. Qualifier resolution. A bare on a carries no table qualifier, but the keep-both planner branch needs left.a = right.a. The left table name isn't known at parse time — it's only known once both inputs are attached. So the rewrite is deferred to Join.attach, where the qualifier is taken from the side alias (left=/right=/AS) when present, and otherwise from the underlying table name (walking the single-child plan chain down to the Relation). An un-aliased subquery side with no reachable table name throws a SemanticCheckException telling the user to add an alias.

This keeps the planner 100% unchanged — the rewritten criteria is an ordinary Compare("=", l.f, r.f) that flows through the existing keep-both join branch; all shorthand-specific logic lives in the AST node and grammar.

Resolves #5484.

Changes

Area File Change
Grammar ppl/src/main/antlr/OpenSearchPPLParser.g4 Reorder joinCommand so the criteria alternative precedes the field-list alternative, so join on a reads on as the criteria keyword instead of a field
Parser ppl/.../ppl/parser/AstBuilder.java Leave a bare single-part field (or AND chain) as the join condition rather than treating it as a field list
Rewrite core/.../ast/tree/Join.java attach() rewrites the bare-field condition into the qualified equality l.f = r.f using side aliases or table names; joinCondition made non-final; new rewriteBareFieldCriteria/collectBareFields/buildQualifiedEqualityConjunction/qualifierOf/findRelation helpers; SemanticCheckException for an unresolvable un-aliased side
Docs docs/user/ppl/cmd/join.md Shorthand examples in both syntax sections, <joinCriteria> parameter descriptions, and the field-dedup limitation note
Tests ppl/.../calcite/CalcitePPLJoinTest.java, ppl/.../parser/AstBuilderTest.java, ppl/.../utils/PPLQueryDataAnonymizerTest.java Calcite-plan, AST-rewrite, and anonymizer coverage
QA (IT) integ-test/.../calcite/remote/CalcitePPLJoinIT.java End-to-end coverage on a live cluster

Testing

CalcitePPLJoinIT on a live cluster, exercising the new shorthand against pre-change behavior:

Test Exercises Before After
testJoinWithImplicitField inner join on name — keep-both; asserts shorthand output is identical to explicit on l.name = r.name; co-named columns resolve to the left
testJoinNoPrefixImplicitField join on name — no prefix; on parsed as the criteria keyword, identical to the explicit form (not the field-list merge)
testLeftJoinWithImplicitField left join on name — unmatched-left rows keep the left key (no NULL key; keep-both does not coalesce)
CalcitePPLJoinIT (new) total 0/3 3/3

Before = the same query run against pre-change code (parse/plan error — the bare field was read as a field name, e.g. field [on] not found).

Unit suites (all green, run locally):

  • CalcitePPLJoinTest: 50/50 (10 new) — plan shape for single/multi-field, alias, where, no-prefix, left/semi, self-join tautology, and not-on-both-sides cases.
  • AstBuilderTest: 158/158 (11 new) — rewrite-vs-passthrough decisions (bare field rewritten; qualified/comparison criteria untouched; field-list still parses as a field list; prefix-without-criteria is a syntax error).
  • PPLQueryDataAnonymizerTest: 119/119 (1 new) — shorthand anonymizes correctly.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

PR Reviewer Guide 🔍

(Review updated until commit 19e6693)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The collectBareFields method does not prevent duplicate field names from being added to the out list. If the join condition is on a AND a, the list will contain ["a", "a"], which may lead to duplicate join conditions like a = a AND a = a. This could cause unexpected behavior or inefficiency in the generated query plan.

static boolean collectBareFields(UnresolvedExpression expr, List<String> out) {
  if (expr instanceof And and) {
    return collectBareFields(and.getLeft(), out) && collectBareFields(and.getRight(), out);
  }
  if (expr instanceof Field field
      && field.getField() instanceof QualifiedName qn
      && qn.getParts().size() == 1) {
    out.add(qn.getParts().get(0));
    return true;
  }
  return false;
}
Possible Issue

When bareFields is empty after collectBareFields returns true, the reduce operation produces Optional.empty(), causing orElse(context.relBuilder.literal(true)) to generate a tautology join condition. This can occur if the input expression is a degenerate AND chain with no actual fields. The result is a cross join instead of the intended equi-join, potentially causing a massive performance issue or incorrect results.

bareFields.stream()
    .map(f -> buildJoinConditionByFieldName(context, f))
    .reduce(context.rexBuilder::and)
    .orElse(context.relBuilder.literal(true));

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

PR Code Suggestions ✨

Latest suggestions up to 19e6693
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent partial list modification on failure

The method modifies the out list even when returning false, which can leave it in an
inconsistent state. If the left side of an And succeeds but the right side fails,
the list will contain partial results. Clear the list or use a temporary collection
when the method returns false.

core/src/main/java/org/opensearch/sql/calcite/utils/JoinAndLookupUtils.java [49-60]

 static boolean collectBareFields(UnresolvedExpression expr, List<String> out) {
   if (expr instanceof And and) {
-    return collectBareFields(and.getLeft(), out) && collectBareFields(and.getRight(), out);
+    int sizeBefore = out.size();
+    if (!collectBareFields(and.getLeft(), out)) {
+      out.subList(sizeBefore, out.size()).clear();
+      return false;
+    }
+    if (!collectBareFields(and.getRight(), out)) {
+      out.subList(sizeBefore, out.size()).clear();
+      return false;
+    }
+    return true;
   }
   if (expr instanceof Field field
       && field.getField() instanceof QualifiedName qn
       && qn.getParts().size() == 1) {
     out.add(qn.getParts().get(0));
     return true;
   }
   return false;
 }
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a critical issue where the out list can be left in an inconsistent state if collectBareFields returns false after partially populating it. The improved code properly cleans up partial modifications by tracking the size before recursion and clearing added elements on failure.

Medium
Prevent empty field list cross join

The orElse(context.relBuilder.literal(true)) fallback creates a tautology when
bareFields is empty, which would produce a cross join instead of failing. Since the
grammar requires at least one field in the join criteria, an empty list should be
treated as an error rather than silently creating an unintended cross join.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [1893-1901]

 List<String> bareFields = new ArrayList<>();
 if (JoinAndLookupUtils.collectBareFields(node.getJoinCondition().get(), bareFields)) {
   ...
+  if (bareFields.isEmpty()) {
+    throw new IllegalStateException("Join criteria must contain at least one field");
+  }
   joinCondition =
       bareFields.stream()
           .map(f -> buildJoinConditionByFieldName(context, f))
           .reduce(context.rexBuilder::and)
-          .orElse(context.relBuilder.literal(true));
+          .get();
Suggestion importance[1-10]: 7

__

Why: The suggestion identifies a potential issue where an empty bareFields list would create a tautology condition. However, the grammar and collectBareFields logic should prevent this scenario from occurring in practice, making this more of a defensive check than a critical bug fix.

Medium

Previous suggestions

Suggestions up to commit 374d412
CategorySuggestion                                                                                                                                    Impact
Possible issue
Clear partial results on validation failure

The method returns false when encountering non-bare-field expressions (like
comparisons or OR), but doesn't validate that the accumulated out list is non-empty
before returning. If a mixed expression like a AND (b = c) is encountered, the
method will return false after partially populating out, leaving it in an
inconsistent state that could cause incorrect behavior in the caller.

core/src/main/java/org/opensearch/sql/ast/tree/Join.java [108-120]

 private static boolean collectBareFields(UnresolvedExpression expr, List<String> out) {
+  int initialSize = out.size();
   // AND is left-recursive (a AND b == And(a,b)); recurse both sides to flatten.
   if (expr instanceof And and) {
-    return collectBareFields(and.getLeft(), out) && collectBareFields(and.getRight(), out);
+    boolean result = collectBareFields(and.getLeft(), out) && collectBareFields(and.getRight(), out);
+    if (!result) {
+      out.subList(initialSize, out.size()).clear();
+    }
+    return result;
   }
   if (expr instanceof Field field
       && field.getField() instanceof QualifiedName qn
       && qn.getParts().size() == 1) {
     out.add(qn.getParts().get(0));
     return true;
   }
   return false;
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential issue where out could be left in an inconsistent state if a mixed expression is encountered. Clearing partial results on validation failure ensures the list remains consistent, improving robustness.

Medium
General
Prevent stack overflow in recursion

The recursive traversal lacks depth protection, which could cause a stack overflow
if the plan tree is deeply nested or contains cycles. Although cycles are unlikely
in a well-formed AST, adding a depth limit provides a safety guard against malformed
input or implementation bugs.

core/src/main/java/org/opensearch/sql/ast/tree/Join.java [155-164]

 private static Relation findRelation(UnresolvedPlan plan) {
+  return findRelation(plan, 0);
+}
+
+private static Relation findRelation(UnresolvedPlan plan, int depth) {
+  if (depth > 100) {
+    throw new SemanticCheckException("join plan tree exceeds maximum depth");
+  }
   if (plan instanceof Relation relation) {
     return relation;
   }
   List<? extends Node> children = plan.getChild();
   if (children.size() == 1 && children.get(0) instanceof UnresolvedPlan child) {
-    return findRelation(child);
+    return findRelation(child, depth + 1);
   }
   return null;
 }
Suggestion importance[1-10]: 6

__

Why: Adding a depth limit to the recursive findRelation method is a reasonable safeguard against stack overflow from deeply nested or malformed plan trees. While unlikely in practice, this defensive measure improves robustness with minimal overhead.

Low
Add explicit empty list validation

The orElseThrow will never be reached because rewriteBareFieldCriteria calls this
method only after collectBareFields returns true, which guarantees at least one
field in the list. However, if the list is empty due to a logic error elsewhere, the
exception message won't be thrown, making debugging harder. Add an explicit
precondition check at the method entry.

core/src/main/java/org/opensearch/sql/ast/tree/Join.java [123-136]

 private static UnresolvedExpression buildQualifiedEqualityConjunction(
     String leftQual, String rightQual, List<String> fieldNames) {
+  if (fieldNames.isEmpty()) {
+    throw new SemanticCheckException("join 'on' shorthand requires at least one field");
+  }
   return fieldNames.stream()
       .map(
           f ->
               (UnresolvedExpression)
                   new Compare(
                       "=",
                       new Field(QualifiedName.of(leftQual, f)),
                       new Field(QualifiedName.of(rightQual, f))))
       .reduce((l, r) -> new And(l, r))
-      .orElseThrow(
-          () -> new SemanticCheckException("join 'on' shorthand requires at least one field"));
+      .get();
Suggestion importance[1-10]: 5

__

Why: While the orElseThrow is indeed unreachable given the current call path, adding an explicit precondition check at method entry improves defensive programming and makes the code more maintainable. The suggestion correctly replaces .orElseThrow() with .get() after the check.

Low

@RyanL1997 RyanL1997 added bugFix enhancement New feature or request and removed bugFix labels Jun 4, 2026
@RyanL1997 RyanL1997 changed the title feat(ppl): support bare-field join criteria join on <field> shorthand [Enhancement] Support bare-field join criteria join on <field> shorthand Jun 4, 2026
new Field(QualifiedName.of(rightQual, f))))
.reduce((l, r) -> new And(l, r))
.orElseThrow(
() -> new SemanticCheckException("join 'on' shorthand requires at least one field"));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the only error case that can throw here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were two (this one, and an empty-list guard above it). Both are removed in the redesign, so neither throws anymore.

private static String qualifierOf(UnresolvedPlan plan, String side) {
Relation relation = findRelation(plan);
if (relation == null) {
throw new SemanticCheckException(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice detail on the error message!

consider using the ReportBuilder to split out suggestions from the problem

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That error path is going away — once matching is done by position in the planner, there's no "must be a named table" restriction, so the exception it was attached to is deleted.

if (rightAlias.isEmpty() && this.right instanceof SubqueryAlias alias) {
rightAlias = Optional.of(alias.getAlias());
}
rewriteBareFieldCriteria();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is putting it here the right boundary over resolving these fields in the constructor (or even frontloading it to the AST builder)?

Putting the rewrite here introduces a weird boundary, where now Join plays the role of dynamically parsing itself during attachment based directly on AST grammar choices. That multiple places comment "we actually resolve these fields in attach()" also hints that this is unintuitive.

It might be clearer to put this in the constructor. Alternatively, it's a little more work, but we could consider moving all of leftAlias, rightAlias, joinFields and similar into a dedicated JoinClause data type that supports operations like clause.expandBareFields() and clause.resolveAliases()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, same concern I called out here

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — that's the fix I'm making. I'm going to move the expansion into visitJoin instead of attach(), so the AST keeps query how wrote it and the analysis happens in the planner where it belongs. Explain output will now match the original query too.

Yeah, I'll make the change Ryan suggested to fix this.

Comment thread docs/user/ppl/cmd/join.md Outdated
| `source=table1 \| join right=tt on table1.id=t2.id [ source=table2 as t2 \| eval b = id ] \| eval a = 1` | `table1.id, tt.id, tt.b, a` |

* **Field deduplication in extended syntax** – When using the extended syntax with a field list, duplicate field names in the output are deduplicated according to the `overwrite` option.
* **Field deduplication in extended syntax** – When using the extended syntax with a field list (`<join-field-list>`), duplicate field names are deduplicated according to the `overwrite` option. The bare-field `on`/`where` shorthand keeps both key columns, unlike the field-list form.
Copy link
Copy Markdown
Collaborator

@Swiddis Swiddis Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the query grammar at the top of the doc wasn't updated to show what join-field-list is

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I'll cut it because it isn't adding anything valuable to the docs.

Copy link
Copy Markdown
Collaborator

@RyanL1997 RyanL1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Hanyu-W , thanks for the change, and just left some reviews.

* are known; qualifier is the side alias else the table name. Non-bare criteria (qualified, OR,
* comparisons) are left untouched.
*/
private void rewriteBareFieldCriteria() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method resolves the left and right qualifiers independently, with no check that they differ. When both sides are the same unaliased table (e.g. source=EMP | inner join on DEPTNO EMP), qualifierOf returns "EMP" for both, and the conjunction it builds becomes EMP.DEPTNO = EMP.DEPTNO — a tautology that Calcite simplifies to IS NOT NULL(...), not an equi-join. The user gets a near-cross-product with no error or warning.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be fixed now I'm moving it into the planner, as the planner already knows which input is left and which is right by position, so doing the match there produces a correct equi-join even for join on DEPTNO EMP.

if (rightAlias.isEmpty() && this.right instanceof SubqueryAlias alias) {
rightAlias = Optional.of(alias.getAlias());
}
rewriteBareFieldCriteria();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performing semantic rewriting inside the AST node's attach() mixes tree construction with analysis, which usually lives in the analyzer layer here. The unresolved tree no longer reflects what the user wrote, which can make explain output and debugging confusing. Was an analyzer pass considered?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — that's the fix I'm making. I'm going to move the expansion into visitJoin instead of attach(), so the AST
keeps query how wrote it and the analysis happens in the planner where it belongs. Explain output will now match the original query too.

new Field(QualifiedName.of(leftQual, f)),
new Field(QualifiedName.of(rightQual, f))))
.reduce((l, r) -> new And(l, r))
.orElseThrow(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This orElseThrow looks unreachable: rewriteBareFieldCriteria early-returns when no bare fields are collected, and collectBareFields only returns true after adding at least one. Can you double check?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's unreachable. The method will be deleted in the redesign

}

/** Walk the single-child chain down to the underlying Relation. */
private static Relation findRelation(UnresolvedPlan plan) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bails as soon as it hits a node with !=1 child. Please add a test for an aliased subquery with a bare field (source=t1 | join on a [ source=t2 ] as s) to confirm it uses the alias and never reaches qualifierOf.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method will get deleted too, but I'll still add the aliased-subquery test to confirm.

private final JoinType joinType;
private final Optional<UnresolvedExpression> joinCondition;
// Not final: bare-field `on <field>` shorthand is rewritten to qualified equality in attach().
private Optional<UnresolvedExpression> joinCondition;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping final to allow mutation in attach() weakens this AST node's immutability. If attach() ever runs twice or the node is shared, the rewrite re-runs against already-rewritten state. It's idempotent in practice (the qualified result fails collectBareFields), but the invariant is fragile — consider a guard flag or doing this rewrite in an analyzer pass instead of mutating here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll restore final as the redesign should no longer mutate the AST here

+ " COMM=[$6], DEPTNO=[$7], EMP.EMPNO=[$8], EMP.ENAME=[$9], EMP.JOB=[$10],"
+ " EMP.MGR=[$11], EMP.HIREDATE=[$12], EMP.SAL=[$13], EMP.COMM=[$14],"
+ " EMP.DEPTNO=[$15])\n"
+ " LogicalJoin(condition=[AND(IS NOT NULL($15), IS NOT NULL($13))],"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This asserts AND(IS NOT NULL($15), IS NOT NULL($13)) as expected output, which locks in the self-join tautology as if intended. Should flip to an expected-exception test once the guard is added.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually go the other way: instead of making the self-join an error, the redesign makes it work correctly. So I'll update these to assert the real join condition rather than the tautology — the query is valid, it was just being built wrong

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree.

+ " COMM=[$6], DEPTNO=[$7], EMP.EMPNO=[$8], EMP.ENAME=[$9], EMP.JOB=[$10],"
+ " EMP.MGR=[$11], EMP.HIREDATE=[$12], EMP.SAL=[$13], EMP.COMM=[$14],"
+ " EMP.DEPTNO=[$15])\n"
+ " LogicalJoin(condition=[IS NOT NULL($15)], joinType=[inner])\n"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as the above— this enshrines IS NOT NULL($15) as correct. The test name even admits it's a tautology. Convert to expected-exception, or revisit once self-join handling is decided.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually go the other way: instead of making the self-join an error, the redesign makes it work correctly. So I'll update these to assert the real join condition rather than the tautology — the query is valid, it was just being built wrong

same response as above

Let `on`/`where` join criteria accept a bare field name (or `AND` of bare
fields) as shorthand for the qualified equality on that common field:

  source=t1 | inner join on a t2          == on t1.a = t2.a
  source=t1 | inner join on a AND b t2    == on t1.a = t2.a AND t1.b = t2.b
  source=t1 | join on a t2                == on t1.a = t2.a   (no prefix/alias)

The shorthand behaves exactly like the explicit `on l.f = r.f` criteria: it
KEEPS BOTH key columns (the right key renamed `<rightAlias>.f`, or
`<rightTable>.f` when no alias is given). This differs from the field-list
join (`join f t2`), which merges the duplicate key to a single column.

Implementation:
- Grammar: reorder the joinCommand alternatives so the criteria alternative
  is listed first. This lets the no-prefix form `join on a <right>` parse `on`
  as the criteria keyword instead of the field-list alternative greedily
  consuming `on`/`where` as a field name. ANTLR ALL(*) picks the first-listed
  alternative on genuine ambiguity.
- AstBuilder: a bare single-part field (or AND-chain of them) is left as the
  join condition verbatim -- the unresolved AST reflects what the user wrote.
- Planner (CalciteRelNodeVisitor.visitJoin): the criteria branch detects a
  bare-field condition and builds the equi-join from it, resolving each field
  by stack position (LEFT.f = RIGHT.f) via the existing
  buildJoinConditionByFieldName helper. No new planner code, and no AST
  mutation. Because resolution is positional rather than by qualifier, a
  self-join `join on f t` is a genuine cross-scan equi-join rather than the
  `f = f` tautology a name-based rewrite would collapse to.

Adds AstBuilder, Calcite plan, anonymizer, and integration tests; updates the
join command user manual.

Signed-off-by: Hanyu Wei <weihanyu@amazon.com>
@Hanyu-W Hanyu-W force-pushed the ppl-join-shorthand branch from 374d412 to 19e6693 Compare June 5, 2026 16:36
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Persistent review updated to latest commit 19e6693

@anasalkouz anasalkouz added the PPL Piped processing language label Jun 5, 2026
@anasalkouz anasalkouz added the calcite calcite migration releated label Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support implicit field joins when fields match across indices

4 participants