Skip to content

Centralize relation-scope binding and enforce duplicate alias validation in SQL planner#21617

Open
kosiew wants to merge 4 commits intoapache:mainfrom
kosiew:alias-conflict-01-21616
Open

Centralize relation-scope binding and enforce duplicate alias validation in SQL planner#21617
kosiew wants to merge 4 commits intoapache:mainfrom
kosiew:alias-conflict-01-21616

Conversation

@kosiew
Copy link
Copy Markdown
Contributor

@kosiew kosiew commented Apr 14, 2026

Which issue does this PR close?


Rationale for this change

The SQL planner previously performed duplicate relation alias validation inconsistently across different code paths. In particular:

  • Explicit joins could bypass duplicate alias checks entirely
  • Unaliased tables could collide due to simplistic key generation
  • Error diagnostics lacked source location information, making debugging difficult

This led to confusing or missing errors for users and inconsistent behavior depending on query structure.

This PR introduces a centralized relation-scope binding mechanism within the planner to ensure consistent validation and improved diagnostics.


What changes are included in this PR?

  • Introduced RelationScope and RelationBinding to track relation names and aliases within scoped FROM clauses

  • Added relation scope management to PlannerContext, including:

    • with_new_relation_scope
    • clear_relation_scopes
    • insert_relation_binding
  • Implemented duplicate relation detection using scoped bindings with span-aware diagnostics

  • Registered relation bindings during planning for:

    • Base tables
    • Aliases
    • Joins (including explicit joins)
  • Ensured nested joins and subqueries do not leak relation scopes

  • Updated query planning to reset relation scopes for subqueries

  • Refactored nested join handling to respect scope boundaries

  • Added detailed diagnostics with source span references for duplicate alias errors


Are these changes tested?

Yes. This PR includes comprehensive tests covering:

  • Duplicate aliases in explicit joins
  • Duplicate aliases in comma joins
  • Conflicts between unaliased table names and aliases
  • Valid queries with distinct aliases
  • Fully-qualified table names across schemas (ensuring no false positives)
  • Scope isolation for subqueries and nested joins

These tests ensure both correctness and regression coverage for the new behavior.


Are there any user-facing changes?

Yes.

  • Queries with duplicate relation aliases (previously allowed in some cases) will now correctly return planning errors
  • Error messages are improved and now include more precise diagnostics, including source locations where available

There are no breaking API changes, but stricter validation may cause previously accepted invalid queries to fail.


LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.


Additional Notes

This change lays the groundwork for further centralization of name resolution and validation logic in the SQL planner, potentially extending to CTEs and other relation sources in future work.

kosiew added 3 commits April 14, 2026 20:09
… management

- Added a private relation binding scope in `planner.rs` to catch duplicate relation aliases/names at SQL planning time with a planner diagnostic.
- Wrapped each `FROM` list in a fresh relation scope in `select.rs`.
- Registered relation bindings for base and joined relations in `join.rs`.
- Extracted alias/table bindings while preserving full `TableReference` display for unaliased tables in `relation/mod.rs`.
- Cleared inherited relation scopes for nested query planning in `query.rs`.
- Added regression tests for various scenarios, including duplicate explicit join aliases, distinct aliases, and nested subquery scope isolation in `sql_integration.rs`.
- Used `HashMap::entry` and `Default` for relation-scope insertion in `planner.rs`
- Simplified relation alias binding with a private helper and removed repetitive `display_name.clone()` calls in `relation/mod.rs`
- Replaced the nested-join planning branch with a private helper in `relation/mod.rs`
- Consolidated duplicate alias tests with a small assertion helper in `sql_integration.rs`
- Trimmed RelationBinding to store only the span.
- Updated duplicate diagnostic to use the occupied map key for prior binding name.
- Renamed nested-join helper flag from `has_alias` to `is_aliased_nested_join`.
- Added regression tests for:
  - Unaliased relation colliding with an alias: `person JOIN orders person`.
  - Qualified same-leaf relation names not colliding: `public.orders JOIN other.orders`.
@github-actions github-actions bot added the sql SQL Planner label Apr 14, 2026
…nner-level error for DataFusion

- Updated expectations to reflect the new error message: "DataFusion error: Error during planning: duplicate relation alias or name 't1'"
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Apr 14, 2026
@kosiew kosiew marked this pull request as ready for review April 14, 2026 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant