Skip to content

fix(optimizer)!: normalization of synthesized output aliases#7816

Open
fivetran-kwoodbeck wants to merge 13 commits into
mainfrom
optimizer/bug-normalizealiases
Open

fix(optimizer)!: normalization of synthesized output aliases#7816
fivetran-kwoodbeck wants to merge 13 commits into
mainfrom
optimizer/bug-normalizealiases

Conversation

@fivetran-kwoodbeck

@fivetran-kwoodbeck fivetran-kwoodbeck commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

When the optimizer generates an output alias for an unaliased projection (in qualify_outputs), the synthesized alias does not conform to the dialect's NORMALIZATION_STRATEGY

The patch calls normalize_identifier to ensure the alias follow the dialect's casing rules (e.g. upper-folding in Snowflake). An update to star expansion (_expand_stars) was needed so that columns expanded from quoted source projections carry the quoted flag, keeping star output consistent with an explicit reference to the same column.

Quoted columns keep their exact spelling, since folding them would change which column they resolve to.

Example: Snowflake (upper-folding)

SELECT OBJECT_CONSTRUCT(*) FROM (SELECT a, b FROM x) AS t;

Before

SELECT OBJECT_CONSTRUCT(*) AS _col_0 FROM (SELECT a AS a, b AS b FROM x AS x) AS t;

After

SELECT OBJECT_CONSTRUCT(*) AS _COL_0 FROM (SELECT a AS A, b AS B FROM x AS x) AS t;

Comment thread sqlglot/optimizer/qualify_columns.py
@georgesittas

georgesittas commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

@fivetran-kwoodbeck can you remind me what the motivation behind this was? Was this a bug, like changing the semantics of the output compared to the input? Or is it just an aesthetic/consistency improvement?

Comment thread sqlglot/optimizer/qualify_columns.py Outdated
Comment thread sqlglot/optimizer/qualify_columns.py Outdated
Comment thread sqlglot/optimizer/qualify_columns.py Outdated
Comment thread sqlglot/optimizer/qualify_columns.py Outdated
Comment thread sqlglot/optimizer/qualify_columns.py Outdated
@georgesittas georgesittas changed the title fix(optimizer): normalization of synthesized output aliases fix(optimizer)!: normalization of synthesized output aliases Jun 30, 2026
@fivetran-kwoodbeck fivetran-kwoodbeck changed the title fix(optimizer)!: normalization of synthesized output aliases fix(optimizer): normalization of synthesized output aliases Jun 30, 2026
@fivetran-kwoodbeck

fivetran-kwoodbeck commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator Author

@fivetran-kwoodbeck can you remind me what the motivation behind this was? Was this a bug, like changing the semantics of the output compared to the input? Or is it just an aesthetic/consistency improvement?

@georgesittas The motivation is to ensure the optimizer both idempotent and to make sure the generated aliases are consistent with each dialect's NORMALIZATION_STRATEGY. If idempotency is a requirement for the optimizer, then yes, it's a bug.

@fivetran-kwoodbeck fivetran-kwoodbeck force-pushed the optimizer/bug-normalizealiases branch from 99348c9 to 9654789 Compare June 30, 2026 16:38
Comment thread tests/test_optimizer.py
Comment thread tests/test_optimizer.py Outdated
Repository owner deleted a comment from github-actions Bot Jun 30, 2026
@github-actions

This comment was marked as outdated.

Comment on lines +855 to +859
quoted_columns = (
{s.output_name: _output_identifier_quoted(s) for s in source_expression.selects}
if isinstance(source_expression, exp.Query)
else {}
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd make this a set.

Suggested change
quoted_columns = (
{s.output_name: _output_identifier_quoted(s) for s in source_expression.selects}
if isinstance(source_expression, exp.Query)
else {}
)
quoted_columns = {
s.output_name
for s in source_expression.selects
if isinstance(source_expression, exp.Query) and _output_identifier_quoted(s)
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, is this costly to construct within the for table in tables loop instead of once? Doesn't this repeat work? Is there a better way to implement this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's a better way, for example, I don't see if (or how) to reliably access this in the for name in columns: loop.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah feel free to ignore my last comment above. It's fine.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That snippet crashes the test suite. The if isinstance(source_expression, exp.Query) needs to run outside of the loop, not inside.

Comment thread sqlglot/optimizer/qualify_columns.py Outdated
Comment on lines +939 to +963
def qualify_outputs(scope_or_expression: Scope | exp.Expr) -> None:
def qualify_outputs(scope_or_expression: Scope | exp.Expr, dialect: Dialect | None = None) -> None:

@georgesittas georgesittas Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-kwoodbeck let's pass TSQL() to the qualify_outputs call in qualify_derived_table_outputs. I don't think the dialect should be optional here; the normalize_identifier call in L998 should be unconditional.

Comment thread tests/test_optimizer.py Outdated
Comment on lines +2390 to +2391
"source"."name" AS "NAME",
"source"."payload" AS "PAYLOAD"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect @fivetran-kwoodbeck. The schema contains the case-sensitive "name" and "payload". The test was explicitly asserting that the projected columns, after expanding stars, retains the case-sensitivity.

Observe:

(sqlglot) ➜  sqlglot git:(optimizer/bug-normalizealiases) ✗ runsf "create table test as select 1 as \"c\"; select * from test; select \"C\" from test"
create table test as select 1 as "c";
+----------------------------------+
| status                           |
|----------------------------------|
| Table TEST successfully created. |
+----------------------------------+

select * from test;
+---+
| c |
|---|
| 1 |
+---+

select "C" from test
╭─ Error ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ 000904 (42000): 01c56ad2-0009-d3dd-0001-b9fe095fc206: SQL compilation error: error line 1 at position 7                                                                                 │
│ invalid identifier 'C'                                                                                                                                                                  │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Comment thread tests/test_optimizer.py
Comment thread tests/test_optimizer.py Outdated

This comment was marked as duplicate.

Comment thread sqlglot/optimizer/qualify_columns.py Outdated
Comment on lines 985 to 987
@@ -962,11 +986,16 @@ def qualify_outputs(scope_or_expression: Scope | exp.Expr) -> None:
if not selection.output_name:
selection.set("alias", exp.TableAlias(this=exp.to_identifier(f"_col_{i}")))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this branch? This will still emit a lowercase _col_i in Snowflake. Is it normalized before quote_identifiers kicks in?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, copilot made the same comment.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're (both) right, it's been fixed and a test was added.

@georgesittas georgesittas changed the title fix(optimizer): normalization of synthesized output aliases fix(optimizer)!: normalization of synthesized output aliases Jul 1, 2026
@fivetran-kwoodbeck fivetran-kwoodbeck force-pushed the optimizer/bug-normalizealiases branch from 478351a to eb6c093 Compare July 3, 2026 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants