fix(optimizer)!: normalization of synthesized output aliases#7816
fix(optimizer)!: normalization of synthesized output aliases#7816fivetran-kwoodbeck wants to merge 13 commits into
Conversation
|
@fivetran-kwoodbeck can you remind me what the motivation behind this was? Was this a bug, like changing the semantics of the output compared to the input? Or is it just an aesthetic/consistency improvement? |
@georgesittas The motivation is to ensure the optimizer both idempotent and to make sure the generated aliases are consistent with each dialect's |
99348c9 to
9654789
Compare
This comment was marked as outdated.
This comment was marked as outdated.
| quoted_columns = ( | ||
| {s.output_name: _output_identifier_quoted(s) for s in source_expression.selects} | ||
| if isinstance(source_expression, exp.Query) | ||
| else {} | ||
| ) |
There was a problem hiding this comment.
Nit: I'd make this a set.
| quoted_columns = ( | |
| {s.output_name: _output_identifier_quoted(s) for s in source_expression.selects} | |
| if isinstance(source_expression, exp.Query) | |
| else {} | |
| ) | |
| quoted_columns = { | |
| s.output_name | |
| for s in source_expression.selects | |
| if isinstance(source_expression, exp.Query) and _output_identifier_quoted(s) | |
| } |
There was a problem hiding this comment.
By the way, is this costly to construct within the for table in tables loop instead of once? Doesn't this repeat work? Is there a better way to implement this?
There was a problem hiding this comment.
I'm not sure there's a better way, for example, I don't see if (or how) to reliably access this in the for name in columns: loop.
There was a problem hiding this comment.
Yeah feel free to ignore my last comment above. It's fine.
There was a problem hiding this comment.
That snippet crashes the test suite. The if isinstance(source_expression, exp.Query) needs to run outside of the loop, not inside.
| def qualify_outputs(scope_or_expression: Scope | exp.Expr) -> None: | ||
| def qualify_outputs(scope_or_expression: Scope | exp.Expr, dialect: Dialect | None = None) -> None: |
There was a problem hiding this comment.
@fivetran-kwoodbeck let's pass TSQL() to the qualify_outputs call in qualify_derived_table_outputs. I don't think the dialect should be optional here; the normalize_identifier call in L998 should be unconditional.
| "source"."name" AS "NAME", | ||
| "source"."payload" AS "PAYLOAD" |
There was a problem hiding this comment.
This is incorrect @fivetran-kwoodbeck. The schema contains the case-sensitive "name" and "payload". The test was explicitly asserting that the projected columns, after expanding stars, retains the case-sensitivity.
Observe:
(sqlglot) ➜ sqlglot git:(optimizer/bug-normalizealiases) ✗ runsf "create table test as select 1 as \"c\"; select * from test; select \"C\" from test"
create table test as select 1 as "c";
+----------------------------------+
| status |
|----------------------------------|
| Table TEST successfully created. |
+----------------------------------+
select * from test;
+---+
| c |
|---|
| 1 |
+---+
select "C" from test
╭─ Error ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ 000904 (42000): 01c56ad2-0009-d3dd-0001-b9fe095fc206: SQL compilation error: error line 1 at position 7 │
│ invalid identifier 'C' │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
| @@ -962,11 +986,16 @@ def qualify_outputs(scope_or_expression: Scope | exp.Expr) -> None: | |||
| if not selection.output_name: | |||
| selection.set("alias", exp.TableAlias(this=exp.to_identifier(f"_col_{i}"))) | |||
There was a problem hiding this comment.
What about this branch? This will still emit a lowercase _col_i in Snowflake. Is it normalized before quote_identifiers kicks in?
There was a problem hiding this comment.
You're (both) right, it's been fixed and a test was added.
478351a to
eb6c093
Compare
When the
optimizergenerates an output alias for an unaliased projection (inqualify_outputs), the synthesized alias does not conform to the dialect'sNORMALIZATION_STRATEGYThe patch calls
normalize_identifierto ensure the alias follow the dialect's casing rules (e.g. upper-folding in Snowflake). An update to star expansion (_expand_stars) was needed so that columns expanded from quoted source projections carry thequotedflag, keeping star output consistent with an explicit reference to the same column.Quoted columns keep their exact spelling, since folding them would change which column they resolve to.
Example: Snowflake (upper-folding)
Before
After