Skip to content

feat(annotate_types): annotate ranking window functions with constant return types [CLAUDE]#7651

Closed
RichardHughes-amp wants to merge 7 commits into
tobymao:mainfrom
RichardHughes-amp:main
Closed

feat(annotate_types): annotate ranking window functions with constant return types [CLAUDE]#7651
RichardHughes-amp wants to merge 7 commits into
tobymao:mainfrom
RichardHughes-amp:main

Conversation

@RichardHughes-amp
Copy link
Copy Markdown
Contributor

@RichardHughes-amp RichardHughes-amp commented May 15, 2026

The window functions DenseRank, Ntile, Rank, and RowNumber are always BIGINT.
The window functions CumeDist and PercentRank are always DOUBLE.


I checked eight dialects - PostgreSQL, SQL Server, Databricks, Snowflake, BigQuery, Trino, ClickHouse, and Oracle - to verify this was correct. It approximately was. Three exceptions:

  • In PostgreSQL, NTILE returns integer (32-bit), not bigint (64-bit).
  • In ClickHouse, integer functions return UInt64 (unsigned), not signed bigint. There's no precedent for checking for that value in the annotate_functions.sql test, and it renders as Nullable(UInt64); I'm leery of implementing it.
  • In Trino, the CUME_DIST documentation said it returns a BIGINT. I don't think that's true. I think the docs are wrong.

I'm relatively certain these deviations from BIGINT and DOUBLE can be ignored.

RichardHughes-amp and others added 3 commits May 15, 2026 16:23
…type propagation

IGNORE NULLS and RESPECT NULLS wrappers should propagate their inner
expression's type, but the TypeAnnotator has no handler for them so
they return UNKNOWN even when the inner expression is correctly typed.

[CLAUDE]

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…wrappers

Introduce NullTreatment as a common base class for IgnoreNulls and
RespectNulls, then include it alongside Unary/Alias in the unary
annotator registration. Both wrappers carry a single `this` child and
never change the expression's type — only its null-handling behaviour —
so propagating `.this.type` is the correct semantic.

[CLAUDE]

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
… return types

RANK, DENSE_RANK, ROW_NUMBER, NTILE → BIGINT
PERCENT_RANK, CUME_DIST → DOUBLE

Registered in base EXPRESSION_METADATA so the annotation applies across
all dialects rather than requiring per-dialect overrides. [CLAUDE]
@RichardHughes-amp RichardHughes-amp marked this pull request as ready for review May 15, 2026 23:55
@geooo109
Copy link
Copy Markdown
Collaborator

Hello @RichardHughes-amp thanks for the pr.

Did you check Databricks and confirm that it returns BIGINT? Because Hive, Spark, and Databricks return INT for DENSE_RANK, NTILE, RANK, ROW_NUMBER.

@RichardHughes-amp
Copy link
Copy Markdown
Contributor Author

Did you check Databricks and confirm that it returns BIGINT? Because Hive, Spark, and Databricks return INT for DENSE_RANK, NTILE, RANK, ROW_NUMBER.

You're correct, it does return that, and I did check that. I thought that BIGINT was 'close enough'. I'll refine.

@RichardHughes-amp
Copy link
Copy Markdown
Contributor Author

RichardHughes-amp commented May 18, 2026

Replacing this with #7658.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants