Skip to content

feat(annotate_types): annotate ranking window functions with constant return types [CLAUDE]#7658

Merged
georgesittas merged 6 commits into
tobymao:mainfrom
RichardHughes-amp:register-ranking-window-typing
May 20, 2026
Merged

feat(annotate_types): annotate ranking window functions with constant return types [CLAUDE]#7658
georgesittas merged 6 commits into
tobymao:mainfrom
RichardHughes-amp:register-ranking-window-typing

Conversation

@RichardHughes-amp
Copy link
Copy Markdown
Contributor

@RichardHughes-amp RichardHughes-amp commented May 18, 2026

Replacing #7651.

The window functions DENSE_RANK, NTILE, RANK, and ROW_NUMBER always return an integer rank with no inputs influencing the type. PERCENT_RANK and CUME_DIST always return a value in [0, 1] and are always floating-point.

Default (base dialect): BIGINT for the rank functions, DOUBLE for the distribution functions. This matches PostgreSQL, SQL Server, Trino, Presto, Athena, and DuckDB.

Hive, Spark, Databricks, and Snowflake return INTEGER (32-bit) for the four rank functions. I verified this against the official documentation for each engine.

Comment thread sqlglot/typing/__init__.py
Comment thread sqlglot/typing/hive.py
Comment thread sqlglot/typing/hive.py
RichardHughes-amp and others added 3 commits May 19, 2026 15:45
…strations [CLAUDE]

Ranking window functions (DenseRank/Ntile/Rank/RowNumber → BIGINT,
CumeDist/PercentRank → DOUBLE) are now provided by the base mapping in
sqlglot/typing/__init__.py and match BigQuery's documented return types
exactly. Removing them from the BigQuery override keeps the dialect file
focused on entries that actually diverge from the base.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…LAUDE]

PostgreSQL's NTILE returns integer while its other ranking window
functions (RANK, DENSE_RANK, ROW_NUMBER) return bigint. Empirically
verified via pg_typeof against PostgreSQL — matches the documented
behavior in https://www.postgresql.org/docs/current/functions-window.html.

Adds a new sqlglot/typing/postgres.py with the per-function override and
wires it into the Postgres dialect. Redshift is unaffected — its NTILE
returns BIGINT and it imports typing from the base, not from postgres.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per AWS Redshift docs, RANK() returns INTEGER while DENSE_RANK, NTILE,
and ROW_NUMBER return BIGINT (the base default). CUME_DIST and
PERCENT_RANK return FLOAT8 (DOUBLE, also base default).

https://docs.aws.amazon.com/redshift/latest/dg/r_WF_RANK.html
https://docs.aws.amazon.com/redshift/latest/dg/r_WF_DENSE_RANK.html

Documentation-only — not empirically verified against a Redshift cluster.
The asymmetry (RANK alone differing from the other three rank functions)
is unusual; if a future verification shows the docs are wrong, the
override is trivial to revisit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@georgesittas georgesittas merged commit bcc595b into tobymao:main May 20, 2026
8 checks passed
georgesittas added a commit that referenced this pull request May 20, 2026
…t return types [CLAUDE] (#7658)

* test(annotate_types): add failing tests for ranking window function return types [CLAUDE]

* feat(annotate_types): annotate ranking window functions with constant return types [CLAUDE]

* refactor(annotate_types): drop redundant BigQuery ranking window registrations [CLAUDE]

Ranking window functions (DenseRank/Ntile/Rank/RowNumber → BIGINT,
CumeDist/PercentRank → DOUBLE) are now provided by the base mapping in
sqlglot/typing/__init__.py and match BigQuery's documented return types
exactly. Removing them from the BigQuery override keeps the dialect file
focused on entries that actually diverge from the base.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(annotate_types): override PostgreSQL NTILE return type to INT [CLAUDE]

PostgreSQL's NTILE returns integer while its other ranking window
functions (RANK, DENSE_RANK, ROW_NUMBER) return bigint. Empirically
verified via pg_typeof against PostgreSQL — matches the documented
behavior in https://www.postgresql.org/docs/current/functions-window.html.

Adds a new sqlglot/typing/postgres.py with the per-function override and
wires it into the Postgres dialect. Redshift is unaffected — its NTILE
returns BIGINT and it imports typing from the base, not from postgres.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(annotate_types): override Redshift RANK return type to INT [CLAUDE]

Per AWS Redshift docs, RANK() returns INTEGER while DENSE_RANK, NTILE,
and ROW_NUMBER return BIGINT (the base default). CUME_DIST and
PERCENT_RANK return FLOAT8 (DOUBLE, also base default).

https://docs.aws.amazon.com/redshift/latest/dg/r_WF_RANK.html
https://docs.aws.amazon.com/redshift/latest/dg/r_WF_DENSE_RANK.html

Documentation-only — not empirically verified against a Redshift cluster.
The asymmetry (RANK alone differing from the other three rank functions)
is unusual; if a future verification shows the docs are wrong, the
override is trivial to revisit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com>
@RichardHughes-amp
Copy link
Copy Markdown
Contributor Author

LGTM, thanks.

Wonderful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants