Skip to content

[branch-54] Fix TopK DISTINCT aggregation preserving NULLs (#22571)#22634

Open
alamb wants to merge 1 commit into
branch-54from
alamb/backport_22571
Open

[branch-54] Fix TopK DISTINCT aggregation preserving NULLs (#22571)#22634
alamb wants to merge 1 commit into
branch-54from
alamb/backport_22571

Conversation

@alamb
Copy link
Copy Markdown
Contributor

@alamb alamb commented May 30, 2026

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #22554.

## Rationale for this change

TopK aggregation dropped NULL group keys for ordered DISTINCT queries.

For example, `SELECT DISTINCT v FROM t ORDER BY v ASC NULLS FIRST LIMIT
1` could return an empty string instead of NULL when TopK aggregation
was enabled.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

This PR preserves NULL group keys for DISTINCT TopK aggregation by
tracking whether a NULL group key was seen separately from the heap.

The heap still only stores non-NULL values. This avoids making the TopK
heap implementations handle NULL values directly.

The stream also now marks itself done after emitting, so NULL-only
DISTINCT results are emitted once and do not repeat.

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

Yes

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

No API Change

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
@github-actions github-actions Bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants