Skip to content

[flink] Push down lake filters for non-partitioned scans#3169

Merged
luoyuxia merged 6 commits intoapache:mainfrom
luoyuxia:flink-fix-lake-filter-pushdown
Apr 23, 2026
Merged

[flink] Push down lake filters for non-partitioned scans#3169
luoyuxia merged 6 commits intoapache:mainfrom
luoyuxia:flink-fix-lake-filter-pushdown

Conversation

@luoyuxia
Copy link
Copy Markdown
Contributor

@luoyuxia luoyuxia commented Apr 23, 2026

Purpose

Linked issue: close #3168

Allow FlinkTableSource to push filters into LakeSource for non-partitioned table scans. Before this change, lake filter pushdown was nested under isPartitioned(), so non-partitioned scans in lake-backed FULL reads always fell back to Flink-side filtering for historical data.

Brief change log

  • keep lake filter pushdown enabled for partitioned tables and extend the same path to non-partitioned tables
  • make lake pushdown only contribute accepted filter metadata so it does not bypass Fluss-side partition pruning or record-batch filtering
  • document why pushdownLakeFilters(...) updates acceptedFilters but intentionally does not rewrite remainingFilters
  • existing union-read ITs already exercise both partitioned and non-partitioned log tables through FlinkUnionReadLogTableITCase#testReadLogTableFullType(boolean isPartitioned) in both Iceberg and Paimon modules

Tests

  • ./mvnw -pl fluss-flink/fluss-flink-common -DskipITs test
  • ./mvnw -pl fluss-flink/fluss-flink-common spotless:check -Dspotless.check.skip=false
  • Existing IT note: FlinkUnionReadLogTableITCase#testReadLogTableFullType runs with isPartitioned = false and true; the partitioned branch additionally asserts partition filter pushdown in the plan

API and Format

No API or storage format changes.

Documentation

No documentation changes.

@luoyuxia luoyuxia requested a review from Copilot April 23, 2026 06:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Enables LakeSource filter pushdown for non-partitioned lake-backed scans by moving lake predicate handling out of the isPartitioned() branch while keeping Fluss-side pushdowns (partition pruning / record-batch filtering) authoritative and leaving Flink as a correctness backstop (FLINK-38635).

Changes:

  • Move lake filter pushdown to run regardless of table partitioning, after Fluss-side pushdowns are computed.
  • Add pushdownLakeFilters(...) helper that records accepted lake predicates as accepted Flink expressions without rewriting remaining filters.
  • Add in-code documentation clarifying why lake pushdown only updates acceptedFilters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +661 to +667
List<Predicate> acceptedLakePredicates = filterPushDownResult.acceptedPredicates();
for (int i = 0; i < lakePredicates.size(); i++) {
if (acceptedLakePredicates.contains(lakePredicates.get(i))
&& !acceptedFilters.contains(convertedFilters.get(i))) {
acceptedFilters.add(convertedFilters.get(i));
}
}
@beryllw
Copy link
Copy Markdown
Contributor

beryllw commented Apr 23, 2026

LGTM!

@luoyuxia luoyuxia merged commit 42d0d84 into apache:main Apr 23, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[flink] Lake filters shoud be pushed down for non-partitioned scans

3 participants