Skip to content

[flink] Expose scan.bucket for single-bucket manifest pruning#8117

Open
wwj6591812 wants to merge 1 commit into
apache:masterfrom
wwj6591812:add_scan_bucket_0604
Open

[flink] Expose scan.bucket for single-bucket manifest pruning#8117
wwj6591812 wants to merge 1 commit into
apache:masterfrom
wwj6591812:add_scan_bucket_0604

Conversation

@wwj6591812
Copy link
Copy Markdown
Contributor

Background

ReadBuilder.withBucket(int) and manifest scanning already support reading a single bucket, but Flink SQL had no connector option to expose it. Operators often need to debug or scan one bucket of a fixed-bucket primary-key table without reading all buckets.

Why this PR

Expose scan.bucket in Flink so users can run:

SELECT * FROM t /*+ OPTIONS('scan.bucket' = '0') */

and plan splits only for that bucket.

What changes

  • Add FlinkConnectorOptions.SCAN_BUCKET (scan.bucket).
  • ScanBucketUtils.applyScanBucket() reads the option and calls ReadBuilder.withBucket().
  • Wire into FlinkSourceBuilder and FlinkTableSource (batch and split inference).
  • Validate in ReadBuilderImpl.withBucket() (canonical read path): non-negative bucket id, FileStoreTable only, not postpone-bucket mode, bucket < table.bucket when table bucket > 0.

Stage optimized: scan / manifest planning — fewer manifest entries and splits before read. No change to merge or per-record logic.

Tests

  • ScanBucketUtilsTest — invalid bucket id fails fast.
  • ScanBucketITCase — SQL with scan.bucket matches reading that bucket via the table API.

Test plan

  • mvn test -pl paimon-flink/paimon-flink-common -am -Dtest=ScanBucketUtilsTest,ScanBucketITCase

@wwj6591812 wwj6591812 force-pushed the add_scan_bucket_0604 branch from 17c2722 to 7e8c5d8 Compare June 4, 2026 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant