Skip to content

[core] chain table support special partition expire.#7643

Open
Stephen0421 wants to merge 1 commit intoapache:masterfrom
Stephen0421:chain_table_partition_expire
Open

[core] chain table support special partition expire.#7643
Stephen0421 wants to merge 1 commit intoapache:masterfrom
Stephen0421:chain_table_partition_expire

Conversation

@Stephen0421
Copy link
Copy Markdown
Contributor

Purpose

This PR implements partition expiration for chain tables. Chain tables store data across snapshot
and delta branches, where delta partitions depend on their nearest earlier snapshot partition as an
anchor for merge-on-read. Standard partition expiration cannot be applied directly because dropping
a snapshot partition without considering its dependent deltas would break the chain integrity.

Changes

New: Segment-based partition expiration (ChainTablePartitionExpire)

Introduces a segment-based expiration algorithm that preserves chain integrity. A segment consists
of one snapshot partition and all delta partitions whose time falls between that snapshot and the
next snapshot. The segment is the atomic unit of expiration.
Algorithm per group:

  1. Sort snapshot partitions by chain partition time.
  2. Filter to those before the cutoff (now - partition.expiration-time).
  3. If fewer than 2 snapshots fall before the cutoff, nothing is expired (the only one must be
    kept as anchor for its dependent deltas).
  4. The most recent snapshot before the cutoff is the anchor (kept). All earlier snapshots form
    expirable segments together with their associated delta partitions.
  5. Orphan deltas before the earliest expired snapshot are also expired.
  6. Delta partitions are dropped before snapshot partitions so that the commit pre-check passes.

Refactored: PartitionExpire interface extraction

Extracted PartitionExpire from a concrete class into an interface with three methods:
expire(long), isValueExpiration(), and isValueAllExpired(Collection<BinaryRow>).
The original implementation is preserved as NormalPartitionExpire. All existing consumers
(Flink/Spark procedures, actions, TableCommitImpl, ConflictDetection) use only the interface
methods — no compatibility impact.

Fixed: ChainTableCommitPreCallback group partition awareness

The commit pre-callback that validates snapshot partition drops was not group-partition aware.
It used full partition comparators and triangular predicates, which could match partitions across
different groups and produce incorrect pre/next snapshot lookups. Refactored to:

  • Filter snapshot partitions to the same group before finding pre/next.
  • Use ChainPartitionProjector to extract group and chain dimensions.
  • Apply group-scoped predicates for delta partition filtering.

Tests

  • Single partition key: expire segments with correct anchor retention
  • No expiration when < 2 snapshots before cutoff
  • Multiple segment expiration
  • No expiration when no snapshots before cutoff
  • Check interval prevents premature expiration
  • maxExpireNum limits number of expired segments
  • Group partition: independent expiration per group
  • isValueAllExpired: anchor partitions not reported as expired
  • isValueAllExpired: groups with < 2 snapshots retain all
  • isValueAllExpired: cross-group mixed scenarios
  • isValueAllExpired: partitions after cutoff not expired

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant