Skip to content

feat(transaction): add RewriteManifestsAction for manifest compaction#2543

Open
SreeramGarlapati wants to merge 4 commits into
apache:mainfrom
SreeramGarlapati:feat/rewrite-manifests
Open

feat(transaction): add RewriteManifestsAction for manifest compaction#2543
SreeramGarlapati wants to merge 4 commits into
apache:mainfrom
SreeramGarlapati:feat/rewrite-manifests

Conversation

@SreeramGarlapati
Copy link
Copy Markdown
Contributor

@SreeramGarlapati SreeramGarlapati commented May 31, 2026

Which issue does this PR close?

What changes are included in this PR?

Adds RewriteManifestsAction, exposed as Transaction::rewrite_manifests(), mirroring Java's BaseRewriteManifests:

  • groups live DataFile entries by partition tuple within the default partition spec
  • rolls new manifests by target_size_bytes (default 8 MiB, mirrors commit.manifest.target-size-bytes)
  • preserves sequence_number, file_sequence_number, and (v3) first_row_id via ManifestWriter::add_existing_file
  • carries forward total-* summary keys; emits manifests-created, manifests-replaced, manifests-kept, entries-processed
  • commits a Replace snapshot

Why a separate path from SnapshotProducer? The producer cannot emit a Replace snapshot today: update_snapshot_summaries (spec/snapshot_summary.rs:337-346) rejects Operation::Replace, summary keys are driven by added_data_files only (transaction/snapshot.rs:373-379), and ManifestProcess::process_manifests is sync (blocks async manifest writes). Once those land, this action will migrate onto the unified path — tracked in #2556.

Scope (Java-parity ceiling): v1/v2/v3 format versions; knob set is target_size_bytes plus inherited snapshot_properties / commit_uuid / key_metadata. Out of scope: rewrite_if predicate, cluster_by, custom spec_id / staging_location, iceberg-datafusion SQL-procedure layer. Manifests bound to non-default specs and DELETE manifests pass through verbatim.

Are these changes tested?

8 inline unit tests covering: missing current snapshot, single-small-manifest no-op, multi-manifest merge preserves sequence numbers (v2), target-size rolls multiple manifests, v3 row lineage + per-entry (first_row_id, sequence_number, file_sequence_number) preservation, summary + Replace operation, partition grouping + catalog commit, idempotency-after-consolidation. Tested on v2 and v3; the rewrite path is format-version-agnostic by construction (v1 uses the same ManifestListWriter::v1 arm).

  • cargo test -p iceberg (1304 passed, 0 failed)
  • cargo clippy -p iceberg --all-targets -- -D warnings
  • cargo fmt --check
  • make check-public-api

Mirrors Java BaseRewriteManifests semantics: groups DataFile entries by
partition tuple within a single (default) partition spec, rolls new
manifests by target size, and commits a Replace snapshot that preserves
sequence numbers, file_sequence_numbers, and (v3) row lineage via
ManifestWriter::add_existing_file.

Knob set matches Java parity ceiling exactly:
- target_size_bytes (default 8 MiB, per commit.manifest.target-size-bytes)
- snapshot_properties, commit_uuid, key_metadata via builder
@SreeramGarlapati SreeramGarlapati force-pushed the feat/rewrite-manifests branch from 23b9089 to 3f62ce2 Compare May 31, 2026 03:20
…rflow

- Read target size from `commit.manifest.target-size-bytes` table property
  (default 8 MiB), matching Java's default.
- Add Spark-action-style no-op short-circuit: skip when input would already
  fit in a single target-sized manifest (target_num_manifests == 1 && len == 1).
- Promote `generate_unique_snapshot_id` to a free fn in `transaction::snapshot`
  and fix `i64::MIN` overflow by masking with `i64::MAX` instead of negating.
- Trim builder surface (drop `key_metadata`, rename `with_*` -> `set_*`),
  drop redundant defensive invariants, ensure user snapshot_properties go in
  before internal counter keys so the latter can't be clobbered.

cargo test -p iceberg --lib (1302 tests pass), clippy clean, public-api regen.
Apply review-farm pass-2 findings:

- Fix size-rolling proxy (perf BLOCKER): switch from `data_file.file_size_in_bytes`
  (parquet bytes) to a constant `APPROX_BYTES_PER_ENTRY = 256` (manifest entry
  serialization estimate). Java rolls on `ManifestWriter.length()`; using the data
  file size was off by 2-3 orders of magnitude and made target-size meaningless.
- Concurrent manifest loads via `buffer_unordered(16)` + `try_collect` instead of
  sequential `for-await`. Big win on tables with many manifests.
- Drop full `ManifestEntry` clone — store only `(DataFile, snap_id, seq, file_seq)`
  per group to halve the allocation churn during regrouping.
- Drop unused `set_commit_uuid` public API + `commit_uuid` field; uuid is generated
  inline now.
- Drop dead `SnapshotProducer::generate_unique_snapshot_id` pass-through.
- Use `u64::try_from` (not `as u64`) for the manifest_length sum to avoid silent
  wrap on negative values.
- Doc the explicit-subset scope (no rewriteIf, no clusterBy, no specId, no branch
  override; only default-spec data manifests rewritten).
- Doc the `total-*` carry-forward invariant (rewrite-only -> identity).
- Drop `Vec::with_capacity` micro-opts where the input size is small.
- Replace `RangeFrom`-with-`.next().unwrap()` counter with a plain `u64 += 1`
  closure helper.
- Add test_partition_grouping_and_catalog_commit: verifies parent_snapshot_id,
  sequence_number = prev+1, and per-manifest partition tuple homogeneity.
- Adjust test_target_size_rolls_multiple_manifests target to match the new
  per-entry proxy.

cargo test -p iceberg --lib (1303 tests pass), clippy clean.
- trim doc-comment blocks (one-line per item); fold memory-ceiling caveat into struct doc
- inline next_writer closure; capture manifests_replaced before stream; move into stream::iter
- avoid partition-tuple clone on existing-key lookup (get-then-insert)
- lift file_io.new_output() out of FormatVersion match to remove triple clone of manifest_list_path
- inline META_ROOT_PATH constant (used twice)
- let-else for AddSnapshot pattern in test_summary_and_replace_operation
- add test_idempotent_after_consolidation (multi-manifest -> 1 -> no-op on rerun)
- extend test_v3_row_lineage_preserved to assert (sequence_number, file_sequence_number, first_row_id) per-entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant