Skip to content

Update columnar, timely, and differential dependencies#36804

Open
antiguru wants to merge 6 commits into
mainfrom
claude/brave-mayer-IuLPD
Open

Update columnar, timely, and differential dependencies#36804
antiguru wants to merge 6 commits into
mainfrom
claude/brave-mayer-IuLPD

Conversation

@antiguru
Copy link
Copy Markdown
Member

Motivation

Bump the dataflow dependency family to their latest releases:

crate old new
columnar 0.12.1 0.13.0
timely 0.29.0 0.30.0
differential-dataflow 0.23.0 0.24.0
differential-dogs3 0.23.0 0.24.0

These releases carry breaking changes that ripple through the batcher/arrange machinery and the timely communication layer.

What changed

columnar 0.13AsBytes now requires SLICE_COUNT and get_byte_slice, and the chain helper was removed. Updated the manual AsBytes impls for Overflows, Timestamps, and Rows (the Rows impl now relies on the default as_bytes built from get_byte_slice).

differential 0.24 — removes the Batcher::Input / push_container API and the merge_batcher::container compatibility layer (InternalMerge, InternalMerger). The batcher now consumes already-chunked input via PushInto<Output>, seal returns (chain, description), and chunking moves into the operator via a separate ContainerBuilder/chunker. MergeBatcher collapses to a single Merger type parameter.

  • Reimplemented ColInternalMerger as a chunk-list Merger over ColumnationStack chunks (the old InternalMerge-based impl is gone).
  • Migrated the pageable ColumnMergeBatcher to the new Batcher trait.
  • Introduced an ArrangeChunker<C> trait (a subtrait of Batcher) that maps a (batcher, input-container) pair to its chunker. This lets mz_arrange / consolidate_pact recover the chunker from the batcher type, so the ~30 arrange call sites keep working without threading an extra chunker type parameter everywhere.
  • Updated consolidate_pact, consolidate_and_pack, MergeBatcherWrapper (temporal bucketing), the upsert-v2 source stash, and an interchange test to drive the chunker explicitly.
  • TraceBox.trace is now read via trace(); ShutdownButton::press_on_drop was removed, replaced by a small local press-on-drop guard.

timely 0.30 — adds a spill-policy parameter to the process allocators and bundles refill/spill/log hooks into Hooks for initialize_networking_from_sockets. BytesRefill now yields Send buffers, so LgallocHandle gains an unsafe impl Send (it exclusively owns its allocation).

Testing

  • cargo check --workspace --all-targets is clean (libs, tests, benches, examples).
  • The only build failure in this environment is the pre-existing, unrelated mz-fivetran-destination build script, which fetches a CA bundle over the network (TLS failure due to container clock skew) — not affected by this change.

🤖 Draft PR — opened for review of the dependency migration. CI will exercise the full test suite.

https://claude.ai/code/session_01CrNqefrHBNbfJetgjRHsmH


Generated by Claude Code

Bump the dataflow dependency family to their latest releases:

  * columnar 0.12.1 -> 0.13.0
  * timely 0.29.0 -> 0.30.0
  * differential-dataflow 0.23.0 -> 0.24.0
  * differential-dogs3 0.23.0 -> 0.24.0

These releases carry several breaking changes that ripple through the
batcher/arrange machinery and the timely communication layer:

* columnar 0.13 reworks `AsBytes` to require `SLICE_COUNT` and
  `get_byte_slice`, and removes the `chain` helper. Updated the manual
  `AsBytes` impls for `Overflows`, `Timestamps`, and `Rows`.

* differential 0.24 removes the `Batcher::Input`/`push_container` API and
  the `merge_batcher::container` compatibility layer (`InternalMerge`,
  `InternalMerger`). The `Batcher` now consumes already-chunked input via
  `PushInto<Output>`, `seal` returns `(chain, description)`, and chunking
  moves into the operator via a separate `ContainerBuilder`/chunker.
  `MergeBatcher` collapses to a single `Merger` type parameter.

  - Reimplemented `ColInternalMerger` as a chunk-list `Merger` over
    `ColumnationStack` chunks.
  - Migrated `ColumnMergeBatcher` to the new `Batcher` trait.
  - Introduced an `ArrangeChunker<C>` trait that maps a (batcher, input
    container) pair to its chunker, so `mz_arrange`/`consolidate_pact`
    call sites keep working without threading an extra chunker type
    parameter everywhere.
  - Updated `consolidate_pact`, `consolidate_and_pack`, the
    `MergeBatcherWrapper`, the upsert v2 source stash, and the
    interchange test to drive the chunker explicitly.
  - `TraceBox.trace` is now accessed via `trace()`, and
    `ShutdownButton::press_on_drop` is replaced by a local press-on-drop
    guard.

* timely 0.30 adds a spill-policy parameter to the process allocators and
  bundles refill/spill/log hooks into `Hooks` for
  `initialize_networking_from_sockets`; `BytesRefill` now yields
  `Send` buffers, so `LgallocHandle` gains an `unsafe impl Send`.

https://claude.ai/code/session_01CrNqefrHBNbfJetgjRHsmH
Comment thread src/timely-util/src/columnar/merge_batcher.rs
Comment thread src/timely-util/src/columnar/merge_batcher.rs
Comment thread src/cluster/src/communication.rs
antiguru and others added 5 commits May 29, 2026 21:02
* Fix the broken intra-doc link to `ColumnChunker` in the
  `columnar::merge_batcher` module docs (it now lives in the operator-level
  chunker, not this module), and reword the doc to match the
  differential 0.24 contract.
* Document on `ColumnMergeBatcher::push_into` / `seal` where sorting,
  consolidation, and staging-buffer draining now happen (the operator's
  chunker), answering the review questions.
* Leave the timely 0.30 communication spill policy unset with a
  `TODO(CLU-99)` to wire up the zero-copy pager.

https://claude.ai/code/session_01CrNqefrHBNbfJetgjRHsmH
Thread the chunker through the arrange/consolidate wrappers as an explicit
`Chu` type parameter (mirroring differential's `arrange_core`) instead of
recovering it from the batcher via the `ArrangeChunker` associated-type trait.

* Remove the `ArrangeChunker` trait and its impls from `columnar`.
* Add a `Chu` parameter to `mz_arrange`, `mz_arrange_core`, `consolidate_pact`,
  and `consolidate_and_pack`, bounded exactly as `arrange_core` requires
  (`ContainerBuilder<Container = Ba::Output> + for<'a> PushInto<&'a mut C>`).
* `consolidate_named`/`consolidate_named_if` stay single-parameter: they only
  ever consolidate `Vec` input, so they pass `ColumnationChunker` explicitly.
* Update call sites to name the chunker: `ColumnationChunker<_>` for `Vec`
  input, `batcher::Chunker<_>` for `Column` input, and `batcher::ColumnChunker<_>`
  for the paged batcher.

https://claude.ai/code/session_01CrNqefrHBNbfJetgjRHsmH
* Remove the now-redundant `timely::container::ContainerBuilder` import in
  `consolidate_pact`; the explicit `Chu: ContainerBuilder` bound brings the
  trait methods into scope.
* Wrap the `mz_arrange` turbofish call sites that exceeded the 100-column
  limit onto multiple lines (rustfmt does not break method-call turbofish
  itself).

https://claude.ai/code/session_01CrNqefrHBNbfJetgjRHsmH
@antiguru antiguru marked this pull request as ready for review May 30, 2026 02:35
@antiguru antiguru requested review from a team as code owners May 30, 2026 02:35
@antiguru
Copy link
Copy Markdown
Member Author

Marking as ready for review, but needs another round of reviews, and nightly tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants