Intra-process progress exchange medium#807
Draft
frankmcsherry wants to merge 6 commits into
Draft
Conversation
…ll-reduce A Chain<T> is a single-writer, multi-reader chain of atoms intended to eventually replace the Progcaster's intra-process leg; a Mesh<T> bundles one chain per writer, with readers that sweep all chains. Atoms merge through T: Chainable, a commutative monoid, and may be compacted (merged with adjacent atoms, never split) before a reader folds them. Each reader folds every atom sent after its registration exactly once; live state is bounded by O(#readers) independent of the send count. Design highlights: - Forward links; node payload (value, next) behind one RwLock so walkers snapshot consistently and compaction mutates atomically; per-node atomic holders counts with RAII pins. - Writer fast path merges in place into an unpinned newest node (zero allocation in the common case), re-checking holders under the payload write lock to exclude concurrent pinning. - Readers walk pin..=newest hand-over-hand, folding forward; recv_with hands individual atoms to the caller. - Compaction absorbs a successor only when both nodes are unpinned and the successor is not newest; bypassing pinned nodes is unsound (their frozen next pointers are side entrances that later absorptions would move values behind). Boundedness instead comes from an oldest pointer and a full sweep at every recv and reader drop; the prefix behind all pins is reclaimed by Arc refcounts. - Lock order: oldest mutex, then newest mutex, then payload locks older before newer. Tests cover single/multiple/late readers, repeated and empty recvs, laggard compaction bounds, reader-drop healing, the in-place fast path (chain length exactly 2 after catch-up plus N sends), mesh delivery, and a randomized multi-threaded stress test asserting totals and per-chain length <= #readers + 2 at quiescent checkpoints. See chain-design-notes.md for the contract, the compaction safety argument, and deviations from the original sketch. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Per-writer chains fail the primary goal of cross-writer cancellation:
two workers sending {(T,+1)} and {(T,-1)} at distinct increasing T build
internally-incompressible per-chain content whose union cancels to
nothing, leaving laggards O(elapsed) fold work. A single shared chain
merges concurrent sends at the head, so accumulated nodes hold the net.
- Chain<T> is now a cloneable multi-writer handle; send(&self, value)
locks the newest mutex (serializing writers), then the newest node's
payload write lock (consistent with the documented lock ordering), and
merges in place when holders == 0, allocating only when a reader has
just caught up and pinned the head.
- Mesh<T> retained as a per-writer comparison structure, documented as
benchmark-only with its known cancellation pathology.
- Tests: suite adapted to the new API; added multi-writer tests
(sequential two-handle, cross-writer cancellation state bound,
concurrent writers with one and many readers); stress test now shares
one chain across writer threads. 15 chain tests pass; clippy clean.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three scenarios (all keep up; laggard; unread backlog) across N workers and cancellation fractions, with results and analysis in chain-bench-results.md. Headline: the chain bounds backlog state and laggard work by the live cancellation window (orders of magnitude below the MPSC baseline, flat in N), and pays 2-7x on the tight-loop send path where the head mutex contends. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A per-send version counter enables cheap change detection: recv returns without taking any chain lock when no atom has been committed since the reader last caught up, and the compaction sweep runs only after a productive walk (when pins actually moved). This keeps spinning readers off the chain's shared locks. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ing-key axis Two more points on the merge-eagerness spectrum (ledger: total merge via a shared consolidated map; cells: per-reader in-place accumulators) and a key-type axis (u64 vs Box<[u64;3]>) modeling allocating timestamps, whose clone/drop traffic is the scaling pathology the chain targets. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Synthetic three-scenario sweep plus an allocating-key matrix and a record of real-workload measurements from a separate experimental Progcaster wiring (not in this branch). Bottom line: the chain wins laggard work and backlog memory decisively, loses single-socket throughput, and allocation narrows but does not close that gap at 10 cores. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
An experimental but not yet better progress tracking medium for multiple worker threads. The design is a concurrent data structure that supports more in-place aggregation, and is designed to favor laggards by having the folks committing updates perform consolidation as they do, leaving behind less of a mess than MPSCs do.