-
Notifications
You must be signed in to change notification settings - Fork 0
Docs
Dagflow is a single-run DAG executor with a minimal work–stealing runtime underneath.
It’s intentionally not a general-purpose task arena like TBB — instead, it’s designed for predictable, bounded, and explicit graph execution.
How it works:
sequenceDiagram
participant User
participant TaskScope
participant TaskGraph
participant Pool
participant Worker
participant ChaseLevDeque as Worker Deque (Chase–Lev)
participant CentralShard as Central Shard (ring_mpmc)
Note over TaskGraph: token == one node execution<br/>max_concurrency limits parallel workers per node
User->>TaskScope: submit(task, options)
TaskScope->>TaskGraph: add_node(task, NodeOptions)
TaskGraph->>TaskGraph: store node in nodes_
User->>TaskScope: run()
TaskScope->>TaskGraph: run()
TaskGraph->>TaskGraph: seal() (cycle check, skipped if no edges)
TaskGraph->>TaskGraph: reset preds_remain + prime tokens (single pass)
loop For each ready node token
TaskGraph->>Pool: submit(node fun, SubmitOptions)
alt submit called from worker thread
Pool->>ChaseLevDeque: push_bottom(task) on that worker
alt inflight < W/2
Pool->>Worker: notify_worker(neighbor)
end
else submit called from external thread
alt affinity set
Pool->>CentralShard: push(task) to shard (affinity % shards)
else no affinity
Pool->>CentralShard: push(task) round-robin shard
end
Note over Pool: rate-limited notifications:<br/>cold-start OR every 64 external submits
end
end
par Worker scheduling loop
Worker->>ChaseLevDeque: pop_bottom()
alt Got local task
Worker->>Worker: execute task (node token)
else No local task
Worker->>ChaseLevDeque: steal() from other worker
alt Steal successful
Worker->>Worker: execute stolen task
else Steal failed
Worker->>CentralShard: pop()
alt Got task from central
Worker->>Worker: execute task
else No tasks
Worker->>Worker: backoff (sleep/yield)
end
end
end
and Node execution
Worker->>TaskGraph: run node fn for one token
alt fn throws
TaskGraph->>TaskGraph: capture first exception & set cancel=true
else fn ok
TaskGraph->>TaskGraph: normal completion
end
TaskGraph->>TaskGraph: queued--, inflight-- (on finish)
Worker->>TaskGraph: for each successor: preds_remain--
alt successor became ready (preds_remain==0)
TaskGraph->>TaskGraph: prime successor tokens (CAS on queued counter)
TaskGraph->>Pool: submit(...) for successor tokens
end
TaskGraph->>Pool: complete_counter() (per token)
end
rect rgb(250,250,250)
note over TaskGraph: Overflow policy on enqueue:<br/>Block = wait until space<br/>Drop = best-effort enqueue<br/>Fail = set error & cancel
end
Pool->>TaskScope: Handle::Counter reaches zero
TaskScope->>User: wait() completes (via Pool::wait(handle))
This runtime is not a full general-purpose task scheduling framework like Intel oneTBB. It's a focused, lightweight, single-run DAG executor designed for cases where you build a small or medium dependency graph and run it once (or a few times), rather than keeping a long-lived task arena alive.
Main differences vs TBB-style runtimes:
-
Single-run DAG model
- You explicitly build a graph (
TaskGraphor via the higher-levelTaskScope) with known dependencies. - Once you call
run(), the graph is sealed (cycle check, static indegrees calculated) and executed to completion. - No implicit dynamic spawning of tasks during execution (although each node can submit more tokens of itself via
parallel_foror rescheduling).
- You explicitly build a graph (
-
Token-based execution model
- Each node's unit of work is a token (one execution of
fn). - Nodes can have
max_concurrencylimits and back-pressure via inboxcapacity+Overflowpolicy (Block,Drop,Fail). - This makes per-node load control and memory bounds deterministic — something TBB does not directly expose.
- Each node's unit of work is a token (one execution of
-
Integrated back-pressure and overflow control
- In TBB, tasks pile up in queues if producers outpace consumers; here you can choose to block, drop, or fail when inbox capacity is reached.
- This is critical in real-time-ish or bounded-memory pipelines.
-
Explicit
ScheduleOptionswith affinity and priorities- You can pin a node's tokens to a specific worker (or shard) and give them
HighvsNormalpriority. - Affinity is respected even for fan-out reschedules, unlike TBB which mostly relies on work stealing locality heuristics.
- You can pin a node's tokens to a specific worker (or shard) and give them
-
Work-stealing pool with zero-allocation hot path
- Per-worker Chase–Lev deques for local work; central Vyukov ring-buffer shards (bounded, power-of-two capacity) for external submissions — no heap allocation per push/pop.
- Per-worker
Taskfreelist: pool threads recycle task objects without hitting the allocator on the hot path. - Notification rate-limiting: workers wake on cold-start and periodically (every 64 external submits), not on every submission — eliminates mutex overhead at high task rates.
-
Allocation-free central queues
- Central shards use a bounded Vyukov ring-buffer (
ring_mpmc, capacity 16 384 by default) — fixed memory footprint, no per-operation heap traffic, no epoch-based reclamation needed. - DAG node dispatch uses atomic CAS on a
queuedcounter rather than a separate per-node queue, cutting allocations per DAG task to zero on the pool-thread path.
- Central shards use a bounded Vyukov ring-buffer (
-
Small-function and small-vector for zero-dependency, allocation-friendly core
- No
std::functionallocations for tiny callables. - Inline storage for small vectors without heap hits.
- No
-
Minimal ABI & zero global singletons
- You own the
PoolandTaskScope. No hidden thread arenas or global TLS registries. - Easy to embed in game engines, simulation loops, or tools where you need predictable teardown.
- You own the