Docs

Dagflow

Dagflow is a single-run DAG executor with a minimal work–stealing runtime underneath.

It’s intentionally not a general-purpose task arena like TBB — instead, it’s designed for predictable, bounded, and explicit graph execution.

Architecture

How it works:

sequenceDiagram
    participant User
    participant TaskScope
    participant TaskGraph
    participant Pool
    participant Worker
    participant ChaseLevDeque as Worker Deque (Chase–Lev)
    participant CentralShard as Central Shard (ring_mpmc)

    Note over TaskGraph: token == one node execution<br/>max_concurrency limits parallel workers per node

    User->>TaskScope: submit(task, options)
    TaskScope->>TaskGraph: add_node(task, NodeOptions)
    TaskGraph->>TaskGraph: store node in nodes_

    User->>TaskScope: run()
    TaskScope->>TaskGraph: run()
    TaskGraph->>TaskGraph: seal() (cycle check, skipped if no edges)
    TaskGraph->>TaskGraph: reset preds_remain + prime tokens (single pass)

    loop For each ready node token
        TaskGraph->>Pool: submit(node fun, SubmitOptions)
        alt submit called from worker thread
            Pool->>ChaseLevDeque: push_bottom(task) on that worker
            alt inflight < W/2
                Pool->>Worker: notify_worker(neighbor)
            end
        else submit called from external thread
            alt affinity set
                Pool->>CentralShard: push(task) to shard (affinity % shards)
            else no affinity
                Pool->>CentralShard: push(task) round-robin shard
            end
            Note over Pool: rate-limited notifications:<br/>cold-start OR every 64 external submits
        end
    end

    par Worker scheduling loop
        Worker->>ChaseLevDeque: pop_bottom()
        alt Got local task
            Worker->>Worker: execute task (node token)
        else No local task
            Worker->>ChaseLevDeque: steal() from other worker
            alt Steal successful
                Worker->>Worker: execute stolen task
            else Steal failed
                Worker->>CentralShard: pop()
                alt Got task from central
                    Worker->>Worker: execute task
                else No tasks
                    Worker->>Worker: backoff (sleep/yield)
                end
            end
        end
    and Node execution
        Worker->>TaskGraph: run node fn for one token
        alt fn throws
            TaskGraph->>TaskGraph: capture first exception & set cancel=true
        else fn ok
            TaskGraph->>TaskGraph: normal completion
        end
        TaskGraph->>TaskGraph: queued--, inflight-- (on finish)
        Worker->>TaskGraph: for each successor: preds_remain--
        alt successor became ready (preds_remain==0)
            TaskGraph->>TaskGraph: prime successor tokens (CAS on queued counter)
            TaskGraph->>Pool: submit(...) for successor tokens
        end
        TaskGraph->>Pool: complete_counter() (per token)
    end

    rect rgb(250,250,250)
    note over TaskGraph: Overflow policy on enqueue:<br/>Block = wait until space<br/>Drop = best-effort enqueue<br/>Fail = set error & cancel
    end

    Pool->>TaskScope: Handle::Counter reaches zero
    TaskScope->>User: wait() completes (via Pool::wait(handle))

Why not TBB

This runtime is not a full general-purpose task scheduling framework like Intel oneTBB. It's a focused, lightweight, single-run DAG executor designed for cases where you build a small or medium dependency graph and run it once (or a few times), rather than keeping a long-lived task arena alive.

Main differences vs TBB-style runtimes:

Single-run DAG model
- You explicitly build a graph (TaskGraph or via the higher-level TaskScope) with known dependencies.
- Once you call run(), the graph is sealed (cycle check, static indegrees calculated) and executed to completion.
- No implicit dynamic spawning of tasks during execution (although each node can submit more tokens of itself via parallel_for or rescheduling).
Token-based execution model
- Each node's unit of work is a token (one execution of fn).
- Nodes can have max_concurrency limits and back-pressure via inbox capacity + Overflow policy (Block, Drop, Fail).
- This makes per-node load control and memory bounds deterministic — something TBB does not directly expose.
Integrated back-pressure and overflow control
- In TBB, tasks pile up in queues if producers outpace consumers; here you can choose to block, drop, or fail when inbox capacity is reached.
- This is critical in real-time-ish or bounded-memory pipelines.
Explicit ScheduleOptions with affinity and priorities
- You can pin a node's tokens to a specific worker (or shard) and give them High vs Normal priority.
- Affinity is respected even for fan-out reschedules, unlike TBB which mostly relies on work stealing locality heuristics.
Work-stealing pool with zero-allocation hot path
- Per-worker Chase–Lev deques for local work; central Vyukov ring-buffer shards (bounded, power-of-two capacity) for external submissions — no heap allocation per push/pop.
- Per-worker Task freelist: pool threads recycle task objects without hitting the allocator on the hot path.
- Notification rate-limiting: workers wake on cold-start and periodically (every 64 external submits), not on every submission — eliminates mutex overhead at high task rates.
Allocation-free central queues
- Central shards use a bounded Vyukov ring-buffer (ring_mpmc, capacity 16 384 by default) — fixed memory footprint, no per-operation heap traffic, no epoch-based reclamation needed.
- DAG node dispatch uses atomic CAS on a queued counter rather than a separate per-node queue, cutting allocations per DAG task to zero on the pool-thread path.
Small-function and small-vector for zero-dependency, allocation-friendly core
- No std::function allocations for tiny callables.
- Inline storage for small vectors without heap hits.
Minimal ABI & zero global singletons
- You own the Pool and TaskScope. No hidden thread arenas or global TLS registries.
- Easy to embed in game engines, simulation loops, or tools where you need predictable teardown.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs

Dagflow

Architecture

Why not TBB

Clone this wiki locally