Skip to content

Mathdee/Lock-Free-RingBuffer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SPSC Ring Buffer (C++)

A lock-free single-producer / single-consumer (SPSC) ring buffer written in modern C++, with a benchmark that measures throughput and latency.

The idea is simple: one thread pushes items in, one thread pops them out, and neither ever takes a lock. Coordination happens purely through two atomic counters (head and tail) using acquire/release memory ordering.

Features

  • Lock-free Push / Pop — single items, no mutexes, no blocking.
  • Batch operationsPushBatch / PopBatch move whole chunks at once, which massively reduces the number of atomic operations and cache traffic. They return how many items were actually processed, so partial batches are handled safely.
  • Power-of-2 sizing — the buffer size must be a power of 2, so wrapping around is a cheap bitmask (index & mask) instead of a modulo.
  • Cache-line paddinghead and tail are padded onto separate cache lines to avoid false sharing (see below).
  • Spin-wait, not sleep — when the buffer is full/empty, threads spin with _mm_pause() (x86) or std::this_thread::yield() instead of blocking, keeping latency low.
  • Safe constructionnewRingBuffer(size) validates that the size is a power of 2 and returns nullptr otherwise.

Padded vs. unpadded

The header contains two versions of the same buffer:

  • RingBufferhead and tail are each followed by 56 bytes of padding, so each counter sits alone on its own 64-byte cache line.
  • UnpaddedRingBuffer — identical logic, no padding, so both counters share one cache line.

Why it matters: the producer constantly writes tail and the consumer constantly writes head. If both live on the same cache line, every write by one thread invalidates that line in the other thread's core, even though they never touch the same variable. That's false sharing, and the CPUs end up ping-ponging the cache line back and forth.

In this benchmark the unpadded version has comparable average throughput on a single run, but its tail latency is clearly worse — p99.9 spiked to ~1245 us versus ~90–400 us for the padded version. Padding costs 112 bytes and buys you predictability.

Files

File What it is
ringbuffer.hpp The ring buffer (padded + unpadded versions)
benchmark.cpp Throughput + latency benchmark

Build & Run

g++ -O2 -std=c++17 -pthread benchmark.cpp -o benchmark
./benchmark

Results

10 million uint64_t items, buffer size 4096, one producer and one consumer thread:

Mode Throughput p99.9 latency
Single-element Push/Pop (padded) ~70 M ops/sec
Batched, chunks of 128 (padded) ~800–1000 M ops/sec ~90–400 us
Batched, chunks of 128 (unpadded) ~625 M ops/sec ~1245 us

Batching is roughly 10x faster because the atomic head/tail updates are amortized over 128 items instead of paid per item.

Latency per cluster of 10,000 items (batched mode):

p50 (Median)   : ~10 us
p95 (Tail)     : ~14 us
p99 (Jitter)   : ~18 us
p99.9 (Spikes) : ~90-400 us

Buffer size matters too: a sweep from 16 up to 65536 showed tiny buffers choke throughput (~9 M ops/sec at size 16, since threads mostly wait on each other), with the sweet spot around 1024–4096.

A checksum of all popped values is verified at the end (49999995000000), which proves no items were lost, duplicated, or corrupted along the way.

Lessons learned

Two bugs found while building the benchmark, both good reminders:

  1. Trust the return value. The batch loops originally assumed a full chunk was always pushed/popped. When the buffer returned a partial chunk, the code read uninitialized memory and produced impossible throughput numbers. Fix: only process the count the buffer actually reports.
  2. Save your measurements. Latency percentiles all read 0 because the per-cluster durations were computed but never stored in the vector before sorting. One push_back fixed it.

About

A formally verified (TLA+), hardware-optimized SPSC ring buffer in C++ and GO

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors