SPSC Ring Buffer (C++)

A lock-free single-producer / single-consumer (SPSC) ring buffer written in modern C++, with a benchmark that measures throughput and latency.

The idea is simple: one thread pushes items in, one thread pops them out, and neither ever takes a lock. Coordination happens purely through two atomic counters (head and tail) using acquire/release memory ordering.

Features

Lock-free Push / Pop — single items, no mutexes, no blocking.
Batch operations — PushBatch / PopBatch move whole chunks at once, which massively reduces the number of atomic operations and cache traffic. They return how many items were actually processed, so partial batches are handled safely.
Power-of-2 sizing — the buffer size must be a power of 2, so wrapping around is a cheap bitmask (index & mask) instead of a modulo.
Cache-line padding — head and tail are padded onto separate cache lines to avoid false sharing (see below).
Spin-wait, not sleep — when the buffer is full/empty, threads spin with _mm_pause() (x86) or std::this_thread::yield() instead of blocking, keeping latency low.
Safe construction — newRingBuffer(size) validates that the size is a power of 2 and returns nullptr otherwise.

Padded vs. unpadded

The header contains two versions of the same buffer:

RingBuffer — head and tail are each followed by 56 bytes of padding, so each counter sits alone on its own 64-byte cache line.
UnpaddedRingBuffer — identical logic, no padding, so both counters share one cache line.

Why it matters: the producer constantly writes tail and the consumer constantly writes head. If both live on the same cache line, every write by one thread invalidates that line in the other thread's core, even though they never touch the same variable. That's false sharing, and the CPUs end up ping-ponging the cache line back and forth.

In this benchmark the unpadded version has comparable average throughput on a single run, but its tail latency is clearly worse — p99.9 spiked to ~1245 us versus ~90–400 us for the padded version. Padding costs 112 bytes and buys you predictability.

Files

File	What it is
`ringbuffer.hpp`	The ring buffer (padded + unpadded versions)
`benchmark.cpp`	Throughput + latency benchmark

Build & Run

g++ -O2 -std=c++17 -pthread benchmark.cpp -o benchmark
./benchmark

Results

10 million uint64_t items, buffer size 4096, one producer and one consumer thread:

Mode	Throughput	p99.9 latency
Single-element Push/Pop (padded)	~70 M ops/sec	—
Batched, chunks of 128 (padded)	~800–1000 M ops/sec	~90–400 us
Batched, chunks of 128 (unpadded)	~625 M ops/sec	~1245 us

Batching is roughly 10x faster because the atomic head/tail updates are amortized over 128 items instead of paid per item.

Latency per cluster of 10,000 items (batched mode):

p50 (Median)   : ~10 us
p95 (Tail)     : ~14 us
p99 (Jitter)   : ~18 us
p99.9 (Spikes) : ~90-400 us

Buffer size matters too: a sweep from 16 up to 65536 showed tiny buffers choke throughput (~9 M ops/sec at size 16, since threads mostly wait on each other), with the sweet spot around 1024–4096.

A checksum of all popped values is verified at the end (49999995000000), which proves no items were lost, duplicated, or corrupted along the way.

Lessons learned

Two bugs found while building the benchmark, both good reminders:

Trust the return value. The batch loops originally assumed a full chunk was always pushed/popped. When the buffer returned a partial chunk, the code read uninitialized memory and produced impossible throughput numbers. Fix: only process the count the buffer actually reports.
Save your measurements. Latency percentiles all read 0 because the per-cluster durations were computed but never stored in the vector before sorting. One push_back fixed it.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
C++_code		C++_code
GO_code		GO_code
images		images
.gitignore		.gitignore
Design.md		Design.md
README.md		README.md
RingBuffer.tla		RingBuffer.tla
go.mod		go.mod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPSC Ring Buffer (C++)

Features

Padded vs. unpadded

Files

Build & Run

Results

Lessons learned

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SPSC Ring Buffer (C++)

Features

Padded vs. unpadded

Files

Build & Run

Results

Lessons learned

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages