Skip to content

Benchmark: auto-pin threads when CPU affinity is restricted#171

Open
HFTrader wants to merge 1 commit intoefficient:masterfrom
HFTrader:fix-benchmark-cpu-affinity
Open

Benchmark: auto-pin threads when CPU affinity is restricted#171
HFTrader wants to merge 1 commit intoefficient:masterfrom
HFTrader:fix-benchmark-cpu-affinity

Conversation

@HFTrader
Copy link
Copy Markdown

Summary

When the benchmark process has restricted CPU affinity (via taskset, cpuset, or container --cpuset-cpus), automatically pin each thread to its own physical core.

  • Detects restricted affinity by comparing sched_getaffinity against online CPU count
  • Reads HT sibling topology from sysfs thread_siblings_list — two benchmark threads never share a physical core
  • Errors out with a clear message if fewer physical cores are available than --num-threads
  • When affinity is unrestricted, behavior is unchanged (no pinning)

Motivation

Container runtimes (podman/docker) with --cpuset-cpus do not remap CPU numbers. Inside a container restricted to CPUs 9-11, the kernel still uses host CPU IDs. Code that assumes numbering starts at 0 and calls sched_setaffinity(cpu=0) gets EINVAL — silently, if the return value isn't checked. All threads then run unpinned, potentially timesharing a single core, producing bogus benchmark results.

By reading the actual affinity mask and pinning to those real CPU IDs, this issue is avoided. Usage:

# Threads auto-pin to one physical core each:
taskset -c 9-11 ./universal_benchmark --num-threads 3 --reads 100 ...

# No taskset = no pinning (current behavior preserved):
./universal_benchmark --num-threads 8 --reads 100 ...

Test plan

  • Unrestricted affinity: no pinning, current behavior
  • taskset -c 0-2: auto-pins 2 threads to CPUs 0, 1
  • taskset -c 0,4 (HT siblings on test machine): errors with "Need 2 physical cores but only 1 available"
  • All unit tests pass
  • Linux only; non-Linux platforms skip pinning gracefully

When the benchmark detects that its CPU affinity has been restricted
(e.g. via taskset, cpuset, or container --cpuset-cpus), it
automatically pins each thread to its own physical core. HT siblings
are detected via sysfs thread_siblings_list and skipped -- two
benchmark threads will never share a physical core. If there are
fewer physical cores than requested threads, the benchmark exits
with a clear error.

When affinity is unrestricted (all online CPUs available), behavior
is unchanged -- no pinning is performed.

This prevents a subtle benchmarking pitfall: containers (podman/docker)
with --cpuset-cpus do NOT remap CPU numbers. Code that assumes CPUs
start at 0 will call sched_setaffinity with invalid CPU IDs, which
fails silently and leaves all threads unpinned on a single core.
By reading the actual affinity mask from sched_getaffinity and pinning
to those real CPU numbers, this issue is avoided entirely.

Linux only; non-Linux platforms skip pinning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant