Conversation
d95825a to
cfca58a
Compare
vDSO (virtual dynamic shared object) is exported by Linux kernel into every userspace program, designed to speed up this process for certain system calls. For Linux/x86_64, getcpu() can be called via vDSO, which makes getcpu() much faster. The faster getcpu() invocation is beneficial when retrieving NUMA node information. Benchmarking[1] on AMD Ryzen Threadripper 2990WX 32-Core Processor: getcpu: syscall: 103 nsec/call getcpu: vdso: 18 nsec/call We can not use dlsym to resolve the vDSO symbol "__vdso_getcpu" directly becase it would cause recursive malloc calls when MI_DEBUG_FULL is enabled. [1] https://github.com/nathanlynch/vdsotest Co-authored-by: Chin-Hao Lo <hankluo6@gmail.com> Signed-off-by: Jim Huang <jserv@biilabs.io>
|
@jserv noob question: why doesn't the default GLIBC getcpu() doesn't leverage vdso? |
In glibc development, |
|
Thanks for your technical excellence :) |
Did you consider the alternative of using the libc wrapper when available at compile time (as detected via say Pros:
Cons:
|
vDSO (virtual dynamic shared object) is exported by Linux kernel into every userspace program, designed to speed up this process for certain system calls. For Linux/x86_64, getcpu() can be called via vDSO, which
makes getcpu() much faster. The faster getcpu() invocation is beneficial when retrieving NUMA node information.
Benchmarking on AMD Ryzen Threadripper 2990WX 32-Core Processor: