I've noticed that many lockless algorithms implemented using OS-specific primitives, such as the spin locks described here (which use Linux-specific atomic primitives) often make use of a "cpu relax" instruction. With GCC, this can be achieved with:
asm volatile("pause\n": : :"memory");
Specifically, this instruction is often used in the body of while
loop spin locks, while waiting for a variable to set to a certain value.
C++11 doesn't seem to provide any kind of portable "cpu_relax" type instruction. Is there some reason for this? And does the "pause" statement actually accomplish anything useful?
Edit:
Also, I'd ask: why did the C++11 standards committee not decide to include a generic std::cpu_relax()
or whatever? Is it too difficult to guarantee portability?
The
PAUSE
instruction is x86 specific. It's sole use is in spin-lock wait loops, where it:Also:
Where you put this instruction in a spin-lock loop is also x86_64 specific. I cannot speak for the C++11 standards folk, but I think it is reasonable for them to conclude that the right place for this magic is in the relevant library... along with all the other magic required to implement atomics, mutexes etc.
NB: the
PAUSE
does not release the processor to allow another thread to run. It is not a "low-level"pthread_yield()
. (Although on Intel Hyperthreaded cores, it does prevent the spin-lock thread from hogging the core.) The essential function of thePAUSE
appears to be to turn off the usual instruction execution optimisations and pipelining, which slows the thread down (a bit), but having discovered the lock is busy, this reduces the rate at which the lock variable is touched, so that the cache system is not being pounded by the waiter while the current owner of the lock is trying to get on with real work.Note that the primitives being used to "hand roll" spin-locks, mutexes etc. are not OS specific, but processor-specific.
I'm not sure I would describe a "hand rolled" spin-lock as "lockless" !
FWIW, the Intel recommendation for a spin-lock ("Intel® 64 and IA-32 Architectures Optimization Reference Manual") is:
Clearly one can write something which compiles to this, using a
std::atomic_flag
... or usepthread_spin_lock()
, which on my machine is:which is hard to fault, really.