Does pthread_mutex_lock contains memory fence inst

2019-01-11 11:14发布

Do pthread_mutex_lock and pthread_mutex_unlock functions call memory fence/barrier instructions? Or do the the lower level instructions like compare_and_swap implicity have memory barriers?

2条回答
萌系小妹纸
2楼-- · 2019-01-11 11:48

Please take a look at section 4.11 of the POSIX specification.

Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads. [emphasis mine]

Then a list of functions is given which synchronize memory, plus a few additional notes.

If that requires memory barrier instructions on some architecture, then those must be used.

About compare_and_swap: that isn't in POSIX; check the documentation for whatever you are using. For instance, IBM defines a compare_and_swap function for AIX 5.3. which doesn't have full memory barrier semantics The documentation note says:

If compare_and_swap is used as a locking primitive, insert an isync at the start of any critical sections.

From this documentation we can guess that IBM's compare_and_swap has release semantics: since the documentation does not require a barrier for the end of the critical section. The acquiring processor needs to issue an isync to make sure it is not reading stale data, but the publishing processor doesn't have to do anything.

At the instruction level, some processors have compare and swap with certain synchronizing guarantees, and some don't.

查看更多
The star\"
3楼-- · 2019-01-11 12:06

Do pthread_mutex_lock and pthread_mutex_unlock functions call memory fence/barrier instructions?

They do, as well as thread creation.

Note, however, there are two types of memory barriers: compiler and hardware.

Compiler barriers only prevent the compiler from reordering reads and writes and speculating variable values, but don't prevent the CPU from reordering.

The hardware barriers prevent the CPU from reordering reads and writes. Full memory fence is usually the slowest instruction, most of the time you only need operations with acquire and release semantics (to implement spinlocks and mutexes).

With multi-threading you need both barriers most of the time.

Any function whose definition is not available in this translation unit (and is not intrinsic) is a compiler memory barrier. pthread_mutex_lock, pthread_mutex_unlock, pthread_create also issue a hardware memory barrier to prevent the CPU from reordering reads and writes.

See C++ and Beyond 2012: Herb Sutter - atomic<> Weapons for more details.

查看更多
登录 后发表回答