Do the Linux glibc pthread functions on x86_64 act as fences for weakly-ordered memory accesses? (pthread_mutex_lock/unlock are the exact functions I'm interested in).
SSE2 provides some instructions with weak memory ordering (non-temporal stores such as movntps in particular). If you are using these instructions and want to guarantee that another thread/core sees an ordering, then I understand you need an explicit fence for this, e.g., a sfence instruction.
Normally you do expect the pthread API to act as a fence appropriately. However, I suspect normal C code on x86 will not generate weakly-ordered memory accesses, so I'm not confident that pthreads needs to act as a fence for weakly-ordered accesses.
Reading through the glibc pthread source code, a mutex is in the end implemented using "lock cmpxchgl", at least on the uncontended path. So I'm guessing that what I need to know is does that instruction act as a fence for SSE2 weakly-ordered accesses?