I'm interested in sequentially-consistent load operation on x86. As far as I see from assembler listing, generated by compiler it is implemented as a plain load on x86, however plain loads as far as I know guaranteed to have acquire semantics, while plain stores are guaranteed to have release. Sequentially-consistent store is implemented as locked xchg, while load as plain load. That sounds strange to me, could you please explain this in details?
added
Just found in internet, that sequentially-consistent atomic load could be done as simple mov as long as store is done with locked xchg, but there was no prove and no links to documentation. Do you know where can I read about that?
Thanks in advance.
A plain MOV
on x86 is sufficient for an atomic sequentially consistent load, as long as SC stores are done with LOCK
ed instructions, the value is correctly aligned, and "normal" WB cache mode is used.
See my blog post at http://www.justsoftwaresolutions.co.uk/threading/intel-memory-ordering-and-c++-memory-model.html for the full mapping, and the Intel processor docs at http://developer.intel.com/products/processor/manuals/index.htm for the details of the allowed orderings.
If you use "WC" cache mode or "non-temporal" instructions such as MOVNTI
then all bets are off, as the processor doesn't necessarily write the data back to main memory in a timely manner.
Register to memory transfers and vice versa are as far as I know not atomic in an multiprocessor environment.
READING
XOR EAX, EAX
LOCK XADD [address], EAX
This first instruction will zero the EAX register, the second instruction will exchange the content of both EAX with [address] and will store the sum of both in [address] again. Since EAX register was zero before, nothing gets changed.
WRITING
XCHG [address], EAX
EAX register will get the value to store to specified address.
EDIT: LOCK ADD EAX, [address] will cause an "Invalid Opcode Exception" because destination operand is no memory address.
An invalidopcode exception (#UD) is generated when the LOCK prefix is used with any other
instruction or when no write operation is made to memory. 8.1.2.2 Software Controlled Bus Locking
Reads on x86 are by nature atomic, so long as they are aligned, the section under the MOV
instruction in the intel assembly manuals vol 2A should mention this, same with the LOCK
prefix. Other volumes may also mention this
however, if you want an atomic read
, you can use _InterlockedExchangeAdd((LONG*)&var,0)
aka LOCK XADD
, this will yield the old value, but won't change its value, the same can be done with InterlockCompareExchange((LONG*)&var,var,var)
aka LOCK CMPXCHG
, but IMO, there is no need for this