sequentially-consistent atomic load on x86

2019-07-14 07:51发布

问题:

I'm interested in sequentially-consistent load operation on x86. As far as I see from assembler listing, generated by compiler it is implemented as a plain load on x86, however plain loads as far as I know guaranteed to have acquire semantics, while plain stores are guaranteed to have release. Sequentially-consistent store is implemented as locked xchg, while load as plain load. That sounds strange to me, could you please explain this in details?

added

Just found in internet, that sequentially-consistent atomic load could be done as simple mov as long as store is done with locked xchg, but there was no prove and no links to documentation. Do you know where can I read about that?

Thanks in advance.

回答1:

A plain MOV on x86 is sufficient for an atomic sequentially consistent load, as long as SC stores are done with LOCKed instructions, the value is correctly aligned, and "normal" WB cache mode is used.

See my blog post at http://www.justsoftwaresolutions.co.uk/threading/intel-memory-ordering-and-c++-memory-model.html for the full mapping, and the Intel processor docs at http://developer.intel.com/products/processor/manuals/index.htm for the details of the allowed orderings.

If you use "WC" cache mode or "non-temporal" instructions such as MOVNTI then all bets are off, as the processor doesn't necessarily write the data back to main memory in a timely manner.



回答2:

Register to memory transfers and vice versa are as far as I know not atomic in an multiprocessor environment.

READING

XOR EAX, EAX
LOCK XADD [address], EAX

This first instruction will zero the EAX register, the second instruction will exchange the content of both EAX with [address] and will store the sum of both in [address] again. Since EAX register was zero before, nothing gets changed.

WRITING

XCHG [address], EAX

EAX register will get the value to store to specified address.

EDIT: LOCK ADD EAX, [address] will cause an "Invalid Opcode Exception" because destination operand is no memory address.

An invalidopcode exception (#UD) is generated when the LOCK prefix is used with any other instruction or when no write operation is made to memory. 8.1.2.2 Software Controlled Bus Locking



回答3:

Reads on x86 are by nature atomic, so long as they are aligned, the section under the MOV instruction in the intel assembly manuals vol 2A should mention this, same with the LOCK prefix. Other volumes may also mention this

however, if you want an atomic read, you can use _InterlockedExchangeAdd((LONG*)&var,0) aka LOCK XADD, this will yield the old value, but won't change its value, the same can be done with InterlockCompareExchange((LONG*)&var,var,var) aka LOCK CMPXCHG, but IMO, there is no need for this