The x86 instructions lfence/sfence/mfence are used to implement the rmb()/wmb()/mb() mechanisms in the Linux kernel. It is easy to understand that these are used to serialize the memory accesses. However, it is much more difficult to determine when and where to use these while writing the code -- before encountering the bug in the runtime behavior.
I was interested to know if there are known caveats that could be checked, while writing/reviewing the code, that can help us determine where the barriers must be inserted. I understand this is a too complex, but is there a rule-of-thumb or a checklist that can help us identify the code-places where these are needed?
My experience (not in the Linux kernel) is that two patterns cover the vast majority of the need for fencing.
Pattern "Send/receive": Thread 1 sends data to thread 2, and there is a memory location that somehow indicates "data is ready". Thread 1 needs at least an sfence between the store of the data and the store into "data is ready". Thread 2 needs an lfence between the load of the data that says "data is ready" and the load of the data.
If only regular (NOT non-temporal, DMA devices etc.) load/stores are involved in the transfer, then only compiler fences are necessary. Also, LOCK-prefixed instructions imply fences. For example, sometimes the "data is ready" location is not simply a flag, but an atomic counter, and the LOCK-prefixed increment/decrement used to manipulate it can serve as the fence.
This pattern also covers spin locks. Releasing a lock is a "send". Acquiring a lock is " receive.
Pattern "Consensus": Two threads have to reach consensus about something. There must be an mfence (or one implied by a LOCK-prefixed instruction). The fence must be between the "I published my vote" and "I read the other thread's vote". Dekker's protocol is an example. The hard part is spotting this pattern. We once missed one deep in the internals of TBB where the consensus problem was "has an exception been thrown?" Eventually we realized that it was a consensus problem and consequently needed an mfence.
The two patterns above are rules of thumb that do not cover all cases, but I find that they cover 99% of cases.