I understand that volatile
informs the compiler that the value may be changed, but in order to accomplish this functionality, does the compiler need to introduce a memory fence to make it work?
From my understanding, the sequence of operations on volatile objects cannot be reordered and must be preserved. This seems to imply some memory fences are necessary and that there isn't really a way around this. Am I correct in saying this?
There is an interesting discussion at this related question
... Accesses to distinct volatile variables cannot be reordered by the compiler as long as they occur in separate full expressions ... right that volatile is useless for thread-safety, but not for the reasons he gives. It's not because the compiler might reorder accesses to volatile objects, but because the CPU might reorder them. Atomic operations and memory barriers prevent the compiler and the CPU from reordering
To which David Schwartz replies in the comments:
... There's no difference, from the point of view of the C++ standard, between the compiler doing something and the compiler emitting instructions that cause the hardware to do something. If the CPU may reorder accesses to volatiles, then the standard doesn't require that their order be preserved. ...
... The C++ standard doesn't make any distinction about what does the reordering. And you can't argue that the CPU can reorder them with no observable effect so that's okay -- the C++ standard defines their order as observable. A compiler is compliant with the C++ standard on a platform if it generates code that makes the platform do what the standard requires. If the standard requires accesses to volatiles not be reordered, then a platform the reorders them isn't compliant. ...
My point is that if the C++ standard prohibits the compiler from reordering accesses to distinct volatiles, on the theory that the order of such accesses is part of the program's observable behavior, then it also requires the compiler to emit code that prohibits the CPU from doing so. The standard does not differentiate between what the compiler does and what the compiler's generate code makes the CPU do.
Which does yield two questions: Is either of them "right"? What do actual implementations really do?
I always use volatile in interrupt service routines, e.g. the ISR (often assembly code) modifies some memory location and the higher level code that runs outside of the interrupt context accesses the memory location through a pointer to volatile.
I do this for RAM as well as memory-mapped IO.
Based on the discussion here it seems this is still a valid use of volatile but doesn't have anything to do with multiple threads or CPUs. If the compiler for a microcontroller "knows" that there can't be any other accesses (e.g. everyting is on-chip, no cache and there's only one core) I would think that a memory fence isn't implied at all, the compiler just needs to prevent certain optimisations.
As we pile more stuff into the "system" that executes the object code almost all bets are off, at least that's how I read this discussion. How could a compiler ever cover all bases?
The keyword
volatile
essentially means that reads and writes an object should be performed exactly as written by the program, and not optimized in any way. Binary code should follow C or C++ code: a load where this is read, a store where there is a write.It also means that no read should be expected to result in a predictable value: the compiler shouldn't assume anything about a read even immediately following a write to the same volatile object:
volatile
may be the most important tool in the "C is a high level assembly language" toolbox.Whether declaring an object volatile is sufficient for ensuring the behavior of code that deals with asynchronous changes depends on the platform: different CPU give different levels of guaranteed synchronization for normal memory reads and writes. You probably shouldn't try to write such low level multithreading code unless you are an expert in the area.
Atomic primitives provide a nice higher level view of objects for multithreading that makes it easy to reason about code. Almost all programmers should use either atomic primitives or primitives that provide mutual exclusions like mutexes, read-write-locks, semaphores, or other blocking primitives.
It depends on which compiler "the compiler" is. Visual C++ does, since 2005. But the Standard does not require it, so some other compilers do not.
A C++ compiler which conforms to the specification is not required to introduce a memory fence. Your particular compiler might; direct your question to the authors of your compiler.
The function of "volatile" in C++ has nothing to do with threading. Remember, the purpose of "volatile" is to disable compiler optimizations so that reading from a register that is changing due to exogenous conditions is not optimized away. Is a memory address that is being written to by a different thread on a different CPU a register that is changing due to exogenous conditions? No. Again, if some compiler authors have chosen to treat memory addresses being written to by different threads on different CPUs as though they were registers changing due to exogenous conditions, that's their business; they are not required to do so. Nor are they required -- even if it does introduce a memory fence -- to, for instance, ensure that every thread sees a consistent ordering of volatile reads and writes.
In fact, volatile is pretty much useless for threading in C/C++. Best practice is to avoid it.
Moreover: memory fences are an implementation detail of particular processor architectures. In C#, where volatile explicitly is designed for multithreading, the specification does not say that half fences will be introduced, because the program might be running on an architecture that doesn't have fences in the first place. Rather, again, the specification makes certain (extremely weak) guarantees about what optimizations will be eschewed by the compiler, runtime and CPU to put certain (extremely weak) constraints on how some side effects will be ordered. In practice these optimizations are eliminated by use of half fences, but that's an implementation detail subject to change in the future.
The fact that you care about the semantics of volatile in any language as they pertain to multithreading indicates that you're thinking about sharing memory across threads. Consider simply not doing that. It makes your program far harder to understand and far more likely to contain subtle, impossible-to-reproduce bugs.
While I was working through an online downloadable video tutorial for 3D Graphics & Game Engine development working with modern OpenGL. We did use
volatile
within one of our classes. The tutorial website can be found here and the video working with thevolatile
keyword is found in theShader Engine
series video 98. These works are not of my own but are accredited toMarek A. Krzeminski, MASc
and this is an excerpt from the video download page.And if you are subscribed to his website and have access to his video's within this video he references this article concerning the use of
Volatile
withmultithreading
programming.Here is the article from the link above: http://www.drdobbs.com/cpp/volatile-the-multithreaded-programmers-b/184403766
This article might be a little dated, but it does give good insight towards an excellent use of using the volatile modifier with in the use of multithreaded programming to help keep events asynchronous while having the compiler checking for race conditions for us. This may not directly answer the OPs original question about creating a memory fence, but I choose to post this as an answer for others as an excellent reference towards a good use of volatile when working with multithreaded applications.
The compiler only inserts a memory fence on the Itanium architecture, as far as I know.
The
volatile
keyword is really best used for asynchronous changes, e.g., signal handlers and memory-mapped registers; it is usually the wrong tool to use for multithreaded programming.