After looking at a bunch of other questions and their answers, I get the impression that there is no widespread agreement on what the "volatile" keyword in C means exactly.
Even the standard itself does not seem to be clear enough for everyone to agree on what it means.
Among other problems:
- It seems to provide different guarantees depending on your hardware and depending on your compiler.
- It affects compiler optimizations but not hardware optimizations, so on an advanced processor that does its own run-time optimizations, it is not even clear whether the compiler can prevent whatever optimization you want to prevent. (Some compilers do generate instructions to prevent some hardware optimizations on some systems, but this does not appear to be standardized in any way.)
To summarize the problem, it appears (after reading a lot) that "volatile" guarantees something like: The value will be read/written not just from/to a register, but at least to the core's L1 cache, in the same order that the reads/writes appear in the code. But this seems useless, since reading/writing from/to a register is already sufficient within the same thread, while coordinating with L1 cache doesn't guarantee anything further regarding coordination with other threads. I can't imagine when it could ever be important to sync just with L1 cache.
USE 1
The only widely-agreed-upon use of volatile seems to be for old or embedded systems where certain memory locations are hardware-mapped to I/O functions, like a bit in memory that controls (directly, in the hardware) a light, or a bit in memory that tells you whether a keyboard key is down or not (because it is connected by the hardware directly to the key).
It seems that "use 1" does not occur in portable code whose targets include multi-core systems.
USE 2
Not too different from "use 1" is memory that could be read or written at any time by an interrupt handler (which might control a light or store info from a key). But already for this we have the problem that depending on the system, the interrupt handler might run on a different core with its own memory cache, and "volatile" does not guarantee cache coherency on all systems.
So "use 2" seems to be beyond what "volatile" can deliver.
USE 3
The only other undisputed use I see is to prevent mis-optimization of accesses via different variables pointing to the same memory that the compiler doesn't realize is the same memory. But this is probably only undisputed because people aren't talking about it -- I only saw one mention of it. And I thought the C standard already recognized that "different" pointers (like different args to a function) might point to the same item or nearby items, and already specified that the compiler must produce code that works even in such cases. However, I couldn't quickly find this topic in the latest (500 page!) standard.
So "use 3" maybe doesn't exist at all?
Hence my question:
Does "volatile" guarantee anything at all in portable C code for multi-core systems?
EDIT -- update
After browsing the latest standard, it is looking like the answer is at least a very limited yes:
1. The standard repeatedly specifies special treatment for the specific type "volatile sig_atomic_t". However the standard also says that use of the signal function in a multi-threaded program results in undefined behavior. So this use case seems limited to communication between a single-threaded program and its signal handler.
2. The standard also specifies a clear meaning for "volatile" in relation to setjmp/longjmp. (Example code where it matters is given in other questions and answers.)
So the more precise question becomes:
Does "volatile" guarantee anything at all in portable C code for multi-core systems, apart from (1) allowing a single-threaded program to receive information from its signal handler, or (2) allowing setjmp code to see variables modified between setjmp and longjmp?
This is still a yes/no question.
If "yes", it would be great if you could show an example of bug-free portable code which becomes buggy if "volatile" is omitted. If "no", then I suppose a compiler is free to ignore "volatile" outside of these two very specific cases, for multi-core targets.
To summarize the problem, it appears (after reading a lot) that
"volatile" guarantees something like: The value will be read/written
not just from/to a register, but at least to the core's L1 cache, in
the same order that the reads/writes appear in the code.
No, it absolutely does not. And that makes volatile almost useless for the purpose of MT safe code.
If it did, then volatile would be quite good for variables shared by multiple thread as ordering the events in the L1 cache is all you need to do in typical CPU (that is either multi-core or multi-CPU on motherboard) capable of cooperating in a way that makes a normal implementation of either C/C++ or Java multithreading possible with typical expected costs (that is, not a huge cost on most atomic or non-contented mutex operations).
But volatile does not provide any guaranteed ordering (or "memory visibility") in the cache either in theory or in practice.
(Note: the following is based on sound interpretation of the standard documents, the standard's intent, historical practice, and a deep understand of the expectations of compiler writers. This approach based on history, actual practices, and expectations and understanding of real persons in the real world, which is much stronger and more reliable than parsing the words of a document that is not known to be stellar specification writing and which has been revised many times.)
In practice, volatile does guarantees ptrace-ability that is the ability to use debug information for the running program, at any level of optimization, and the fact the debug information makes sense for these volatile objects:
- you may use
ptrace
(a ptrace-like mechanism) to set meaningful break points at the sequence points after operations involving volatile objects: you can really break at exactly these points (note that this works only if you are willing to set many break points as any C/C++ statement may be compiled to many different assembly start and end points, as in a massively unrolled loop);
- while a thread of execution of stopped, you may read the value of all volatile objects, as they have their canonical representation (following the ABI for their respective type); a non volatile local variable could have an atypical representation, f.ex. a shifted representation: a variable used for indexing an array might be multiplied by the size of individual objects, for easier indexing; or it might be replaced by a pointer to an array element (as long as all uses of the variable as similarly converted) (think changing dx to du in an integral);
- you can also modify those objects (as long as the memory mappings allow that, as volatile object with static lifetime that are const qualified might be in a memory range mapped read only).
Volatile guarantee in practice a little more than the strict ptrace interpretation: it also guarantees that volatile automatic variables have an address on the stack, as they aren't allocated to a register, a register allocation which would make ptrace manipulations more delicate (compiler can output debug information to explain how variables are allocated to registers, but reading and changing register state is slightly more involved than accessing memory addresses).
Note that full program debug-ability, that is considering all variables volatile at least at sequence points, is provided by the "zero optimization" mode of the compiler, a mode which still performs trivial optimizations like arithmetic simplifications (there is usually no guaranteed no optimization at all mode). But volatile is stronger than non optimization: x-x
can be simplified for a non volatile integer x
but not of a volatile object.
So volatile means guaranteed to be compiled as is, like the translation from source to binary/assembly by the compiler of a system call isn't a reinterpretation, changed, or optimized in any way by a compiler. Note that library calls may or may not be system calls. Many official system functions are actually library function that offer a thin layer of interposition and generally defer to the kernel at the end. (In particular getpid
doesn't need to go to the kernel and could well read a memory location provided by the OS containing the information.)
Volatile interactions are interactions with the outside world of the real machine, which must follow the "abstract machine". They aren't internal interactions of program parts with other program parts. The compiler can only reason about what it knows, that is the internal program parts.
The code generation for a volatile access should follow the most natural interaction with that memory location: it should be unsurprising. That means that some volatile accesses are expected to be atomic: if the natural way to read or write the representation of a long
on the architecture is atomic, then it's expected that a read or write of a volatile long
will be atomic, as the compiler should not generate silly inefficient code to access volatile objects byte by byte, for example.
You should be able to determine that by knowing the architecture. You don't have to know anything about the compiler, as volatile means that the compiler should be transparent.
But volatile does no more than force the emission of expected assembly for the least optimized for particular cases to do a memory operation: volatile semantics means general case semantic.
The general case is what the compiler does when it doesn't have any information about a construct: f.ex. calling a virtual function on an lvalue via dynamic dispatch is a general case, making a direct call to the overrider after determining at compile time the type of the object designated by the expression is a particular case. The compiler always have a general case handling of all constructs, and it follows the ABI.
Volatile does nothing special to synchronize threads or provide "memory visibility": volatile only provides guarantees at the abstract level seen from inside a thread executing or stopped, that is the inside of a CPU core:
- volatile says nothing about which memory operations reach main RAM (you may set specific memory caching types with assembly instructions or system calls to obtain these guarantees);
- volatile doesn't provide any guarantee about when memory operations will be committed to any level of cache (not even L1).
Only the second point means volatile is not useful in most inter threads communication problems; the first point is essentially irrelevant in any programming problem that doesn't involve communication with hardware components outside the CPU(s) but still on the memory bus.
The property of volatile providing guaranteed behavior from the point of the view of the core running the thread means that asynchronous signals delivered to that thread, which are run from the point of view of the execution ordering of that thread, see operations in source code order.
Unless you plan to send signals to your threads (an extremely useful approach to consolidation of information about currently running threads with no previously agreed point of stopping), volatile is not for you.
I'm no expert, but cppreference.com has what appears to me to be some pretty good information on volatile
. Here's the gist of it:
Every access (both read and write) made through an lvalue expression
of volatile-qualified type is considered an observable side effect for
the purpose of optimization and is evaluated strictly according to the
rules of the abstract machine (that is, all writes are completed at
some time before the next sequence point). This means that within a
single thread of execution, a volatile access cannot be optimized out
or reordered relative to another visible side effect that is separated
by a sequence point from the volatile access.
It also gives some uses:
Uses of volatile
1) static volatile objects model memory-mapped I/O ports, and static
const volatile objects model memory-mapped input ports, such as a
real-time clock
2) static volatile objects of type sig_atomic_t are used for
communication with signal handlers.
3) volatile variables that are local to a function that contains an
invocation of the setjmp macro are the only local variables guaranteed
to retain their values after longjmp returns.
4) In addition, volatile variables can be used to disable certain
forms of optimization, e.g. to disable dead store elimination or
constant folding for microbenchmarks.
And of course, it mentions that volatile
is not useful for thread synchronization:
Note that volatile variables are not suitable for communication
between threads; they do not offer atomicity, synchronization, or
memory ordering. A read from a volatile variable that is modified by
another thread without synchronization or concurrent modification from
two unsynchronized threads is undefined behavior due to a data race.
First of all, there's historically been various hiccups regarding different intepretations of the meaning of volatile
access and similar. See this study: Volatiles Are Miscompiled, and What to Do about It.
Apart from the various issues mentioned in that study, the behavior of volatile
is portable, save for one aspect of them: when they act as memory barriers. A memory barrier is some mechanism which is there to prevent concurrent unsequenced execution of your code. Using volatile
as a memory barrier is certainly not portable.
Whether the C language guarantees memory behavior or not from volatile
is apparently arguable, though personally I think the language is clear. First we have the formal definition of side effects, C17 5.1.2.3:
Accessing a volatile
object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.
The standard defines the term sequencing, as a way of determining order of evaluation (execution). The definition is formal and cumbersome:
Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations
executed by a single thread, which induces a partial order among those evaluations.
Given any two evaluations A and B, if A is sequenced before B, then the execution of A
shall precede the execution of B. (Conversely, if A is sequenced before B, then B is
sequenced after A.) If A is not sequenced before or after B, then A and B are
unsequenced. Evaluations A and B are indeterminately sequenced when A is sequenced
either before or after B, but it is unspecified which.13) The presence of a sequence point
between the evaluation of expressions A and B implies that every value computation and
side effect associated with A is sequenced before every value computation and side effect
associated with B. (A summary of the sequence points is given in annex C.)
The TL;DR of the above is basically that in case we have an expression A
which contains side-effects, it must be done executing before another expression B
, in case B
is sequenced after A
.
Optimizations of C code are made possible through this part:
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual
implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a
volatile object).
This means that the program may evaluate (execute) expressions in the order that the standard mandates elsewhere (order of evaluation etc). But it need not evaluate (execute) a value if it can deduce that it is not used. For example, the operation 0 * x
doesn't need to evaluate x
and simply replace the expression with 0
.
Unless accessing a variable is a side-effect. Meaning that in case x
is volatile
, it must evaluate (execute) 0 * x
even though the result will always be 0. Optimization is not allowed.
Furthermore, the standard speaks of observable behavior:
The least requirements on a conforming implementation are:
- Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
/--/
This is the observable behavior of the program.
Given all of the above, a conforming implementation (compiler + underlying system) may not execute the access of volatile
objects in an unsequenced order, in case the semantics of the written C source says otherwise.
This means that in this example
volatile int x;
volatile int y;
z = x;
z = y;
Both assignment expressions must be evaluated and z = x;
must be evaluated before z = y;
. A multi-processor implementation that outsource these two operations to two different unsequences cores is not conforming!
The dilemma is that compilers can't do much about things like pre-fetch caching and instruction pipelining etc, particularly not when running on top of an OS. And so compilers hand that problem over to the programmers, telling them that memory barriers is now the programmer's responsibility. While the C standard clearly states that the problem needs to be solved by the compiler.
The compiler doesn't necessarily care to solve the problem though, and so volatile
for the sake of acting as a memory barrier is non-portable. It has become a quality of implementation issue.