Does the C++ volatile keyword introduce a memory f

2020-01-25 15:32发布

I understand that volatile informs the compiler that the value may be changed, but in order to accomplish this functionality, does the compiler need to introduce a memory fence to make it work?

From my understanding, the sequence of operations on volatile objects cannot be reordered and must be preserved. This seems to imply some memory fences are necessary and that there isn't really a way around this. Am I correct in saying this?


There is an interesting discussion at this related question

Jonathan Wakely writes:

... Accesses to distinct volatile variables cannot be reordered by the compiler as long as they occur in separate full expressions ... right that volatile is useless for thread-safety, but not for the reasons he gives. It's not because the compiler might reorder accesses to volatile objects, but because the CPU might reorder them. Atomic operations and memory barriers prevent the compiler and the CPU from reordering

To which David Schwartz replies in the comments:

... There's no difference, from the point of view of the C++ standard, between the compiler doing something and the compiler emitting instructions that cause the hardware to do something. If the CPU may reorder accesses to volatiles, then the standard doesn't require that their order be preserved. ...

... The C++ standard doesn't make any distinction about what does the reordering. And you can't argue that the CPU can reorder them with no observable effect so that's okay -- the C++ standard defines their order as observable. A compiler is compliant with the C++ standard on a platform if it generates code that makes the platform do what the standard requires. If the standard requires accesses to volatiles not be reordered, then a platform the reorders them isn't compliant. ...

My point is that if the C++ standard prohibits the compiler from reordering accesses to distinct volatiles, on the theory that the order of such accesses is part of the program's observable behavior, then it also requires the compiler to emit code that prohibits the CPU from doing so. The standard does not differentiate between what the compiler does and what the compiler's generate code makes the CPU do.

Which does yield two questions: Is either of them "right"? What do actual implementations really do?

13条回答
乱世女痞
2楼-- · 2020-01-25 16:06

First of all, the C++ standards do not guarantee the memory barriers needed for properly ordering the read / writes that are non atomic. volatile variables are recommended for using with MMIO, signal handling, etc. On most implementations volatile is not useful for multi-threading and it's not generally recommended.

Regarding the implementation of volatile accesses, this is the compiler choice.

This article, describing gcc behavior shows that you cannot use a volatile object as a memory barrier to order a sequence of writes to volatile memory.

Regarding icc behavior I found this source telling also that volatile does not guarantee ordering memory accesses.

Microsoft VS2013 compiler has a different behavior. This documentation explains how volatile enforces Release / Acquire semantics and enables volatile objects to be used in locks / releases on multi-threaded applications.

Another aspect that needs to be taken into considerations is that the same compiler may have a different behavior wrt. to volatile depending on the targeted hardware architecture. This post regarding the MSVS 2013 compiler clearly states the specifics of compiling with volatile for ARM platforms.

So my answer to:

Does the C++ volatile keyword introduce a memory fence?

would be: Not guaranteed, probably not but some compilers might do it. You should not rely on the fact that it does.

查看更多
走好不送
3楼-- · 2020-01-25 16:10

What David is overlooking is the fact that the c++ standard specifies the behavior of several threads interacting only in specific situations and everything else results in undefined behavior. A race condition involving at least one write is undefined if you don't use atomic variables.

Consequently the compiler is perfectly in its right to forego any synchronization instructions since your cpu'll only notice the difference in a program that exhibits undefined behavior due to missing synchronization.

查看更多
看我几分像从前
4楼-- · 2020-01-25 16:17

The compiler needs to introduce a memory fence around volatile accesses if, and only if, that is necessary to make the uses for volatile specified in the standard work (setjmp, signal handlers, and so on) on that particular platform.

Note that some compilers do go way beyond what's required by the C++ standard in order to make volatile more powerful or useful on those platforms. Portable code shouldn't rely on volatile to do anything beyond what's specified in the C++ standard.

查看更多
Lonely孤独者°
5楼-- · 2020-01-25 16:17

I think the confusion around volatile and instruction reordering stems from the 2 notions of reorderings CPUs do:

  1. Out-of-order execution.
  2. Sequence of memory read/writes as seen by other CPUs (reordering in a sense that each CPU might see a different sequence).

Volatile affects how a compiler generates the code assuming single threaded execution (this includes interrupts). It doesn't imply anything about memory barrier instructions, but it rather precludes a compiler from performing certain kinds of optimizations related to memory accesses.
A typical example is re-fetching a value from memory, instead of using one cached in a register.

Out-of-order execution

CPUs can execute instructions out-of-order/speculatively provided that the end result could have happened in the original code. CPUs can perform transformations that are disallowed in compilers because compilers can only perform transformations which are correct in all circumstances. In contrast, CPUs can check the validity of these optimizations and back out of them if they turn out to be incorrect.

Sequence of memory read/writes as seen by other CPUs

The end result of a sequence of instruction, the effective order, must agree with the semantics of the code generated by a compiler. However the actual execution order chosen by the CPU can be different. The effective order as seen in other CPUs (every CPU can have a different view) can be constrained by memory barriers.
I'm not sure how much effective and actual order can differ because I don't know to what extent memory barriers can preclude CPUs from performing out-of-order execution.

Sources:

查看更多
Anthone
6楼-- · 2020-01-25 16:18

It doesn't have to. Volatile is not a synchronization primitive. It just disables optimisations, i.e. you get a predictable sequence of reads and writes within a thread in the same order as prescribed by the abstract machine. But reads and writes in different threads have no order in the first place, it makes no sense to speak of preserving or not preserving their order. The order between theads can be established by synchronization primitives, you get UB without them.

A bit of explanation regarding memory barriers. A typical CPU has several levels of memory access. There is a memory pipeline, several levels of cache, then RAM etc.

Membar instructions flush the pipeline. They don't change the order in which reads and writes are executed, it just forces outstanding ones to be executed at a given moment. It is useful for multithreaded programs, but not much otherwise.

Cache(s) are normally automatically coherent between CPUs. If one wants to make sure the cache is in sync with RAM, cache flush is needed. It is very different from a membar.

查看更多
地球回转人心会变
7楼-- · 2020-01-25 16:21

Rather than explaining what volatile does, allow me to explain when you should use volatile.

  • When inside an signal handler. Because writing to a volatile variable is pretty much the only thing the standard allows you to do from within a signal handler. Since C++11 you can use std::atomic for that purpose, but only if the atomic is lock-free.
  • When dealing with setjmp according to Intel.
  • When dealing directly with hardware and you want to ensure that the compiler does not optimize your reads or writes away.

For example:

volatile int *foo = some_memory_mapped_device;
while (*foo)
    ; // wait until *foo turns false

Without the volatile specifier, the compiler is allowed to completely optimize the loop away. The volatile specifier tells the compiler that it may not assume that 2 subsequent reads return the same value.

Note that volatile has nothing to do with threads. The above example is does not work if there was a different thread writing to *foo because there is no acquire operation involved.

In all other cases, usage of volatile should be considered non-portable and not pass code review anymore except when dealing with pre-C++11 compilers and compiler extensions (such as msvc's /volatile:ms switch, which is enabled by default under X86/I64).

查看更多
登录 后发表回答