Suppose we have a volatile int a
. One thread does
while (true) {
a = 1;
a = 0;
}
and another thread does
while (true) {
System.out.println(a+a);
}
Now, would it be illegal for a JIT compiler to emit assembly corresponding to 2*a
instead of a+a
?
On one hand the very purpose of a volatile read is that it should always be fresh from memory.
On the other hand, there's no synchronization point between the two reads, so I can't see that it would be illegal to treat a+a
atomically, in which case I don't see how an optimization such as 2*a
would break the spec.
References to JLS would be appreciated.
Short answer:
Yes, this optimization is allowed. Collapsing two sequential read operations produes the observable behavior of the sequence being atomic, but does not appear as a reordering of operations. Any sequence of actions performed on a single thread of execution can be executed as an atomic unit. In general, it is difficult to ensure a sequence of operations executes atomically, and it rarely results in a performance gain because most execution environments introduce overhead to execute items atomically.
In the example given by the original question, the sequence of operations in question is the following:
Performing these operations atomically guarantees that the value read on the first line is equal to the value read on the second line. Furthermore, it means the value read on the second line is the value contained in
a
at the time the first read was executed (and vice versa, because atomic both read operations occurred at the same time according to the observable execution state of the program). The optimization in question, which is reusing the value of the first read for the second read, is equivalent to the compiler and/or JIT executing the sequence atomically, and is thus valid.Original longer answer:
The Java Memory Model describes operations using a happens-before partial ordering. In order to express the restriction that the first read
r1
and second readr2
ofa
cannot be collapsed, you need to show that some operation is semantically required to appear between them.The operations on the thread with
r1
andr2
is the following:To express the requirement that something (say
y
) lie betweenr1
andr2
, you need to require thatr1
happens-beforey
andy
happens-beforer2
. As it happens, there is no rule where a read operation appears on the left side of a happens-before relationship. The closest you could get is sayingy
happens-beforer2
, but the partial order would allowy
to also occur beforer1
, thus collapsing the read operations.If no scenario exists which requires an operation to fall between
r1
andr2
, then you can declare that no operation ever appears betweenr1
andr2
and not violate the required semantics of the language. Using a single read operation would be equivalent to this claim.Edit My answer is getting voted down, so I'm going to go into additional details.
Here are some related questions:
Is the Java compiler or JVM required to collapse these read operations?
No. The expressions
a
anda
used in the add expression are not constant expressions, so there is no requirement that they be collapsed.Does the JVM collapse these read operations?
To this, I'm not sure of the answer. By compiling a program and using
javap -c
, it's easy to see that the Java compiler does not collapse these read operations. Unfortunately it's not as easy to prove the JVM does not collapse the operations (or even tougher, the processor itself).Should the JVM collapse these read operations?
Probably not. Each optimization takes time to execute, so there is a balance between the time it takes to analyze the code and the benefit you expect to gain. Some optimizations, such as array bounds check elimination or checking for null references, have proven to have extensive benefits for real-world applications. The only case where this particular optimization has the possibility of improving performance is cases where two identical read operations appear sequentially.
Furthermore, as shown by the response to this answer along with the other answers, this particular change would result in an unexpected behavior change for certain applications which users may not desire.
Edit 2: Regarding Rafael's description of a claim that two read operations that cannot be reordered. This statement is designed to highlight the fact that caching the read operation of
a
in the following sequence could produce an incorrect result:Suppose initially
a
andb
have their default value 0. Then you execute just the firstread(a)
.Now suppose another thread executes the following sequence:
Finally, suppose the first thread executes the line
read(b)
. If you were to cache the originally read value ofa
, you would end up with the following call:This is not correct. Since the updated value of
a
was stored before writing tob
, there is no way to read the valueb1 = 1
and then read the valuea2 = 0
. Without caching, the correct sequence of events leads to the following call.However, if you were to ask the question "Is there any way to allow the read of
a
to be cached?", the answer is yes. If you can execute all three read operations in the first thread sequence as an atomic unit, then caching the value is allowed. While synchronizing across multiple variables is difficult and rarely provides an opportunistic optimization advantage, it is certainly conceivable to encounter an exception. For example, supposea
andb
are each 4 bytes, and they appear sequentially in memory witha
aligned on an 8-byte boundary. A 64-bit process could implement the sequenceread(a) read(b)
as an atomic 64-bit load operation, which would allow the value ofa
to be cached (effectively treating all three read operations as an atomic operation instead of just the first two).Modified the OP Problem a little
If the above code have been executed Sequentially, then there exist a valid Sequential Consistent Execution which will break the thread2 while loop.
Whereas if compiler optimizes a+a to 2a then thread2 while loop will never exist.
So the above optimization will prohibit one particular execution if it had been Sequentially Executed Code.
Main Question, is this optimization a Problem ?
Ans. A program is correctly synchronized if, when it is executed in a sequentially consistent manner, there are no data races. Refer Example 17.4.8-1 from JLS chapter 17
Sequential Consistency is a strong guarantee. The Execution Path where compiler optimizes a+a as 2a is also a valid Sequentially Consistent Execution. So the Answer is Yes.
Ans. Sequential Consistency implies that happens before guarantee is valid here . So the Answer is Yes. JLS ref
So i don't think optimization is invalid legally at least in the OP case. The case where the Thread 2 while loops stucks into an infinte is also quite possible without compiler transformation.
That is not how the Java Language Specification defines volatile. The JLS simply says:
Therefore, a write to a volatile variable happens-before (and is visible to) any subsequent reads of that same variable.
This constraint is trivially satisfied for a read that is not subsequent. That is, volatile only ensures visibility of a write if the read is known to occur after the write.
This is not the case in your program. For every well formed execution that observes a to be 1, I can construct another well formed execution where a is observed to be 0, simply be moving the read after the write. This is possible because the happens-before relation looks as follows:
That is, all the JMM guarantees for your program is that a+a will yield 0, 1 or 2. That is satisfied if a+a always yields 0. Just as the operating system is permitted to execute this program on a single core, and always interrupt thread 1 before the same instruction of the loop, the JVM is permitted to reuse the value - after all, the observable behavior remains the same.
In general, moving the read across the write violates happens-before consistency, because some other synchronization action is "in the way". In the absence of such intermediary synchronization actions, a volatile read can be satisfied from a cache.
In my original answer, I argued against the legality of the suggested optimization. I backed this mainly from information of the JSR-133 cookbook where it states that a volatile read must not be reordered with another volatile read and where it further states that a cached read is to be treated as a reordering. The latter statement is however formulated with some ambiguouity which is why I went through the formal definition of the JMM where I did not find such indication. Therefore, I would now argue that the optimization is allowed. However, the JMM is quite complex and the discussion on this page indicates that this corner case might be decided differently by someone with a more thorough understanding of the formalism.
Denoting thread 1 to execute
and thread 2 to execute:
The two reads
r_i
and two writesw_i
ofa
are synchronization actions asa
isvolatile
(JSR 17.4.2). They are external actions as variablea
is used in several threads. These actions are contained in the set of all actionsA
. There exists a total order of all synchronization actions, the synchronization order which is consistent with program order for thread 1 and thread 2 (JSR 17.4.4). From the definition of the synchronizes-with partial order, there is no edge defined for this order in the above code. As a consequence, the happens-before order only reflects the intra-thread semantics of each thread (JSR 17.4.5).With this, we define
W
as a write-seen function whereW(r_i) = w_2
and a value-written functionV(w_i) = w_2
(JLS 17.4.6). I took some freedom and eliminatedw_1
as it makes this outline of a formal proof even simpler. The question is of this proposed executionE
is well-formed (JLS 17.5.7). The proposed executionE
obeys intra-thread semantics, is happens-before consistent, obeys the synchronized-with order and each read observes a consistent write. Checking the causality requirements is trivial (JSR 17.4.8). I do neither see why the rules for non-terminating executions would be relevant as the loop covers the entire discussed code (JLS 17.4.9) and we do not need to distinguish observable actions.For all this, I cannot find any indication of why this optimization would be forbidden. Nevertheless, it is not applied for
volatile
reads by the HotSpot VM as one can observe using-XX:+PrintAssembly
. I assume that the performance benefits are however minor and this pattern is not normally observed.Remark: After watching the Java memory model pragmatics (multiple times), I am pretty sure, this reasoning is correct.
As laid out in other answers there are two reads and two writes. Imagine the following execution (T1 and T2 denote two threads), using annotations that match the JLS statement below:
a = 0 //W(r)
read temp1 = a //r_initial
a = 1 //w
read temp2 = a //r
print temp1+temp2
In a concurrrent environment this is definitely a possible thread interleaving. Your question is then: would the JVM be allowed to make
r
observeW(r)
and read 0 instead of 1?JLS #17.4.5 states:
The optimisation you propose (
temp = a; print (2 * temp);
) would violate that requirement. So your optimisation can only work if there is no intervening write betweenr_initial
andr
, which can't be guaranteed in a typical multi threaded framework.As a side comment, note however that there is no guarantee as to how long it will take for the writes to become visible from the reading thread. See for example: Detailed semantics of volatile regarding timeliness of visibility.