Note: this question isn't related to volatile, AtomicLong, or any perceived deficiency in the described use case.
The property I am trying to prove or rule out is as follows:
Given the following:
- a recent 64-bit OpenJDK 7/8 (preferably 7, but 8 also helpful)
- a multiprocessing Intel-base system
- a non-volatile long primitive variable
- multiple unsynchronized mutator threads
- an unsynchronized observer thread
Is the observer always guaranteed to encounter intact values as written by a mutator thread, or is word tearing a danger?
JLS: Inconclusive
This property exists for 32-bit primitives and 64-bit object references, but isn't guaranteed by the JLS for longs and doubles:
17.7. Non-atomic Treatment of double and long:
For the purposes of the Java programming language memory model, a single write to a non-volatile long or double value is treated as two separate writes: one to each 32-bit half. This can result in a situation where a thread sees the first 32 bits of a 64-bit value from one write, and the second 32 bits from another write.
But hold your horses:
[...] For efficiency's sake, this behavior is implementation-specific; an implementation of the Java Virtual Machine is free to perform writes to long and double values atomically or in two parts. Implementations of the Java Virtual Machine are encouraged to avoid splitting 64-bit values where possible. [...]
So, the JLS allows JVM implementations to split 64-bit writes, and encourages developers to adjust accordingly, but also encourages JVM implementors to stick with 64-bit writes. We don't have an answer for recent versions of HotSpot yet.
HotSpot JIT: Careful Optimism
Since word tearing is most likely to occur within the confines of tight loops and other hotspots, I've tried to analyze actual assembly output from JIT compilation. To make a long story short: further testing is needed, but I can only see atomic 64-bit operations on longs.
I used hdis, a disassembler plugin for OpenJDK. Having built and installed the plugin in my aging OpenJDK 7u25 build, I proceeded to write a short program:
public class Counter {
static long counter = 0;
public static void main(String[] _) {
for (long i = (long)1e12; i < (long)1e12 + 1e5; i++)
put(i);
System.out.println(counter);
}
static void put(long v) {
counter += v;
}
}
I made sure to always use values larger than MAX_INT (1e12 to 1e12+1e5), and repeat the operation enough times (1e5) to trigger JIT.
After compilation, I executed Counter.main() with hdis, like so:
java -XX:+UnlockDiagnosticVMOptions \
-XX:PrintAssemblyOptions=intel \
-XX:CompileCommand=print,Counter.put \
Counter
The assembly generated for Counter.put() by the JIT was as follows (decimal line numbers added for convenience):
01 # {method} 'put' '(J)V' in 'Counter'
02 ⇒ # parm0: rsi:rsi = long
03 # [sp+0x20] (sp of caller)
04 0x00007fdf61061800: sub rsp,0x18
05 0x00007fdf61061807: mov QWORD PTR [rsp+0x10],rbp ;*synchronization entry
06 ; - Counter::put@-1 (line 15)
07 0x00007fdf6106180c: movabs r10,0x7d6655660 ; {oop(a 'java/lang/Class' = 'Counter')}
08 ⇒ 0x00007fdf61061816: add QWORD PTR [r10+0x70],rsi ;*putstatic counter
09 ; - Counter::put@5 (line 15)
10 0x00007fdf6106181a: add rsp,0x10
11 0x00007fdf6106181e: pop rbp
12 0x00007fdf6106181f: test DWORD PTR [rip+0xbc297db],eax # 0x00007fdf6cc8b000
13 ; {poll_return}
The interesting lines are marked with '⇒'. As you can see, the add operation is performed over a quad-word (64-bit), using 64-bit registers (rsi).
I also tried to see if byte alignment is an issue by adding a byte-typed padding variable just prior to 'long counter'. The only difference in assembly output was:
before
0x00007fdf6106180c: movabs r10,0x7d6655660 ; {oop(a 'java/lang/Class' = 'Counter')}
after
0x00007fdf6106180c: movabs r10,0x7d6655668 ; {oop(a 'java/lang/Class' = 'Counter')}
Both addresses are 64-bit aligned and those 'movabs r10, ...' calls are using 64-bit registers.
So far, I've only tested addition. I assume subtraction behaves similarly.
Other operations, such as bitwise operations, assignment, multiplication, etc remain to be tested (or confirmed by somebody sufficiently familiar with HotSpot internals).
Interpreter: Inconclusive
This leaves us with the non-JIT scenario. Let's decompile Compiler.class:
$ javap -c Counter
[...]
static void put(long);
Code:
0: getstatic #8 // Field counter:J
3: lload_0
4: ladd
5: putstatic #8 // Field counter:J
8: return
[...]
...and we will be interested in the 'ladd' bytecode instruction on line #7. However, I've been unable to trace it through to platform-specific implementation so far.
Your help appreciated!