Java specification guarantees primitive variable assignments are always atomic (expect for long
and double types
.
On the contrary, Fetch-and-Add operation corresponding to the famous i++
increment operation, would be non-atomic because leading to a read-modify-write operation.
Assuming this code:
public void assign(int b) {
int a = b;
}
The generated bytecode is:
public void assign(int);
Code:
0: iload_1
1: istore_2
2: return
Thus, we see the assignment is composed of two steps (loading and storing).
Assuming this code:
public void assign(int b) {
int i = b++;
}
Bytecode:
public void assign(int);
Code:
0: iload_1
1: iinc 1, 1 //extra step here regarding the previous sample
4: istore_2
5: return
Knowing that X86 processor can (at least modern ones), operates increment operation atomically, as said:
In computer science, the fetch-and-add CPU instruction is a special
instruction that atomically modifies the contents of a memory
location. It is used to implement mutual exclusion and concurrent
algorithms in multiprocessor systems, a generalization of semaphores.
Thus, first question: Despite of the fact that bytecode requires both steps (loading and storage), does Java rely on the fact that assignment operation is an operation always carried out atomically whatever the processor's architecture and so can ensure permanent atomicity (for primitive assignments) in its specification?
Second question: Is it wrong to confirm that with very modern X86 processor and without sharing compiled code across different architectures, there's no need at all to synchronize the i++
operation (or AtomicInteger
)? Considering it already atomic.
Considering the Second question.
You imply that i++
will translate into the X86 Fetch-And-Add instruction which is not true. If the code is compiled and optimized by the JVM it may be true (would have to check the source code of JVM to confirm that), but that code can also run in interpreted mode, where the fetch and add are seperated and not synchronized.
Out of curiosity I checked what assembly code is generated for this Java code:
public class Main {
volatile int a;
static public final void main (String[] args) throws Exception {
new Main ().run ();
}
private void run () {
for (int i = 0; i < 1000000; i++) {
increase ();
}
}
private void increase () {
a++;
}
}
I used Java HotSpot(TM) Server VM (17.0-b12-fastdebug) for windows-x86 JRE (1.6.0_20-ea-fastdebug-b02), built on Apr 1 2010 03:25:33
version of JVM (this one I had somewhere on my drive).
These is the crucial output of running it (java -server -XX:+PrintAssembly -cp . Main
):
At first it is compiled into this:
00c PUSHL EBP
SUB ESP,8 # Create frame
013 MOV EBX,[ECX + #8] # int ! Field VolatileMain.a
016 MEMBAR-acquire ! (empty encoding)
016 MEMBAR-release ! (empty encoding)
016 INC EBX
017 MOV [ECX + #8],EBX ! Field VolatileMain.a
01a MEMBAR-volatile (unnecessary so empty encoding)
01a LOCK ADDL [ESP + #0], 0 ! membar_volatile
01f ADD ESP,8 # Destroy frame
POPL EBP
TEST PollPage,EAX ! Poll Safepoint
029 RET
Then it is inlined and compiled into this:
0a8 B11: # B11 B12 <- B10 B11 Loop: B11-B11 inner stride: not constant post of N161 Freq: 0.999997
0a8 MOV EBX,[ESI] # int ! Field VolatileMain.a
0aa MEMBAR-acquire ! (empty encoding)
0aa MEMBAR-release ! (empty encoding)
0aa INC EDI
0ab INC EBX
0ac MOV [ESI],EBX ! Field VolatileMain.a
0ae MEMBAR-volatile (unnecessary so empty encoding)
0ae LOCK ADDL [ESP + #0], 0 ! membar_volatile
0b3 CMP EDI,#1000000
0b9 Jl,s B11 # Loop end P=0.500000 C=126282.000000
As you can see it does not use Fetch-And-Add instructions for a++
.
Even if the i++ would translate into an X86 Fetch-And-Add instruction would change nothing because the memory mentionned in the Fetch-And-Add instruction refers to the local memory registres of the CPU and not to the general memory of the device/application. On a modern CPU, this property will extend to the local memory caches of the CPU and can even extend to the various caches used by the different cores for a multicores CPU but in the case of a multithreading application; there is absolutely no garanty that this distribution will extend to the copy of the memory used by the threads themselves.
In clear, in a multithread application, if a variable can be modified by different threads running at the same time then you must use some synchronisation mecanism provided by the system and you cannot rely on the fact that the instruction i++ occupies a single line of java code to be atomic.
Regarding your first question: the read and the write are atomic, but the read/write operation is not. I could not find a specific reference on primitives but the JLS #17.7 says something similar regarding references:
Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values.
So in your case, both the iload and istore are atomic, but the whole (iload, istore) operation is not.
Is it wrong to [consider that] there's no need at all to synchronize the i++ operation?
Regarding your second question, the code below prints 982 on my x86 machine (and not 1,000) which shows that some ++
got lost in translation ==> you need to properly synchronize a ++
operation even on a processor architecture that supports a fetch-and-add instruction.
public class Test1 {
private static int i = 0;
public static void main(String args[]) throws InterruptedException {
ExecutorService executor = Executors.newFixedThreadPool(10);
final CountDownLatch start = new CountDownLatch(1);
final Set<Integer> set = new ConcurrentSkipListSet<>();
Runnable r = new Runnable() {
@Override
public void run() {
try {
start.await();
} catch (InterruptedException ignore) {}
for (int j = 0; j < 100; j++) {
set.add(i++);
}
}
};
for (int j = 0; j < 10; j++) {
executor.submit(r);
}
start.countDown();
executor.shutdown();
executor.awaitTermination(1, TimeUnit.SECONDS);
System.out.println(set.size());
}
}