Relation between bytecode instructions and process

2020-05-27 04:24发布

Java specification guarantees primitive variable assignments are always atomic (expect for long and double types.

On the contrary, Fetch-and-Add operation corresponding to the famous i++ increment operation, would be non-atomic because leading to a read-modify-write operation.

Assuming this code:

public void assign(int b) {
    int a = b;
}

The generated bytecode is:

public void assign(int);
    Code:
       0: iload_1       
       1: istore_2      
       2: return 

Thus, we see the assignment is composed of two steps (loading and storing).

Assuming this code:

public void assign(int b) {
        int i = b++;
}

Bytecode:

public void assign(int);
    Code:
       0: iload_1       
       1: iinc          1, 1    //extra step here regarding the previous sample
       4: istore_2      
       5: return 

Knowing that X86 processor can (at least modern ones), operates increment operation atomically, as said:

In computer science, the fetch-and-add CPU instruction is a special instruction that atomically modifies the contents of a memory location. It is used to implement mutual exclusion and concurrent algorithms in multiprocessor systems, a generalization of semaphores.

Thus, first question: Despite of the fact that bytecode requires both steps (loading and storage), does Java rely on the fact that assignment operation is an operation always carried out atomically whatever the processor's architecture and so can ensure permanent atomicity (for primitive assignments) in its specification?

Second question: Is it wrong to confirm that with very modern X86 processor and without sharing compiled code across different architectures, there's no need at all to synchronize the i++ operation (or AtomicInteger)? Considering it already atomic.

3条回答
Melony?
2楼-- · 2020-05-27 04:31

Regarding your first question: the read and the write are atomic, but the read/write operation is not. I could not find a specific reference on primitives but the JLS #17.7 says something similar regarding references:

Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values.

So in your case, both the iload and istore are atomic, but the whole (iload, istore) operation is not.

Is it wrong to [consider that] there's no need at all to synchronize the i++ operation?

Regarding your second question, the code below prints 982 on my x86 machine (and not 1,000) which shows that some ++ got lost in translation ==> you need to properly synchronize a ++ operation even on a processor architecture that supports a fetch-and-add instruction.

public class Test1 {

    private static int i = 0;

    public static void main(String args[]) throws InterruptedException {
        ExecutorService executor = Executors.newFixedThreadPool(10);
        final CountDownLatch start = new CountDownLatch(1);
        final Set<Integer> set = new ConcurrentSkipListSet<>();
        Runnable r = new Runnable() {
            @Override
            public void run() {
                try {
                    start.await();
                } catch (InterruptedException ignore) {}
                for (int j = 0; j < 100; j++) {
                    set.add(i++);
                }
            }
        };

        for (int j = 0; j < 10; j++) {
            executor.submit(r);
        }
        start.countDown();
        executor.shutdown();
        executor.awaitTermination(1, TimeUnit.SECONDS);
        System.out.println(set.size());
    }
}
查看更多
贼婆χ
3楼-- · 2020-05-27 04:32

Even if the i++ would translate into an X86 Fetch-And-Add instruction would change nothing because the memory mentionned in the Fetch-And-Add instruction refers to the local memory registres of the CPU and not to the general memory of the device/application. On a modern CPU, this property will extend to the local memory caches of the CPU and can even extend to the various caches used by the different cores for a multicores CPU but in the case of a multithreading application; there is absolutely no garanty that this distribution will extend to the copy of the memory used by the threads themselves.

In clear, in a multithread application, if a variable can be modified by different threads running at the same time then you must use some synchronisation mecanism provided by the system and you cannot rely on the fact that the instruction i++ occupies a single line of java code to be atomic.

查看更多
ら.Afraid
4楼-- · 2020-05-27 04:39

Considering the Second question.

You imply that i++ will translate into the X86 Fetch-And-Add instruction which is not true. If the code is compiled and optimized by the JVM it may be true (would have to check the source code of JVM to confirm that), but that code can also run in interpreted mode, where the fetch and add are seperated and not synchronized.

Out of curiosity I checked what assembly code is generated for this Java code:

public class Main {
    volatile int a;

  static public final void main (String[] args) throws Exception {
    new Main ().run ();
  }

  private void run () {
      for (int i = 0; i < 1000000; i++) {
        increase ();
      }  
  } 

  private void increase () {
    a++;
  }
}

I used Java HotSpot(TM) Server VM (17.0-b12-fastdebug) for windows-x86 JRE (1.6.0_20-ea-fastdebug-b02), built on Apr 1 2010 03:25:33 version of JVM (this one I had somewhere on my drive).

These is the crucial output of running it (java -server -XX:+PrintAssembly -cp . Main):

At first it is compiled into this:

00c     PUSHL  EBP
    SUB    ESP,8    # Create frame
013     MOV    EBX,[ECX + #8]   # int ! Field  VolatileMain.a
016     MEMBAR-acquire ! (empty encoding)
016     MEMBAR-release ! (empty encoding)
016     INC    EBX
017     MOV    [ECX + #8],EBX ! Field  VolatileMain.a
01a     MEMBAR-volatile (unnecessary so empty encoding)
01a     LOCK ADDL [ESP + #0], 0 ! membar_volatile
01f     ADD    ESP,8    # Destroy frame
    POPL   EBP
    TEST   PollPage,EAX ! Poll Safepoint

029     RET

Then it is inlined and compiled into this:

0a8   B11: #    B11 B12 &lt;- B10 B11   Loop: B11-B11 inner stride: not constant post of N161 Freq: 0.999997
0a8     MOV    EBX,[ESI]    # int ! Field  VolatileMain.a
0aa     MEMBAR-acquire ! (empty encoding)
0aa     MEMBAR-release ! (empty encoding)
0aa     INC    EDI
0ab     INC    EBX
0ac     MOV    [ESI],EBX ! Field  VolatileMain.a
0ae     MEMBAR-volatile (unnecessary so empty encoding)
0ae     LOCK ADDL [ESP + #0], 0 ! membar_volatile
0b3     CMP    EDI,#1000000
0b9     Jl,s  B11   # Loop end  P=0.500000 C=126282.000000

As you can see it does not use Fetch-And-Add instructions for a++.

查看更多
登录 后发表回答