What is the synchronization cost of calling a sync

2019-02-08 06:33发布

站内文章 / Java

20 0

祖国的老花朵

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Is there any difference in performance between this

synchronized void x() {
    y();
}

synchronized void y() {
}

and this

synchronized void x() {
    y();
}

void y() {
}

回答1:

Yes, there is an additional performance cost, unless and until the JVM inlines the call to y(), which a modern JIT compiler will do in fairly short order. First, consider the case you've presented in which y() is visible outside the class. In this case, the JVM must check on entering y() to ensure that it can enter the monitor on the object; this check will always succeed when the call is coming from x(), but it can't be skipped, because the call could be coming from a client outside the class. This additional check incurs a small cost.

Additionally, consider the case in which y() is private. In this case, the compiler still does not optimize away the synchronization; see the following disassembly of an empty y():

private synchronized void y();
  flags: ACC_PRIVATE, ACC_SYNCHRONIZED
  Code:
    stack=0, locals=1, args_size=1
       0: return

According to the spec's definition of synchronized, each entrance into a synchronized block or method performs lock action on the object, and leaving performs an unlock action. No other thread can acquire that object's monitor until the lock counter goes down to zero. Presumably some sort of static analysis could demonstrate that a private synchronized method is only ever called from within other synchronized methods, but Java's multi-source-file support would make that fragile at best, even ignoring reflection. This means that the JVM must still increment the counter on entering y():

Monitor entry on invocation of a synchronized method, and monitor exit on its return, are handled implicitly by the Java Virtual Machine's method invocation and return instructions, as if monitorenter and monitorexit were used.

@AmolSonawane correctly notes that the JVM may optimize this code at runtime by performing lock coarsening, essentially inlining the y() method. In this case, after the JVM has decided to perform a JIT optimization, calls from x() to y() will not incur any additional performance overhead, but of course calls directly to y() from any other location will still need to acquire the monitor separately.

回答2:

Results of a micro benchmark run with jmh

Benchmark                      Mean     Mean error    Units
c.a.p.SO18996783.syncOnce      21.003        0.091  nsec/op
c.a.p.SO18996783.syncTwice     20.937        0.108  nsec/op

=> no statistical difference.

Looking at the generated assembly shows that lock coarsening has been performed and y_sync has been inlined in x_sync although it is synchronized.

Full results:

Benchmarks: 
# Running: com.assylias.performance.SO18996783.syncOnce
Iteration   1 (5000ms in 1 thread): 21.049 nsec/op
Iteration   2 (5000ms in 1 thread): 21.052 nsec/op
Iteration   3 (5000ms in 1 thread): 20.959 nsec/op
Iteration   4 (5000ms in 1 thread): 20.977 nsec/op
Iteration   5 (5000ms in 1 thread): 20.977 nsec/op

Run result "syncOnce": 21.003 ±(95%) 0.055 ±(99%) 0.091 nsec/op
Run statistics "syncOnce": min = 20.959, avg = 21.003, max = 21.052, stdev = 0.044
Run confidence intervals "syncOnce": 95% [20.948, 21.058], 99% [20.912, 21.094]

Benchmarks: 
com.assylias.performance.SO18996783.syncTwice
Iteration   1 (5000ms in 1 thread): 21.006 nsec/op
Iteration   2 (5000ms in 1 thread): 20.954 nsec/op
Iteration   3 (5000ms in 1 thread): 20.953 nsec/op
Iteration   4 (5000ms in 1 thread): 20.869 nsec/op
Iteration   5 (5000ms in 1 thread): 20.903 nsec/op

Run result "syncTwice": 20.937 ±(95%) 0.065 ±(99%) 0.108 nsec/op
Run statistics "syncTwice": min = 20.869, avg = 20.937, max = 21.006, stdev = 0.052
Run confidence intervals "syncTwice": 95% [20.872, 21.002], 99% [20.829, 21.045]

回答3:

Why not test it!? I ran a quick benchmark. The benchmark() method is called in a loop for warm-up. This may not be super accurate but it does show some consistent interesting pattern.

public class Test {
    public static void main(String[] args) {

        for (int i = 0; i < 100; i++) {
            System.out.println("+++++++++");
            benchMark();
        }
    }

    static void benchMark() {
        Test t = new Test();
        long start = System.nanoTime();
        for (int i = 0; i < 100; i++) {
            t.x();
        }
        System.out.println("Double sync:" + (System.nanoTime() - start) / 1e6);

        start = System.nanoTime();
        for (int i = 0; i < 100; i++) {
            t.x1();
        }
        System.out.println("Single sync:" + (System.nanoTime() - start) / 1e6);
    }
    synchronized void x() {
        y();
    }
    synchronized void y() {
    }
    synchronized void x1() {
        y1();
    }
    void y1() {
    }
}

Results (last 10)

+++++++++
Double sync:0.021686
Single sync:0.017861
+++++++++
Double sync:0.021447
Single sync:0.017929
+++++++++
Double sync:0.021608
Single sync:0.016563
+++++++++
Double sync:0.022007
Single sync:0.017681
+++++++++
Double sync:0.021454
Single sync:0.017684
+++++++++
Double sync:0.020821
Single sync:0.017776
+++++++++
Double sync:0.021107
Single sync:0.017662
+++++++++
Double sync:0.020832
Single sync:0.017982
+++++++++
Double sync:0.021001
Single sync:0.017615
+++++++++
Double sync:0.042347
Single sync:0.023859

Looks like the second variation is indeed slightly faster.

回答4:

Test can be found below ( You have to guess what some methods do but nothing complicated ) :

It tests them with 100 threads each and starts counting the averages after 70% of them has completed ( as warmup ).

It prints it out once at the end.

public static final class Test {
        final int                      iterations     =     100;
        final int                      jiterations    = 1000000;
        final int                      count          = (int) (0.7 * iterations);
        final AtomicInteger            finishedSingle = new AtomicInteger(iterations);
        final AtomicInteger            finishedZynced = new AtomicInteger(iterations);
        final MovingAverage.Cumulative singleCum      = new MovingAverage.Cumulative();
        final MovingAverage.Cumulative zyncedCum      = new MovingAverage.Cumulative();
        final MovingAverage            singleConv     = new MovingAverage.Converging(0.5);
        final MovingAverage            zyncedConv     = new MovingAverage.Converging(0.5);

        // -----------------------------------------------------------
        // -----------------------------------------------------------
        public static void main(String[] args) {
                final Test test = new Test();

                for (int i = 0; i < test.iterations; i++) {
                        test.benchmark(i);
                }

                Threads.sleep(1000000);
        }
        // -----------------------------------------------------------
        // -----------------------------------------------------------

        void benchmark(int i) {

                Threads.async(()->{
                        long start = System.nanoTime();

                        for (int j = 0; j < jiterations; j++) {
                                a();
                        }

                        long elapsed = System.nanoTime() - start;
                        int v = this.finishedSingle.decrementAndGet();
                        if ( v <= count ) {
                                singleCum.add (elapsed);
                                singleConv.add(elapsed);
                        }

                        if ( v == 0 ) {
                                System.out.println(elapsed);
                                System.out.println("Single Cum:\t\t" + singleCum.val());
                                System.out.println("Single Conv:\t" + singleConv.val());
                                System.out.println();

                        }
                });

                Threads.async(()->{

                        long start = System.nanoTime();
                        for (int j = 0; j < jiterations; j++) {
                                az();
                        }

                        long elapsed = System.nanoTime() - start;

                        int v = this.finishedZynced.decrementAndGet();
                        if ( v <= count ) {
                                zyncedCum.add(elapsed);
                                zyncedConv.add(elapsed);
                        }

                        if ( v == 0 ) {
                                // Just to avoid the output not overlapping with the one above 
                                Threads.sleep(500);
                                System.out.println();
                                System.out.println("Zynced Cum: \t"  + zyncedCum.val());
                                System.out.println("Zynced Conv:\t" + zyncedConv.val());
                                System.out.println();
                        }
                });

        }                       

        synchronized void a() { b();  }
                     void b() { c();  }
                     void c() { d();  }
                     void d() { e();  }
                     void e() { f();  }
                     void f() { g();  }
                     void g() { h();  }
                     void h() { i();  }
                     void i() { }

        synchronized void az() { bz(); }
        synchronized void bz() { cz(); }
        synchronized void cz() { dz(); }
        synchronized void dz() { ez(); }
        synchronized void ez() { fz(); }
        synchronized void fz() { gz(); }
        synchronized void gz() { hz(); }
        synchronized void hz() { iz(); }
        synchronized void iz() {}
}

MovingAverage.Cumulative add is basically ( performed atomically ): average = (average * (n) + number) / (++n);

MovingAverage.Converging you can look up but uses another formula.

The results after a 50 second warmup:

With: jiterations -> 1000000

Zynced Cum:     3.2017985649516254E11
Zynced Conv:    8.11945143126507E10

Single Cum:     4.747368153507841E11
Single Conv:    8.277793176290959E10

That's nano seconds averages. That's really nothing and even shows that the zynced one takes less time.

With: jiterations -> original * 10 (takes much longer time)

Zynced Cum:     7.462005651190714E11
Zynced Conv:    9.03751742946726E11

Single Cum:     9.088230941676143E11
Single Conv:    9.09877020004914E11

As you can see the results show it's really not a big difference. The zynced one actually has lower average time for the last 30% completions.

With one thread each (iterations = 1) and jiterations = original * 100;

Zynced Cum:     6.9167088486E10
Zynced Conv:    6.9167088486E10

Single Cum:     6.9814404337E10
Single Conv:    6.9814404337E10

In a same thread environment ( removing Threads.async calls )

With: jiterations -> original * 10

Single Cum:     2.940499529542545E8
Single Conv:    5.0342450600964054E7


Zynced Cum:     1.1930525617915475E9
Zynced Conv:    6.672312498662484E8

The zynced one here seems to be slower. On an order of ~10. The reason for this could be due to the zynced one running after each time, who knows. No energy to try the reverse.

Last test run with:

public static final class Test {
        final int                      iterations     =     100;
        final int                      jiterations    = 10000000;
        final int                      count          = (int) (0.7 * iterations);
        final AtomicInteger            finishedSingle = new AtomicInteger(iterations);
        final AtomicInteger            finishedZynced = new AtomicInteger(iterations);
        final MovingAverage.Cumulative singleCum      = new MovingAverage.Cumulative();
        final MovingAverage.Cumulative zyncedCum      = new MovingAverage.Cumulative();
        final MovingAverage            singleConv     = new MovingAverage.Converging(0.5);
        final MovingAverage            zyncedConv     = new MovingAverage.Converging(0.5);

        // -----------------------------------------------------------
        // -----------------------------------------------------------
        public static void main(String[] args) {
                final Test test = new Test();

                for (int i = 0; i < test.iterations; i++) {
                        test.benchmark(i);
                }

                Threads.sleep(1000000);
        }
        // -----------------------------------------------------------
        // -----------------------------------------------------------

        void benchmark(int i) {

                        long start = System.nanoTime();

                        for (int j = 0; j < jiterations; j++) {
                                a();
                        }

                        long elapsed = System.nanoTime() - start;
                        int s = this.finishedSingle.decrementAndGet();
                        if ( s <= count ) {
                                singleCum.add (elapsed);
                                singleConv.add(elapsed);
                        }

                        if ( s == 0 ) {
                                System.out.println(elapsed);
                                System.out.println("Single Cum:\t\t" + singleCum.val());
                                System.out.println("Single Conv:\t" + singleConv.val());
                                System.out.println();

                        }


                        long zstart = System.nanoTime();
                        for (int j = 0; j < jiterations; j++) {
                                az();
                        }

                        long elapzed = System.nanoTime() - zstart;

                        int z = this.finishedZynced.decrementAndGet();
                        if ( z <= count ) {
                                zyncedCum.add(elapzed);
                                zyncedConv.add(elapzed);
                        }

                        if ( z == 0 ) {
                                // Just to avoid the output not overlapping with the one above 
                                Threads.sleep(500);
                                System.out.println();
                                System.out.println("Zynced Cum: \t"  + zyncedCum.val());
                                System.out.println("Zynced Conv:\t" + zyncedConv.val());
                                System.out.println();
                        }

        }                       

        synchronized void a() { b();  }
                     void b() { c();  }
                     void c() { d();  }
                     void d() { e();  }
                     void e() { f();  }
                     void f() { g();  }
                     void g() { h();  }
                     void h() { i();  }
                     void i() { }

        synchronized void az() { bz(); }
        synchronized void bz() { cz(); }
        synchronized void cz() { dz(); }
        synchronized void dz() { ez(); }
        synchronized void ez() { fz(); }
        synchronized void fz() { gz(); }
        synchronized void gz() { hz(); }
        synchronized void hz() { iz(); }
        synchronized void iz() {}
}

Conclusion, there really is no difference.

回答5:

In the case where both methods are synchronized, you would be locking monitor twice. So first approach would have additional overhead of lock again. But your JVM can reduce the cost of locking by lock coarsening and may in-line call to y().

回答6:

No difference will be there. Since threads content only to acquire lock at x(). Thread that acquired lock at x() can acquire lock at y() without any contention(Because that is only thread that can reach that point at one particular time). So placing synchronized over there has no effect.