可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I was trying to cut thread contention in my code by replacing some synchronized blocks with AtomicBoolean.

Here's an example with synchronized:

public void toggleCondition() {
    synchronized (this.mutex) {
        if (this.toggled) {
            return;
        }

        this.toggled = true;
        // do other stuff
    }
}

And the alternative with AtomicBoolean:

public void toggleCondition() {
    if (!this.condition.getAndSet(true)) {
        // do other stuff
    }
}

Taking advantage of AtomicBoolean's CAS property should be way faster than relying on synchronization so I ran a little micro-benchmark.

For 10 concurrent threads and 1000000 iterations, AtomicBoolean comes in only slightly faster than synchronized block.

Average time (per thread) spent on toggleCondition() with AtomicBoolean: 0.0338

Average time (per thread) spent on toggleCondition() with synchronized: 0.0357

I know micro-benchmarks are worth what they're worth but shouldn't the difference be higher?

回答1:

I know micro-benchmarks are worth what they're worth but shouldn't the difference be higher?

I think the problem is in your benchmark. It looks like each thread is going to toggle the condition just once. The benchmark will spend most of its time creating and destroying threads. The chance that any given thread will be toggling a condition at the same time as any other thread is toggling it will be close to zero.

An AtomicBoolean has a performance advantage over primitive locking when there is significant contention for the condition. For an uncontended condition, I'd expect to see little difference.

Change your benchmark so that each thread toggles the condition a few million times. That will guarantee lots of lock contention, and I expect you will see a performance difference.

EDIT

If the scenario you intended to test only involved one toggle per thread (and 10 threads), then it is unlikely that your application would experience contention, and therefore it is unlikely that using AtomicBoolean will make any difference.

At this point, I should ask why you are focusing your attention on this particular aspect. Have you profiled your application and determined that really you have a lock contention problem? Or are you just guessing? Have you been given the standard lecture on the evils of premature optimization yet??

回答2:

Looking at the actual implementation, I mean looking at the code is way better than some microbenchmark ( which are less than useless in Java or any other GC runtime ), I am not surprised it isn't "significantly faster". It is basically doing an implicit synchronized section.

/**
 * Atomically sets to the given value and returns the previous value.
 *
 * @param newValue the new value
 * @return the previous value
 */
public final boolean getAndSet(boolean newValue) {
    for (;;) {
        boolean current = get();
        if (compareAndSet(current, newValue))
            return current;
    }
}

/**
 * Atomically sets the value to the given updated value
 * if the current value {@code ==} the expected value.
 *
 * @param expect the expected value
 * @param update the new value
 * @return true if successful. False return indicates that
 * the actual value was not equal to the expected value.
 */
public final boolean compareAndSet(boolean expect, boolean update) {
    int e = expect ? 1 : 0;
    int u = update ? 1 : 0;
    return unsafe.compareAndSwapInt(this, valueOffset, e, u);
}

And then this from com.sun.Unsafe.java

/**
 * Atomically update Java variable to <tt>x</tt> if it is currently
 * holding <tt>expected</tt>.
 * @return <tt>true</tt> if successful
 */
public final native boolean compareAndSwapInt(Object o, long offset,
                                              int expected,
                                              int x);

there is no magic in this, resource contention is a bitch and very complex. That is why using final variables and working with immutable data is so prevalent in real concurrent languages like Erlang. All this complexity that eats CPU time is by passed, or at least shifted somewhere less complex.