Performance of Interlocked.Increment

2020-02-09 06:15发布

问题:

Is Interlocked.Increment(ref x) faster or slower than x++ for ints and longs on various platforms?

回答1:

It is slower since it forces the action to occur atomically and it acts as a memory barrier, eliminating the processor's ability to re-order memory accesses around the instruction.

You should be using Interlocked.Increment when you want the action to be atomic on state that can be shared between threads - it's not intended to be a full replacement for x++.



回答2:

In our experience the InterlockedIncrement() et al on Windows are quite significant impacts. In one sample case we were able to eliminate the interlock and use ++/-- instead. This alone reduced run time from 140 seconds to 110 seconds. My analysis is that the interlock forces a memory roundtrip (otherwise how could other cores see it?). An L1 cache read/write is around 10 clock cycles, but a memory read/write more like 100.

In this sample case, I estimated the number of increment/decrement operations at about 1 billion. So on a 2Ghz CPU this is something like 5 seconds for the ++/--, and 50 seconds for the interlock. Spread the difference across several threads, and its close to 30 seconds.



回答3:

Think about it for a moment, and you'll realize an Increment call cannot be any faster than a simple application of the increment operator. If it were, then the compiler's implementation of the increment operator would call Increment internally, and they'd perform the same.

But, as you can see by testing it for yourself, they don't perform the same.

The two options have different purposes. Use the increment operator generally. Use Increment when you need the operation to be atomic and you're sure all other users of that variable are also using interlocked operations. (If they're not all cooperating, then it doesn't really help.)



回答4:

It's slower. However, it's the most performant general way I know of for achieving thread safety on scalar variables.



回答5:

It will always be slower because it has to perform a CPU bus lock vs just updating a register. However modern CPUs achieve near register performance so it's negligible even in real-time processing.



回答6:

My perfomance test:

volatile: 65,174,400

lock: 62,428,600

interlocked: 113,248,900

TimeSpan span = TimeSpan.FromSeconds(5);

object syncRoot = new object();
long test = long.MinValue;

Do(span, "volatile", () => {

    long r = Thread.VolatileRead(ref test);

    r++;

    Thread.VolatileWrite(ref test, r);
});

Do(span, "lock", () =>
{
    lock (syncRoot)
    {
        test++;
    }
});

Do(span, "interlocked", () =>
{
    Interlocked.Increment(ref test);
});