Is it a sensible optimization to check whether a v

2020-02-27 04:19发布

if (var != X)
  var = X;

Is it sensible or not? Will the compiler always optimize-out the if statement? Are there any use cases that would benefit from the if statement?

What if var is a volatile variable?

I'm interested in both C++ and Java answers as the volatile variables have different semantics in both of the languages. Also the Java's JIT-compiling can make a difference.

The if statement introduces branching and additional read that wouldn't happen if we always overwrote var with X, so it's bad. On the other hand, if var == X then using this optimization we perform only a read and we do not perform a write, which could have some effects on cache. Clearly, there are some trade-offs here. I'd like to know how it looks like in practice. Has anyone done any testing on this?

EDIT:

I'm mostly interested about how it looks like in a multi-processor environment. In a trivial situation there doesn't seem to be much sense in checking the variable first. But when cache coherency has to be kept between processors/cores the extra check might be actually beneficial. I just wonder how big impact can it have? Also shouldn't the processor do such an optimization itself? If var == X assigning it once more value X should not 'dirt-up' the cache. But can we rely on this?

8条回答
smile是对你的礼貌
2楼-- · 2020-02-27 04:22

In general the answer is no. Since if you have simple datatype, compiler would be able to perform any necessary optimizations. And in case of types with heavy operator= it is responsibility of operator= to choose optimal way to assign new value.

查看更多
三岁会撩人
3楼-- · 2020-02-27 04:24

In C++, assigning a SIMPLE variable (that is, a normal integer or float variable) is definitely and always faster than checking if it already has that value and then setting it if it didn't have the value. I would be very surprised if this wasn't true in Java too, but I don't know how complicated or simple things are in Java - I've written a few hundred lines, and not actually studied how byte code and JITed bytecode actually works.

Clearly, if the variable is very easy to check, but complicated to set, which could be the case for classes and other such things, then there may be a value. The typical case where you'd find this would be in some code where the "value" is some sort of index or hash, but if it's not a match, a whole lot of work is required. One example would be in a task-switch:

if (current_process != new_process_to_run)
     current_process == new_process_to_run; 

Because here, a "process" is a complex object to alter, but the != can be done on the ID of the process.

Whether the object is simple or complex, the compiler will almost certainly not understand what you are trying to do here, so it will probably not optimize it away - but compilers are more clever than you think SOMETIMES, and more stupid at other times, so I wouldn't bet either way.

volatile should always force the compiler to read and write values to the variable, whether it "thinks" it is necessary or not, so it will definitely READ the variable and WRITE the variable. Of course, if the variable is volatile it probably means that it can change or represents some hardware, so you should be EXTRA careful with how you treat it yourself too... An extra read of a PCI-X card could incur several bus cycles (bus cycles being an order of magnitude slower than the processor speed!), which is likely to affect the performance much more. But then writing to a hardware register may (for example) cause the hardware to do something unexpected, and checking that we have that value first MAY make it faster, because "some operation starts over", or something like that.

查看更多
三岁会撩人
4楼-- · 2020-02-27 04:27

In Objective-C you have the situation where assigning a object address to a pointer variable may require that the object be "retained" (reference count incremented). In such a case it makes sense to see if the value being assigned is the same as the value currently in the pointer variable, to avoid having to do the relatively expensive increment/decrement operations.

Other languages that use reference counting likely have similar scenarios.

But when assigning, say, an int or a boolean to a simple variable (outside of the multiprocessor cache scenario mentioned elsewhere) the test is rarely merited. The speed of a store in most processors is at least as fast as the load/test/branch.

查看更多
成全新的幸福
5楼-- · 2020-02-27 04:30

Is it a sensible optimization to check whether a variable holds a specific value before writing that value?

Are there any use cases that would benefit from the if statement?

It is when assignment is significantly more costly than an inequality comparison returning false.

A example would be a large* std::set, which may require many heap allocations to duplicate.

**for some definition of "large"*

Will the compiler always optimize-out the if statement?

That's a fairly safe "no", as are most questions that contain both "optimize" and "always".

The C++ standard makes rare mention of optimizations, but never demands one.

What if var is a volatile variable?

Then it may perform the if, although volatile doesn't achieve what most people assume.

查看更多
一夜七次
6楼-- · 2020-02-27 04:30

In java the answer is always no. All assignments you can do in Java are primitive. In C++, the answer is still pretty much always no - if copying is so much more expensive than an equality check, the class in question should do that equality check itself.

查看更多
别忘想泡老子
7楼-- · 2020-02-27 04:34

Yes, there are definitely cases where this is sensible, and as you suggest, volatile variables are one of those cases - even for single threaded access!

Volatile writes are expensive, both from a hardware and a compiler/JIT perspective. At the hardware level, these writes might be 10x-100x more expensive than a normal write, since write buffers have to be flushed (on x86, the details will vary by platform). At the compiler/JIT level, volatile writes inhibit many common optimizations.

Speculation, however, can only get you so far - the proof is always in the benchmarking. Here's a microbenchmark that tries your two strategies. The basic idea is to copy values from one array to another (pretty much System.arraycopy), with two variants - one which copies unconditionally, and one that checks to see if the values are different first.

Here are the copy routines for the simple, non-volatile case (full source here):

        // no check
        for (int i=0; i < ARRAY_LENGTH; i++) {
            target[i] = source[i];
        }

        // check, then set if unequal
        for (int i=0; i < ARRAY_LENGTH; i++) {
            int x = source[i];
            if (target[i] != x) {
                target[i] = x;
            }
        }

The results using the above code to copy an array length of 1000, using Caliper as my microbenchmark harness, are:

    benchmark arrayType    ns linear runtime
  CopyNoCheck      SAME   470 =
  CopyNoCheck DIFFERENT   460 =
    CopyCheck      SAME  1378 ===
    CopyCheck DIFFERENT  1856 ====

This also includes about 150ns of overhead per run to reset the target array each time. Skipping the check is much faster - about 0.47 ns per element (or around 0.32 ns per element after we remove the setup overhead, so pretty much exactly 1 cycle on my box).

Checking is about 3x slower when the arrays are the same, and 4x slower then they are different. I'm surprised at how bad the check is, given that it is perfectly predicted. I suspect that the culprit is largely the JIT - with a much more complex loop body, it may be unrolled fewer times, and other optimizations may not apply.

Let's switch to the volatile case. Here, I've used AtomicIntegerArray as my arrays of volatile elements, since Java doesn't have any native array types with volatile elements. Internally, this class is just writing straight through to the array using sun.misc.Unsafe, which allows volatile writes. The assembly generated is substantially similar to normal array access, other than the volatile aspect (and possibly range check elimination, which may not be effective in the AIA case).

Here's the code:

        // no check
        for (int i=0; i < ARRAY_LENGTH; i++) {
            target.set(i, source[i]);
        }

        // check, then set if unequal
        for (int i=0; i < ARRAY_LENGTH; i++) {
            int x = source[i];
            if (target.get(i) != x) {
                target.set(i, x);
            }
        }

And here are the results:

arrayType     benchmark    us linear runtime
     SAME   CopyCheckAI  2.85 =======
     SAME CopyNoCheckAI 10.21 ===========================
DIFFERENT   CopyCheckAI 11.33 ==============================
DIFFERENT CopyNoCheckAI 11.19 =============================

The tables have turned. Checking first is ~3.5x faster than the usual method. Everything is much slower overall - in the check case, we are paying ~3 ns per loop, and in the worst cases ~10 ns (the times above are in us, and cover the copy of the whole 1000 element array). Volatile writes really are more expensive. There is about 1 ns of overheaded included in the DIFFERENT case to reset the array on each iteration (which is why even the simple is slightly slower for DIFFERENT). I suspect a lot of the overhead in the "check" case is actually bounds checking.

This is all single threaded. If you actual had cross-core contention over a volatile, the results would be much, much worse for the simple method, and just about as good as the above for the check case (the cache line would just sit in the shared state - no coherency traffic needed).

I've also only tested the extremes of "every element equal" vs "every element different". This means the branch in the "check" algorithm is always perfectly predicted. If you had a mix of equal and different, you wouldn't get just a weighted combination of the times for the SAME and DIFFERENT cases - you do worse, due to misprediction (both at the hardware level, and perhaps also at the JIT level, which can no longer optimize for the always-taken branch).

So whether it is sensible, even for volatile, depends on the specific context - the mix of equal and unequal values, the surrounding code and so on. I'd usually not do it for volatile alone in a single-threaded scenario, unless I suspected a large number of sets are redundant. In heavily multi-threaded structures, however, reading and then doing a volatile write (or other expensive operation, like a CAS) is a best-practice and you'll see it quality code such as java.util.concurrent structures.

查看更多
登录 后发表回答