Boolean expressions optimizations in Java

2020-03-25 06:44发布

问题:

Consider the following method in Java:

public static boolean expensiveComputation() {
    for (int i = 0; i < Integer.MAX_VALUE; ++i);
    return false;
}

And the following main method:

public static void main(String[] args) {
    boolean b = false;
    if (expensiveComputation() && b) {
    }
}

Logical conjunction (same as &&) is a commutative operation. So why the compiler doesn't optimize the if-statement code to the equivalent:

if (b && expensiveComputation()) {
}

which has the benefits of using short-circuit evaluation?

Moreover, does the compiler try to make other logic simplifications or permutation of booleans in order to generate faster code? If not, why? Surely some optimizations would be very difficult, but my example isn't simple? Calling a method should always be slower than reading a boolean, right?

Thank you in advance.

回答1:

It doesn't do that because expensiveComputation() may have side effects which change the state of the program. This means that the order in which the expressions in the boolean statements are evaluated (expensiveComputation() and b) matters. You wouldn't want the compiler optimizing a bug into your compiled program, would you?

For example, what if the code was like this

public static boolean expensiveComputation() {
        for (int i = 0; i < Integer.MAX_VALUE; ++i);
        b = false;
        return false;
}

public static boolean b = true;
public static void main(String[] args) {
        if (expensiveComputation() || b) {
        // do stuff
        }
}

Here, if the compiler performed your optimization, then the //do stuff would run when you wouldn't expect it to by looking at the code (because the b, which is originally true, is evaluated first).



回答2:

Because expensiveComputation() may have side-effects.

Since Java doesn't aim to be a functionally pure language, it doesn't inhibit programmers from writing methods that have side-effects. Thus there probably isn't a lot of value in the compiler analyzing for functional purity. And then, optimizations like you posit are unlikely to be very valuable in practice, as expensiveComputation() would usually be required to executed anyway, to get the side effects.

Of course, for a programmer, it's easy to put the b first if they expect it to be false and explicitly want to avoid the expensive computation.



回答3:

Actually, some compilers can optimise programs like the one you suggested, it just has to make sure that the function has no side-effects. GCC has a compiler directive you can annotate a function with to show that it has no side-effects, which the compiler may then use when optimizing. Java may have something similar.

A classic example is

for(ii = 0; strlen(s) > ii; ii++) < do something >

which gets optimized to

n = strlen(s); for(ii = 0; n > ii; ii++) < do something >

by GCC with optimization level 2, at least on my machine.



回答4:

The compiler will optimize this if you run the code often enough, probably by inlining the method and simplifying the resulting boolean expression (but most likely not by reordering the arguments of &&).

You can benchmark this by timing a loop of say a million iterations of this code repeatedly. The first iteration or two are much slower than the following.



回答5:

The version of java I am using optimises a in an expression a && b but not with b.

i.e. If a is false, b does not get evaluated but if b was false it did not do this.

I found this out when I was implementing validation in a website form: I created messages to display on the web-page in a series of boolean methods. I expected of the fields in the page which were incorrectly entered to become highlighted but, because of Java's speed-hack, the code was only executed until the first incorrect field was discovered. After that, Java must have thought something like "false && anything is always false" and skipped the remaining validation methods.

I suppose, as a direct answer to your question, if you make optimisations like this, your program may run slower than it could. However, someone else's program will completely break because they have assumed the non-optimised behaviour like the side effect thing mentioned in other answers.

Unfortunately, it's difficult to automate intelligent decisions, especially with imperative languages (C, C++, Java, Python,... i.e the normal languages).