I am looking for the fastest way to square a double (double d
). So far I came up with two approaches:
1. d*d
2. Math.pow(d, 2)
To test the performance I set up three test cases, in each I generate random numbers using the same seed for the three cases and just calculate the squared number in a loop 100 000 000 times.
In the first test case numbers are generated using random.nextDouble()
, in the second case using random.nextDouble()*Double.MAX_VALUE
and in the third one using random.nextDouble()*Double.MIN_VALUE
.
The results of a couple of runs (approximate results, theres always some variation, run using java 1.8, compiled for java 1.6 on Mac OSX Mavericks)
Approach | Case 1 | Case 2 | Case 3
---------•--------•--------•-------
1 | ~2.16s | ~2.16s | ~2.16s
2 | ~9s | ~30s | ~60s
The conclusion seems to be that approach 1 is way faster but also that Math.pow
seems to behave kind of weird.
So I have two questions:
1 Why is Math.pow
so slow, and why does it cope badly with > 1
and even worse with < -1
numbers?
2 Is there a way to improve the performance over what I suggested as approach 1? I was thinking about something like:
long l = Double.doubleToRawLongBits(d);
long sign = (l & (1 << 63));
Double.longBitsToDouble((l<<1)&sign);
But that is a) wrong, and b) about the same speed as approach 1.
The fastest way to square a number is to multiply it by itself.
It's really not, but it is performing exponentiation instead of simple multiplication.
First, because it does the math. From the Javadoc it also contains tests for many corner cases. Finally, I would not rely too much on your micro-benchmark.
Squaring by multipling with self is the fastest. Because that approch can be directly translated into simple, non-branching bytecode (and thus, indirectly, machine code).
Math.pow() is a quite complex function that comes with various guarantees for edge cases. And it need to be called instead of being inlined.
Math.pow()
is slow because it has to deal with the generic case or raising a number to any given power.As for why it is slower with negative numbers, it is because it has to test if the power is positive or negative in order to give the sign, so it is one more operation to do.