I'm looking for the fastest way to determine if a long
value is a perfect square (i.e. its square root is another integer):
- I've done it the easy way, by using the built-in
Math.sqrt()
function, but I'm wondering if there is a way to do it faster by restricting yourself to integer-only domain. - Maintaining a lookup table is impractical (since there are about 231.5 integers whose square is less than 263).
Here is the very simple and straightforward way I'm doing it now:
public final static boolean isPerfectSquare(long n)
{
if (n < 0)
return false;
long tst = (long)(Math.sqrt(n) + 0.5);
return tst*tst == n;
}
Note: I'm using this function in many Project Euler problems. So no one else will ever have to maintain this code. And this kind of micro-optimization could actually make a difference, since part of the challenge is to do every algorithm in less than a minute, and this function will need to be called millions of times in some problems.
I've tried the different solutions to the problem:
- After exhaustive testing, I found that adding
0.5
to the result of Math.sqrt() is not necessary, at least not on my machine. - The fast inverse square root was faster, but it gave incorrect results for n >= 410881. However, as suggested by BobbyShaftoe, we can use the FISR hack for n < 410881.
- Newton's method was a good bit slower than
Math.sqrt()
. This is probably becauseMath.sqrt()
uses something similar to Newton's Method, but implemented in the hardware so it's much faster than in Java. Also, Newton's Method still required use of doubles. - A modified Newton's method, which used a few tricks so that only integer math was involved, required some hacks to avoid overflow (I want this function to work with all positive 64-bit signed integers), and it was still slower than
Math.sqrt()
. - Binary chop was even slower. This makes sense because the binary chop will on average require 16 passes to find the square root of a 64-bit number.
- According to John's tests, using
or
statements is faster in C++ than using aswitch
, but in Java and C# there appears to be no difference betweenor
andswitch
. - I also tried making a lookup table (as a private static array of 64 boolean values). Then instead of either switch or
or
statement, I would just sayif(lookup[(int)(n&0x3F)]) { test } else return false;
. To my surprise, this was (just slightly) slower. This is because array bounds are checked in Java.
You'll have to do some benchmarking. The best algorithm will depend on the distribution of your inputs.
Your algorithm may be nearly optimal, but you might want to do a quick check to rule out some possibilities before calling your square root routine. For example, look at the last digit of your number in hex by doing a bit-wise "and." Perfect squares can only end in 0, 1, 4, or 9 in base 16, So for 75% of your inputs (assuming they are uniformly distributed) you can avoid a call to the square root in exchange for some very fast bit twiddling.
Kip benchmarked the following code implementing the hex trick. When testing numbers 1 through 100,000,000, this code ran twice as fast as the original.
When I tested the analogous code in C++, it actually ran slower than the original. However, when I eliminated the switch statement, the hex trick once again make the code twice as fast.
Eliminating the switch statement had little effect on the C# code.
I was thinking about the horrible times I've spent in Numerical Analysis course.
And then I remember, there was this function circling around the 'net from the Quake Source code:
Which basically calculates a square root, using Newton's approximation function (cant remember the exact name).
It should be usable and might even be faster, it's from one of the phenomenal id software's game!
It's written in C++ but it should not be too hard to reuse the same technique in Java once you get the idea:
I originally found it at: http://www.codemaestro.com/reviews/9
Newton's method explained at wikipedia: http://en.wikipedia.org/wiki/Newton%27s_method
You can follow the link for more explanation of how it works, but if you don't care much, then this is roughly what I remember from reading the blog and from taking the Numerical Analysis course:
* (long*) &y
is basically a fast convert-to-long function so integer operations can be applied on the raw bytes.0x5f3759df - (i >> 1);
line is a pre-calculated seed value for the approximation function.* (float*) &i
converts the value back to floating point.y = y * ( threehalfs - ( x2 * y * y ) )
line bascially iterates the value over the function again.The approximation function gives more precise values the more you iterate the function over the result. In Quake's case, one iteration is "good enough", but if it wasn't for you... then you could add as much iteration as you need.
This should be faster because it reduces the number of division operations done in naive square rooting down to a simple divide by 2 (actually a
* 0.5F
multiply operation) and replace it with a few fixed number of multiplication operations instead.I'm pretty late to the party, but I hope to provide a better answer; shorter and (assuming my benchmark is correct) also much faster.
The first test catches most non-squares quickly. It uses a 64-item table packed in a long, so there's no array access cost (indirection and bounds checks). For a uniformly random
long
, there's a 81.25% probability of ending here.The second test catches all numbers having an odd number of twos in their factorization. The method
Long.numberOfTrailingZeros
is very fast as it gets JIT-ed into a single i86 instruction.After dropping the trailing zeros, the third test handles numbers ending with 011, 101, or 111 in binary, which are no perfect squares. It also cares about negative numbers and also handles 0.
The final test falls back to
double
arithmetic. Asdouble
has only 53 bits mantissa, the conversion fromlong
todouble
includes rounding for big values. Nonetheless, the test is correct (unless the proof is wrong).Trying to incorporate the mod255 idea wasn't successful.
I'm not sure if it would be faster, or even accurate, but you could use John Carmack's Magical Square Root, algorithm to solve the square root faster. You could probably easily test this for all possible 32 bit integers, and validate that you actually got correct results, as it's only an appoximation. However, now that I think about it, using doubles is approximating also, so I'm not sure how that would come into play.
I like the idea to use an almost correct method on some of the input. Here is a version with a higher "offset". The code seems to work and passes my simple test case.
Just replace your:
code with this one:
This is the fastest Java implementation I could come up with, using a combination of techniques suggested by others in this thread.
I also experimented with these modifications but they did not help performance: