I'm looking for the fastest way to determine if a long
value is a perfect square (i.e. its square root is another integer):
- I've done it the easy way, by using the built-in
Math.sqrt()
function, but I'm wondering if there is a way to do it faster by restricting yourself to integer-only domain. - Maintaining a lookup table is impractical (since there are about 231.5 integers whose square is less than 263).
Here is the very simple and straightforward way I'm doing it now:
public final static boolean isPerfectSquare(long n)
{
if (n < 0)
return false;
long tst = (long)(Math.sqrt(n) + 0.5);
return tst*tst == n;
}
Note: I'm using this function in many Project Euler problems. So no one else will ever have to maintain this code. And this kind of micro-optimization could actually make a difference, since part of the challenge is to do every algorithm in less than a minute, and this function will need to be called millions of times in some problems.
I've tried the different solutions to the problem:
- After exhaustive testing, I found that adding
0.5
to the result of Math.sqrt() is not necessary, at least not on my machine. - The fast inverse square root was faster, but it gave incorrect results for n >= 410881. However, as suggested by BobbyShaftoe, we can use the FISR hack for n < 410881.
- Newton's method was a good bit slower than
Math.sqrt()
. This is probably becauseMath.sqrt()
uses something similar to Newton's Method, but implemented in the hardware so it's much faster than in Java. Also, Newton's Method still required use of doubles. - A modified Newton's method, which used a few tricks so that only integer math was involved, required some hacks to avoid overflow (I want this function to work with all positive 64-bit signed integers), and it was still slower than
Math.sqrt()
. - Binary chop was even slower. This makes sense because the binary chop will on average require 16 passes to find the square root of a 64-bit number.
- According to John's tests, using
or
statements is faster in C++ than using aswitch
, but in Java and C# there appears to be no difference betweenor
andswitch
. - I also tried making a lookup table (as a private static array of 64 boolean values). Then instead of either switch or
or
statement, I would just sayif(lookup[(int)(n&0x3F)]) { test } else return false;
. To my surprise, this was (just slightly) slower. This is because array bounds are checked in Java.
Considering for general bit length (though I have used specific type here), I tried to design simplistic algo as below. Simple and obvious check for 0,1,2 or <0 is required initially. Following is simple in sense that it doesn't try to use any existing maths functions. Most of the operator can be replaced with bit-wise operators. I haven't tested with any bench mark data though. I'm neither expert at maths or computer algorithm design in particular, I would love to see you pointing out problem. I know there is lots of improvement chances there.
I figured out a method that works ~35% faster than your 6bits+Carmack+sqrt code, at least with my CPU (x86) and programming language (C/C++). Your results may vary, especially because I don't know how the Java factor will play out.
My approach is threefold:
int64 x
.)z = r - x * x
, and set t to be the largest power of 2 dividing z with a bit trick. This allows me to skip t values that wouldn't have affected the value of r anyway. The precomputed start value in my case picks out the "smallest positive" square root modulo 8192.Even if this code doesn't work faster for you, I hope you enjoy some of the ideas it contains. Complete, tested code follows, including the precomputed tables.
If speed is a concern, why not partition off the most commonly used set of inputs and their values to a lookup table and then do whatever optimized magic algorithm you have come up with for the exceptional cases?
I ran my own analysis of several of the algorithms in this thread and came up with some new results. You can see those old results in the edit history of this answer, but they're not accurate, as I made a mistake, and wasted time analyzing several algorithms which aren't close. However, pulling lessons from several different answers, I now have two algorithms that crush the "winner" of this thread. Here's the core thing I do differently than everyone else:
However, this simple line, which most of the time adds one or two very fast instructions, greatly simplifies the
switch-case
statement into one if statement. However, it can add to the runtime if many of the tested numbers have significant power-of-two factors.The algorithms below are as follows:
Here is a sample runtime if the numbers are generated using
Math.abs(java.util.Random.nextLong())
And here is a sample runtime if it's run on the first million longs only:
As you can see,
DurronTwo
does better for large inputs, because it gets to use the magic trick very very often, but gets clobbered compared to the first algorithm andMath.sqrt
because the numbers are so much smaller. Meanwhile, the simplerDurron
is a huge winner because it never has to divide by 4 many many times in the first million numbers.Here's
Durron
:And
DurronTwo
And my benchmark harness: (Requires Google caliper 0.1-rc5)
UPDATE: I've made a new algorithm that is faster in some scenarios, slower in others, I've gotten different benchmarks based on different inputs. If we calculate modulo
0xFFFFFF = 3 x 3 x 5 x 7 x 13 x 17 x 241
, we can eliminate 97.82% of numbers that cannot be squares. This can be (sort of) done in one line, with 5 bitwise operations:The resulting index is either 1) the residue, 2) the residue
+ 0xFFFFFF
, or 3) the residue+ 0x1FFFFFE
. Of course, we need to have a lookup table for residues modulo0xFFFFFF
, which is about a 3mb file (in this case stored as ascii text decimal numbers, not optimal but clearly improvable with aByteBuffer
and so forth. But since that is precalculation it doesn't matter so much. You can find the file here (or generate it yourself):I load it into a
boolean
array like this:Example runtime. It beat
Durron
(version one) in every trial I ran.Math.sqrt()
works with doubles as input parameters, so you won't get accurate results for integers bigger than 2^53.You should get rid of the 2-power part of N right from the start.
2nd Edit The magical expression for m below should be
and not as written
End of 2nd edit
1st Edit:
Minor improvement:
End of 1st edit
Now continue as usual. This way, by the time you get to the floating point part, you already got rid of all the numbers whose 2-power part is odd (about half), and then you only consider 1/8 of whats left. I.e. you run the floating point part on 6% of the numbers.