I'm looking for the fastest way to determine if a long
value is a perfect square (i.e. its square root is another integer):
- I've done it the easy way, by using the built-in
Math.sqrt()
function, but I'm wondering if there is a way to do it faster by restricting yourself to integer-only domain. - Maintaining a lookup table is impractical (since there are about 231.5 integers whose square is less than 263).
Here is the very simple and straightforward way I'm doing it now:
public final static boolean isPerfectSquare(long n)
{
if (n < 0)
return false;
long tst = (long)(Math.sqrt(n) + 0.5);
return tst*tst == n;
}
Note: I'm using this function in many Project Euler problems. So no one else will ever have to maintain this code. And this kind of micro-optimization could actually make a difference, since part of the challenge is to do every algorithm in less than a minute, and this function will need to be called millions of times in some problems.
I've tried the different solutions to the problem:
- After exhaustive testing, I found that adding
0.5
to the result of Math.sqrt() is not necessary, at least not on my machine. - The fast inverse square root was faster, but it gave incorrect results for n >= 410881. However, as suggested by BobbyShaftoe, we can use the FISR hack for n < 410881.
- Newton's method was a good bit slower than
Math.sqrt()
. This is probably becauseMath.sqrt()
uses something similar to Newton's Method, but implemented in the hardware so it's much faster than in Java. Also, Newton's Method still required use of doubles. - A modified Newton's method, which used a few tricks so that only integer math was involved, required some hacks to avoid overflow (I want this function to work with all positive 64-bit signed integers), and it was still slower than
Math.sqrt()
. - Binary chop was even slower. This makes sense because the binary chop will on average require 16 passes to find the square root of a 64-bit number.
- According to John's tests, using
or
statements is faster in C++ than using aswitch
, but in Java and C# there appears to be no difference betweenor
andswitch
. - I also tried making a lookup table (as a private static array of 64 boolean values). Then instead of either switch or
or
statement, I would just sayif(lookup[(int)(n&0x3F)]) { test } else return false;
. To my surprise, this was (just slightly) slower. This is because array bounds are checked in Java.
For performance, you very often have to do some compromsies. Others have expressed various methods, however, you noted Carmack's hack was faster up to certain values of N. Then, you should check the "n" and if it is less than that number N, use Carmack's hack, else use some other method described in the answers here.
Here is the simplest and most concise way, although I do not know how it compares in terms of CPU cycles. This works great if you only wish to know if the root is a whole number. If you really care if it is an integer, you can also figure that out. Here is a simple (and pure) function:
If you do not need micro-optimization, this answer is better in terms of simplicity and maintainability. If you will be getting negative numbers, perhaps you will want to use Math.abs() on the number argument as the Math.sqrt() argument.
On my 3.6Ghz Intel i7-4790 CPU, a run of this algorithm on 0 - 10,000,000 took an average of 35 - 37 nanoseconds per calculation. I did 10 sequential runs, printing the average time spent on each of the ten million sqrt calculations. Each total run took just a little over 600 ms to complete.
If you are performing a lesser number of calculations, the earlier calculations take a bit longer.
This a rework from decimal to binary of the old Marchant calculator algorithm (sorry, I don't have a reference), in Ruby, adapted specifically for this question:
Here's a workup of something similar (please don't vote me down for coding style/smells or clunky O/O - it's the algorithm that counts, and C++ is not my home language). In this case, we're looking for residue == 0:
Not sure if this is the fastest way, but this is something I stumbled upon (long time ago in high-school) when I was bored and playing with my calculator during math class. At that time, I was really amazed this was working...
It's been pointed out that the last
d
digits of a perfect square can only take on certain values. The lastd
digits (in baseb
) of a numbern
is the same as the remainder whenn
is divided byb
d
, ie. in C notationn % pow(b, d)
.This can be generalized to any modulus
m
, ie.n % m
can be used to rule out some percentage of numbers from being perfect squares. The modulus you are currently using is 64, which allows 12, ie. 19% of remainders, as possible squares. With a little coding I found the modulus 110880, which allows only 2016, ie. 1.8% of remainders as possible squares. So depending on the cost of a modulus operation (ie. division) and a table lookup versus a square root on your machine, using this modulus might be faster.By the way if Java has a way to store a packed array of bits for the lookup table, don't use it. 110880 32-bit words is not much RAM these days and fetching a machine word is going to be faster than fetching a single bit.
I checked all of the possible results when the last n bits of a square is observed. By successively examining more bits, up to 5/6th of inputs can be eliminated. I actually designed this to implement Fermat's Factorization algorithm, and it is very fast there.
The last bit of pseudocode can be used to extend the tests to eliminate more values. The tests above are for k = 0, 1, 2, 3
It first tests whether it has a square residual with moduli of power of two, then it tests based on a final modulus, then it uses the Math.sqrt to do a final test. I came up with the idea from the top post, and attempted to extend upon it. I appreciate any comments or suggestions.
Update: Using the test by a modulus, (modSq) and a modulus base of 44352, my test runs in 96% of the time of the one in the OP's update for numbers up to 1,000,000,000.