I'm looking for the fastest way to determine if a long
value is a perfect square (i.e. its square root is another integer):
- I've done it the easy way, by using the built-in
Math.sqrt()
function, but I'm wondering if there is a way to do it faster by restricting yourself to integer-only domain. - Maintaining a lookup table is impractical (since there are about 231.5 integers whose square is less than 263).
Here is the very simple and straightforward way I'm doing it now:
public final static boolean isPerfectSquare(long n)
{
if (n < 0)
return false;
long tst = (long)(Math.sqrt(n) + 0.5);
return tst*tst == n;
}
Note: I'm using this function in many Project Euler problems. So no one else will ever have to maintain this code. And this kind of micro-optimization could actually make a difference, since part of the challenge is to do every algorithm in less than a minute, and this function will need to be called millions of times in some problems.
I've tried the different solutions to the problem:
- After exhaustive testing, I found that adding
0.5
to the result of Math.sqrt() is not necessary, at least not on my machine. - The fast inverse square root was faster, but it gave incorrect results for n >= 410881. However, as suggested by BobbyShaftoe, we can use the FISR hack for n < 410881.
- Newton's method was a good bit slower than
Math.sqrt()
. This is probably becauseMath.sqrt()
uses something similar to Newton's Method, but implemented in the hardware so it's much faster than in Java. Also, Newton's Method still required use of doubles. - A modified Newton's method, which used a few tricks so that only integer math was involved, required some hacks to avoid overflow (I want this function to work with all positive 64-bit signed integers), and it was still slower than
Math.sqrt()
. - Binary chop was even slower. This makes sense because the binary chop will on average require 16 passes to find the square root of a 64-bit number.
- According to John's tests, using
or
statements is faster in C++ than using aswitch
, but in Java and C# there appears to be no difference betweenor
andswitch
. - I also tried making a lookup table (as a private static array of 64 boolean values). Then instead of either switch or
or
statement, I would just sayif(lookup[(int)(n&0x3F)]) { test } else return false;
. To my surprise, this was (just slightly) slower. This is because array bounds are checked in Java.
If you want speed, given that your integers are of finite size, I suspect that the quickest way would involve (a) partitioning the parameters by size (e.g. into categories by largest bit set), then checking the value against an array of perfect squares within that range.
An integer problem deserves an integer solution. Thus
Do binary search on the (non-negative) integers to find the greatest integer t such that
t**2 <= n
. Then test whetherr**2 = n
exactly. This takes time O(log n).If you don't know how to binary search the positive integers because the set is unbounded, it's easy. You starting by computing your increasing function f (above
f(t) = t**2 - n
) on powers of two. When you see it turn positive, you've found an upper bound. Then you can do standard binary search.It should be much faster to use Newton's method to calculate the Integer Square Root, then square this number and check, as you do in your current solution. Newton's method is the basis for the Carmack solution mentioned in some other answers. You should be able to get a faster answer since you're only interested in the integer part of the root, allowing you to stop the approximation algorithm sooner.
Another optimization that you can try: If the Digital Root of a number doesn't end in 1, 4, 7, or 9 the number is not a perfect square. This can be used as a quick way to eliminate 60% of your inputs before applying the slower square root algorithm.
Just for the record, another approach is to use the prime decomposition. If every factor of the decomposition is even, then the number is a perfect square. So what you want is to see if a number can be decomposed as a product of squares of prime numbers. Of course, you don't need to obtain such a decomposition, just to see if it exists.
First build a table of squares of prime numbers which are lower than 2^32. This is far smaller than a table of all integers up to this limit.
A solution would then be like this:
I guess it's a bit cryptic. What it does is checking in every step that the square of a prime number divide the input number. If it does then it divides the number by the square as long as it is possible, to remove this square from the prime decomposition. If by this process, we came to 1, then the input number was a decomposition of square of prime numbers. If the square becomes larger than the number itself, then there is no way this square, or any larger squares, can divide it, so the number can not be a decomposition of squares of prime numbers.
Given nowadays' sqrt done in hardware and the need to compute prime numbers here, I guess this solution is way slower. But it should give better results than solution with sqrt which won't work over 2^54, as says mrzl in his answer.
If you do a binary chop to try to find the "right" square root, you can fairly easily detect if the value you've got is close enough to tell:
So having calculated
n^2
, the options are:n^2 = target
: done, return truen^2 + 2n + 1 > target > n^2
: you're close, but it's not perfect: return falsen^2 - 2n + 1 < target < n^2
: dittotarget < n^2 - 2n + 1
: binary chop on a lowern
target > n^2 + 2n + 1
: binary chop on a highern
(Sorry, this uses
n
as your current guess, andtarget
for the parameter. Apologise for the confusion!)I don't know whether this will be faster or not, but it's worth a try.
EDIT: The binary chop doesn't have to take in the whole range of integers, either
(2^x)^2 = 2^(2x)
, so once you've found the top set bit in your target (which can be done with a bit-twiddling trick; I forget exactly how) you can quickly get a range of potential answers. Mind you, a naive binary chop is still only going to take up to 31 or 32 iterations.The sqrt call is not perfectly accurate, as has been mentioned, but it's interesting and instructive that it doesn't blow away the other answers in terms of speed. After all, the sequence of assembly language instructions for a sqrt is tiny. Intel has a hardware instruction, which isn't used by Java I believe because it doesn't conform to IEEE.
So why is it slow? Because Java is actually calling a C routine through JNI, and it's actually slower to do so than to call a Java subroutine, which itself is slower than doing it inline. This is very annoying, and Java should have come up with a better solution, ie building in floating point library calls if necessary. Oh well.
In C++, I suspect all the complex alternatives would lose on speed, but I haven't checked them all. What I did, and what Java people will find usefull, is a simple hack, an extension of the special case testing suggested by A. Rex. Use a single long value as a bit array, which isn't bounds checked. That way, you have 64 bit boolean lookup.
The routine isPerfectSquare5 runs in about 1/3 the time on my core2 duo machine. I suspect that further tweaks along the same lines could reduce the time further on average, but every time you check, you are trading off more testing for more eliminating, so you can't go too much farther on that road.
Certainly, rather than having a separate test for negative, you could check the high 6 bits the same way.
Note that all I'm doing is eliminating possible squares, but when I have a potential case I have to call the original, inlined isPerfectSquare.
The init2 routine is called once to initialize the static values of pp1 and pp2. Note that in my implementation in C++, I'm using unsigned long long, so since you're signed, you'd have to use the >>> operator.
There is no intrinsic need to bounds check the array, but Java's optimizer has to figure this stuff out pretty quickly, so I don't blame them for that.