Fastest way to determine if an integer's squar

2018-12-31 05:12发布

I'm looking for the fastest way to determine if a long value is a perfect square (i.e. its square root is another integer):

  1. I've done it the easy way, by using the built-in Math.sqrt() function, but I'm wondering if there is a way to do it faster by restricting yourself to integer-only domain.
  2. Maintaining a lookup table is impractical (since there are about 231.5 integers whose square is less than 263).

Here is the very simple and straightforward way I'm doing it now:

public final static boolean isPerfectSquare(long n)
{
  if (n < 0)
    return false;

  long tst = (long)(Math.sqrt(n) + 0.5);
  return tst*tst == n;
}

Note: I'm using this function in many Project Euler problems. So no one else will ever have to maintain this code. And this kind of micro-optimization could actually make a difference, since part of the challenge is to do every algorithm in less than a minute, and this function will need to be called millions of times in some problems.


I've tried the different solutions to the problem:

  • After exhaustive testing, I found that adding 0.5 to the result of Math.sqrt() is not necessary, at least not on my machine.
  • The fast inverse square root was faster, but it gave incorrect results for n >= 410881. However, as suggested by BobbyShaftoe, we can use the FISR hack for n < 410881.
  • Newton's method was a good bit slower than Math.sqrt(). This is probably because Math.sqrt() uses something similar to Newton's Method, but implemented in the hardware so it's much faster than in Java. Also, Newton's Method still required use of doubles.
  • A modified Newton's method, which used a few tricks so that only integer math was involved, required some hacks to avoid overflow (I want this function to work with all positive 64-bit signed integers), and it was still slower than Math.sqrt().
  • Binary chop was even slower. This makes sense because the binary chop will on average require 16 passes to find the square root of a 64-bit number.
  • According to John's tests, using or statements is faster in C++ than using a switch, but in Java and C# there appears to be no difference between or and switch.
  • I also tried making a lookup table (as a private static array of 64 boolean values). Then instead of either switch or or statement, I would just say if(lookup[(int)(n&0x3F)]) { test } else return false;. To my surprise, this was (just slightly) slower. This is because array bounds are checked in Java.

30条回答
残风、尘缘若梦
2楼-- · 2018-12-31 05:38

If you want speed, given that your integers are of finite size, I suspect that the quickest way would involve (a) partitioning the parameters by size (e.g. into categories by largest bit set), then checking the value against an array of perfect squares within that range.

查看更多
有味是清欢
3楼-- · 2018-12-31 05:41

An integer problem deserves an integer solution. Thus

Do binary search on the (non-negative) integers to find the greatest integer t such that t**2 <= n. Then test whether r**2 = n exactly. This takes time O(log n).

If you don't know how to binary search the positive integers because the set is unbounded, it's easy. You starting by computing your increasing function f (above f(t) = t**2 - n) on powers of two. When you see it turn positive, you've found an upper bound. Then you can do standard binary search.

查看更多
残风、尘缘若梦
4楼-- · 2018-12-31 05:43

It should be much faster to use Newton's method to calculate the Integer Square Root, then square this number and check, as you do in your current solution. Newton's method is the basis for the Carmack solution mentioned in some other answers. You should be able to get a faster answer since you're only interested in the integer part of the root, allowing you to stop the approximation algorithm sooner.

Another optimization that you can try: If the Digital Root of a number doesn't end in 1, 4, 7, or 9 the number is not a perfect square. This can be used as a quick way to eliminate 60% of your inputs before applying the slower square root algorithm.

查看更多
不再属于我。
5楼-- · 2018-12-31 05:44

Just for the record, another approach is to use the prime decomposition. If every factor of the decomposition is even, then the number is a perfect square. So what you want is to see if a number can be decomposed as a product of squares of prime numbers. Of course, you don't need to obtain such a decomposition, just to see if it exists.

First build a table of squares of prime numbers which are lower than 2^32. This is far smaller than a table of all integers up to this limit.

A solution would then be like this:

boolean isPerfectSquare(long number)
{
    if (number < 0) return false;
    if (number < 2) return true;

    for (int i = 0; ; i++)
    {
        long square = squareTable[i];
        if (square > number) return false;
        while (number % square == 0)
        {
            number /= square;
        }
        if (number == 1) return true;
    }
}

I guess it's a bit cryptic. What it does is checking in every step that the square of a prime number divide the input number. If it does then it divides the number by the square as long as it is possible, to remove this square from the prime decomposition. If by this process, we came to 1, then the input number was a decomposition of square of prime numbers. If the square becomes larger than the number itself, then there is no way this square, or any larger squares, can divide it, so the number can not be a decomposition of squares of prime numbers.

Given nowadays' sqrt done in hardware and the need to compute prime numbers here, I guess this solution is way slower. But it should give better results than solution with sqrt which won't work over 2^54, as says mrzl in his answer.

查看更多
琉璃瓶的回忆
6楼-- · 2018-12-31 05:45

If you do a binary chop to try to find the "right" square root, you can fairly easily detect if the value you've got is close enough to tell:

(n+1)^2 = n^2 + 2n + 1
(n-1)^2 = n^2 - 2n + 1

So having calculated n^2, the options are:

  • n^2 = target: done, return true
  • n^2 + 2n + 1 > target > n^2 : you're close, but it's not perfect: return false
  • n^2 - 2n + 1 < target < n^2 : ditto
  • target < n^2 - 2n + 1 : binary chop on a lower n
  • target > n^2 + 2n + 1 : binary chop on a higher n

(Sorry, this uses n as your current guess, and target for the parameter. Apologise for the confusion!)

I don't know whether this will be faster or not, but it's worth a try.

EDIT: The binary chop doesn't have to take in the whole range of integers, either (2^x)^2 = 2^(2x), so once you've found the top set bit in your target (which can be done with a bit-twiddling trick; I forget exactly how) you can quickly get a range of potential answers. Mind you, a naive binary chop is still only going to take up to 31 or 32 iterations.

查看更多
泪湿衣
7楼-- · 2018-12-31 05:46

The sqrt call is not perfectly accurate, as has been mentioned, but it's interesting and instructive that it doesn't blow away the other answers in terms of speed. After all, the sequence of assembly language instructions for a sqrt is tiny. Intel has a hardware instruction, which isn't used by Java I believe because it doesn't conform to IEEE.

So why is it slow? Because Java is actually calling a C routine through JNI, and it's actually slower to do so than to call a Java subroutine, which itself is slower than doing it inline. This is very annoying, and Java should have come up with a better solution, ie building in floating point library calls if necessary. Oh well.

In C++, I suspect all the complex alternatives would lose on speed, but I haven't checked them all. What I did, and what Java people will find usefull, is a simple hack, an extension of the special case testing suggested by A. Rex. Use a single long value as a bit array, which isn't bounds checked. That way, you have 64 bit boolean lookup.

typedef unsigned long long UVLONG
UVLONG pp1,pp2;

void init2() {
  for (int i = 0; i < 64; i++) {
    for (int j = 0; j < 64; j++)
      if (isPerfectSquare(i * 64 + j)) {
    pp1 |= (1 << j);
    pp2 |= (1 << i);
    break;
      }
   }
   cout << "pp1=" << pp1 << "," << pp2 << "\n";  
}


inline bool isPerfectSquare5(UVLONG x) {
  return pp1 & (1 << (x & 0x3F)) ? isPerfectSquare(x) : false;
}

The routine isPerfectSquare5 runs in about 1/3 the time on my core2 duo machine. I suspect that further tweaks along the same lines could reduce the time further on average, but every time you check, you are trading off more testing for more eliminating, so you can't go too much farther on that road.

Certainly, rather than having a separate test for negative, you could check the high 6 bits the same way.

Note that all I'm doing is eliminating possible squares, but when I have a potential case I have to call the original, inlined isPerfectSquare.

The init2 routine is called once to initialize the static values of pp1 and pp2. Note that in my implementation in C++, I'm using unsigned long long, so since you're signed, you'd have to use the >>> operator.

There is no intrinsic need to bounds check the array, but Java's optimizer has to figure this stuff out pretty quickly, so I don't blame them for that.

查看更多
登录 后发表回答