Arcane isPrime method in Java

2020-02-08 18:54发布

问题:

Consider the following method:

public static boolean isPrime(int n) {
    return ! (new String(new char[n])).matches(".?|(..+?)\\1+");
}

I've never been a regular expression guru, so can anyone fully explain how this method actually works? Furthermore, is it efficient compared to other possible methods for determining whether an integer is prime?

回答1:

First, note that this regex applies to numbers represented in a unary counting system, i.e.

1       is 1
11      is 2
111     is 3
1111    is 4
11111   is 5
111111  is 6
1111111 is 7

and so on. Really, any character can be used (hence the .s in the expression), but I'll use "1".

Second, note that this regex matches composite (non-prime) numbers; thus negation detects primality.


Explanation:

The first half of the expression,

.?

says that the strings "" (0) and "1" (1) are matches, i.e. not prime (by definition, though arguable.)

The second half, in simple English, says:

Match the shortest string whose length is at least 2, for example, "11" (2). Now, see if we can match the entire string by repeating it. Does "1111" (4) match? Does "111111" (6) match? Does "11111111" (8) match? And so on. If not, then try it again for the next shortest string, "111" (3). Etc.

You can now see how, if the original string can't be matched as a multiple of its substrings, then by definition, it's prime!

BTW, the non-greedy operator ? is what makes the "algorithm" start from the shortest and count up.


Efficiency:

It's interesting, but certainly not efficient, by various arguments, some of which I'll consolidate below:

  • As @TeddHopp notes, the well-known sieve-of-Eratosthenes approach would not bother to check multiples of integers such as 4, 6, and 9, having been "visited" already while checking multiples of 2 and 3. Alas, this regex approach exhaustively checks every smaller integer.

  • As @PetarMinchev notes, we can "short-circuit" the multiples-checking scheme once we reach the square root of the number. We should be able to because a factor greater than the square root must partner with a factor lesser than the square root (since otherwise two factors greater than the square root would produce a product greater than the number), and if this greater factor exists, then we should have already encountered (and thus, matched) the lesser factor.

  • As @Jesper and @Brian note with concision, from a non-algorithmic perspective, consider how a regular expression would begin by allocating memory to store the string, e.g. char[9000] for 9000. Well, that was easy, wasn't it? ;)

  • As @Foon notes, there exist probabilistic methods which may be more efficient for larger numbers, though they may not always be correct (turning up pseudoprimes instead). But also there are deterministic tests that are 100% accurate and far more efficient than sieve-based methods. Wolfram's has a nice summary.



回答2:

The unary characteristics of primes and why this works has already been covered. So here's a test using conventional approaches and this approach:

public class Main {

    public static void main(String[] args) {
        long time = System.nanoTime();
        for (int i = 2; i < 10000; i++) {
            isPrimeOld(i);
        }
        time = System.nanoTime() - time;
        System.out.println(time + " ns (" + time / 1000000 + " ms)");
        time = System.nanoTime();
        for (int i = 2; i < 10000; i++) {
            isPrimeRegex(i);
        }
        time = System.nanoTime() - time;
        System.out.println(time + " ns (" + time / 1000000 + " ms)");
        System.out.println("Done");
    }

    public static boolean isPrimeRegex(int n) {
        return !(new String(new char[n])).matches(".?|(..+?)\\1+");
    }

    public static boolean isPrimeOld(int n) {
        if (n == 2)
            return true;
        if (n < 2)
            return false;
        if ((n & 1) == 0)
            return false;
        int limit = (int) Math.round(Math.sqrt(n));
        for (int i = 3; i <= limit; i += 2) {
            if (n % i == 0)
                return false;
        }
        return true;
    }
}

This test computes whether or not the number is prime up to 9,999, starting from 2. And here's its output on a relatively powerful server:

8537795 ns (8 ms)
30842526146 ns (30842 ms)
Done

So it is grossly inefficient once the numbers get large enough. (For up to 999, the regex runs in about 400 ms.) For small numbers, it's fast, but it's still faster to generate the primes up to 9,999 the conventional way than it is to even generate primes up to 99 the old way (23 ms).



回答3:

This is not a really efficient way to check if a number is prime(it checks every divisor).

An efficient way is to check for divisors up to sqrt(number). This is if you want to be certain if a number is prime. Otherwise there are probabilistic primality checks which are faster, but not 100% correct.