Is this an optimal prime generator?

2019-08-11 00:02发布

问题:

Is this in any way an optimal solution for finding primes? I am not trying to add every optimization under the sun, but is the principal good?

def primesUpto(self, x):
    primes = [2]
    sieve = [2]
    i = 3
    while i <= x:
        composite = False
        j = 0
        while j < len(sieve):
            sieve[j] = sieve[j] - 1
            if sieve[j] == 0:
                composite = True
                sieve[j] = primes[j]
            j += 1
        if not composite:
            primes.append(i)
            sieve.append(i*i-i)
        i += 1
    return primes

回答1:

Hmm, very interesting. Your code is actual honest to goodness genuine sieve of Eratosthenes IMHO, counting its way along the ascending natural numbers by decrementing each counter that it sets up for each prime encountered, by 1 on each step.

And it is very inefficient. Tested on Ideone it runs at the same empirical order of growth ~ n^2.2 (at the tested range of few thousand primes produced) as the famously inefficient Turner's trial division sieve (in Haskell).

Why? Several reasons. First, no early bailout in your test: when you detect it's a composite, you continue processing the array of counters, sieve. You have to, because of the second reason: you count the difference by decrementing each counter by 1 on each step, with 0 representing your current position. This is the most faithful expression of the original sieve IMHO, and it is very inefficient: today our CPUs know how to add numbers in O(1) time (if those numbers belong to a certain range, 0 .. 2^32, or 0 .. 2^64, of course).

Moreover, our computers also have direct access memory now, and having calculated the far-off number we can mark it in a random access array. Which is the foundation of the efficiency of the sieve of Eratosthenes on modern computers - both the direct calculation, and the direct marking of multiples.

And third, perhaps the most immediate reason for inefficiency, is the premature handling of the multiples: when you encounter 5 as a prime, you add its first multiple (not yet encountered) i.e. 25, right away into the array of counters, sieve (i.e. the distance between the current point and the multiple, i*i-i). That is much too soon. The addition of 25 must be postponed until ... well, until we encounter 25 among the ascending natural numbers. Starting to handle the multiples of each prime prematurely (at p instead of p*p) leads to having way too many counters to maintain - O(n) of them (where n is the number of primes produced), instead of just O(π(sqrt(n log n))) = O(sqrt(n / log n)).

The postponement optimization when applied on a similar "counting" sieve in Haskell brought its empirical orders of growth from ~ n^2.3 .. 2.6 for n = 1000 .. 6000 primes down to just above ~ n^1.5 (with obviously enormous gains in speed). When counting was further replaced by direct addition, the resulting measured empirical orders of growth were ~ n^1.2 .. 1.3 in producing up to hlaf a million primes (although in all probability it would gain on ~ n^1.5 for bigger ranges).