A Fast Prime Number Sieve in Python

2019-01-19 04:48发布

I have been going through prime number generation in python using the sieve of Eratosthenes and the solutions which people tout as a relatively fast option such as those in a few of the answers to a question on optimising prime number generation in python are not straightforward and the simple implementation which I have here rivals them in efficiency. My implementation is given below

def sieve_for_primes_to(n):
    size = n//2
    sieve = [1]*size
    limit = int(n**0.5)
    for i in range(1,limit):
        if sieve[i]:
            val = 2*i+1
            tmp = ((size-1) - i)//val 
            sieve[i+val::val] = [0]*tmp
    return sieve


print [2] + [i*2+1 for i, v in enumerate(sieve_for_primes_to(10000000)) if v and i>0]

Timing the execution returns

python -m timeit -n10 -s "import euler" "euler.sieve_for_primes_to(1000000)"
10 loops, best of 3: 19.5 msec per loop

While the method described in the answer to the above linked question as being the fastest from the python cookbook is given below

import itertools
def erat2( ):
    D = {  }
    yield 2
    for q in itertools.islice(itertools.count(3), 0, None, 2):
        p = D.pop(q, None)
        if p is None:
            D[q*q] = q
            yield q
        else:
            x = p + q
            while x in D or not (x&1):
                x += p
            D[x] = p

def get_primes_erat(n):
  return list(itertools.takewhile(lambda p: p<n, erat2()))

When run it gives

python -m timeit -n10 -s "import euler" "euler.get_primes_erat(1000000)"
10 loops, best of 3: 697 msec per loop

My question is why do people tout the above from the cook book which is relatively complex as the ideal prime generator?

2条回答
在下西门庆
2楼-- · 2019-01-19 05:44

I transformed your code to fit into the prime sieve comparison script of @unutbu at Fastest way to list all primes below N as follows:

def sieve_for_primes_to(n):
    size = n//2
    sieve = [1]*size
    limit = int(n**0.5)
    for i in range(1,limit):
        if sieve[i]:
            val = 2*i+1
            tmp = ((size-1) - i)//val 
            sieve[i+val::val] = [0]*tmp
    return [2] + [i*2+1 for i, v in enumerate(sieve) if v and i>0]

On my MBPro i7 the script is fast calculating all primes < 1000000 but actually 1.5 times slower than rwh_primes2, rwh_primes1 (1.2), rwh_primes (1.19) and primeSieveSeq (1.12) (@andreasbriese at the page end).

查看更多
欢心
3楼-- · 2019-01-19 05:49

You should only use the "postponed" variant of that algorithm. Comparing your code test run up to 10 and 20 mln upper limit, as

...
print(len( [2] + [i*2+1 for i, v in 
  enumerate(sieve_for_primes_to(10000000)) if v and i>0]))

with the other one, run at the corresponding figures of 664579 and 1270607 primes to produce, as

...
print( list( islice( (p for p in postponed_sieve() ), n-1, n+1))) 

shows your code running "only" 3.1x...3.3x times faster. :) Not 36x times faster, as your timings show for some reason.

I don't think anyone ever claimed it's an "ideal" prime generator, just that it is a conceptually clean and clear one. All these prime generation functions are toys really, the real stuff is working with the very big numbers, using completely different algorithms anyway.

Here in the low range, what matters is the time complexity of the algorithm, which should be around ~ n^(1+a), a < 0.1...0.2 empirical orders of growth, which both of them seem to be indeed. Having a toy generator with ~ n^1.5 or even ~ n^2 orders of growth is just no fun to play with.

查看更多
登录 后发表回答