Just to clarify, this is not a homework problem :)
I wanted to find primes for a math application I am building & came across Sieve of Eratosthenes approach.
I have written an implementation of it in Python. But it's terribly slow. For say, if I want to find all primes less than 2 million. It takes > 20 mins. (I stopped it at this point). How can I speed this up?
def primes_sieve(limit):
limitn = limit+1
primes = range(2, limitn)
for i in primes:
factors = range(i, limitn, i)
for f in factors[1:]:
if f in primes:
primes.remove(f)
return primes
print primes_sieve(2000)
UPDATE: I ended up doing profiling on this code & found that quite a lot of time was spent on removing an element from the list. Quite understandable considering it has to traverse the entire list (worst-case) to find the element & then remove it and then readjust the list (maybe some copy goes on?). Anyway, I chucked out list for dictionary. My new implementation -
def primes_sieve1(limit):
limitn = limit+1
primes = dict()
for i in range(2, limitn): primes[i] = True
for i in primes:
factors = range(i,limitn, i)
for f in factors[1:]:
primes[f] = False
return [i for i in primes if primes[i]==True]
print primes_sieve1(2000000)
You're not quite implementing the correct algorithm:
In your first example,
primes_sieve
doesn't maintain a list of primality flags to strike/unset (as in the algorithm), but instead resizes a list of integers continuously, which is very expensive: removing an item from a list requires shifting all subsequent items down by one.In the second example,
primes_sieve1
maintains a dictionary of primality flags, which is a step in the right direction, but it iterates over the dictionary in undefined order, and redundantly strikes out factors of factors (instead of only factors of primes, as in the algorithm). You could fix this by sorting the keys, and skipping non-primes (which already makes it an order of magnitude faster), but it's still much more efficient to just use a list directly.The correct algorithm (with a list instead of a dictionary) looks something like:
(Note that this also includes the algorithmic optimization of starting the non-prime marking at the prime's square (
i*i
) instead of its double.)I prefer NumPy because of speed.
Check the output:
Compare the speed of Sieve of Eratosthenes and brute-force on Jupyter Notebook. Sieve of Eratosthenes in 539 times faster than brute-force for million elements.
I realise this isn't really answering the question of how to generate primes quickly, but perhaps some will find this alternative interesting: because python provides lazy evaluation via generators, eratosthenes' sieve can be implemented exactly as stated:
The try block is there because the algorithm runs until it blows the stack and without the try block the backtrace is displayed pushing the actual output you want to see off screen.
Much faster:
Here's a version that's a bit more memory-efficient (and: a proper sieve, not trial divisions). Basically, instead of keeping an array of all the numbers, and crossing out those that aren't prime, this keeps an array of counters - one for each prime it's discovered - and leap-frogging them ahead of the putative prime. That way, it uses storage proportional to the number of primes, not up to to the highest prime.
You'll note that
primes()
is a generator, so you can keep the results in a list or you can use them directly. Here's the firstn
primes:And, for completeness, here's a timer to measure the performance:
Just in case you're wondering, I also wrote
primes()
as a simple iterator (using__iter__
and__next__
), and it ran at almost the same speed. Surprised me too!My implementation: