I was just experimenting a bit with (for me) a new programming language: clojure. And I wrote a quite naive 'sieve' implementation, which I then tried to optimise a bit.
Strangely enough though (for me at least), the new implementation wasn't faster, but much slower...
Can anybody provide some insight in why this is so much slower?
I'm also interested in other tips in how to improve this algorithm...
Best regards,
Arnaud Gouder
; naive sieve.
(defn sieve
([max] (sieve max (range 2 max) 2))
([max candidates n]
(if (> (* n n) max)
candidates
(recur max (filter #(or (= % n) (not (= (mod % n) 0))) candidates) (inc n)))))
; Instead of just passing the 'candidates' list, from which I sieve-out the non-primes,
; I also pass a 'primes' list, with the already found primes
; I hoped that this would increase the speed, because:
; - Instead of sieving-out multiples of 'all' numbers, I now only sieve-out the multiples of primes.
; - The filter predicate now becomes simpler.
; However, this code seems to be approx 20x as slow.
; Note: the primes in 'primes' end up reversed, but I don't care (much). Adding a 'reverse' call makes it even slower :-(
(defn sieve2
([max] (sieve2 max () (range 2 max)))
([max primes candidates]
(let [n (first candidates)]
(if (> (* n n) max)
(concat primes candidates)
(recur max (conj primes n) (filter #(not (= (mod % n) 0)) (rest candidates)))))))
; Another attempt to speed things up. Instead of sieving-out multiples of all numbers in the range,
; I want to sieve-out only multiples of primes.. I don't like the '(first (filter ' construct very much...
; It doesn't seem to be faster than 'sieve'.
(defn sieve3
([max] (sieve max (range 2 max) 2))
([max candidates n]
(if (> (* n n) max)
candidates
(let [new_candidates (filter #(or (= % n) (not (= (mod % n) 0))) candidates)]
(recur max new_candidates (first (filter #(> % n) new_candidates)))))))
(time (sieve 10000000))
(time (sieve 10000000))
(time (sieve2 10000000))
(time (sieve2 10000000))
(time (sieve2 10000000))
(time (sieve 10000000)) ; Strange, speeds are very different now... Must be some memory allocation thing caused by running sieve2
(time (sieve 10000000))
(time (sieve3 10000000))
(time (sieve3 10000000))
(time (sieve 10000000))
I have good news and bad news. The good news is that your intuitions are correct.
The bad news is that both are much slower than you think
What's happening is that because filter is lazy, the filtering isn't getting done until the answers need to be printed. All the first expression is counting is the time to wrap the sequence in a load of filters. Putting the count in means that the sequence actually has to be calculated within the timing expression, and then you see how long it really takes.
I think in the case without the count, sieve2 is taking longer because it is doing a bit of the work whilst constructing the filtered sequence.
When you put the count in, sieve2 is faster because it's the better algorithm.
P.S. When I try (time (sieve 10000000)), my machine crashes with a stack overflow, presumably because of the vast stack of nested filter calls it's building up. How come it ran for you?
Some optimization tips for this kind of Primative number heavy math:
clonjure 1.3 allows un-boxed-checked-arithmetic so you wont be casting everything to Integer.
filter
will require boxing of all the numbers.You're likely loosing enough time in boxing/unboxing Integers/ints that it will make it hard to really judge the different optimizations. If you type hint (and use clojure 1.3) then you will likely get better numbers to judge your optimizations.