I have trouble understanding this chunk of code:
let
sieve (p:xs) = p : sieve (filter (\ x -> x `mod` p /= 0) xs)
in sieve [2 .. ]
Can someone break it down for me? I understand there is recursion in it, but thats the problem I can't understand how the recursion in this example works.
Contrary to what others have stated here, this function does not implement the true sieve of Eratosthenes. It does returns an initial sequence of the prime numbers though, and in a similar manner, so it's okay to think of it as the sieve of Eratosthenes.
I was about done explaining the code when mipadi posted his answer; I could delete it, but since I spent some time on it, and because our answers are not completely identical, I'll leave it here.
Firs of all, note that there are some syntax errors in the code you posted. The correct code is,
let x in y
definesx
and allows its definition to be used iny
. The result of this expression is the result ofy
. So in this case we define a functionsieve
and return the result of applying[2..]
tosieve
.Now let us have a closer look at the
let
part of this expression:sieve
which takes a list as its first argument.(p:xs)
is a pattern which matchesp
with the head of said list andxs
with the tail (everything but the head).p : xs
is a list whose first element isp
.xs
is a list containing the remaining elements. Thus,sieve
returns the first element of the list it receives.Not look at the remainder of the list:
sieve
is being called recursively. Thus, thefilter
expression will return a list.filter
takes a filter function and a list. It returns only those elements in the list for which the filter function returns true.In this case
xs
is the list being filtered andis the filter function.
x
and returns true iff it is not a multiple ofp
.Now that
sieve
is defined, we pass it[2 .. ]
, the list of all natural numbers starting at 2. Thus,It says "the sieve of some list is the first element of the list (which we'll call p) and the sieve of the rest of the list, filtered such that only elements not divisible by p are allowed through". It then gets things started by by returning the sieve of all integers from 2 to infinity (which is 2 followed by the sieve of all integers not divisible by 2, etc.).
I recommend The Little Schemer before you attack Haskell.
It's implementing the Sieve of Eratosthenes
Basically, start with a prime (2), and filter out from the rest of the integers, all multiples of two. The next number in that filtered list must also be a prime, and therefore filter all of its multiples from the remaining, and so on.
It defines a generator - a stream transformer called "sieve",
which uses a curried form of an anonymous function equivalent to
Both
Sieve
andFilter
are data-constructing operations with internal state and by-value argument passing semantics.Here we can see that the most glaring problem of this code is not, repeat not that it uses trial division to filter out the multiples from the working sequence, whereas it could find them out directly, by counting up in increments of
p
. If we were to replace the former with the latter, the resulting code would still have abysmal run-time complexity.No, its most glaring problem is that it puts a
Filter
on top of its working sequence too soon, when it should really do that only after the prime's square is seen in the input. As a result it creates a quadratic number ofFilter
s compared to what's really needed. The chain ofFilter
s it creates is too long, and most of them aren't even needed at all.The corrected version, with the filter creation postponed until the proper moment, is
or in Haskell,
rem
is used here instead ofmod
as it can be much faster in some interpreters, and the numbers are all positive here anyway.Measuring the local orders of growth of an algorithm by taking its run times
t1,t2
at problem-size pointsn1,n2
, aslogBase (n2/n1) (t2/t1)
, we getO(n^2)
for the first one, and just aboveO(n^1.4)
for the second (inn
primes produced).Just to clarify it, the missing parts could be defined in this (imaginary) language simply as
see also.
update: Curiously, the first instance of this code in David Turner's 1976 SASL manual according to A.J.T. Davie's 1992 Haskell book,
actually admits two pairs of implementations for
remove
andmultsof
going together -- one pair for the trial division sieve as in this question, and the other for the ordered removal of each prime's multiples directly generated by counting, aka the genuine sieve of Eratosthenes (both would be non-postponed, of course). In Haskell,If only he'd postponed picking the actual implementation...
As for the postponed code, in a pseudocode with "list patterns" it could be
It's actually pretty elegant.
First, we define a function
sieve
that takes a list of elements:In the body of
sieve
, we take the head of the list (because we're passing the infinite list[2..]
, and 2 is defined to be prime) and append it (lazily!) to the result of applyingsieve
to the rest of the list:So let's look at the code that does the work on the rest of the list:
We're applying
sieve
to the filtered list. Let's break down what the filter part does:filter
takes a function and a list on which we apply that function, and retains elements that meet the criteria given by the function. In this case,filter
takes an anonymous function:This anonymous function takes one argument,
x
. It checks the modulus ofx
againstp
(the head of the list, every timesieve
is called):If the modulus is not equal to 0:
Then the element
x
is kept in the list. If it is equal to 0, it's filtered out. This makes sense: ifx
is divisible byp
, thanx
is divisible by more than just 1 and itself, and thus it is not prime.