When Generating Primes in F#, why is the “Sieve of

IE,

What am I doing wrong here? Does it have to to with lists, sequences and arrays and the way the limitations work?

So here is the setup: I'm trying to generate some primes. I see that there are a billion text files of a billion primes. The question isn't why...the question is how are the guys using python calculating all of the primes below 1,000,000 in milliseconds on this post...and what am I doing wrong with the following F# code?

let sieve_primes2 top_number = 
    let numbers = [ for i in 2 .. top_number do yield i ]
    let sieve (n:int list) = 
        match n with
        | [x] -> x,[]
        | hd :: tl -> hd, List.choose(fun x -> if x%hd = 0 then None else Some(x)) tl
        | _ -> failwith "Pernicious list error."
    let rec sieve_prime (p:int list) (n:int list) =  
        match (sieve n) with
        | i,[] -> i::p
        | i,n'  -> sieve_prime (i::p) n'
    sieve_prime [1;0] numbers

With the timer on in FSI, I get 4.33 seconds worth of CPU for 100000... after that, it all just blows up.

标签： f#

4条回答

Rolldiameter

2楼-- · 2019-01-20 02:28

There have been several good answers both as to general trial division algorithm using lists (@pad) and in choice of an array for a sieving data structure using the Sieve of Eratosthenes (SoE) (@John Palmer and @Jon Harrop). However, @pad's list algorithm isn't particularly fast and will "blow up" for larger sieving ranges and @John Palmer's array solution is somewhat more complex, uses more memory than necessary, and uses external mutable state so is not different than if the program were written in an imperative language such as C#.

EDIT_ADD: I've edited the below code (old code with line comments) modifying the sequence expression to avoid some function calls so as to reflect more of an "iterator style" and while it saved 20% of the speed it still doesn't come close to that of a true C# iterator which is about the same speed as the "roll your own enumerator" final F# code. I've modified the timing information below accordingly. END_EDIT

The following true SoE program only uses 64 KBytes of memory to sieve primes up to a million (due to only considering odd numbers and using the packed bit BitArray) and still is almost as fast as @John Palmer's program at about 40 milliseconds to sieve to one million on a i7 2700K (3.5 GHz), with only a few lines of code:

open System.Collections
let primesSoE top_number=
  let BFLMT = int((top_number-3u)/2u) in let buf = BitArray(BFLMT+1,true)
  let SQRTLMT = (int(sqrt (double top_number))-3)/2
  let rec cullp i p = if i <= BFLMT then (buf.[i] <- false; cullp (i+p) p)
  for i = 0 to SQRTLMT do if buf.[i] then let p = i+i+3 in cullp (p*(i+1)+i) p
  seq { for i = -1 to BFLMT do if i<0 then yield 2u
                               elif buf.[i] then yield uint32(3+i+i) }
//  seq { yield 2u; yield! seq { 0..BFLMT } |> Seq.filter (fun i->buf.[i])
//                                          |> Seq.map (fun i->uint32 (i+i+3)) }
primesSOE 1000000u |> Seq.length;;

Almost all of the elapsed time is spent in the last two lines enumerating the found primes due to the inefficiency of the sequence run time library as well as the cost of enumerating itself at about 28 clock cycles per function call and return with about 16 function calls per iteration. This could be reduced to only a few function calls per iteration by rolling our own iterator, but the code is not as concise; note that in the following code there is no mutable state exposed other than the contents of the sieving array and the reference variable necessary for the iterator implementation using object expressions:

open System
open System.Collections
open System.Collections.Generic
let primesSoE top_number=
  let BFLMT = int((top_number-3u)/2u) in let buf = BitArray(BFLMT+1,true)
  let SQRTLMT = (int(sqrt (double top_number))-3)/2
  let rec cullp i p = if i <= BFLMT then (buf.[i] <- false; cullp (i+p) p)
  for i = 0 to SQRTLMT do if buf.[i] then let p = i+i+3 in cullp (p*(i+1)+i) p
  let nmrtr() =
    let i = ref -2
    let rec nxti() = i:=!i+1;if !i<=BFLMT && not buf.[!i] then nxti() else !i<=BFLMT
    let inline curr() = if !i<0 then (if !i= -1 then 2u else failwith "Enumeration not started!!!")
                        else let v = uint32 !i in v+v+3u
    { new IEnumerator<_> with
        member this.Current = curr()
      interface IEnumerator with
        member this.Current = box (curr())
        member this.MoveNext() = if !i< -1 then i:=!i+1;true else nxti()
        member this.Reset() = failwith "IEnumerator.Reset() not implemented!!!"
      interface IDisposable with
        member this.Dispose() = () }
  { new IEnumerable<_> with
      member this.GetEnumerator() = nmrtr()
    interface IEnumerable with
      member this.GetEnumerator() = nmrtr() :> IEnumerator }
primesSOE 1000000u |> Seq.length;;

The above code takes about 8.5 milliseconds to sieve the primes to a million on the same machine due to greatly reducing the number of function calls per iteration to about three from about 16. This is about the same speed as C# code written in the same style. It's too bad that F#'s iterator style as I used in the first example doesn't automatically generate the IEnumerable boiler plate code as C# iterators do, but I guess that is the intention of sequences - just that they are so damned inefficient as to speed performance due to being implemented as sequence computation expressions.

Now, less than half of the time is expended in enumerating the prime results for a much better use of CPU time.

0人赞添加讨论(0) 举报

Bombasti

3楼-- · 2019-01-20 02:37

Your sieve function is slow because you tried to filter out composite numbers up to top_number. With Sieve of Eratosthenes, you only need to do so until sqrt(top_number) and remaining numbers are inherently prime. Suppose we havetop_number = 1,000,000, your function does 78498 rounds of filtering (the number of primes until 1,000,000) while the original sieve only does so 168 times (the number of primes until 1,000).

You can avoid generating even numbers except 2 which cannot be prime from the beginning. Moreover, sieve and sieve_prime can be merged into a recursive function. And you could use lightweight List.filter instead of List.choose.

Incorporating above suggestions:

let sieve_primes top_number = 
    let numbers = [ yield 2
                    for i in 3..2..top_number -> i ]
    let rec sieve ns = 
        match ns with
        | [] -> []
        | x::xs when x*x > top_number -> ns
        | x::xs -> x::sieve (List.filter(fun y -> y%x <> 0) xs)
    sieve numbers

In my machine, the updated version is very fast and it completes within 0.6s for top_number = 1,000,000.

0人赞添加讨论(0) 举报

萌系小妹纸

4楼-- · 2019-01-20 02:43

What am I doing wrong here?

You've implemented a different algorithm that goes through each possible value and uses % to determine if it needs to be removed. What you're supposed to be doing is stepping through with a fixed increment removing multiples. That would be asymptotically.

You cannot step through lists efficiently because they don't support random access so use arrays.

0人赞添加讨论(0) 举报

祖国的老花朵

5楼-- · 2019-01-20 02:49

Based on my code here: stackoverflow.com/a/8371684/124259

Gets the first 1 million primes in 22 milliseconds in fsi - a significant part is probably compiling the code at this point.

#time "on"

let limit = 1000000
//returns an array of all the primes up to limit
let table =
    let table = Array.create limit true //use bools in the table to save on memory
    let tlimit = int (sqrt (float limit)) //max test no for table, ints should be fine
    let mutable curfactor = 1;
    while curfactor < tlimit-2 do
        curfactor <- curfactor+2
        if table.[curfactor]  then //simple optimisation
            let mutable v = curfactor*2
            while v < limit do
                table.[v] <- false
                v <- v + curfactor
    let out = Array.create (100000) 0 //this needs to be greater than pi(limit)
    let mutable idx = 1
    out.[0]<-2
    let mutable curx=1
    while curx < limit-2 do
        curx <- curx + 2
        if table.[curx] then
            out.[idx]<-curx
            idx <- idx+1
    out

0人赞添加讨论(0) 举报

When Generating Primes in F#, why is the “Sieve of

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间