I am trying to optimize a small program which calculates perfect numbers from a given exponent.
The program runs (almost) perfectly, but when I open the task manager, it still runs on a single thread. That means that I must be doing something wrong, but my knowledge of F# is still in a 'beginning' phase.
I will try to put this question as clear as I possibly can, but if I fail in doing so, please let me know.
A perfect number is a number where the sum of all it's divisors (except for the number itself) is equal to the number itself (e.g. 6 is perfect, since the sum of it's divisors 1, 2 and 3 are 6).
I use prime numbers to speed up the calculation, that is I am not interested in (huge) lists where all the divisors are stored. To do so, I use the formula that Euclid proved to be correct: (2*(power of num - 1)) * ( 2* (power of num - 1)) where the latter is a Mersenne prime. I used a very fast algorithm from stackoverflow (by @Juliet) to determine whether a given number is a prime.
As I have been reading through several articles (I have not yet purchased a good book, so shame on me) on the Internet, I found out that sequences perform better than lists. So that is why I first started to create a function which generates a sequence of perfect numbers:
let perfectNumbersTwo (n : int) =
seq { for i in 1..n do
if (PowShift i) - 1I |> isPrime
then yield PowShift (i-1) * ((PowShift i)-1I)
}
The helperfunction PowShift is implemented as following:
let inline PowShift (exp:int32) = 1I <<< exp ;;
I use a bit shift operator, since the base of all power calculations is from 2, hence this could be an easy way. Of course I am still grateful for the contributions on the question I asked about this on: F# Power issues which accepts both arguments to be bigints>F# Power issues which accepts both arguments to be bigints
The function Juliet created (borrowed here) is as following:
let isPrime ( n : bigint) =
let maxFactor = bigint(sqrt(float n))
let rec loop testPrime tog =
if testPrime > maxFactor then true
elif n % testPrime = 0I then false
else loop (testPrime + tog) (6I - tog)
if n = 2I || n = 3I || n = 5I then true
elif n <= 1I || n % 2I = 0I || n % 3I = 0I || n % 5I = 0I then false
else loop 7I 4I;;
Using this code, without parallel, it takes about 9 minutes on my laptop to find the 9th perfect number (which consists of 37 digits, and can be found with value 31 for the exponent). Since my laptop has a CPU with two cores, and only one is running at 50 percent (full load for one core) I though that I could speed up the calculations by calculating the results parallel.
So I changed my perfectnumber function as following:
//Now the function again, but async for parallel computing
let perfectNumbersAsync ( n : int) =
async {
try
for x in 1.. n do
if PowShift x - 1I |> isPrime then
let result = PowShift (x-1) * ((PowShift x)-1I)
printfn "Found %A as a perfect number" result
with
| ex -> printfn "Error%s" (ex.Message);
}
To call this function, I use a small helper function to run it:
let runPerfects n =
[n]
|> Seq.map perfectNumbersAsync
|> Async.Parallel
|> Async.RunSynchronously
|> ignore
Where the result of async calculation is ignored, since I am displaying it within the perfectNumbersAsync function.
The code above compiles and it runs, however it still uses only one core (although it runs 10 seconds faster when calculating the 9th perfect number). I am afraid that it has to do something with the helper functions PowShift and isPrime, but I am not certain. Do I have to put the code of these helper functions within the async block of perfectNumbersAsync? It does not improve readability...
The more I play with F#, the more I learn to appreciate this language, but as with this case, I am in need of some experts sometimes :).
Thanks in advance for reading this, I only hope that I made myself a bit clear...
Robert.
One quick comment on speed and parallelisability,
Your
isPrime
is O(sqrt(n)), and each succesive n is about 2 x as big as the last one, so will take approx 1.5 x as long to calculate, which means that calculating the last numbers will take much longerI have done some hacking with testing for primality and some things I have found which are useful are:
For big N, (you are testing numbers with 20 digits), the prime density is actually quite low, so you will be doing alot of divisions by composite numbers. A better approach is to precalculate a table of primes (using a sieve) up to some maximum limit (probably determined by amount of memory). Note that you are most likely to find factors with small numbers. Once you run out of memory for your table, you can test the rest of the numbers with your existing function, with a larger starting point.
Another approach is to use multiple threads in the checking. For example, you currently check
x,x+4,x+6...
as factors. By being slightly cleverer, you can do the number congruent to 1 mod 3 in 1 thread and the numbers congruent to 2 mod 3 in another thread.No. 2 is simplest, but No. 1 is more effective, and provides potential for doing control flow with OutOfMemoryExceptions which can always be interesting
EDIT: So I implemented both of these ideas, it finds 2305843008139952128 almost instantly, finding 2658455991569831744654692615953842176 takes 7 minutes on my computer (quad core AMD 3200). Most of the time is spent checking 2^61 is prime, so a better algorithm would probably be better for checking the prime numbers: Code here
@Jeffrey Sax's comment is definitely interesting, so I took some time to do a small experiment. The Lucas-Lehmer test is written as follows:
With the Lucas-Lehmer test, I can get first few perfect numbers very fast:
The Lucas-Lehmer test helps to reduce the time checking prime numbers. Instead of testing divisibility of 2^p-1 which takes
O(sqrt(2^p-1))
, we use the primality test which is at mostO(p^3)
. Withn = 2048
, I am able to find first 15 Mersenne numbers in 7.83 seconds. The 15th Mersenne number is the one withi = 1279
and it consists of 770 digits.I tried to parallelize
runPerfects
using PSeq module in F# Powerpack. PSeq doesn't preserve the order of the original sequence, so to be fair I have sorted the output sequence. Since the primality test is quite balance among indices, the result is quite encouraging:With the same input, the parallel version took 2.28 seconds which is equivalent to 3.4x speedup on my quad-core machine. I believe the result could be improved further if you use
Parallel.For
construct and partition the input range sensibly.