I have taken Problem #12 from Project Euler as a programming exercise and to compare my (surely not optimal) implementations in C, Python, Erlang and Haskell. In order to get some higher execution times, I search for the first triangle number with more than 1000 divisors instead of 500 as stated in the original problem.
The result is the following:
C:
lorenzo@enzo:~/erlang$ gcc -lm -o euler12.bin euler12.c
lorenzo@enzo:~/erlang$ time ./euler12.bin
842161320
real 0m11.074s
user 0m11.070s
sys 0m0.000s
Python:
lorenzo@enzo:~/erlang$ time ./euler12.py
842161320
real 1m16.632s
user 1m16.370s
sys 0m0.250s
Python with PyPy:
lorenzo@enzo:~/Downloads/pypy-c-jit-43780-b590cf6de419-linux64/bin$ time ./pypy /home/lorenzo/erlang/euler12.py
842161320
real 0m13.082s
user 0m13.050s
sys 0m0.020s
Erlang:
lorenzo@enzo:~/erlang$ erlc euler12.erl
lorenzo@enzo:~/erlang$ time erl -s euler12 solve
Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.7.4 (abort with ^G)
1> 842161320
real 0m48.259s
user 0m48.070s
sys 0m0.020s
Haskell:
lorenzo@enzo:~/erlang$ ghc euler12.hs -o euler12.hsx
[1 of 1] Compiling Main ( euler12.hs, euler12.o )
Linking euler12.hsx ...
lorenzo@enzo:~/erlang$ time ./euler12.hsx
842161320
real 2m37.326s
user 2m37.240s
sys 0m0.080s
Summary:
- C: 100%
- Python: 692% (118% with PyPy)
- Erlang: 436% (135% thanks to RichardC)
- Haskell: 1421%
I suppose that C has a big advantage as it uses long for the calculations and not arbitrary length integers as the other three. Also it doesn't need to load a runtime first (Do the others?).
Question 1:
Do Erlang, Python and Haskell lose speed due to using arbitrary length integers or don't they as long as the values are less than MAXINT
?
Question 2: Why is Haskell so slow? Is there a compiler flag that turns off the brakes or is it my implementation? (The latter is quite probable as Haskell is a book with seven seals to me.)
Question 3: Can you offer me some hints how to optimize these implementations without changing the way I determine the factors? Optimization in any way: nicer, faster, more "native" to the language.
EDIT:
Question 4: Do my functional implementations permit LCO (last call optimization, a.k.a tail recursion elimination) and hence avoid adding unnecessary frames onto the call stack?
I really tried to implement the same algorithm as similar as possible in the four languages, although I have to admit that my Haskell and Erlang knowledge is very limited.
Source codes used:
#include <stdio.h>
#include <math.h>
int factorCount (long n)
{
double square = sqrt (n);
int isquare = (int) square;
int count = isquare == square ? -1 : 0;
long candidate;
for (candidate = 1; candidate <= isquare; candidate ++)
if (0 == n % candidate) count += 2;
return count;
}
int main ()
{
long triangle = 1;
int index = 1;
while (factorCount (triangle) < 1001)
{
index ++;
triangle += index;
}
printf ("%ld\n", triangle);
}
#! /usr/bin/env python3.2
import math
def factorCount (n):
square = math.sqrt (n)
isquare = int (square)
count = -1 if isquare == square else 0
for candidate in range (1, isquare + 1):
if not n % candidate: count += 2
return count
triangle = 1
index = 1
while factorCount (triangle) < 1001:
index += 1
triangle += index
print (triangle)
-module (euler12).
-compile (export_all).
factorCount (Number) -> factorCount (Number, math:sqrt (Number), 1, 0).
factorCount (_, Sqrt, Candidate, Count) when Candidate > Sqrt -> Count;
factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;
factorCount (Number, Sqrt, Candidate, Count) ->
case Number rem Candidate of
0 -> factorCount (Number, Sqrt, Candidate + 1, Count + 2);
_ -> factorCount (Number, Sqrt, Candidate + 1, Count)
end.
nextTriangle (Index, Triangle) ->
Count = factorCount (Triangle),
if
Count > 1000 -> Triangle;
true -> nextTriangle (Index + 1, Triangle + Index + 1)
end.
solve () ->
io:format ("~p~n", [nextTriangle (1, 1) ] ),
halt (0).
factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
where square = sqrt $ fromIntegral number
isquare = floor square
factorCount' number sqrt candidate count
| fromIntegral candidate > sqrt = count
| number `mod` candidate == 0 = factorCount' number sqrt (candidate + 1) (count + 2)
| otherwise = factorCount' number sqrt (candidate + 1) count
nextTriangle index triangle
| factorCount triangle > 1000 = triangle
| otherwise = nextTriangle (index + 1) (triangle + index + 1)
main = print $ nextTriangle 1 1
Your Haskell implementation could be greatly sped up by using some functions from Haskell packages. In this case I used primes, which is just installed with 'cabal install primes' ;)
Timings:
Your original program:
Improved implementation
As you can see, this one runs in 38 milliseconds on the same machine where yours ran in 16 seconds :)
Compilation commands:
Question one can be answered in the negative for Erlang. The last question is answered by using Erlang appropriately, as in:
http://bredsaal.dk/learning-erlang-using-projecteuler-net
Since it's faster than your initial C example, I would guess there are numerous problems as others have already covered in detail.
This Erlang module executes on a cheap netbook in about 5 seconds ... It uses the network threads model in erlang and, as such demonstrates how to take advantage of the event model. It could be distributed over many nodes. And it's fast. Not my code.
The test below took place on an: Intel(R) Atom(TM) CPU N270 @ 1.60GHz
There are some problems with the Erlang implementation. As baseline for the following, my measured execution time for your unmodified Erlang program was 47.6 seconds, compared to 12.7 seconds for the C code.
The first thing you should do if you want to run computationally intensive Erlang code is to use native code. Compiling with
erlc +native euler12
got the time down to 41.3 seconds. This is however a much lower speedup (just 15%) than expected from native compilation on this kind of code, and the problem is your use of-compile(export_all)
. This is useful for experimentation, but the fact that all functions are potentially reachable from the outside causes the native compiler to be very conservative. (The normal BEAM emulator is not that much affected.) Replacing this declaration with-export([solve/0]).
gives a much better speedup: 31.5 seconds (almost 35% from the baseline).But the code itself has a problem: for each iteration in the factorCount loop, you perform this test:
The C code doesn't do this. In general, it can be tricky to make a fair comparison between different implementations of the same code, and in particular if the algorithm is numerical, because you need to be sure that they are actually doing the same thing. A slight rounding error in one implementation due to some typecast somewhere may cause it to do many more iterations than the other even though both eventually reach the same result.
To eliminate this possible error source (and get rid of the extra test in each iteration), I rewrote the factorCount function as follows, closely modelled on the C code:
This rewrite, no
export_all
, and native compilation, gave me the following run time:which is not too bad compared to the C code:
considering that Erlang is not at all geared towards writing numerical code, being only 50% slower than C on a program like this is pretty good.
Finally, regarding your questions:
Question 1: Do erlang, python and haskell loose speed due to using arbitrary length integers or don't they as long as the values are less than MAXINT?
Yes, somewhat. In Erlang, there is no way of saying "use 32/64-bit arithmetic with wrap-around", so unless the compiler can prove some bounds on your integers (and it usually can't), it must check all computations to see if they can fit in a single tagged word or if it has to turn them into heap-allocated bignums. Even if no bignums are ever used in practice at runtime, these checks will have to be performed. On the other hand, that means you know that the algorithm will never fail because of an unexpected integer wraparound if you suddenly give it larger inputs than before.
Question 4: Do my functional implementations permit LCO and hence avoid adding unnecessary frames onto the call stack?
Yes, your Erlang code is correct with respect to last call optimization.
With Haskell, you really don't need to think in recursions explicitly.
In the above code, I have replaced explicit recursions in @Thomas' answer with common list operations. The code still does exactly the same thing without us worrying about tail recursion. It runs (~ 7.49s) about 6% slower than the version in @Thomas' answer (~ 7.04s) on my machine with GHC 7.6.2, while the C version from @Raedwulf runs ~ 3.15s. It seems GHC has improved over the year.
PS. I know it is an old question, and I stumble upon it from google searches (I forgot what I was searching, now...). Just wanted to comment on the question about LCO and express my feelings about Haskell in general. I wanted to comment on the top answer, but comments do not allow code blocks.
I made the assumption that the number of factors is only large if the numbers involved have many small factors. So I used thaumkid's excellent algorithm, but first used an approximation to the factor count that is never too small. It's quite simple: Check for prime factors up to 29, then check the remaining number and calculate an upper bound for the nmber of factors. Use this to calculate an upper bound for the number of factors, and if that number is high enough, calculate the exact number of factors.
The code below doesn't need this assumption for correctness, but to be fast. It seems to work; only about one in 100,000 numbers gives an estimate that is high enough to require a full check.
Here's the code:
This finds the 14,753,024th triangular with 13824 factors in about 0.7 seconds, the 879,207,615th triangular number with 61,440 factors in 34 seconds, the 12,524,486,975th triangular number with 138,240 factors in 10 minutes 5 seconds, and the 26,467,792,064th triangular number with 172,032 factors in 21 minutes 25 seconds (2.4GHz Core2 Duo), so this code takes only 116 processor cycles per number on average. The last triangular number itself is larger than 2^68, so
I modified "Jannich Brendle" version to 1000 instead 500. And list the result of euler12.bin, euler12.erl, p12dist.erl. Both erl codes use '+native' to compile.