I try computing Ackermann(4,1)
and there's a big difference in performance between different languages/compilers. Below are results on my Core i7 3820QM, 16G, Ubuntu 12.10 64bit,
C: 1.6s, gcc -O3
(with gcc 4.7.2)
int ack(int m, int n) {
if (m == 0) return n+1;
if (n == 0) return ack(m-1, 1);
return ack(m-1, ack(m, n-1));
}
int main() {
printf("%d\n", ack(4,1));
return 0;
}
OCaml: 3.6s, ocamlopt
(with ocaml 3.12.1)
let rec ack = function
| 0,n -> n+1
| m,0 -> ack (m-1, 1)
| m,n -> ack (m-1, ack (m, n-1))
in print_int (ack (4, 1))
Standard ML: 5.1s mlton -codegen c -cc-opt -O3
(with mlton 20100608)
fun ack 0 n = n+1
| ack m 0 = ack (m-1) 1
| ack m n = ack (m-1) (ack m (n-1));
print (Int.toString (ack 4 1));
Racket: 11.5s racket
(with racket v5.3.3)
(require racket/unsafe/ops)
(define + unsafe-fx+)
(define - unsafe-fx-)
(define (ack m n)
(cond
[(zero? m) (+ n 1)]
[(zero? n) (ack (- m 1) 1)]
[else (ack (- m 1) (ack m (- n 1)))]))
(time (ack 4 1))
Haskell: unfinished, killed by system after 22s ghc -O2
(with ghc 7.4.2)
Haskell: 1.8s ajhc
(with ajhc 0.8.0.4)
main = print $ ack 4 1
where ack :: Int -> Int -> Int
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))
The Haskell version is the only one that fails to terminate properly because it takes too much memory. It freezes my machine and fills the swap space before getting killed. What can I do to improve it without heavily fuglifying the code?
EDIT: I appreciate some of the asymptotically smarter solutions, but they are not exactly what I am asking for. This is more about seeing whether the compiler handles certain patterns in a reasonably efficient way (stack, tail calls, unboxing, etc.) than computing the ackermann function.
EDIT 2: As pointed out by several responses, this seems to be a bug in recent versions of GHC. I try the same code with AJHC and get much better performance.
Thank you very much :)
The following is an idiomatic version that takes advantage of Haskell's lazyness and GHC's optimisation of constant top-level expressions.
Here, we're lazily building a matrix for all the values of the Ackermann function. As a result, subsequent calls to
acks
will not recompute anything (i.e. evaluatingacks !! 4 !! 1
again will not double the running time).Although this is not the fastest solution, it looks a lot like the naïve implementation, it is very efficient in terms of memory use, and it recasts one of Haskell's weirder features (lazyness) as a strength.
This version uses some properties of the ackermann function. It's not equivalent to the other versions, but it's fast :
Edit : And this is a version with memoization , we see that it's easy to memoize a function in haskell, the only change is in the call site :
It seems that there is some kind of bug involved. What GHC version are you using?
With GHC 7, I get the same behavior as you do. The program consumes all available memory without producing any output.
However if I compile it with GHC 6.12.1 just with
ghc --make -O2 Ack.hs
, it works perfectly. It computes the result in 10.8s on my computer, while plain C version takes 7.8s.I suggest you to report this bug on GHC web site.
NB: The high memory usage issue is a bug in the GHC RTS, where upon stack overflow and allocation of new stacks on the heap it was not checked whether garbage collection is due. It has been already fixed in GHC HEAD.
I was able to get much better performance by CPS-converting
ack
:Your original function consumes all available memory on my machine, while this one runs in constant space.
Ocaml is still faster, however:
Edit: When compiled with JHC, your original program is about as fast as the Ocaml version:
Edit 2: Something else I've discovered: running your original program with a larger stack chunk size (
+RTS -kc1M
) makes it run in constant space. The CPS version is still a bit faster, though.Edit 3: I managed to produce a version that runs nearly as fast as the Ocaml one by manually unrolling the main loop. However, it only works when run with
+RTS -kc1M
(Dan Doel has filed a bug about this behaviour):Testing:
Edit 4: Apparently, the space leak is fixed in GHC HEAD, so
+RTS -kc1M
won't be required in the future.I don't see that this is a bug at all,
ghc
just isn't taking advantage of the fact that it knows that 4 and 1 are the only arguments the function will ever be called with -- that is, to put it bluntly, it doesn't cheat. It also doesn't do constant math for you, so if you had writtenmain = print $ ack (2+2) 1
, it wouldn't have calculated that 2+2 = 4 till runtime. Theghc
has much more important things to think about. Help for the latter difficulty is available if you care for it http://hackage.haskell.org/package/const-math-ghc-plugin.So
ghc
is helped if you do a little math e.g. this is at least a hundered times as fast as your C program with 4 and 1 as arguments. But try it with 4 & 2:This will give the right answer, all ~20,000 digits, in under a tenth of a second, whereas the gcc, with your algorithm, will take forever to give the wrong answer.
This performance issue (except for GHC RTS bug obviously) seems to be fixed now on OS X 10.8 after Apple
XCode
update to4.6.2
. I can still reproduce it on Linux (I have been testing with GHC LLVM backend though), but not any more on OS X. After I updated the XCode to 4.6.2, the new version seems to have affected the GHC backend code generation for Ackermann substantially (from what I remember from looking at object dumps pre-update). I was able to reproduce the performance issue on Mac before XCode update - I don't have the numbers but they were surely quite bad. So, it seems that XCode update improved the GHC code generation for Ackermann.Now, both C and GHC versions are quite close. C code:
Time to execute ack(4,1):
Haskell code:
Time to execute ack 4 1 (with +RTS -kc1M):
All were compiled with
-O2
flag (and-rtsopts
flag for GHC for RTS bug workaround). It is quite a head scratcher though. Updating XCode seems to have made a big difference with optimization of Ackermann in GHC.