The following blog article shows how in F# foldBack
can be made tail recursive using continuation passing style.
In Scala this would mean that:
def foldBack[T,U](l: List[T], acc: U)(f: (T, U) => U): U = {
l match {
case x :: xs => f(x, foldBack(xs, acc)(f))
case Nil => acc
}
}
can be made tail recursive by doing this:
def foldCont[T,U](list: List[T], acc: U)(f: (T, U) => U): U = {
@annotation.tailrec
def loop(l: List[T], k: (U) => U): U = {
l match {
case x :: xs => loop(xs, (racc => k(f(x, racc))))
case Nil => k(acc)
}
}
loop(list, u => u)
}
Unfortunately, I still get a stack overflow for long lists. loop is tail recursive and optimized but I guess the stack accumulation is just moved into the continuation calls.
Why is this not a problem with F#? And is there any way to work around this with Scala?
Edit: here some code that shows depth of stack:
def showDepth(s: Any) {
println(s.toString + ": " + (new Exception).getStackTrace.size)
}
def foldCont[T,U](list: List[T], acc: U)(f: (T, U) => U): U = {
@annotation.tailrec
def loop(l: List[T], k: (U) => U): U = {
showDepth("loop")
l match {
case x :: xs => loop(xs, (racc => { showDepth("k"); k(f(x, racc)) }))
case Nil => k(acc)
}
}
loop(list, u => u)
}
foldCont(List.fill(10)(1), 0)(_ + _)
This prints:
loop: 50
loop: 50
loop: 50
loop: 50
loop: 50
loop: 50
loop: 50
loop: 50
loop: 50
loop: 50
loop: 50
k: 51
k: 52
k: 53
k: 54
k: 55
k: 56
k: 57
k: 58
k: 59
k: 60
res2: Int = 10
Jon, n.m., thank you for your answers. Based on your comments I thought I'd give a try and use trampoline. A bit of research shows Scala has library support for trampolines in
TailCalls
. Here is what I came up with after a bit of fiddling around:I was interested to see how this compares to the solution without the trampoline as well as the default
foldLeft
andfoldRight
implementations. Here is the benchmark code and some results:The timings are:
Based on this, it would seem that trampolining is actually yielding pretty good performance. I suspect that the penalty on top of the boxing/unboxing is relatively not that bad.
Edit: as suggested by Jon's comments, here are the timings on 1M items which confirm that performance degrades with larger lists. Also I found out that library List.foldLeft implementation is not overriden, so I timed with the following foldLeft2:
yields:
So foldLeft2.reverse is the winner...
F# has all tail calls optimized.
You can do TCO using other techniques like trampolines but you lose interop because it changes the calling convention and it is ~10× slower. This is one of the three reasons I don't use Scala.
EDIT
Your benchmark results indicate that Scala's trampolines are a lot faster than they were the last time I tested them. Also, it is interesting to add equivalent benchmarks using F# and for larger lists (because there's no point in doing CPS on small lists!).
For 1,000x on a 1,000-element list on my netbook with a 1.67GHz N570 Intel Atom, I get:
For 1x 1,000,000-element list, I get:
You may also be interested in the old discussions about this on the caml-list in the context of replacing OCaml's non-tail-recursive list functions with optimized tail recursive ones.
The problem is the continuation function
(racc => k(f(x, racc)))
itself. It should be tailcall optimized for this whole business to work, but isn't.Scala cannot make tailcall optimizations for arbitrary tail calls, only for those it can transform into loops (i.e. when the function calls itself, not some other function).
I'm late to this question, but I wanted to show how you can write a tail-recursive FoldRight without using a full trampoline; by accumulating a list of continuations (instead of having them call each other when done, which leads to a stack overflow) and folding over them at the end, kind of like keeping a stack, but on the heap:
The fold that happens at the end is itself tail-recursive. Try it out in ScalaFiddle
In terms of performance, it performs slightly worse than the tail call version.