Disclosure: this came up in FsCheck, an F# random testing framework I maintain. I have a solution, but I do not like it. Moreover, I do not understand the problem - it was merely circumvented.
A fairly standard implementation of (monadic, if we're going to use big words) sequence is:
let sequence l =
let k m m' = gen { let! x = m
let! xs = m'
return (x::xs) }
List.foldBack k l (gen { return [] })
Where gen can be replaced by a computation builder of choice. Unfortunately, that implementation consumes stack space, and so eventually stack overflows if the list is long enough.The question is: why? I know in principle foldBack is not tail recursive, but the clever bunnies of the F# team have circumvented that in the foldBack implementation. Is there a problem in the computation builder implementation?
If I change the implementation to the below, everything is fine:
let sequence l =
let rec go gs acc size r0 =
match gs with
| [] -> List.rev acc
| (Gen g)::gs' ->
let r1,r2 = split r0
let y = g size r1
go gs' (y::acc) size r2
Gen(fun n r -> go l [] n r)
For completeness, the Gen type and computation builder can be found in the FsCheck source
You're correct - the reason why you're getting a stack overflow is that the
bind
operation of the monad needs to be tail-recursive (because it is used to aggregate values during folding).The monad used in FsCheck is essentially a state monad (it keeps the current generator and some number). I simplified it a bit and got something like:
Here, the
bind
function is not tail-recursive because it callsk
and then does some more work. You can change the monad to be a continuation monad. It is implemented as a function that takes the state and a continuation - a function that is called with the result as an argument. For this monad, you can makebind
tail recursive:The following example will not stack overflow (and it did with the original implementation):
Building on Tomas's answer, let's define two modules:
To simplify things a bit, let's rewrite your original sequence function as
Now,
sequence [for i in 1 .. 100000 -> unit i]
will run to completion regardless of whethersequence
is defined in terms ofKurt.gen
orTomas.gen
. The issue is not thatsequence
causes a stack overflow when using your definitions, it's that the function returned from the call tosequence
causes a stack overflow when it is called.To see why this is so, let's expand the definition of
sequence
in terms of the underlying monadic operations:Inlining the
Kurt.unit
andKurt.bind
values and simplifying like crazy, we getNow it's hopefully clear why calling
let (Kurt.Gen f) = sequence [for i in 1 .. 1000000 -> unit i] in f 0
overflows the stack:f
requires a non-tail-recursive call to sequence and evaluation of the resulting function, so there will be one stack frame for each recursive call.Inlining
Tomas.unit
andTomas.bind
into the definition ofsequence
instead, we get the following simplified version:Reasoning about this variant is tricky. You can empirically verify that it won't blow the stack for some arbitrarily large inputs (as Tomas shows in his answer), and you can step through the evaluation to convince yourself of this fact. However, the stack consumption depends on the
Gen
instances in the list that's passed in, and it is possible to blow the stack for inputs that aren't themselves tail recursive: