Generating a list causes a stack overflow

2019-08-11 19:39发布

问题:

I'm generating a list of random numbers:

let gen n =
    let rec pom l n =
        match n with
        | 0 -> l
        | _ ->
            let el = Random.int 20000000
            in pom (el::l) (n-1)
    in
        pom [] n

let lo = gen 1000000

What I get is

Fatal error: exception Stack_overflow

Why? I'm using tail recursion (and an accumulator)

EDIT:

You're right, the stack overflows on both sorts.

But if my code had a zillion lines, it would be a pain to debug it this way. I'd like to use ocamldebug here, just as a learning experience. I ran ocamldebug this way:

(ocd) r
Loading program... done.
Time: 88089944
Program end.
Uncaught exception: Stack_overflow
(ocd) b
Time: 88089943 - pc: 52 - module Pervasives
214   | hd :: tl -> <|b|>hd :: (tl @ l2)
(ocd) bt
#0  Pc: 52  Pervasives char 7700
#1  Pc: 64  Pervasives char 7715
#2  Pc: 64  Pervasives char 7715
#3  Pc: 64  Pervasives char 7715
#4  Pc: 64  Pervasives char 7715
#5  Pc: 64  Pervasives char 7715
#6  Pc: 64  Pervasives char 7715
// and so it goes on forever

This tells me nothing why my program has crashed. How could I debug it with ocamldebug?

(meta: should I post a separate thread for it or should I stay here)

回答1:

Since the error was in a different function, here is how you can debug this kind of thing quicker in the future.

  1. You can turn on backtraces. Either call Printexc.record_backtrace true at the beginning of your program, or run it with the environment variable OCAMLRUNPARAM set to b, as in OCAMLRUNPARAM=b ./a.out. This should tell you where the error occurred, although sometimes it skips parts of what you would expect to be the call stack – I believe this is due to optimizations such as inlining. It is generally useful, though. For this, the program must have been compiled with the flag -g.

  2. You can still find the source of the exception, even in a bunch of function calls as in your program, by doing a binary search. First wrap half the function calls in a handler. If the exception occurs there, unwrap those, then wrap half of them in a handler again, and so on, until you drill down to the source. It is still somewhat labor intensive, but you can debug this way in O(log n) time, and the log of a zillion is not that much :)



回答2:

Printing a backtrace of Stack_overflow exception usually somewhat useless, because the number of calls that leads to the overflow exceeds the size of backtrace buffer. For example if you take the following program (backtrace.ml):

let init n =
  let rec loop xs x =
    if x >= 0 then loop (x::xs) (x-1) else xs in
  loop [] (n-1)

let sum = function
  | [] -> 0
  | x :: xs -> List.fold_right (+) xs x

let () =
  let xs = init 10000000 in
  let y = sum xs in
  print_int y

and execute it with

 OCAMLRUNPARAM=b ocamlbuild backtrace.d.byte -- 

you will get a useless backtrace of a form:

Fatal error: exception Stack_overflow
Raised by primitive operation at file "list.ml", line 89, characters 16-37
Called from file "list.ml", line 89, characters 16-37
Called from file "list.ml", line 89, characters 16-37
...

We can't increase the internal backtrace buffer, but we can decrease the stack size, thus limiting the size of backtrace, so it can fit into the buffer. So, if we run the program, within a limited stack, we will get a much better backtrace:

OCAMLRUNPARAM="b,l=100" ocamlbuild backtrace.d.byte --
Fatal error: exception Stack_overflow
Raised by primitive operation at file "list.ml", line 89, characters 16-37
Called from file "list.ml", line 89, characters 16-37
Called from file "list.ml", line 89, characters 16-37
...
Called from file "backtrace.ml", line 18, characters 10-16

Bingo. The source of a problem is pinpointed up to a callsite.

Note: the l option of OCAMLRUNPARAM is understood only by the bytecode runtime. In order to repeat the same trick for a native code, one should limit the stack using mechanisms provided by an operating system. For unices it is usually ulimit -s shell primitive.



回答3:

I ran your code on my Mac Pro and don't get a stack overflow. As you say your code looks perfectly fine.

Possible explanations:

  1. You're running in an environment with limited memory.

  2. You had some old definitions of parts of your code. Maybe try re-running in a new OCaml toplevel.

Update

I think @antron has an excellent point. If you're running compiled code you most likely don't know exactly where the problem is. Add some tracing and you will very likely find the stack overflow is elsewhere.