How to tell whether Haskell will cache a result or

2020-05-17 00:37发布

问题:

I noticed that sometimes Haskell pure functions are somehow cached: if I call the function twice with the same parameters, the second time the result is computed in no time.

  1. Why does this happen? Is it a GHCI feature or what?
  2. Can I rely on this (ie: can I deterministically know if a function value will be cached)?
  3. Can I force or disable this feature for some function calls?

As required by comments, here is an example I found on the web:

isPrime a = isPrimeHelper a primes
isPrimeHelper a (p:ps)
    | p*p > a = True
    | a `mod` p == 0 = False
    | otherwise = isPrimeHelper a ps
primes = 2 : filter isPrime [3,5..]

I was expecting, before running it, to be quite slow, since it keeps accessing elements of primes without explicitly caching them (thus, unless these values are cached somewhere, they would need to be recomputed plenty times). But I was wrong.

If I set +s in GHCI (to print timing/memory stats after each evaluation) and evaluate the expression primes!!10000 twice, this is what I get:

*Main> :set +s
*Main> primes!!10000
104743
(2.10 secs, 169800904 bytes)
*Main> primes!!10000
104743
(0.00 secs, 0 bytes)

This means that at least primes !! 10000 (or better: the whole primes list, since also primes!!9999 will take no time) must be cached.

回答1:

primes, in your code, is not a function, but a constant, in haskellspeak known as a CAF. If it took a parameter (say, ()), you would get two different versions of the same list back if calling it twice, but as it is a CAF, you get the exact same list back both times;

As a ghci top-level definition, primes never becomes unreachable, thus the head of the list it points to (and thus its tail/the rest of the computation) is never garbage collected. Adding a parameter prevents retaining that reference, the list would then be garbage collected as (!!) iterates down it to find the right element, and your second call to (!!) would force repetition of the whole computation instead of just traversing the already-computed list.

Note that in compiled programs, there is no top-level scope like in ghci and things get garbage collected when the last reference to them is gone, quite likely before the whole program exits, CAF or not, meaning that your first call would take long, the second one not, and after that, "the future of your program" not referencing the CAF anymore, the memory the CAF takes up is recycled.

The primes package provides a function that takes an argument for (primarily, I'd claim) this very reason, as carrying around half a terabyte of prime numbers might not be what one wants to do.

If you want to really get down to the bottom of this, I recommend reading the STG paper. It doesn't include newer developments in GHC, but does a great job of explaining how Haskell maps onto assembly, and thus how thunks get eaten by strictness, in general.