What do the thunks for the following value/expression/function look like in the Haskell heap?
val = 5 -- is `val` a pointer to a box containing 5?
add x y = x + y
result = add 2 val
main = print $ result
Would be nice to have a picture of how these are represented in Haskell, given its lazy evaluation mode.
It would be absolutely correct to wrap every value in a thunk. But since Haskell is non-strict, compilers can choose when to evaluate thunks/expressions. In particular, compilers can choose to evaluate an expression earlier than strictly necessary, if it results in better code.
Optimizing Haskell compilers (GHC) perform Strictness analysis to figure out, which values will always be computed.
In the beginning, the compiler has to assume, that none of a function's arguments are ever used. Then it goes over the body of the function and tries to find functions applications that 1) are known to be strict in (at least some of) their arguments and 2) always have to be evaluated to compute the function's result.
In your example, we have the function
(+)
that is strict in both it's arguments. Thus the compiler knows that bothx
andy
are always required to be evaluated at this point. Now it just so happens, that the expressionx+y
is always necessary to compute the function's result, therefore the compiler can store the information that the functionadd
is strict in bothx
andy
.The generated code for
add
* will thus expect integer values as parameters and not thunks. The algorithm becomes much more complicated when recursion is involved (a fixed point problem), but the basic idea remains the same.Another example:
This function will take
x
in evaluated form (as a boolean) andy
as a thunk. The expressionx
needs to be evaluated in every possible execution path throughmkList
, thus we can have the caller evaluate it. The expressiony
, on the other hand, is never used in any function application that is strict in it's arguments. The cons-function:
never looks aty
it just stores it in a list. Thusy
needs to be passed as a thunk in order to satisfy the lazy Haskell semantics.*:
add
is of course polymorphic and the exact type ofx
andy
depends on the instantiation.In general, even primitive values in Haskell (e.g. of type Int and Float) are represented by thunks. This is indeed required by the non-strict semantics; consider the following fragment:
This definition will generate a div-by-zero exception only if the value of bottom is inspected, but not if the value is never used.
Consider now the add function:
A naive implementation of add must force the thunk for x, force the thunk for y, add the values and create an (evaluated) thunk for the result. This is a huge overhead for arithmetic compared to strict functional languages (not to mention imperative ones).
However, an optimizing compiler such as GHC can mostly avoid this overhead; this is a simplified view of how GHC translates the add function:
Internally, basic types like Int is seen as datatype with a single constructor. The type Int# is the "raw" machine type for integers and +# is the primitive addition on raw types. Operations on raw types are implemented directly on bit-patterns (e.g. registers) --- not thunks. Boxing and unboxing are then translated as constructor application and pattern matching.
The advantage of this approach (not visible in this simple example) is that the compiler is often capable of inlining such definitions and removing intermediate boxing/unboxing operations, leaving only the outermost ones.
Official answer
It's none of your business. Strictly implementation detail of your compiler.
Short answer
Yes.
Longer answer
To the Haskell program itself, the answer is always yes, but the compiler can and will do things differently if it finds out that it can get away with it, for performance reasons.
For example, for '''add x y = x + y''', a compiler might generate code that works with thunks for x and y and constructs a thunk as a result. But consider the following:
Here, an optimizing compiler will generate code that first takes x and y out of their boxes, then does all the arithmetic, and then stores the result in a box.
Advanced answer
This paper describes how GHC switched from one way of implementing thunks to another that was actually both simpler and faster: http://research.microsoft.com/en-us/um/people/simonpj/papers/eval-apply/
I think the others answered your question nicely, but just for completeness's sake let me add that GHC offers you the possibility of using unboxed values directly as well. This is what Haskell Wiki says about it:
As mentioned, it's non-portable, so you need a GHC language extension. See here for their docs.
Short answer: Yes.
Long answer:
This has to be stored in a thunk, because imagine if we wrote this anywhere in our code (like, in a library we imported or something):
If this has to be evaluated when our program starts, it would crash, right? If we actually use that value for something, that would be what we want, but if we don't use it, it shouldn't be able to influence our program so catastrophically.
For your second example, let me change it a little:
This value has to be stored in a thunk as well, because imagine some code like this:
If
div
was strict here, it would be evaluated even when the list isnull
(aka. empty), meaning that writing the function like this wouldn't work, because it wouldn't have a chance to return0
when given the empty list, even though this is what we would want in this case.Your final example is just a variation of example 1, and it has to be lazy for the same reasons.
All this being said, it is possible to force the compiler to make some values strict, but that goes beyond the scope of this question.