I was just curious about some exact implementation details of lists in Haskell (GHC-specific answers are fine)--are they naive linked lists, or do they have any special optimizations? More specifically:
- Do
length
and(!!)
(for instance) have to iterate through the list? - If so, are their values cached in any way (i.e., if I call
length
twice, will it have to iterate both times)? - Does access to the back of the list involve iterating through the whole list?
- Are infinite lists and list comprehensions memoized? (i.e., for
fib = 1:1:zipWith (+) fib (tail fib)
, will each value be computed recursively, or will it rely on the previous computed value?)
Any other interesting implementation details would be much appreciated. Thanks in advance!
Lists have no special operational treatment in Haskell. They are defined just like:
Just with some special notation:
[a]
forList a
,[]
forNil
and(:)
forCons
. If you defined the same and redefined all the operations, you would get the exact same performance.Thus, Haskell lists are singly-linked. Because of laziness, they are often used as iterators.
sum [1..n]
runs in constant space, because the unused prefixes of this list are garbage collected as the sum progresses, and the tails aren't generated until they are needed.As for #4: all values in Haskell are memoized, with the exception that functions do not keep a memo table for their arguments. So when you define
fib
like you did, the results will be cached and the nth fibonacci number will be accessed in O(n) time. However, if you defined it in this apparently equivalent way:(Take a moment to note the similarity to your definition)
Then the results are not shared and the nth fibonacci number will be accessed in O(fib n) (which is exponential) time. You can convince functions to be shared with a memoization library like data-memocombinators.
GHC does not perform full Common Subexpression Elimination. For example:
Gives on
-ddump-simpl
:Note that
aaaaaaaaa
callsGHC.List.$wlen
twice.(In fact, because
x
needs to be retained inaaaaaaaaa
, it is more than 2x slower thanbbbbbbbbb
.)As far as I know (I don't know how much of this is GHC-specific)
length
and(!!)
DO have to iterate through the list.I don't think there are any special optimisations for lists, but there is a technique that applies to all datatypes.
If you have something like
then
length xs
will be computed twice.But if instead you have
then it will only be computed once.
Yes.
Yes, once part of a named value is computed, it is retained until the name goes out of scope. (The language doesn't require this, but this is how I understand the implementations behave.)