I just started Haskell 2 days ago so I'm not yet sure about how to optimise my code.
As an exercise, I have rewritten foldl
and foldr
( I will give foldl
here but foldr
is the same, replacing last
with head
and init
with tail
).
The code is:
module Main where
myFoldl :: ( a -> ( b -> a ) ) -> a -> ( [b] -> a )
myFoldl func = ( \x -> (\theList
-> if (length theList == 0) then
x
else
myFoldl func (func x (last theList) ) (init theList)
) )
My only concern is that I suspect Haskell can't apply tail call optimisation here because the recursive call is not made at the end of the function.
How can I make this tail call optimised? Is Haskell's built-in implementation of foldl
implemented differently to mine?
Your use of parentheses in your code sample and your emphasis on tail recursion suggests you're coming to Haskell from Lisp or Scheme. If you're coming to Haskell from an eager language like Scheme, be warned: tail calls are not nearly as predictive of performance in Haskell as they are in an eager language. You can have tail-recursive functions that execute in linear space because of laziness, and you can have non-tail recursive functions that execute in constant space because of laziness. (Confused already?)
First flaw in your definition is the use of the length theList == 0
. This forces evaluation of the whole spine of the list, and is O(n) time. It's better to use pattern matching, like in this naïve foldl
definition in Haskell:
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs
This, however, performs infamously badly in Haskell, because we don't actually compute the f z x
part until the caller of foldl
demands the result; so this foldl
accumulates unevaluated thunks in memory for each list element, and gains no benefit from being tail recursive. In fact, despite being tail-recursive, this naïve foldl
over a long list can lead to a stack overflow! (The Data.List
module has a foldl'
function that doesn't have this problem.)
As a converse to this, many Haskell non-tail recursive functions perform very well. For example, take this definition of find
, based on the accompanying non-tail recursive definition of foldr
:
find :: (a -> Boolean) -> [a] -> Maybe a
find pred xs = foldr find' Nothing xs
where find' elem rest = if pred elem then Just elem else rest
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z [] = z
foldr f z (x:xs) = f x (subfold xs)
where subfold = foldr f z
This actually executes in linear time and constant space, thanks to laziness. Why?
- Once you find an element that satisfies the predicate, there is no need to traverse the rest of the list to compute the value of
rest
.
- Once you look at an element and decide that it doesn't match, there's no need to keep any data about that element.
The lesson I'd impart right now is: don't bring in your performance assumptions from eager languages into Haskell. You're just two days in; concentrate first on understanding the syntax and semantics of the language, and don't contort yourself into writing optimized versions of functions just yet. You're going to get hit with the foldl
-style stack overflow from time to time at first, but you'll master it in time.
EDIT [9/5/2012]: Simpler demonstration that lazy find
runs in constant space despite not being tail recursive. First, simplified definitions:
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
find :: (a -> Bool) -> [a] -> Maybe a
find p xs = let step x rest = if p x then Just x else rest
in foldr step Nothing xs
Now, using equational reasoning (i.e., substituting equals with equals, based on the definition above), and evaluating in a lazy order (outermost first), let's calculate find (==400) [1..]
:
find (==400) [1..]
-- Definition of `find`:
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing [1..]
-- `[x, y, ...]` is the same as `x:[y, ...]`:
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing (1:[2..])
-- Using the second equation in the definition of `foldr`:
=> let step x rest = if x == 400 then Just x else rest
in step 1 (foldr step Nothing [2..])
-- Applying `step`:
=> let step x rest = if x == 400 then Just x else rest
in if 1 == 400 then Just 1 else foldr step Nothing [2..]
-- `1 == 400` is `False`
=> let step x rest = if x == 400 then Just x else rest
in if False then Just 1 else foldr step Nothing [2..]
-- `if False then a else b` is the same as `b`
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing [2..]
-- Repeat the same reasoning steps as above
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing (2:[3..])
=> let step x rest = if x == 400 then Just x else rest
in step 2 (foldr step Nothing [3..])
=> let step x rest = if x == 400 then Just x else rest
in if 2 == 400 then Just 2 else foldr step Nothing [3..]
=> let step x rest = if x == 400 then Just x else rest
in if False then Just 2 else foldr step Nothing [3..]
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing [3..]
.
.
.
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing [400..]
=> let step x rest = if x == 400 then Just x else rest
in foldr step Nothing (400:[401..])
=> let step x rest = if x == 400 then Just x else rest
in step 400 (foldr step Nothing [401..])
=> let step x rest = if x == 400 then Just x else rest
in if 400 == 400 then Just 400 else foldr step Nothing [401..]
=> let step x rest = if x == 400 then Just x else rest
in if True then Just 400 else foldr step Nothing [401..]
-- `if True then a else b` is the same as `a`
=> let step x rest = if x == 400 then Just x else rest
in Just 400
-- We can eliminate the `let ... in ...` here:
=> Just 400
Note that the expressions in the successive evaluation steps don't get progressively more complex or longer as we proceed through the list; the length or depth of the expression at step n is not proportional to n, it's basically fixed. This in fact demonstrates how find (==400) [1..]
can be lazily executed in constant space.
Idiomatic Haskell looks very different to this, eschewing if-then-else, nested lambdas, parentheses, and destructuring functions (head, tail). Instead, you'd write it something like:
foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f z0 xs0 = go z0 xs0
where
go z [] = z
go z (x:xs) = go (f z x) xs
Relying instead on pattern matching, a where clause, tail recursion, guarded declarations.