Is my rewritten foldl function optimised?

I just started Haskell 2 days ago so I'm not yet sure about how to optimise my code.

As an exercise, I have rewritten foldl and foldr ( I will give foldl here but foldr is the same, replacing last with head and init with tail).

The code is:

module Main where

    myFoldl :: ( a -> ( b -> a ) ) -> a -> ( [b] -> a )

    myFoldl func = ( \x -> (\theList
        -> if (length theList == 0) then
            x
        else
            myFoldl func (func x (last theList) ) (init theList)
        ) )

My only concern is that I suspect Haskell can't apply tail call optimisation here because the recursive call is not made at the end of the function.

How can I make this tail call optimised? Is Haskell's built-in implementation of foldl implemented differently to mine?

回答1:

Your use of parentheses in your code sample and your emphasis on tail recursion suggests you're coming to Haskell from Lisp or Scheme. If you're coming to Haskell from an eager language like Scheme, be warned: tail calls are not nearly as predictive of performance in Haskell as they are in an eager language. You can have tail-recursive functions that execute in linear space because of laziness, and you can have non-tail recursive functions that execute in constant space because of laziness. (Confused already?)

First flaw in your definition is the use of the length theList == 0. This forces evaluation of the whole spine of the list, and is O(n) time. It's better to use pattern matching, like in this naïve foldl definition in Haskell:

foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs

This, however, performs infamously badly in Haskell, because we don't actually compute the f z x part until the caller of foldl demands the result; so this foldl accumulates unevaluated thunks in memory for each list element, and gains no benefit from being tail recursive. In fact, despite being tail-recursive, this naïve foldl over a long list can lead to a stack overflow! (The Data.List module has a foldl' function that doesn't have this problem.)

As a converse to this, many Haskell non-tail recursive functions perform very well. For example, take this definition of find, based on the accompanying non-tail recursive definition of foldr:

find :: (a -> Boolean) -> [a] -> Maybe a
find pred xs = foldr find' Nothing xs
    where find' elem rest = if pred elem then Just elem else rest

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z [] = z
foldr f z (x:xs) = f x (subfold xs)
    where subfold = foldr f z

This actually executes in linear time and constant space, thanks to laziness. Why?

Once you find an element that satisfies the predicate, there is no need to traverse the rest of the list to compute the value of rest.
Once you look at an element and decide that it doesn't match, there's no need to keep any data about that element.

The lesson I'd impart right now is: don't bring in your performance assumptions from eager languages into Haskell. You're just two days in; concentrate first on understanding the syntax and semantics of the language, and don't contort yourself into writing optimized versions of functions just yet. You're going to get hit with the foldl-style stack overflow from time to time at first, but you'll master it in time.

EDIT [9/5/2012]: Simpler demonstration that lazy find runs in constant space despite not being tail recursive. First, simplified definitions:

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)

find :: (a -> Bool) -> [a] -> Maybe a
find p xs = let step x rest = if p x then Just x else rest
            in foldr step Nothing xs

Now, using equational reasoning (i.e., substituting equals with equals, based on the definition above), and evaluating in a lazy order (outermost first), let's calculate find (==400) [1..]:

find (==400) [1..]
    -- Definition of `find`:
    => let step x rest = if x == 400 then Just x else rest
       in foldr step Nothing [1..]

    -- `[x, y, ...]` is the same as `x:[y, ...]`:
    => let step x rest = if x == 400 then Just x else rest
       in foldr step Nothing (1:[2..])

    -- Using the second equation in the definition of `foldr`:
    => let step x rest = if x == 400 then Just x else rest
       in step 1 (foldr step Nothing [2..])

    -- Applying `step`:
    => let step x rest = if x == 400 then Just x else rest
       in if 1 == 400 then Just 1 else foldr step Nothing [2..]

    -- `1 == 400` is `False`
    => let step x rest = if x == 400 then Just x else rest
       in if False then Just 1 else foldr step Nothing [2..]

    -- `if False then a else b` is the same as `b`
    => let step x rest = if x == 400 then Just x else rest
       in foldr step Nothing [2..]

    -- Repeat the same reasoning steps as above 
    => let step x rest = if x == 400 then Just x else rest
       in foldr step Nothing (2:[3..])
    => let step x rest = if x == 400 then Just x else rest
       in step 2 (foldr step Nothing [3..])
    => let step x rest = if x == 400 then Just x else rest
       in if 2 == 400 then Just 2 else foldr step Nothing [3..]
    => let step x rest = if x == 400 then Just x else rest
       in if False then Just 2 else foldr step Nothing [3..]
    => let step x rest = if x == 400 then Just x else rest
       in foldr step Nothing [3..]
        .
        .
        .

    => let step x rest = if x == 400 then Just x else rest
       in foldr step Nothing [400..]
    => let step x rest = if x == 400 then Just x else rest
       in foldr step Nothing (400:[401..])
    => let step x rest = if x == 400 then Just x else rest
       in step 400 (foldr step Nothing [401..])
    => let step x rest = if x == 400 then Just x else rest
       in if 400 == 400 then Just 400 else foldr step Nothing [401..]
    => let step x rest = if x == 400 then Just x else rest
       in if True then Just 400 else foldr step Nothing [401..]

    -- `if True then a else b` is the same as `a`
    => let step x rest = if x == 400 then Just x else rest
       in Just 400

    -- We can eliminate the `let ... in ...` here:
    => Just 400

Note that the expressions in the successive evaluation steps don't get progressively more complex or longer as we proceed through the list; the length or depth of the expression at step n is not proportional to n, it's basically fixed. This in fact demonstrates how find (==400) [1..] can be lazily executed in constant space.

回答2:

Idiomatic Haskell looks very different to this, eschewing if-then-else, nested lambdas, parentheses, and destructuring functions (head, tail). Instead, you'd write it something like:

foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f z0 xs0 = go z0 xs0
   where
       go z []     = z
       go z (x:xs) = go (f z x) xs

Relying instead on pattern matching, a where clause, tail recursion, guarded declarations.