Efficiently merge two lists

2019-07-15 05:53发布

问题:

I'm currently learning Haskell, and working with List.

According to the HaskellWiki, if I wanted to merge two lists together, I would write:

list1 ++ list2 

However, according to this answer, using ++ on a large list would be inefficient.

From researching, I came across this SO page, but that question requires a specific requirement for the output of the List.

What I tried:

Suppose if I had two lists of numbers (for this example, assume that both lists are large enough that using ++ would be inefficient as described in the SO answer):

 oldNumbers = [1,5,14,22,37]
 newNumbers = [3,10,17,27,34,69]
 allNumbers = oldNumbers:newNumbers

As you can see I attempted to add oldNumbers to the head of newNumbers with the intention of reversing it afterward (disregard that allNumbers would be unorder for now, that's an exercise for another day).

As you probably guessed, it produced an error

error:
    * Non type-variable argument in the constraint: Num [a]
      (Use FlexibleContexts to permit this)
    * When checking the inferred type
        allNumbers :: forall a. (Num a, Num [a]) => [[a]]

So, as stated in the title, how would I efficiently merge two lists?

回答1:

According to the HaskellWiki, if I wanted to merge two lists together, I would write:

list1 ++ list2 

However, according to this answer, using ++ on a large list would be inefficient.

I think you need to take the context into account in this answer. If we append two lists a and b with (++) then it appends the two lists in O(m) (with m the length of the list). So this is, in terms of time complexity efficient. One can not construct such singly linked list more efficient than that.

@melpomene actually points to a popular mistake people new to Haskell make: they append a single element at the end of the list. Again if you only want to append a single element to the list, that is not a problem. This is a problem if you want to append n elements that way to a list, so if you each time append a single element to the list, and you do that n times, then the algorithm will be O(n2).

So in short, from a time complexity point of view, (++) is not slow when you append two lists together, nor is it slow when you append a singleton list to it. There are however, in terms of asymptotic behavior, usually better ways than repeatedly appending a list with one element to a list, since then the first operand will typically become larger, and each iteration it takes O(n) time only to append a single element to a list. Typically one can use recursion on the tail element of a list in that case, or for example build the list in reverse.

(++) is thus not "intrinsically" inefficient, by by using it in ways for which it was not design, you can obtain inefficient behavior.

An example where ++ will be quite slow is the following function that performs a "mapping":

badmap :: (a -> b) -> [a] -> [b]
badmap f = go []
    where go temp [] = temp
          go temp (x:xs) = go (temp ++ [f x]) xs

There are two problems here:

  1. it is not lazy, it requires to evaluate the entire list before this can emit the first element; and
  2. it will run in quadratic time, since for each element in the input list, it will take more and more time to append this element.

A more effective way to implement a map is:

goodmap :: (a -> b) -> [a] -> [b]
goodmap f = go
    where go (x:xs) = f x : go xs
          go [] = []


回答2:

There's absolutely nothing wrong with appending two lists with one-time ++, however long they are.

The answer you cite, (rightfully) talks about repeated appending of one-element lists with ++ [x] being "bad".



回答3:

If you would like to efficiently append arbitrary lists, use a different data structure! Data.Seq is optimized for efficient appends, for example.

https://www.stackage.org/haddock/lts-12.19/containers-0.5.11.0/Data-Sequence.html#v:-62--60-