Haskell: converting a list of (a, b) key-value pai

2020-06-08 11:40发布

I'm a Haskell beginner. Let's suppose I want to write a function convertKVList that takes a flat list of key-value pairs, where some of the keys might be repeated, and turns it into a mapping from keys to lists of values where all of the keys are unique. For instance, on a list of pairs of Ints, I want this behavior:

> convertKVList [(1, 2), (1, 4), (1, 3), (2, 3)]
[(1,[3,4,2]),(2,[3])]

This seems like a common enough task that there ought to be a library function available to do what I want, but I couldn't find anything when I looked. Finally, someone suggested that I compose Map.toList with Map.fromListWith (++), and I ended up with this:

import Data.Map as Map (toList, fromListWith)

convertKVList :: (Ord a) => [(a, b)] -> [(a, [b])]
convertKVList ls =
  (Map.toList . Map.fromListWith (++) . map (\(x,y) -> (x,[y]))) ls

My question is for more experienced Haskellers and is in two parts: First, is this how you would go about it, or is there a "better" (easier to read, or more efficient, or both) way?

Second, how could I have come up with this on my own? I knew that I wanted the type to be [(a, b)] -> [(a, [b])], but putting that into Hoogle didn't turn up anything useful. And I had looked at the Data.Map docs, but neither fromListWith nor toList had jumped out as particularly helpful. So: how would you go about thinking about this problem? (I realize that both of these questions are subjective, especially the second one.)

Thanks!

6条回答
我命由我不由天
2楼-- · 2020-06-08 11:51

while this is by no means canonical:

import Data.List
import Data.Ord
import Data.Function (on)

convertKVList :: Ord a => [(a,b)] -> [(a,[b])]
convertKVList = map (\x -> (fst $ head x,  map snd x)) . groupBy ((==) `on` fst) . sortBy (comparing fst)

it does have the advantage of not pulling in Data.Map. should be asymptotically the same, haven't benchmarked. I think you could clean up the first chunk with Control.Arrow (something like (fst . head &&& map snd)) but it's not obviously cleaner.

Not sure how you'd come to it except by either knowing it or asking in #haskell, though.

查看更多
我命由我不由天
3楼-- · 2020-06-08 11:51

That looks like an understandable solution and you can clean it up slightly more:

import Data.Map (toList, fromListWith)
import Control.Arrow (second)

convertKVList :: Ord a => [(a, b)] -> [(a, [b])]
convertKVList = toList . fromListWith (++) . map (second (:[]))

Regarding how you might come up with this on your own: assuming you started with Data.Map, then you want to use the map to combine values with equal keys. The documentation for Data.Map on Hackage says a is the type for values and k for keys.

Knowing this, you can search for a -> a -> a to find functions that might combine two values in a Map k a to produce a new a value. This narrows the API down to a handful of functions like insertWith, fromListWith, and fromAscListWith.

Similarly, to convert your Map k a to [(k, a)], you can search the documentation for Map k a -> [(k, a)] and find only a few functions like assocs, toList, toAscList, and toDescList. Note that in your case, [(k, a)] is instantiated to [(Int, [Int])].

One thing I've found helpful in understanding the standard Haskell libraries is to view the source on Hackage. Seeing which functions are implemented in terms of others helps make the API feel smaller, and I can see which functions are the basic building blocks.

查看更多
▲ chillily
4楼-- · 2020-06-08 11:52

Hoogle is not the only search engine able to search Haskell libraries by type signatures and it definitely and unfortunately covers only a small part of Hackage. Searching with Hayoo for a type signature [(a,b)]->[(a,[b])] brought up these two implementations:

Concerning your take on the problem, since in your function you already bring up a higher level datastructure (Map), it doesn't make sense to downgrade to a more primitive associative list in the output, because:

  1. Most of the algorithms you can possibly utilize such data in will only benefit from getting a Map input, because it's much more effective for dealing with key-value stores, and if you ever find yourself still needing a list you can always utilize the toList in place.
  2. Map implies the absence of duplicate keys on type level, which is not any less important, since in Haskell you should always do the maximum proofs using the type-system. This principle is essentially what makes the statement "If it compiles, it works" closest to truth.

In other words this is the correct definition of your function:

convertKVList :: (Ord a) => [(a, b)] -> Map a [b]
convertKVList ls =
  Map.fromListWith (++) . map (\(x,y) -> (x,[y])) $ ls

Hayooing for that type signature brings a couple of already implemented results too.

Concerning the approaching the problem, it's classic: "Divide and conquer!". Chris has some good points too in his answer.

查看更多
唯我独甜
5楼-- · 2020-06-08 11:58

I suspect that without dipping into mutation and the ST monad, you are unlikely to improve on the Map.fromListWith solution (or substantially equivalent alternatives like using HashMap.fromListWith). I'd just go with that.

Basically, with mutation you can do this grouping in near-linear time by using a mutable hash table with a as the keys and mutable lists of b as the values. Without mutation, however, it's going to be worse, because each insert into a balanced search tree is O(log n); this is because "inserting" means constructing a new copy of each tree node that leads to the one your inserted element goes in. And you need to do n inserts—which gives you exactly the O(n * log n) bounds that the Map.fromListWith function has. Sorting the association list ahead of time doesn't fundamentally improve this, because sorting is also O(n * log n).

So to improve on O(n * log n), you need data structures with mutation. I just did a quick Google and the best bet would be to implement the standard imperative algorithm using something like the hashtables library (which I've never tried, so I can't vouch for it). To use this you're going to need to understand Control.Monad.ST and Data.STRef. The ST monad is a technique that GHC provides for using mutation "internally" in a pure function—it uses some type system extensions to guarantee that the side effects cannot be observed outside of the functions in question. HaskellWiki has some examples, but it might take some studying and practice to feel comfortable with this one.

The other thing I would recommend, if you feel like you want to understand Data.Map or similar libraries better, is to look at Chris Okasaki's Purely Functional Data Structures book (or his dissertation (PDF) that the book is based on). It's based on Standard ML instead of Haskell, the data structures are not the same, and it can be a bit of a difficult read, but it's a foundational book.

查看更多
ら.Afraid
6楼-- · 2020-06-08 12:00

So, my solution overuses pattern-matching because I don't actually know what functions are in the standard library.

The idea was that if the list is sorted by the keys, then you can collect your key-values as you go. To do the logic of checking whether to add to the first key-value list or create a new entry, I used patterns and guards to define the conditionals. And liberal use of cons to prepend values to a list.

And in case the original list isn't sorted, there's a sortBy.

import Data.List
import Data.Ord

ls = [(2, 1), (1, 2), (1, 4), (1, 3), (2, 3)]

addval [] (k, v)= [(k, [v])]
addval ((k1, vals) : xs) (k2, v) | k1 == k2
  = ((k1, (v : vals)) : xs)
addval ls (k, v) = ((k, [v]) : ls)

convert ls = foldl addval [] (sortBy (comparing fst) ls)

Ugly code, but it avoids using Map.

查看更多
▲ chillily
7楼-- · 2020-06-08 12:01

One of the most important points when writing a function, is trying to split what it should do into separate sub-tasks (which are often put together by function composition in the end). E.g., in the definition you came up with, there are three tasks (in order of application, i.e., from right to left in the definition):

  1. map the 2nd component of each pair to a singleton list (thereby enabling the use of Map.fromListWith)
  2. create a map (which takes care of merging entries with equal keys)
  3. turn it into a list

I wanted to post a different solution (which was an exact copy of the code Mark posted in the meanwhile ;)). Just to make clear that most of the time there are different routes to the same goal. In his definition you have the separate tasks:

  1. sort the list by keys
  2. group the result by keys
  3. turn it into a list of the desired type

Once again, separation of concerns (modularity) is an important principle. Just try to apply it to small problems and once you gained some experience you will be able to come up with surprisingly simple solutions to seemingly difficult problems.

查看更多
登录 后发表回答