I'm a Haskell beginner. Let's suppose I want to write a function convertKVList
that takes a flat list of key-value pairs, where some of the keys might be repeated, and turns it into a mapping from keys to lists of values where all of the keys are unique. For instance, on a list of pairs of Int
s, I want this behavior:
> convertKVList [(1, 2), (1, 4), (1, 3), (2, 3)]
[(1,[3,4,2]),(2,[3])]
This seems like a common enough task that there ought to be a library function available to do what I want, but I couldn't find anything when I looked. Finally, someone suggested that I compose Map.toList
with Map.fromListWith (++)
, and I ended up with this:
import Data.Map as Map (toList, fromListWith)
convertKVList :: (Ord a) => [(a, b)] -> [(a, [b])]
convertKVList ls =
(Map.toList . Map.fromListWith (++) . map (\(x,y) -> (x,[y]))) ls
My question is for more experienced Haskellers and is in two parts: First, is this how you would go about it, or is there a "better" (easier to read, or more efficient, or both) way?
Second, how could I have come up with this on my own? I knew that I wanted the type to be [(a, b)] -> [(a, [b])]
, but putting that into Hoogle didn't turn up anything useful. And I had looked at the Data.Map
docs, but neither fromListWith
nor toList
had jumped out as particularly helpful. So: how would you go about thinking about this problem? (I realize that both of these questions are subjective, especially the second one.)
Thanks!
while this is by no means canonical:
it does have the advantage of not pulling in Data.Map. should be asymptotically the same, haven't benchmarked. I think you could clean up the first chunk with Control.Arrow (something like (fst . head &&& map snd)) but it's not obviously cleaner.
Not sure how you'd come to it except by either knowing it or asking in #haskell, though.
That looks like an understandable solution and you can clean it up slightly more:
Regarding how you might come up with this on your own: assuming you started with
Data.Map
, then you want to use the map to combine values with equal keys. The documentation forData.Map
on Hackage saysa
is the type for values andk
for keys.Knowing this, you can search for
a -> a -> a
to find functions that might combine two values in aMap k a
to produce a newa
value. This narrows the API down to a handful of functions likeinsertWith
,fromListWith
, andfromAscListWith
.Similarly, to convert your
Map k a
to[(k, a)]
, you can search the documentation forMap k a -> [(k, a)]
and find only a few functions likeassocs
,toList
,toAscList
, andtoDescList
. Note that in your case,[(k, a)]
is instantiated to[(Int, [Int])]
.One thing I've found helpful in understanding the standard Haskell libraries is to view the source on Hackage. Seeing which functions are implemented in terms of others helps make the API feel smaller, and I can see which functions are the basic building blocks.
Hoogle is not the only search engine able to search Haskell libraries by type signatures and it definitely and unfortunately covers only a small part of Hackage. Searching with Hayoo for a type signature
[(a,b)]->[(a,[b])]
brought up these two implementations:aggregateAL
alistCollect
Concerning your take on the problem, since in your function you already bring up a higher level datastructure (
Map
), it doesn't make sense to downgrade to a more primitive associative list in the output, because:Map
input, because it's much more effective for dealing with key-value stores, and if you ever find yourself still needing a list you can always utilize thetoList
in place.Map
implies the absence of duplicate keys on type level, which is not any less important, since in Haskell you should always do the maximum proofs using the type-system. This principle is essentially what makes the statement "If it compiles, it works" closest to truth.In other words this is the correct definition of your function:
Hayooing for that type signature brings a couple of already implemented results too.
Concerning the approaching the problem, it's classic: "Divide and conquer!". Chris has some good points too in his answer.
I suspect that without dipping into mutation and the
ST
monad, you are unlikely to improve on theMap.fromListWith
solution (or substantially equivalent alternatives like usingHashMap.fromListWith
). I'd just go with that.Basically, with mutation you can do this grouping in near-linear time by using a mutable hash table with
a
as the keys and mutable lists ofb
as the values. Without mutation, however, it's going to be worse, because each insert into a balanced search tree is O(log n); this is because "inserting" means constructing a new copy of each tree node that leads to the one your inserted element goes in. And you need to do n inserts—which gives you exactly the O(n * log n) bounds that theMap.fromListWith
function has. Sorting the association list ahead of time doesn't fundamentally improve this, because sorting is also O(n * log n).So to improve on O(n * log n), you need data structures with mutation. I just did a quick Google and the best bet would be to implement the standard imperative algorithm using something like the
hashtables
library (which I've never tried, so I can't vouch for it). To use this you're going to need to understandControl.Monad.ST
andData.STRef
. TheST
monad is a technique that GHC provides for using mutation "internally" in a pure function—it uses some type system extensions to guarantee that the side effects cannot be observed outside of the functions in question. HaskellWiki has some examples, but it might take some studying and practice to feel comfortable with this one.The other thing I would recommend, if you feel like you want to understand
Data.Map
or similar libraries better, is to look at Chris Okasaki's Purely Functional Data Structures book (or his dissertation (PDF) that the book is based on). It's based on Standard ML instead of Haskell, the data structures are not the same, and it can be a bit of a difficult read, but it's a foundational book.So, my solution overuses pattern-matching because I don't actually know what functions are in the standard library.
The idea was that if the list is sorted by the keys, then you can collect your key-values as you go. To do the logic of checking whether to add to the first key-value list or create a new entry, I used patterns and guards to define the conditionals. And liberal use of cons to prepend values to a list.
And in case the original list isn't sorted, there's a
sortBy
.Ugly code, but it avoids using Map.
One of the most important points when writing a function, is trying to split what it should do into separate sub-tasks (which are often put together by function composition in the end). E.g., in the definition you came up with, there are three tasks (in order of application, i.e., from right to left in the definition):
Map.fromListWith
)I wanted to post a different solution (which was an exact copy of the code Mark posted in the meanwhile ;)). Just to make clear that most of the time there are different routes to the same goal. In his definition you have the separate tasks:
Once again, separation of concerns (modularity) is an important principle. Just try to apply it to small problems and once you gained some experience you will be able to come up with surprisingly simple solutions to seemingly difficult problems.