I was thinking about a nice way to convert a List of tuple with duplicate key [("a","b"),("c","d"),("a","f")]
into map ("a" -> ["b", "f"], "c" -> ["d"])
. Normally (in python), I'd create an empty map and for-loop over the list and check for duplicate key. But I am looking for something more scala-ish and clever solution here.
btw, actual type of key-value I use here is (Int, Node)
and I want to turn into a map of (Int -> NodeSeq)
Group and then project:
More scalish way to use fold, in the way like there (skip
map f
step).Starting
Scala 2.13
, most collections are provided with the groupMap method which is (as its name suggests) an equivalent (more efficient) of agroupBy
followed bymapValues
:This:
group
s elements based on the first part of tuples (group part of groupMap)map
s grouped values by taking their second tuple part (map part of groupMap)This is an equivalent of
list.groupBy(_._1).mapValues(_.map(_._2))
but performed in one pass through the List.You can try this
Here's another alternative:
For Googlers that don't expect duplicates or are fine with the default duplicate handling policy:
As of 2.12, the default policy reads:
Below you can find a few solutions. (GroupBy, FoldLeft, Aggregate, Spark)
GroupBy variation
Fold Left variation
Aggregate Variation - Similar to fold Left
Spark Variation - For big data sets (Conversion to a RDD and to a Plain Map from RDD)