Scala FlatMap provides wrong results

2019-07-16 14:54发布

问题:

given a list of documents, I want to obtain the pairs that shares at least one token. To do this I wrote the code below, that do that through an inverted index.

object TestFlatMap {
 case class Document(id : Int, tokens : List[String])

 def main(args: Array[String]): Unit = {

   val documents = List(
     Document(1, List("A", "B", "C", "D")),
     Document(2, List("A", "B", "E", "F", "G")),
     Document(3, List("E", "G", "H")),
     Document(4, List("A", "L", "M", "N"))
   )

   val expectedTokensIds = List(("A",1), ("A",2), ("A",4), ("B",1), ("B",2), ("C",1), ("D",1), ("E",2), ("E",3), ("F",2), ("G",2), ("G",3), ("H",3), ("L",4), ("M",4), ("N",4)) //Expected tokens - id tuples
   val expectedCouples = Set((1, 2), (1, 4), (2, 3), (2, 4)) //Expected resulting pairs


   /**
     * For each token returns the id of the documents that contains it
     * */
   val tokensIds = documents.flatMap{ document =>
     document.tokens.map{ token =>
       (token, document.id)
     }
   }

   //Check if the tuples are right
   assert(tokensIds.length == expectedTokensIds.length && tokensIds.intersect(expectedTokensIds).length == expectedTokensIds.length, "Error: tokens-ids not matches")

   //Group the documents by the token
   val docIdsByToken = tokensIds.groupBy(_._1).filter(_._2.size > 1)

   /**
     * For each group of documents generate the pairs
     * */
   val couples = docIdsByToken.map{ case (token, docs) =>
     docs.combinations(2).map{ c =>
       val d1 = c.head._2
       val d2 = c.last._2

       if(d1 < d2){
         (d1, d2)
       }
       else{
         (d2, d1)
       }
     }
   }.flatten.toSet


   /**
     * Same operation, but with flatMap
     * For each group of documents generate the pairs
     * */
   val couples1 = docIdsByToken.flatMap{ case (token, docs) =>
     docs.combinations(2).map{ c =>
       val d1 = c.head._2
       val d2 = c.last._2

       if(d1 < d2){
         (d1, d2)
       }
       else{
         (d2, d1)
       }
     }
   }.toSet

   //The results obtained with flatten pass the test
   assert(couples.size == expectedCouples.size && couples.intersect(expectedCouples).size == expectedCouples.size, "Error: couples not matches")
   //The results obtained with flatMap do not pass the test: they are wrong
   assert(couples1.size == expectedCouples.size && couples1.intersect(expectedCouples).size == expectedCouples.size, "Error: couples1 not matches")
}

The problem is that the flatMap that should generates the final results does not works properly, it only returns two couples: (2,3) and (1,2). I do not understand why it does not works, moreover IntelliJ suggests me to use flatMap instead of use map an then flatten.

Someone is able to explain me where the problem is? Because I cannot figure out, I also had this problem in past.

Thanks

Luca

回答1:

That's an excellent example that demonstrates that all the nice monad laws do not necessarily hold if you switch between different types of collections during the map/flatMap/flatten.


You must convert the Map to List, so that keys are not overridden repeatedly while you are constructing another Map as an intermediate result, because a Map will override keys, instead of collecting all pairs:

val couples1 = docIdsByToken.toList.flatMap{ case (token, docs) =>
  docs.combinations(2).map{ c =>
    val d1 = c.head._2
    val d2 = c.last._2

    if(d1 < d2){
      (d1, d2)
    }
    else{
      (d2, d1)
    }
  }
}.toSet

Here is a much shorter version that demonstrates the same problem:

val m = Map("A" -> (2, 1), "B" -> (2, 3))
val s = m.flatMap{ case (k, v) => List(v) }.toSet
println(s)

Instead of Set((2, 1), (2, 3)), it will produce Set((2, 3)), because after the flatMap and before the toSet the intermediate result is a new Map, and this map can hold only one value for key = 2.

The difference to the first version is that after map, you obtain something like an Iterable[List[(Int, Int)]], which is not a Map, and therefore cannot lose/override any keys.