LinkedHashMap variable is not accessable out side

2020-05-03 12:13发布

问题:

Here is my code.

var link = scala.collection.mutable.LinkedHashMap[String, String]()
var fieldTypeMapRDD = fixedRDD.mapPartitionsWithIndex((idx, itr) => itr.map(s => (s(8), s(9))))

fieldTypeMapRDD.foreach { i =>
  println(i)
  link.put(i._1, i._2)

}
println(link.size)// here size is zero

I want to access link out side loop .Please help.

回答1:

Why your code is not supposed to work:

  1. Before your foreach task is started, whole your function's closure inside foreach block is serialized and sent first to master, then to each of workers. This means each of them will have its own instance of mutable.LinkedHashMap as copy of link.
  2. During foreach block each worker will put each of its items inside its own link copy
  3. After your task is done you have still empty local link and several non-empty former copies on each of worker nodes.

Moral is clear: don't use local mutable collections with RDD. It's just not going to work.

One way to get whole collection to local machine is collect method. You can use it as:

val link = fieldTypeMapRDD.collect.toMap

or in case of need to preserve the order:

import scala.collection.immutable.ListMap
val link = ListMap(fieldTypeMapRDD.collect:_*)

But if you are really into mutable collections, you can modify your code a bit. Just change

fieldTypeMapRDD.foreach {

to

fieldTypeMapRDD.toLocalIterator.foreach {

See also this question.