Here is my code.
var link = scala.collection.mutable.LinkedHashMap[String, String]()
var fieldTypeMapRDD = fixedRDD.mapPartitionsWithIndex((idx, itr) => itr.map(s => (s(8), s(9))))
fieldTypeMapRDD.foreach { i =>
println(i)
link.put(i._1, i._2)
}
println(link.size)// here size is zero
I want to access link out side loop .Please help.
Why your code is not supposed to work:
- Before your
foreach
task is started, whole your function's closure inside foreach
block is serialized and sent first to master, then to each of workers. This means each of them will have its own instance of mutable.LinkedHashMap
as copy of link
.
- During
foreach
block each worker will put each of its items inside its own link
copy
- After your task is done you have still empty local
link
and several non-empty former copies on each of worker nodes.
Moral is clear: don't use local mutable collections with RDD. It's just not going to work.
One way to get whole collection to local machine is collect
method.
You can use it as:
val link = fieldTypeMapRDD.collect.toMap
or in case of need to preserve the order:
import scala.collection.immutable.ListMap
val link = ListMap(fieldTypeMapRDD.collect:_*)
But if you are really into mutable
collections, you can modify your code a bit. Just change
fieldTypeMapRDD.foreach {
to
fieldTypeMapRDD.toLocalIterator.foreach {
See also this question.