How to write in global list with rdd?
Li = []
Fn(list):
If list.value == 4:
Li.append(1)
rdd.mapValues(lambda x:fn(x))
When I try to print Li the result is: []
What I'm trying to do is to transform another global liste Li1 while transforming the rdd object. However, when I do this I have always an empty list in the end. Li1 is never transformed.
The reason why you get
Li
value set to[]
after executingmapValue
s - is because Spark serializesFn
function (and all global variables that it references - it is called closure) and sends to an another machine - worker.But there is no exactly corresponding mechanism for sending results with closures back from worker to driver.
In order to receive results - you need to return from your function and use action like
take()
orcollect()
. But be careful - you don't want to send back more data than can fit into driver's memory - otherwise Spark app will throw out of memory exception.Also you have not executed action on your RDD
mapValues
transformation - so in your example no task were executed on workers.would result in
Edi
Following your problem description (based on my understanding of what you want to do):