pickle.PicklingError: Cannot pickle files that are

2019-05-05 19:33发布

i'm getting this error while running pyspark job on dataproc. What could be the reason ?

This is the stack trace of error.

  File "/usr/lib/python2.7/pickle.py", line 331, in save
  self.save_reduce(obj=obj, *rv)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", 
  line 553, in save_reduce
  File "/usr/lib/python2.7/pickle.py", line 286, in save
  f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
  self._batch_setitems(obj.iteritems())
  File "/usr/lib/python2.7/pickle.py", line 681, in _batch_setitems
  save(v)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
  f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", 
  line 582, in save_file
  pickle.PicklingError: Cannot pickle files that are not opened for reading

标签： pyspark pickle google-cloud-dataproc

1条回答

何必那么认真

2楼-- · 2019-05-05 19:52

I found out the issue.I was using a dictionary in the Map function. The reason it was failing: worker nodes couldn't access the dictionary which I was passing in map function.

Solution :

I broadcasted the dictionary and then used it in function (Map)
sc =  SparkContext()
lookup_bc = sc.broadcast(lookup_dict)

Then in function, I took value by using this:

data = lookup_bc.value.get(key)

Hope it helps !

0人赞添加讨论(0) 举报

pickle.PicklingError: Cannot pickle files that are

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间