可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

So, I want to read and RDD into an array. For that purpose, I could use the collect method. But that method is really annoying as in my case it keeps on giving kyro buffer overflow errors. If I set the kyro buffer size too much, it starts to have its own problems. On the other hand, I have noticed that if I just save the RDD into a file using the saveAsTextFile method, I get no errors. So, I was thinking, there must be some better method of reading an RDD into an array which isn't as problematic as the collect method.

回答1:

No. collect is the only method for reading an RDD into an array.

saveAsTextFile never has to collect all the data to one machine, so it is not limited by the available memory on a single machine in the same way that collect is.

回答2:

toLocalIterator()

This method returns an iterator that contains all of the elements in this RDD.The iterator will consume as much memory as the largest partition in this RDD. Processes as RunJob to evaluate one single partition on each step.

>>> x = rdd.toLocalIterator()
>>> x
<generator object toLocalIterator at 0x283cf00>

then you can access the elements in rdd by

empty_array = []    
for each_element in x:
    empty_array.append(each_element)

https://spark.apache.org/docs/1.0.2/api/java/org/apache/spark/rdd/RDD.html#toLocalIterator()

Is there any better method than collect to read an

问题:

回答1:

回答2:

收藏的人(0)

Is there any better method than collect to read an

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮