Spark: Saving RDD in an already existing path in H

2019-03-01 08:49发布

I am able to save the RDD output to HDFS with saveAsTextFile method. This method throws an exception if the file path already exists.

I have a use case where I need to save the RDDS in an already existing file path in HDFS. Is there a way to do just append the new RDD data to the data that is already existing in the same path?

标签： hadoop apache-spark hdfs rdd

1条回答

Emotional °昔

2楼-- · 2019-03-01 09:29

One possible solution, available since Spark 1.6, is to use DataFrames with text format and append mode:

val outputPath: String = ???

rdd.map(_.toString).toDF.write.mode("append").text(outputPath)

0人赞添加讨论(0) 举报

Spark: Saving RDD in an already existing path in H

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间