How to overwrite RDD output objects any existing path when we are saving time.
test1:
975078|56691|2.000|20171001_926_570_1322
975078|42993|1.690|20171001_926_570_1322
975078|46462|2.000|20171001_926_570_1322
975078|87815|1.000|20171001_926_570_1322
rdd=sc.textFile('/home/administrator/work/test1').map( lambda x: x.split("|")[:4]).map( lambda r: Row( user_code = r[0],item_code = r[1],qty = float(r[2])))
rdd.coalesce(1).saveAsPickleFile("/home/administrator/work/foobar_seq1")
The first time it is saving properly. now again I removed one line from the input file and saving RDD same location, it show file has existed.
rdd.coalesce(1).saveAsPickleFile("/home/administrator/work/foobar_seq1")
For example, in dataframe we can overwrite existing path.
df.coalesce(1).write().overwrite().save(path)
If I am doing same on RDD object getting an error.
rdd.coalesce(1).write().overwrite().saveAsPickleFile(path)
please help me on this