Specifying the filename when saving a DataFrame as

2020-01-24 12:53发布

This question already has an answer here:

Spark dataframe save in single file on hdfs location [duplicate] (1 answer)

Say I have a Spark DF that I want to save to disk a CSV file. In Spark 2.0.0+, one can convert DataFrame(DataSet[Rows]) as a DataFrameWriter and use the .csv method to write the file.

The function is defined as

def csv(path: String): Unit
    path : the location/folder name and not the file name.

Spark stores the csv file at the location specified by creating CSV files with name - part-*.csv.

Is there a way to save the CSV with specified filename instead of part-*.csv ? Or possible to specify prefix to instead of part-r ?

Code :

df.coalesce(1).write.csv("sample_path")

Current Output :

sample_path
|
+-- part-r-00000.csv

Desired Output :

sample_path
|
+-- my_file.csv

Note : The coalesce function is used to output a single file and the executor has enough memory to collect the DF without memory error.

import org.apache.hadoop.fs._; val fs = FileSystem.get(sc.hadoopConfiguration); val file = fs.globStatus(new Path("path/file.csv/part*"))(0).getPath().getName(); fs.rename(new Path("csvDirectory/" + file), new Path("mydata.csv")); fs.delete(new Path("mydata.csv-temp"), true);