Currently i am using Spark along with Pandas framework. How can I convert Pandas Dataframe in a convenient way which can be written to s3.
I have tried below option but I get error as df is Pandas dataframe and it has no write option.
df.write()
.format("com.databricks.spark.csv")
.option("header", "true")
.save("123.csv");
As you are running this in Spark, one approach would be to convert the Pandas DataFrame into a Spark DataFrame and then save this to S3.
The code snippet below creates the
pdf
Pandas DataFrame and converts it into thedf
Spark DataFrame.To validate, we can also print out the schema for the Spark DataFrame with the output below.
Now that it is a Spark DataFrame, you can use the
spark-csv
package to save the file with the example below.