I tested writing with:
df.write.partitionBy("id", "name")
.mode(SaveMode.Append)
.parquet(filePath)
However if I leave out the partitioning:
df.write
.mode(SaveMode.Append)
.parquet(filePath)
It executes 100x(!) faster.
Is it normal for the same amount of data to take 100x longer to write when partitioning?
There are 10 and 3000 unique id
and name
column values respectively.
The DataFrame
has 10 additional integer columns.