I want to access the first 100 rows of a spark data frame and write the result back to a CSV file.
Why is take(100)
basically instant, whereas
df.limit(100)
.repartition(1)
.write
.mode(SaveMode.Overwrite)
.option("header", true)
.option("delimiter", ";")
.csv("myPath")
takes forever. I do not want to obtain the first 100 records per partition but just any 100 records.