I use pySpark to write parquet file. I would like to change the hdfs block size of that file. I set the block size like this and it doesn't work:
sc._jsc.hadoopConfiguration().set("dfs.block.size", "128m")
Does this have to be set before starting the pySpark job? If so, how to do it.
I had a similiar issue, but I figured out the issue. It needs a number not "128m". Therefore this should work (worked for me at least!):
Try setting it through
sc._jsc.hadoopConfiguration()
with SparkContextin Scala: