Example - Now assume we have an input RDD input which is filtered in the second step. Now I want to calculate the data size in the filtered RDD and calculate how many partitions will be required to repartition by considering block size is 128MB
This will help me out to pass the number of partitions to repartition method.
InputRDD=sc.textFile("sample.txt")
FilteredRDD=InputRDD.Filter( Some Filter Condition )
FilteredRDD.repartition(XX)
Q1.How to calculate the value of XX ?
Q2.What is the similar approach for Spark SQL/DataFrame?