Get current number of partitions of a DataFrame

2019-01-23 11:19发布

问题:

Is there any way to get the current number of partitions of a DataFrame? I checked the DataFrame javadoc (spark 1.6) and didn't found a method for that, or am I just missed it? (In case of JavaRDD there's a getNumPartitions() method.)

回答1:

You need to call getNumPartitions() on the DataFrame's underlying RDD, e.g., df.rdd.getNumPartitions(). In the case of Scala, this is a parameterless method: df.rdd.getNumPartitions.



回答2:

convert to RDD then get the partitions length

DF.rdd.partitions.length


回答3:

 val df = Seq(
  ("A", 1), ("B", 2), ("A", 3), ("C", 1)
).toDF("k", "v")

df.rdd.getNumPartitions


回答4:

size is another alternative.

let me explain you this with full example..

val x = (1 to 10).toList
val numberDF = x.toDF(“number”)
numberDF.rdd.partitions.size // => 4

To prove that how many number of partitions we got with above... save that dataframe as csv

numberDF.write.csv(“/Users/Ram.Ghadiyaram/output/numbers”)

Here is how the data is separated on the different partitions.

Partition 00000: 1, 2
Partition 00001: 3, 4, 5
Partition 00002: 6, 7
Partition 00003: 8, 9, 10