Is there any way to get the current number of partitions of a DataFrame? I checked the DataFrame javadoc (spark 1.6) and didn't found a method for that, or am I just missed it? (In case of JavaRDD there's a getNumPartitions() method.)
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
You need to call getNumPartitions()
on the DataFrame's underlying RDD, e.g., df.rdd.getNumPartitions()
. In the case of Scala, this is a parameterless method: df.rdd.getNumPartitions
.
回答2:
convert to RDD then get the partitions length
DF.rdd.partitions.length
回答3:
val df = Seq(
("A", 1), ("B", 2), ("A", 3), ("C", 1)
).toDF("k", "v")
df.rdd.getNumPartitions
回答4:
size
is another alternative.
let me explain you this with full example..
val x = (1 to 10).toList
val numberDF = x.toDF(“number”)
numberDF.rdd.partitions.size // => 4
To prove that how many number of partitions we got with above... save that dataframe as csv
numberDF.write.csv(“/Users/Ram.Ghadiyaram/output/numbers”)
Here is how the data is separated on the different partitions.
Partition 00000: 1, 2
Partition 00001: 3, 4, 5
Partition 00002: 6, 7
Partition 00003: 8, 9, 10