I have obtained a key/value pair, and sorted it into a new JavaPairRDD
Now, I need to select the top 5 elements from it, that is, to obtain a new JavaPairRDD with those top 5 elements in it.
How would I do that ?
Is there a simpler way than using the flatMap, since it seems like the unnecessary extra work ?
Thanks!
Assuming you don't care about order, you can use RDD.take(5)
to get the first 5 elements in an RDD.
To get the top (or bottom) items (and answer the question you asked), you could use:
.takeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T]
Syntax for using getting the smallest value of a priority queue:
assuming resultRdd = RDD[Double]
resultRdd.map (y => y.takeOrdered(x)(Ordering.by[Double]())
Syntax for using getting the largest value of a priority queue:
assuming resultRdd = RDD[Double]
resultRdd.map (y => y.top(x)(Ordering.by[Double]())
Note:
( top reverses the order and internally invokes takeOrdered )
def top(num: Int)(implicit ord: Ordering[T]): Array[T] = takeOrdered(num)(ord.reverse)