I want to sort the Double values in a RDD and I want my sort function to ignore the Double.NaN values.
Either the Double.NaN values should appear at the bottom or top of the sorted RDD.
I was not able to achieve this using sortBy.
scala> res13.sortBy(r => r, ascending = true)
res21: org.apache.spark.rdd.RDD[Double] = MapPartitionsRDD[10] at sortBy at <console>:26
scala> res21.collect.foreach(println)
0.656
0.99
0.998
1.0
NaN
5.6
7.0
scala> res13.sortBy(r => r, ascending = false)
res23: org.apache.spark.rdd.RDD[Double] = MapPartitionsRDD[15] at sortBy at <console>:26
scala> res23.collect.foreach(println)
7.0
5.6
NaN
1.0
0.998
0.99
0.656
My expected result is
scala> res23.collect.foreach(println)
7.0
5.6
1.0
0.998
0.99
0.656
NaN
or
scala> res21.collect.foreach(println)
NaN
0.656
0.99
0.998
1.0
5.6
7.0
To add on @user3685285 's answer :
Taking what I said in the comment, you can try this: