I'm a newbie in spark/scala. This is what I'm doing to calculate the first quartile of a csv file
val column= sc.textFile("test.txt").map(_.split(",")(2)).flatMap(_.split(",")).map((_.toDouble))
val total = column.count.toDouble
val upper=(total+1)/4
val upper2= scala.math.ceil(upper).toInt
I'm not really sure how to sort the column other than adding a Key Value Pair. all I need is to take the last 2 values for the quartiles, after they are sorted. But i'm forced to create a key value pair.
val quartiles = column.map((_,1)).sortByKey(true).take(upper2)
val first_quartile =0
if(upper % upper.toInt >0){
first_quartile = quartiles(upper.toInt-1)
}else{
first_quartile = (quartiles(upper2-1) +(quartiles(upper2-2))/2
}
This works, but it will leave me with an annoying key value pair. how do i revert back to just 1 column, instead of 2 (e.g. the key value pair)