How to sort dataframe in Spark without using Spark

2019-08-08 21:57发布

问题:

I'm working with Spark now but I find out that using ORDER BY in Spark SQL is very slow to sort a DataFrame. So how to sort a DataFrame without Spark SQL ?

回答1:

I'm not sure if I've fully understand what you need.

Anyway, if you want to sort a DF you could use sortBy (or sortByKey in case of (K,V))

For example, if we assume to have a DF (in this case coming from Spark SQL), we can sort it like this:

val sqlResult = sqlContext.sql("select first_column, second_column from logs").toDF()
val result = sqlResult.sortBy(x=>x._1) // first column sorting

As said before, you can sort any DF, but I just want to show another way to "access" data with Spark SQL, and then sorting them with Spark core functionalities.

Hope it could help!

FF