I'm working with Spark now but I find out that using ORDER BY in Spark SQL is very slow to sort a DataFrame. So how to sort a DataFrame without Spark SQL ?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
I'm not sure if I've fully understand what you need.
Anyway, if you want to sort a DF you could use sortBy (or sortByKey in case of (K,V))
For example, if we assume to have a DF (in this case coming from Spark SQL), we can sort it like this:
val sqlResult = sqlContext.sql("select first_column, second_column from logs").toDF()
val result = sqlResult.sortBy(x=>x._1) // first column sorting
As said before, you can sort any DF, but I just want to show another way to "access" data with Spark SQL, and then sorting them with Spark core functionalities.
Hope it could help!
FF