I tried df.orderBy("col1").show(10)
but it sorted in ascending order. df.sort("col1").show(10)
also sorts in descending order. I looked on stackoverflow and the answers I found were all outdated or referred to RDDs. I'd like to use the native dataframe in spark.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
You can also sort the column by importing the spark sql functions
import org.apache.spark.sql.functions._
df.orderBy(asc("col1"))
Or
import org.apache.spark.sql.functions._
df.sort(desc("col1"))
importing sqlContext.implicits._
import sqlContext.implicits._
df.orderBy($"col1".desc)
Or
import sqlContext.implicits._
df.sort($"col1".desc)
回答2:
It's in org.apache.spark.sql.DataFrame
for sort
method:
df.sort($"col1", $"col2".desc)
Note $
and .desc
inside sort
for the column to sort the results by.
回答3:
The easiest way is to just add the parameter ascending=False:
df.orderBy("col1", ascending=False).show(10)
Reference: http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy
回答4:
df.sort($"ColumnName".desc).show()
回答5:
import org.apache.spark.sql.functions.desc
df.orderBy(desc("columnname1"),desc("columnname2"),asc("columnname3"))
回答6:
In the case of Java:
If we use DataFrames
, while applying joins (here Inner join), we can sort (in ASC) after selecting distinct elements in each DF as:
Dataset<Row> d1 = e_data.distinct().join(s_data.distinct(), "e_id").orderBy("salary");
where e_id
is the column on which join is applied while sorted by salary in ASC.
Also, we can use Spark SQL as:
SQLContext sqlCtx = spark.sqlContext();
sqlCtx.sql("select * from global_temp.salary order by salary desc").show();
where
- spark -> SparkSession
- salary -> GlobalTemp View.