Row aggregations in Scala

2020-05-06 14:13发布

站内文章 / Spark

30 0

祖国的老花朵

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am looking for a way to get a new column in a data frame in Scala that calculates the min/max of the values in col1, col2, ..., col10 for each row.

I know I can do it with a UDF but maybe there is an easier way.

Thanks!

回答1:

Porting this Python answer by user6910411

import org.apache.spark.sql.functions._

val df = Seq(
  (1, 3, 0, 9, "a", "b", "c")
).toDF("col1", "col2", "col3", "col4", "col5", "col6", "Col7")

val cols =  Seq("col1", "col2", "col3", "col4")

val rowMax = greatest(
  cols map col: _*
).alias("max")

val rowMin = least(
  cols map col: _*
).alias("min")

df.select($"*", rowMin, rowMax).show

// +----+----+----+----+----+----+----+---+---+
// |col1|col2|col3|col4|col5|col6|Col7|min|max|
// +----+----+----+----+----+----+----+---+---+
// |   1|   3|   0|   9|   a|   b|   c|0.0|9.0|
// +----+----+----+----+----+----+----+---+---+