How to use constant value in UDF of Spark SQL(Data

2019-02-04 12:59发布

站内文章 / Spark

19 0

疯言疯语

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a dataframe which includes timestamp. To aggregate by time(minute, hour, or day), I have tried as:

val toSegment = udf((timestamp: String) => {
  val asLong = timestamp.toLong
  asLong - asLong % 3600000 // period = 1 hour
})

val df: DataFrame // the dataframe
df.groupBy(toSegment($"timestamp")).count()

This works fine.

My question is how to generalize the UDF toSegment as

val toSegmentGeneralized = udf((timestamp: String, period: Int) => {
  val asLong = timestamp.toLong
  asLong - asLong % period
})

I have tried as follows but it doesn't work

df.groupBy(toSegment($"timestamp", $"3600000")).count()

It seems to find the column named 3600000.

Possible solution is to use constant column but I couldn't find it.

回答1:

You can use org.apache.spark.sql.functions.lit() to create the constant column:

import org.apache.spark.sql.functions._

df.groupBy(toSegment($"timestamp", lit(3600000))).count()

标签： scala apache-spark apache-spark-sql

疯言疯语

女 | 书童

私信

收藏的人(0)

Ta的文章更多文章

0条评论

还没有人评论过~

How to use constant value in UDF of Spark SQL(Data

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮