How do I select item with most count in a datafram

2019-08-21 17:58发布

站内文章 / 后端开发

11 0

我命由我不由天

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

For example, if I have a dataframe as below, I want to do something like val top_src_ip = 58.242.83.11, but I don't want to fix this number. I want it to be a variable based on a dataframe. What is the command to do it?

+--------------+------------+
|        src_ip|src_ip_count|
+--------------+------------+
|  58.242.83.11|          52|
|58.218.198.160|          33|
|58.218.198.175|          22|
|221.194.47.221|           6|

回答1:

As in my answer here you can use argmax to get the relevant value:

import org.apache.spark.sql.functions._
val newDF = df.agg(max(struct('src_ip_count, 'src_ip)) as 'tmp).select($"tmp.src_ip")

The above creates the result in a dataframe, to use it as a variable, you should simply get the head (there would be just one element) and get the relevant column (I assume src_ip is a string):

val top_src_ip = newDF.head.getAs[String](0)

标签： scala dataframe

我命由我不由天

女 | 书童

私信

收藏的人(0)

Ta的文章更多文章

0条评论

还没有人评论过~

How do I select item with most count in a datafram

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮