get min and max from a specific column scala spark

2020-05-19 03:55发布

I would like to access to the min and max of a specific column from my dataframe but I don't have the header of the column, just its number, so I should I do using scala ?

maybe something like this :

val q = nextInt(ncol) //we pick a random value for a column number
col = df(q)
val minimum = col.min()

Sorry if this sounds like a silly question but I couldn't find any info on SO about this question :/

7条回答
不美不萌又怎样
2楼-- · 2020-05-19 04:32

You can use the column number to extract the column names first (by indexing df.columns), then aggregate use the column names:

val df = Seq((2.0, 2.1), (1.2, 1.4)).toDF("A", "B")
// df: org.apache.spark.sql.DataFrame = [A: double, B: double]

df.agg(max(df(df.columns(1))), min(df(df.columns(1)))).show
+------+------+

|max(B)|min(B)|
+------+------+
|   2.1|   1.4|
+------+------+
查看更多
登录 后发表回答