I would like to access to the min and max of a specific column from my dataframe but I don't have the header of the column, just its number, so I should I do using scala ?
maybe something like this :
val q = nextInt(ncol) //we pick a random value for a column number
col = df(q)
val minimum = col.min()
Sorry if this sounds like a silly question but I couldn't find any info on SO about this question :/
How about getting the column name from the metadata:
Here is a direct way to get the min and max from a dataframe with column names:
If you want to get the min and max values as separate variables, then you can convert the result of
agg()
above into aRow
and useRow.getInt(index)
to get the column values of theRow
.In Java, we have to explicitly mention
org.apache.spark.sql.functions
that has implementation formin
andmax
:Hope this will help
Using spark functions min and max, you can find min or max values for any column in a data frame.
You can use pattern matching while assigning variable:
Where q is either a
Column
or a name of column (String). Assuming your data type isDouble
.