This is my partitionBy condition which i need to change based on the column value from the data frame .
val windowSpec = Window.partitionBy("col1", "clo2","clo3").orderBy($"Col5".desc)
Now if the value of the one of the column (col6) in data frame is I then above condition .
But when the value of the column(col6) changes O then below condition
val windowSpec = Window.partitionBy("col1","clo3").orderBy($"Col5".desc)
How can i implement it in the spark data frame .
So it is like for each record it will check whether col6 is I or O based on that partitionBy condition will be applied
Given the requirement to select the final window specification based on the values of col6
column, I'd do filter
first followed by the final window aggregation.
scala> dataset.show
+----+----+----+----+----+
|col1|col2|col3|col5|col6|
+----+----+----+----+----+
| 0| 0| 0| 0| I| // <-- triggers 3 columns to use
| 0| 0| 0| 0| O| // <-- the aggregation should use just 2 columns
+----+----+----+----+----+
With the above dataset, I'd filter
out to see if there's at least one I
in col6
and apply the window specification.
val windowSpecForIs = Window.partitionBy("col1", "clo2","clo3").orderBy($"Col5".desc)
val windowSpecForOs = Window.partitionBy("col1","clo3").orderBy($"Col5".desc)
val noIs = dataset.filter($"col6" === "I").take(1).isEmpty
val windowSpec = if (noIs) windowSpecForOs else windowSpecForIs