def getSequence(row : Row) : Seq[String] = {
some code
}
Basically I want to iterate the dataFrame by row and update the value with 1 for the sequence I get from getSequence.
Input
+---+----+-----+
|sno|dept|color|
+---+----+-----+
| 1| 0 | 0 |
| 2| 0 | 0 |
| 3| 0 | 0 |
+---+----+-----+
getSequence for Row 1 give Seq("dept")
Row 2 give Seq("color") Row 3 give Seq("dept","color")
output be like
+---+----+-----+
|sno|dept|color|
+---+----+-----+
| 1| 1 | 0 |
| 2| 0 | 1 |
| 3| 1 | 1 |
+---+----+-----+
def lit(literal: Any): org.apache.spark.sql.Column
def monotonically_increasing_id(): org.apache.spark.sql.Column
Use lit
function to update column values.
Please check below code to update specific column.
scala> val df = Seq((1,0,0),(2,0,0),(3,0,0)).toDF("sno","dept","color").withColumn("id",monotonically_increasing_id)
df: org.apache.spark.sql.DataFrame = [sno: int, dept: int ... 2 more fields]
scala> df.withColumn("dept",when($"id" =!= 1,lit(1)).otherwise(lit(0))).withColumn("color",when($"id" =!= 0,lit(1)).otherwise(lit(0))).drop("id").show(false)
+---+----+-----+
|sno|dept|color|
+---+----+-----+
|1 |1 |0 |
|2 |0 |1 |
|3 |1 |1 |
+---+----+-----+