How to update column value for a particular row ra

2020-05-09 21:54发布

问题:

def getSequence(row : Row) : Seq[String] = {
some code
}

Basically I want to iterate the dataFrame by row and update the value with 1 for the sequence I get from getSequence.

Input

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  0 |  0  |
|  2|  0 |  0  |
|  3|  0 |  0  |
+---+----+-----+

getSequence for Row 1 give Seq("dept")
Row 2 give Seq("color") Row 3 give Seq("dept","color")
output be like 
+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  1 |  0  |
|  2|  0 |  1  |
|  3|  1 |  1  |
+---+----+-----+

回答1:

def lit(literal: Any): org.apache.spark.sql.Column

def monotonically_increasing_id(): org.apache.spark.sql.Column

Use lit function to update column values.

Please check below code to update specific column.

scala> val df = Seq((1,0,0),(2,0,0),(3,0,0)).toDF("sno","dept","color").withColumn("id",monotonically_increasing_id)
df: org.apache.spark.sql.DataFrame = [sno: int, dept: int ... 2 more fields]

scala> df.withColumn("dept",when($"id" =!= 1,lit(1)).otherwise(lit(0))).withColumn("color",when($"id" =!= 0,lit(1)).otherwise(lit(0))).drop("id").show(false)
+---+----+-----+
|sno|dept|color|
+---+----+-----+
|1  |1   |0    |
|2  |0   |1    |
|3  |1   |1    |
+---+----+-----+