How to append an element to an array column of a S

2020-07-03 07:58发布

问题:

Suppose I have the following DataFrame:

scala> val df1 = Seq("a", "b").toDF("id").withColumn("nums", array(lit(1)))
df1: org.apache.spark.sql.DataFrame = [id: string, nums: array<int>]

scala> df1.show()
+---+----+
| id|nums|
+---+----+
|  a| [1]|
|  b| [1]|
+---+----+

And I want to add elements to the array in the nums column, so that I get something like the following:

+---+-------+
| id|nums   |
+---+-------+
|  a| [1,5] |
|  b| [1,5] |
+---+-------+

Is there a way to do this using the .withColumn() method of the DataFrame? E.g.

val df2 = df1.withColumn("nums", append(col("nums"), lit(5))) 

I've looked through the API documentation for Spark, but can't find anything that would allow me to do this. I could probably use split and concat_ws to hack something together, but I would prefer a more elegant solution if one is possible. Thanks.

回答1:

import org.apache.spark.sql.functions.{lit, array, array_union}

val df1 = Seq("a", "b").toDF("id").withColumn("nums", array(lit(1)))
val df2 = df1.withColumn("nums", array_union($"nums", lit(Array(5))))
df2.show

+---+------+
| id|  nums|
+---+------+
|  a|[1, 5]|
|  b|[1, 5]|
+---+------+

The array_union() was added since spark 2.4.0 release on 11/2/2018, 7 months after you asked the question, :) see https://spark.apache.org/news/index.html



回答2:

You can do it using a udf function as

def addValue = udf((array: Seq[Int])=> array ++ Array(5))

df1.withColumn("nums", addValue(col("nums")))
  .show(false)

and you should get

+---+------+
|id |nums  |
+---+------+
|a  |[1, 5]|
|b  |[1, 5]|
+---+------+

Updated Alternative way is to go with dataset way and use map as

df1.map(row => add(row.getAs[String]("id"), row.getAs[Seq[Int]]("nums")++Seq(5)))
  .show(false)

where add is a case class

case class add(id: String, nums: Seq[Int])

I hope the answer is helpful