Suppose I have the following DataFrame:
scala> val df1 = Seq("a", "b").toDF("id").withColumn("nums", array(lit(1)))
df1: org.apache.spark.sql.DataFrame = [id: string, nums: array<int>]
scala> df1.show()
+---+----+
| id|nums|
+---+----+
| a| [1]|
| b| [1]|
+---+----+
And I want to add elements to the array in the nums
column, so that I get something like the following:
+---+-------+
| id|nums |
+---+-------+
| a| [1,5] |
| b| [1,5] |
+---+-------+
Is there a way to do this using the .withColumn()
method of the DataFrame? E.g.
val df2 = df1.withColumn("nums", append(col("nums"), lit(5)))
I've looked through the API documentation for Spark, but can't find anything that would allow me to do this. I could probably use split
and concat_ws
to hack something together, but I would prefer a more elegant solution if one is possible. Thanks.
The
array_union()
was added since spark 2.4.0 release on 11/2/2018, 7 months after you asked the question, :) see https://spark.apache.org/news/index.htmlYou can do it using a
udf
function asand you should get
Updated Alternative way is to go with dataset way and use map as
where add is a case class
I hope the answer is helpful