I am using Apache Spark 2.0 Dataframe/Dataset API I want to add a new column to my dataframe from List of values. My list has same number of values like given dataframe.
val list = List(4,5,10,7,2)
val df = List("a","b","c","d","e").toDF("row1")
I would like to do something like:
val appendedDF = df.withColumn("row2",somefunc(list))
df.show()
// +----+------+
// |row1 |row2 |
// +----+------+
// |a |4 |
// |b |5 |
// |c |10 |
// |d |7 |
// |e |2 |
// +----+------+
For any ideas I would be greatful, my dataframe in reality contains more columns.
Adding for completeness: the fact that the input
list
(which exists in driver memory) has the same size as theDataFrame
suggests that this is a small DataFrame to begin with - so you might considercollect()
-ing it, zipping withlist
, and converting back into aDataFrame
if needed:That won't be faster, but if the data is really small it might be negligible and the code is (arguably) clearer.
You could do it like this: