I have a Spark DataFrame where I have a column with Vector values. The vector values are all n-dimensional, aka with the same length. I also have a list of column names Array("f1", "f2", "f3", ..., "fn")
, each corresponds to one element in the vector.
some_columns... | Features
... | [0,1,0,..., 0]
to
some_columns... | f1 | f2 | f3 | ... | fn
... | 0 | 1 | 0 | ... | 0
What is the best way to achieve this? I thought of one way which is to create a new DataFrame with createDataFrame(Row(Features), featureNameList)
and then join with the old one, but it requires spark context to use createDataFrame. I only want to transform the existing data frame. I also know .withColumn("fi", value)
but what do I do if n
is large?
I'm new to Scala and Spark and couldn't find any good examples for this. I think this can be a common task. My particular case is that I used the CountVectorizer
and wanted to recover each column individually for better readability instead of only having the vector result.
One way could be to convert the
vector
column to anarray<double>
and then usinggetItem
to extract individual elements.