I have a DataFrame field that is a Seq[Seq[String]]
I built a UDF to transform said column into a column of Seq[String]; basically, a UDF for the flatten
function from Scala.
def combineSentences(inCol: String, outCol: String): DataFrame => DataFrame = {
def flatfunc(seqOfSeq: Seq[Seq[String]]): Seq[String] = seqOfSeq match {
case null => Seq.empty[String]
case _ => seqOfSeq.flatten
}
df: DataFrame => df.withColumn(outCol, udf(flatfunc _).apply(col(inCol)))
}
My use case is strings, but obviously, this could be generic. You can use this function in a chain of DataFrame transforms like:
df.transform(combineSentences(inCol, outCol))
Is there a Spark built-in function that does the same thing? I have not been able to find one.
There is a similar function (since Spark 2.4) and it is called
flatten
:From the official documentation:
To get the exact equivalent you'll have to
coalesce
to replaceNULL
.