Return Seq[Row] from Spark-Scala UDF

2020-04-17 08:00发布

问题:

I am using Spark with Scala to do some data processing. I have XML data mapped to dataframe. I am passing a Row as parameter to the UDF and trying to extract two complex types objects as a list. Spark is giving me following error:

Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Row is not supported

def testUdf = udf((testInput: Row) => {
  val firstObject = testInput.getAs[Row]("Object1")
  val secondObject = testInput.getAs[Row]("Object2")
  val returnObject = Seq[firstObject,secondObject]

  returnObject
})

Could you please tell me what I am doing wrong. Thanks.

回答1:

UDF cannot return Row objects. Return type has to be one of the types enumerated in the column Value type in Scala in the Data Types table.

Good news is there should be no need for UDF here. If Object1 and Object2 have the same schema (it wouldn't work otherwise anyway) you can use array function:

import org.apache.spark.sql.functions._

df.select(array(col("Object1"), col("Object2"))

or

df.select(array(col("path.to.Object1"), col("path.to.Object2"))

if Object1 and Object2 are not top level columns.



回答2:

I would like to suggest one alternative way which can be used if schema for object1 and object2 are different and you get to return the row. Basically to return row , you simply return a case class having the schema of Row objects which in this case is object1 and object2 which themselves seem to be rows

so do the following

case class Object1(<add the schema here>)

case class Object2(<add the schema here>)

case class Record(object1:Object1,object2:Object2)

Now inside the UDF , you can create object1 and object2 using firstObject and secondObject

then

val record = Record(object1,object2)

Then you can return the record

In this you can return rows even if schema not same or some processing required.

I know that this doesn't actually pertain to your question , but this question seemed a correct opportunity to tell about this concept.