Define A UDF with Generic Type and Extra Parameter

2019-08-21 19:08发布

问题:

I want to define a UDF in scala spark like the pseudo code below:

def transformUDF(size:Int):UserDefinedFunction = udf((input:Seq[T]){

  if (input != null)
    Vectors.dense(input.map(_.toDouble).toArray)
  else
    Vectors.dense(Array.fill[Double](size)(0.0))

})

if input is not null, cast every element to Double Type.
if input is null, return a all-zero vector.

And I want T to be limited to numeric type, like java.lang.Number in Java. But it seems that Seq[java.lang.Number] cannot work with the toDouble.

Is there any appropriate way?

回答1:

As mentioned in my working comment as

def transformUDF: UserDefinedFunction = udf((size: Int, input:Seq[java.lang.Number]) => {
  if (input != null)
    Vectors.dense(input.map(_.doubleValue()).toArray)
  else
    Vectors.dense(Array.fill[Double](size)(0.0))
})

You don't need to create a new column, you can just pass it to the udf function as

dataframe.withColumn("newCol", transformUDF(lit(the size you want), dataframe("the column you want to transform")))