How to register UDF with no argument in Pyspark

2020-04-16 18:30发布

I have tried Spark UDF with parameter using lambda function and register it. but how could I create udf with not argument and registrar it I have tried this my sample code will expected to show current time

from datetime import datetime from pyspark.sql.functions import udf

def getTime():
    timevalue=datetime.now()
    return timevalue 

udfGateTime=udf(getTime,TimestampType())

But PySpark is showing

NameError: name 'TimestampType' is not defined

which probably means my UDF is not registered I was comfortable with this format

spark.udf.register('GATE_TIME', lambda():getTime(), TimestampType())

but does lambda function take empty argument? Though I didn't try it, I am a bit confused. How could I write the code for registering this getTime() function?

3条回答
祖国的老花朵
2楼-- · 2020-04-16 19:17
  • lambda expression can be nullary. You're just using incorrect syntax:

    spark.udf.register('GATE_TIME', lambda: getTime(), TimestampType())
    
  • There is nothing special in lambda expressions in context of Spark. You can use getTime directly:

    spark.udf.register('GetTime', getTime, TimestampType())
    
  • There is no need for inefficient udf at all. Spark provides required function out-of-the-box:

    spark.sql("SELECT current_timestamp()")
    

    or

    from pyspark.sql.functions import current_timestamp
    
    spark.range(0, 2).select(current_timestamp())
    
查看更多
淡お忘
3楼-- · 2020-04-16 19:30

The error "NameError: name 'TimestampType' is not defined" seems to be due to the lack of:

import pyspark.sql.types.TimestampType

For more info regarding TimeStampType see this answer https://stackoverflow.com/a/30992905/5088142

查看更多
冷血范
4楼-- · 2020-04-16 19:33

I have done a bit tweak here and it is working well for now

import datetime
from pyspark.sql.types import*

def getTime():
    timevalue=datetime.datetime.now()
    return timevalue
def GetVal(x):
    if(True):
     timevalue=getTime()
     return timevalue
spark.udf.register('GetTime', lambda(x):GetVal(x),TimestampType()) 
spark.sql("select GetTime('currenttime')as value ").show()

instead of currenttime any value can pass at it will give current date time here

查看更多
登录 后发表回答