I have tried Spark UDF with parameter using lambda function and register it. but how could I create udf with not argument and registrar it I have tried this my sample code will expected to show current time
from datetime import datetime from pyspark.sql.functions import udf
def getTime():
timevalue=datetime.now()
return timevalue
udfGateTime=udf(getTime,TimestampType())
But PySpark is showing
NameError: name 'TimestampType' is not defined
which probably means my UDF is not registered I was comfortable with this format
spark.udf.register('GATE_TIME', lambda():getTime(), TimestampType())
but does lambda function take empty argument? Though I didn't try it, I am a bit confused. How could I write the code for registering this getTime() function?
lambda
expression can be nullary. You're just using incorrect syntax:There is nothing special in
lambda
expressions in context of Spark. You can usegetTime
directly:There is no need for inefficient
udf
at all. Spark provides required function out-of-the-box:or
The error "
NameError: name 'TimestampType' is not defined
" seems to be due to the lack of:For more info regarding
TimeStampType
see this answer https://stackoverflow.com/a/30992905/5088142I have done a bit tweak here and it is working well for now
instead of currenttime any value can pass at it will give current date time here