Pyspark Creating timestamp column

2020-07-08 06:39发布

I am using spark 2.1.0. I am not able to create timestamp column in pyspark I am using below code snippet. Please help

df=df.withColumn('Age',lit(datetime.now()))

I am getting

assertion error:col should be Column

Please help

3条回答
地球回转人心会变
2楼-- · 2020-07-08 07:02

I am not sure for 2.1.0, on 2.2.1 at least you can just:

from pyspark.sql import functions as F
df.withColumn('Age', F.current_timestamp())

Hope it helps!

查看更多
冷血范
3楼-- · 2020-07-08 07:13

Assuming you have dataframe from your code snippet and you want same timestamp for all your rows.

Let me create some dummy dataframe.

>>> dict = [{'name': 'Alice', 'age': 1},{'name': 'Again', 'age': 2}]
>>> df = spark.createDataFrame(dict)

>>> import time
>>> import datetime
>>> timestamp = datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S')
>>> type(timestamp)
<class 'str'>

>>> from pyspark.sql.functions import lit,unix_timestamp
>>> timestamp
'2017-08-02 16:16:14'
>>> new_df = df.withColumn('time',unix_timestamp(lit(timestamp),'yyyy-MM-dd HH:mm:ss').cast("timestamp"))
>>> new_df.show(truncate = False)
+---+-----+---------------------+
|age|name |time                 |
+---+-----+---------------------+
|1  |Alice|2017-08-02 16:16:14.0|
|2  |Again|2017-08-02 16:16:14.0|
+---+-----+---------------------+

>>> new_df.printSchema()
root
 |-- age: long (nullable = true)
 |-- name: string (nullable = true)
 |-- time: timestamp (nullable = true)
查看更多
乱世女痞
4楼-- · 2020-07-08 07:18

Adding on to balalaika, if someone, like me just want to add the date, but not the time with it, then he can follow the below code

from pyspark.sql import functions as F
df.withColumn('Age', F.current_date())

Hope this helps

查看更多
登录 后发表回答