adding hours to timestamp in pyspark dymanically

2020-01-19 05:04发布

问题:

import pyspark.sql.functions as F
from datetime import datetime

data = [
  (1, datetime(2017, 3, 12, 3, 19, 58), 'Raising',2),
  (2, datetime(2017, 3, 12, 3, 21, 30), 'sleeping',1),
  (3, datetime(2017, 3, 12, 3, 29, 40), 'walking',3),
  (4, datetime(2017, 3, 12, 3, 31, 23), 'talking',5),
  (5, datetime(2017, 3, 12, 4, 19, 47), 'eating',6),
  (6, datetime(2017, 3, 12, 4, 33, 51), 'working',7),
]
df.show()

| id|       testing_time|test_name|shift|
|  1|2017-03-12 03:19:58|  Raising|    2|
|  2|2017-03-12 03:21:30| sleeping|    1|
|  3|2017-03-12 03:29:40|  walking|    3|
|  4|2017-03-12 03:31:23|  talking|    5|
|  5|2017-03-12 04:19:47|   eating|    6|
|  6|2017-03-12 04:33:51|  working|    7|

Now I want to add shift (hours) to the testing time. Can anybody help me out with a quick solution?

回答1:

You can use something like below. You need to convert shift field to seconds so I multiply it with 3600

>>> df.withColumn("testing_time", (F.unix_timestamp("testing_time") + F.col("shift")*3600).cast('timestamp')).show()
+---+-------------------+---------+-----+
| id|       testing_time|test_name|shift|
+---+-------------------+---------+-----+
|  1|2017-03-12 05:19:58|  Raising|    2|
|  2|2017-03-12 04:21:30| sleeping|    1|
|  3|2017-03-12 06:29:40|  walking|    3|
|  4|2017-03-12 08:31:23|  talking|    5|
|  5|2017-03-12 10:19:47|   eating|    6|
|  6|2017-03-12 11:33:51|  working|    7|
+---+-------------------+---------+-----+