Pandas: How to create a datetime object from Week

I have a dataframe that provides two integer columns with the Year and Week of the year:

import pandas as pd
import numpy as np
L1 = [43,44,51,2,5,12]
L2 = [2016,2016,2016,2017,2017,2017]
df = pd.DataFrame({"Week":L1,"Year":L2})

df
Out[72]: 
   Week  Year
0    43  2016
1    44  2016
2    51  2016
3     2  2017
4     5  2017
5    12  2017

I need to create a datetime-object from these two numbers.

I tried this, but it throws an error:

df["DT"] = df.apply(lambda x: np.datetime64(x.Year,'Y') + np.timedelta64(x.Week,'W'),axis=1)

Then I tried this, it works but gives the wrong result, that is it ignores the week completely:

df["S"] = df.Week.astype(str)+'-'+df.Year.astype(str)
df["DT"] = df["S"].apply(lambda x: pd.to_datetime(x,format='%W-%Y'))

df
Out[74]: 
   Week  Year        S         DT
0    43  2016  43-2016 2016-01-01
1    44  2016  44-2016 2016-01-01
2    51  2016  51-2016 2016-01-01
3     2  2017   2-2017 2017-01-01
4     5  2017   5-2017 2017-01-01
5    12  2017  12-2017 2017-01-01

I'm really getting lost between Python's datetime, Numpy's datetime64, and pandas Timestamp, can you tell me how it's done correctly?

I'm using Python 3, if that is relevant in any way.

标签： python pandas datetime numpy

2条回答

爷、活的狠高调

2楼-- · 2020-02-25 08:46

You need %w for specify which day is first in week:

df["DT"] = pd.to_datetime(df.Week.astype(str)+
                          df.Year.astype(str).add('-0') ,format='%W%Y-%w')
print (df)

  Week  Year         DT
0    43  2016 2016-10-30
1    44  2016 2016-11-06
2    51  2016 2016-12-25
3     2  2017 2017-01-15
4     5  2017 2017-02-05
5    12  2017 2017-03-26

df["DT"] = pd.to_datetime(df.Week.astype(str)+
                          df.Year.astype(str).add('-1') ,format='%W%Y-%w')
print (df)
   Week  Year         DT
0    43  2016 2016-10-24
1    44  2016 2016-10-31
2    51  2016 2016-12-19
3     2  2017 2017-01-09
4     5  2017 2017-01-30
5    12  2017 2017-03-20

df["DT"] = pd.to_datetime(df.Week.astype(str)+
                          df.Year.astype(str).add('-2') ,format='%W%Y-%w')
print (df)

   Week  Year         DT
0    43  2016 2016-10-25
1    44  2016 2016-11-01
2    51  2016 2016-12-20
3     2  2017 2017-01-10
4     5  2017 2017-01-31
5    12  2017 2017-03-21

0人赞添加讨论(0) 举报

够拽才男人

3楼-- · 2020-02-25 08:54

Try this:

In [19]: pd.to_datetime(df.Year.astype(str), format='%Y') + \
             pd.to_timedelta(df.Week.mul(7).astype(str) + ' days')
Out[19]:
0   2016-10-28
1   2016-11-04
2   2016-12-23
3   2017-01-15
4   2017-02-05
5   2017-03-26
dtype: datetime64[ns]

Initially I have timestamps in s

It's much easier to parse it from UNIX epoch timestamp:

df['Date'] = pd.to_datetime(df['UNIX_Time'], unit='s')

Timing for 10M rows DF:

Setup:

In [26]: df = pd.DataFrame(pd.date_range('1970-01-01', freq='1T', periods=10**7), columns=['date'])

In [27]: df.shape
Out[27]: (10000000, 1)

In [28]: df['unix_ts'] = df['date'].astype(np.int64)//10**9

In [30]: df
Out[30]:
                       date    unix_ts
0       1970-01-01 00:00:00          0
1       1970-01-01 00:01:00         60
2       1970-01-01 00:02:00        120
3       1970-01-01 00:03:00        180
4       1970-01-01 00:04:00        240
5       1970-01-01 00:05:00        300
6       1970-01-01 00:06:00        360
7       1970-01-01 00:07:00        420
8       1970-01-01 00:08:00        480
9       1970-01-01 00:09:00        540
...                     ...        ...
9999990 1989-01-05 10:30:00  599999400
9999991 1989-01-05 10:31:00  599999460
9999992 1989-01-05 10:32:00  599999520
9999993 1989-01-05 10:33:00  599999580
9999994 1989-01-05 10:34:00  599999640
9999995 1989-01-05 10:35:00  599999700
9999996 1989-01-05 10:36:00  599999760
9999997 1989-01-05 10:37:00  599999820
9999998 1989-01-05 10:38:00  599999880
9999999 1989-01-05 10:39:00  599999940

[10000000 rows x 2 columns]

Check:

In [31]: pd.to_datetime(df.unix_ts, unit='s')
Out[31]:
0         1970-01-01 00:00:00
1         1970-01-01 00:01:00
2         1970-01-01 00:02:00
3         1970-01-01 00:03:00
4         1970-01-01 00:04:00
5         1970-01-01 00:05:00
6         1970-01-01 00:06:00
7         1970-01-01 00:07:00
8         1970-01-01 00:08:00
9         1970-01-01 00:09:00
                  ...
9999990   1989-01-05 10:30:00
9999991   1989-01-05 10:31:00
9999992   1989-01-05 10:32:00
9999993   1989-01-05 10:33:00
9999994   1989-01-05 10:34:00
9999995   1989-01-05 10:35:00
9999996   1989-01-05 10:36:00
9999997   1989-01-05 10:37:00
9999998   1989-01-05 10:38:00
9999999   1989-01-05 10:39:00
Name: unix_ts, Length: 10000000, dtype: datetime64[ns]

Timing:

In [32]: %timeit pd.to_datetime(df.unix_ts, unit='s')
10 loops, best of 3: 156 ms per loop

Conclusion: I think 156 milliseconds for converting 10.000.000 rows is not that slow

0人赞添加讨论(0) 举报

Pandas: How to create a datetime object from Week

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间