I have a dataframe that provides two integer columns with the Year and Week of the year:
import pandas as pd
import numpy as np
L1 = [43,44,51,2,5,12]
L2 = [2016,2016,2016,2017,2017,2017]
df = pd.DataFrame({"Week":L1,"Year":L2})
df
Out[72]:
Week Year
0 43 2016
1 44 2016
2 51 2016
3 2 2017
4 5 2017
5 12 2017
I need to create a datetime-object from these two numbers.
I tried this, but it throws an error:
df["DT"] = df.apply(lambda x: np.datetime64(x.Year,'Y') + np.timedelta64(x.Week,'W'),axis=1)
Then I tried this, it works but gives the wrong result, that is it ignores the week completely:
df["S"] = df.Week.astype(str)+'-'+df.Year.astype(str)
df["DT"] = df["S"].apply(lambda x: pd.to_datetime(x,format='%W-%Y'))
df
Out[74]:
Week Year S DT
0 43 2016 43-2016 2016-01-01
1 44 2016 44-2016 2016-01-01
2 51 2016 51-2016 2016-01-01
3 2 2017 2-2017 2017-01-01
4 5 2017 5-2017 2017-01-01
5 12 2017 12-2017 2017-01-01
I'm really getting lost between Python's datetime
, Numpy's datetime64
, and pandas Timestamp
, can you tell me how it's done correctly?
I'm using Python 3, if that is relevant in any way.
You need
%w
for specify which day is first in week:Try this:
It's much easier to parse it from UNIX epoch timestamp:
Timing for 10M rows DF:
Setup:
Check:
Timing:
Conclusion: I think 156 milliseconds for converting 10.000.000 rows is not that slow