Datetime module and Pandas to_datetime give differ

2019-08-17 22:41发布

问题:

I have a string containing a UTC datetime

utc_str = '2017-11-21T23:00+0100'

which in my local time (Europe/Berlin) is:

local_time = '2017-11-22 00:00'

And is the desired value I would like to obtain from utc_string.

I can convert utc_string to local_time just fine using:

import datetime as dt
utc_time = dt.datetime.strptime(date_str, '%Y-%m-%dT%H:%M%z')
local_time = utc_time.replace(tzinfo=pytz.utc).astimezone(pytz.timezone('Europe/Berlin'))

print(local_time.strftime('%Y-%m-%d %H:%M'))
>>> 2017-11-22 00:00

However, when I use Pandas, I get a different result. It doesn't seem to apply the UTC offset:

import pandas as pd
pd_date = pd.to_datetime(date_str, utc=True)

print(pd_date.strftime('%Y-%m-%d %H:%M'))
>>> '2017-11-21 22:00'

And naively if I try to do the same process as with the datetime module, the results are still off:

pd_date = pd.to_datetime(date_str, utc=True)
pd_date = pd_date.replace(tzinfo=pytz.utc).astimezone(pytz.timezone('Europe/Berlin'))

print(pd_date.strftime('%Y-%m-%d %H:%M'))
>>> '2017-11-21 23:00'

Is there something I am not understanding? Am I using pd.to_datetime or something else wrong? On Python 3.6, Windows 7.

回答1:

As stated in the comment, I think your code for local_time is wrong

utc_time
datetime.datetime(2017, 11, 21, 23, 0, tzinfo=datetime.timezone(datetime.timedelta(0, 3600))
utc_time.replace(tzinfo=pytz.utc)
'datetime.datetime(2017, 11, 21, 23, 0, tzinfo=<UTC>)'

so this replace removes the '+0100 from the datetime, but keeps the rest the same

utc_time.replace(tzinfo=pytz.utc).astimezone(pytz.timezone('Europe/Berlin'))
"datetime.datetime(2017, 11, 22, 0, 0, tzinfo=<DstTzInfo 'Europe/Berlin' CET+1:00:00 STD>)"

This then adds 1 hour to 23:00UTC, so become the next day midnight in Berlin as expected

pd.to_datetime(utc_str, utc=True)
Timestamp('2017-11-21 22:00:00+0000', tz='UTC')

The difference in behaviour is due to the constructor. pd.to_datetime calculates the time and timezone back to 22:00UTC instead of 23:00+0100, so if there you replace the timezone info with UTC, it changes nothing

Local time

Your utc_time object is in the correct timezone, so if you want the local time you can just do utc_time.strftime('%Y-%m-%d %H:%M') in pandas you'll have to do pd.to_datetime(utc_str, utc=True).astimezone(pytz.timezone('Europe/Berlin')).strftime('%Y-%m-%d %H:%M')