Python pandas tz_localize throws NonExistentTimeEr

2019-05-19 14:16发布

问题:

In python pandas, I have a dataset that looks like this:

For data before 2007-04-26 17:00:00, the time zone is US/Eastern. For data after, the time zone is American/Chicago.

When I run this:

data.index = data[:'2007-04-26 16:59:59'].index.tz_localize('US/Eastern', ambiguous = 'NaT').tz_convert('Europe/London')

I get an error that says:

NonExistentTimeError: 2006-04-02 02:00:00

This is indeed because of daylight savings time. I have the same problem for 2007. I don't have the problem for subsequent years. Ideally, I'd like two commands - one that converts the first half of the dataset from Eastern to London, and another that converts the second half from Chicago to London.

Since that didn't work, I tried dropping these times (an hour I believe), eg. 02:00:00 to 03:00:00 where there was daylight savings time. However, when I run

data.drop(data.ix['2005-04-03 2:00:00':'2005-04-03 3:00:00'], inplace=True)

I get

ValueError: labels ['open' 'high' 'low' 'close' 'volume'] not contained in axis

Does anyone know how I can simply convert these times? Any help would be greatly appreciated.

Thanks, Alex

UPDATE to add more info:

Ok I've used the following code which has worked to drop the offending times:

UPDATE 2:

mask =  ((data.index<datetime.strptime("2006-04-02 02:00:00","%Y-%m-%d %H:%S:%M")) | (data.index>datetime.strptime("2006-04-02 03:00:00","%Y-%m-%d %H:%S:%M"))) & ((data.index<datetime.strptime("2005-04-03 02:00:00","%Y-%m-%d %H:%S:%M")) | (data.index>datetime.strptime("2005-04-03 03:00:00","%Y-%m-%d %H:%S:%M"))) & ((data.index<datetime.strptime("2005-10-30 01:00:00","%Y-%m-%d %H:%S:%M")) | (data.index>datetime.strptime("2005-10-30 02:00:00","%Y-%m-%d %H:%S:%M"))) & ((data.index<datetime.strptime("2006-10-29 01:00:00","%Y-%m-%d %H:%S:%M")) | (data.index>datetime.strptime("2006-10-29 02:00:00","%Y-%m-%d %H:%S:%M")))
data_filtered = data[mask]
data_filtered.ix = data_filtered.tz_localize('US/Eastern', infer_dst=True).tz_convert('Europe/London')

But now I get this error:

    data_filtered.ix = data_filtered.tz_localize('US/Eastern', infer_dst=True).tz_convert('Europe/London')
Traceback (most recent call last):

  File "<ipython-input-38-0fc8a9e68588>", line 1, in <module>
    data_filtered.ix = data_filtered.tz_localize('US/Eastern', infer_dst=True).tz_convert('Europe/London')

  File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 1955, in __setattr__
    object.__setattr__(self, name, value)

AttributeError: can't set attribute

Any ideas on this? I did some Googling but couldn't find anything really related..

回答1:

Your drop command doesn't look like it should work based on the description in the docs. To get rid of the offending times, I would create a mask on the dataframe, ie:

from datetime import datetime
mask =  ((df.index<datetime.strptime("2006-04-02 02:00:00","%Y-%m-%d %H:%S:%M") | (df.index>datetime.strptime("2006-04-02 03:00:00","%Y-%m-%d %H:%S:%M")) # probably add some    more years here as or clauses

df_filtered = df[mask]

Probably there's a way to make drop work too. Check this related question: Deleting rows of daylight saving time from a time indexed pandas dataframe