Custom time series resampling in Pandas

I have a df with OHLC data in a 1m frequency:

                        Open     High      Low    Close
DateTime                                               
2005-09-06 18:00:00  1230.25  1231.50  1230.25  1230.25
2005-09-06 18:01:00  1230.50  1231.75  1229.25  1230.50
.
.
2005-09-07 15:59:00  1234.50  1235.50  1234.25  1234.50
2005-09-07 16:00:00  1234.25  1234.50  1234.25  1234.25

I need to do a "custom" resample that fits futures hours data where:

Every day starts at 18:00:00 of the previous day (Monday starts on Sunday 18:00:00)
Every day ends at 16:00:00 of the current day
The timestamp should be as of the the time of the close, not the open.

After doing the resample, the output should be:

                        Open     High      Low    Close
DateTime                                               
2005-09-07 16:00:00  1230.25  1235.50  1229.25  1234.25

Where:

Open = first (column Open) at 2005-09-06 18:00:00
High = max (column High) from 2005-09-06 18:00:00 to 2005-09-07 16:00:00
Low = min (column Low) from 2005-09-06 18:00:00 to 2005-09-07 16:00:00
Close = last (column Close) at 2005-09-07 16:00:00

I have tried:

Changing the parameters rule, and base, but it didn't work.
Playing with reindex with no success.

I used the following 'how':

conversion = {'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'}

标签： python python-2.7 pandas time-series

1条回答

劫难

2楼-- · 2020-06-29 08:37

import pandas as pd
df = pd.read_table('data', sep='\s{2,}')
# Make sure the index is a DatetimeIndex
df.index = pd.DatetimeIndex(df.index)

# discard rows whose time falls between 16:00 and 18:00
df = df.between_time('18:00', '16:00', include_start=True, include_end=True)

proxy = df.index + pd.DateOffset(hours=6)
result = df.groupby(proxy.date).agg(
    {'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'})
result = result.reindex(columns=['Open','High','Low','Close'])
print(result)

yields

               Open    High      Low    Close
2005-09-07  1230.25  1235.5  1229.25  1234.25

The code above creates a proxy date which is calculated by adding 6 hours to each datetime in the index. This proxy date is then used as the groupby value.

In [112]: proxy = pd.DatetimeIndex(df.index) + pd.DateOffset(hours=6)

To see how the proxy values correspond to the index:

In [116]: pd.Series(proxy.date, index=df.index)
Out[116]: 
DateTime
2005-09-06 18:00:00    2005-09-07
2005-09-06 18:01:00    2005-09-07
2005-09-07 15:59:00    2005-09-07
2005-09-07 16:00:00    2005-09-07
dtype: object

0人赞添加讨论(0) 举报

Custom time series resampling in Pandas

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间