Resample/Upsample Period Index and using both extr

I have the following DataFrame, a weekly price data timeserie with a Period Index. Let's call it df

                            timestamp         open        high        low        close  volume
timestamp                       
2009-02-01/2009-02-07   733442.166309   830.540773  832.586910  828.788627  830.706009  48401.952790
2009-02-08/2009-02-14   733449.166309   839.945279  841.763948  837.812232  839.742489  53429.330472
2009-02-15/2009-02-21   733456.245777   790.733108  792.399775  788.897523  790.549550  50671.887387
2009-02-22/2009-02-28   733463.166309   760.586910  762.640558  758.234979  760.428112  60565.506438

If I try to resample it with df.resample('30min').mean() the data ends at 2009-02-22. I would like it to end at 2009-02-28, while still starting at 2009-02-01. How can I do that?
I suspect it has to do with the closed and label values of the resample function, but those are not very well explained in the doc.

Here a snippet to reconstruct the dataframe:

import pandas as pd
from pandas import Period
dikt={'volume': {Period('2009-02-01/2009-02-07', 'W-SAT'): 48401.952789699571, Period('2009-02-08/2009-02-14', 'W-SAT'): 53429.330472103007, Period('2009-02-15/2009-02-21', 'W-SAT'): 50671.887387387389, Period('2009-02-22/2009-02-28', 'W-SAT'): 60565.506437768243}, 'close': {Period('2009-02-01/2009-02-07', 'W-SAT'): 830.70600858369096, Period('2009-02-08/2009-02-14', 'W-SAT'): 839.74248927038627, Period('2009-02-15/2009-02-21', 'W-SAT'): 790.54954954954951, Period('2009-02-22/2009-02-28', 'W-SAT'): 760.42811158798281}, 'open': {Period('2009-02-01/2009-02-07', 'W-SAT'): 830.54077253218884, Period('2009-02-08/2009-02-14', 'W-SAT'): 839.94527896995703, Period('2009-02-15/2009-02-21', 'W-SAT'): 790.73310810810813, Period('2009-02-22/2009-02-28', 'W-SAT'): 760.58690987124464}, 'high': {Period('2009-02-01/2009-02-07', 'W-SAT'): 832.58690987124464, Period('2009-02-08/2009-02-14', 'W-SAT'): 841.76394849785413, Period('2009-02-15/2009-02-21', 'W-SAT'): 792.39977477477476, Period('2009-02-22/2009-02-28', 'W-SAT'): 762.64055793991417}, 'low': {Period('2009-02-01/2009-02-07', 'W-SAT'): 828.78862660944208, Period('2009-02-08/2009-02-14', 'W-SAT'): 837.8122317596567, Period('2009-02-15/2009-02-21', 'W-SAT'): 788.89752252252254, Period('2009-02-22/2009-02-28', 'W-SAT'): 758.23497854077254}, 'timestamp': {Period('2009-02-01/2009-02-07', 'W-SAT'): 733442.16630901292, Period('2009-02-08/2009-02-14', 'W-SAT'): 733449.16630901292, Period('2009-02-15/2009-02-21', 'W-SAT'): 733456.24577702698, Period('2009-02-22/2009-02-28', 'W-SAT'): 733463.16630901292}}
pd.DataFrame(dikt, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])

标签： python parsing pandas

1条回答

别忘想泡老子

2楼-- · 2019-07-31 15:15

Since you want to include the start_time corresponding to the first PeriodIndex and end_time corresponding to the last one, the keyword arguments present in DF.resample would be of little help here as these operate as a whole/mutually exclusive in nature (meaning altering any arg would affect either the start_time or end_time but not both).

Instead, you could downsample these to take on the day frequency, "D" and then perform the aggregation of mean for each group within 30 minutes.

df.resample('D').asfreq().resample('30T').mean()

The convention arg could have been used if resampling across start_time or end_time specifically were to be performed.

To check:

resamp_start = df.resample('30min').mean()
resamp_all = df.resample('D').asfreq().resample('30T').mean().head(resamp_start.shape[0])
resamp_start.equals(resamp_all)
True

If you require only the resampled index and not it's aggregation, then it would make sense to down-sample it's current frequency to the lowest integer frequency corresponding to the frequency that is to be resampled for [Here, 1 minute] and then take slices of every 30 rows to compute this for every 30 minute sample.

df.resample('T').asfreq().iloc[::30]

These would give you the samples for the whole of 2009-02-28 as compared to the earlier case where the dates upto and not including 2009-02-28 were considered due to their normalization (times adjusted to midnight) imposed during .resample('D') operation.

0人赞添加讨论(0) 举报

Resample/Upsample Period Index and using both extr

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间