Resample/Upsample Period Index and using both extr

2019-07-31 15:00发布

I have the following DataFrame, a weekly price data timeserie with a Period Index. Let's call it df

                            timestamp         open        high        low        close  volume
timestamp                       
2009-02-01/2009-02-07   733442.166309   830.540773  832.586910  828.788627  830.706009  48401.952790
2009-02-08/2009-02-14   733449.166309   839.945279  841.763948  837.812232  839.742489  53429.330472
2009-02-15/2009-02-21   733456.245777   790.733108  792.399775  788.897523  790.549550  50671.887387
2009-02-22/2009-02-28   733463.166309   760.586910  762.640558  758.234979  760.428112  60565.506438

If I try to resample it with df.resample('30min').mean() the data ends at 2009-02-22. I would like it to end at 2009-02-28, while still starting at 2009-02-01. How can I do that?
I suspect it has to do with the closed and label values of the resample function, but those are not very well explained in the doc.

Here a snippet to reconstruct the dataframe:

import pandas as pd
from pandas import Period
dikt={'volume': {Period('2009-02-01/2009-02-07', 'W-SAT'): 48401.952789699571, Period('2009-02-08/2009-02-14', 'W-SAT'): 53429.330472103007, Period('2009-02-15/2009-02-21', 'W-SAT'): 50671.887387387389, Period('2009-02-22/2009-02-28', 'W-SAT'): 60565.506437768243}, 'close': {Period('2009-02-01/2009-02-07', 'W-SAT'): 830.70600858369096, Period('2009-02-08/2009-02-14', 'W-SAT'): 839.74248927038627, Period('2009-02-15/2009-02-21', 'W-SAT'): 790.54954954954951, Period('2009-02-22/2009-02-28', 'W-SAT'): 760.42811158798281}, 'open': {Period('2009-02-01/2009-02-07', 'W-SAT'): 830.54077253218884, Period('2009-02-08/2009-02-14', 'W-SAT'): 839.94527896995703, Period('2009-02-15/2009-02-21', 'W-SAT'): 790.73310810810813, Period('2009-02-22/2009-02-28', 'W-SAT'): 760.58690987124464}, 'high': {Period('2009-02-01/2009-02-07', 'W-SAT'): 832.58690987124464, Period('2009-02-08/2009-02-14', 'W-SAT'): 841.76394849785413, Period('2009-02-15/2009-02-21', 'W-SAT'): 792.39977477477476, Period('2009-02-22/2009-02-28', 'W-SAT'): 762.64055793991417}, 'low': {Period('2009-02-01/2009-02-07', 'W-SAT'): 828.78862660944208, Period('2009-02-08/2009-02-14', 'W-SAT'): 837.8122317596567, Period('2009-02-15/2009-02-21', 'W-SAT'): 788.89752252252254, Period('2009-02-22/2009-02-28', 'W-SAT'): 758.23497854077254}, 'timestamp': {Period('2009-02-01/2009-02-07', 'W-SAT'): 733442.16630901292, Period('2009-02-08/2009-02-14', 'W-SAT'): 733449.16630901292, Period('2009-02-15/2009-02-21', 'W-SAT'): 733456.24577702698, Period('2009-02-22/2009-02-28', 'W-SAT'): 733463.16630901292}}
pd.DataFrame(dikt, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])

1条回答
别忘想泡老子
2楼-- · 2019-07-31 15:15

Since you want to include the start_time corresponding to the first PeriodIndex and end_time corresponding to the last one, the keyword arguments present in DF.resample would be of little help here as these operate as a whole/mutually exclusive in nature (meaning altering any arg would affect either the start_time or end_time but not both).

Instead, you could downsample these to take on the day frequency, "D" and then perform the aggregation of mean for each group within 30 minutes.

df.resample('D').asfreq().resample('30T').mean()

The convention arg could have been used if resampling across start_time or end_time specifically were to be performed.


To check:

resamp_start = df.resample('30min').mean()
resamp_all = df.resample('D').asfreq().resample('30T').mean().head(resamp_start.shape[0])
resamp_start.equals(resamp_all)
True

If you require only the resampled index and not it's aggregation, then it would make sense to down-sample it's current frequency to the lowest integer frequency corresponding to the frequency that is to be resampled for [Here, 1 minute] and then take slices of every 30 rows to compute this for every 30 minute sample.

df.resample('T').asfreq().iloc[::30]

These would give you the samples for the whole of 2009-02-28 as compared to the earlier case where the dates upto and not including 2009-02-28 were considered due to their normalization (times adjusted to midnight) imposed during .resample('D') operation.

查看更多
登录 后发表回答