I have the following DataFrame, a weekly price data timeserie with a Period Index. Let's call it df
timestamp open high low close volume
timestamp
2009-02-01/2009-02-07 733442.166309 830.540773 832.586910 828.788627 830.706009 48401.952790
2009-02-08/2009-02-14 733449.166309 839.945279 841.763948 837.812232 839.742489 53429.330472
2009-02-15/2009-02-21 733456.245777 790.733108 792.399775 788.897523 790.549550 50671.887387
2009-02-22/2009-02-28 733463.166309 760.586910 762.640558 758.234979 760.428112 60565.506438
If I try to resample it with df.resample('30min').mean()
the data ends at 2009-02-22
. I would like it to end at 2009-02-28
, while still starting at 2009-02-01
. How can I do that?
I suspect it has to do with the closed
and label
values of the resample
function, but those are not very well explained in the doc.
Here a snippet to reconstruct the dataframe:
import pandas as pd
from pandas import Period
dikt={'volume': {Period('2009-02-01/2009-02-07', 'W-SAT'): 48401.952789699571, Period('2009-02-08/2009-02-14', 'W-SAT'): 53429.330472103007, Period('2009-02-15/2009-02-21', 'W-SAT'): 50671.887387387389, Period('2009-02-22/2009-02-28', 'W-SAT'): 60565.506437768243}, 'close': {Period('2009-02-01/2009-02-07', 'W-SAT'): 830.70600858369096, Period('2009-02-08/2009-02-14', 'W-SAT'): 839.74248927038627, Period('2009-02-15/2009-02-21', 'W-SAT'): 790.54954954954951, Period('2009-02-22/2009-02-28', 'W-SAT'): 760.42811158798281}, 'open': {Period('2009-02-01/2009-02-07', 'W-SAT'): 830.54077253218884, Period('2009-02-08/2009-02-14', 'W-SAT'): 839.94527896995703, Period('2009-02-15/2009-02-21', 'W-SAT'): 790.73310810810813, Period('2009-02-22/2009-02-28', 'W-SAT'): 760.58690987124464}, 'high': {Period('2009-02-01/2009-02-07', 'W-SAT'): 832.58690987124464, Period('2009-02-08/2009-02-14', 'W-SAT'): 841.76394849785413, Period('2009-02-15/2009-02-21', 'W-SAT'): 792.39977477477476, Period('2009-02-22/2009-02-28', 'W-SAT'): 762.64055793991417}, 'low': {Period('2009-02-01/2009-02-07', 'W-SAT'): 828.78862660944208, Period('2009-02-08/2009-02-14', 'W-SAT'): 837.8122317596567, Period('2009-02-15/2009-02-21', 'W-SAT'): 788.89752252252254, Period('2009-02-22/2009-02-28', 'W-SAT'): 758.23497854077254}, 'timestamp': {Period('2009-02-01/2009-02-07', 'W-SAT'): 733442.16630901292, Period('2009-02-08/2009-02-14', 'W-SAT'): 733449.16630901292, Period('2009-02-15/2009-02-21', 'W-SAT'): 733456.24577702698, Period('2009-02-22/2009-02-28', 'W-SAT'): 733463.16630901292}}
pd.DataFrame(dikt, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
Since you want to include the
start_time
corresponding to the firstPeriodIndex
andend_time
corresponding to the last one, the keyword arguments present inDF.resample
would be of little help here as these operate as a whole/mutually exclusive in nature (meaning altering any arg would affect either thestart_time
orend_time
but not both).Instead, you could downsample these to take on the day frequency,
"D"
and then perform the aggregation of mean for each group within 30 minutes.The
convention
arg could have been used if resampling acrossstart_time
orend_time
specifically were to be performed.To check:
If you require only the resampled index and not it's aggregation, then it would make sense to down-sample it's current frequency to the lowest integer frequency corresponding to the frequency that is to be resampled for [Here, 1 minute] and then take slices of every 30 rows to compute this for every 30 minute sample.
These would give you the samples for the whole of
2009-02-28
as compared to the earlier case where the dates upto and not including2009-02-28
were considered due to their normalization (times adjusted to midnight) imposed during.resample('D')
operation.