df:
hour rev
datetime
2016-05-01 01:00:00 1 -0.02
2016-05-01 02:00:00 2 -0.01
2016-05-01 03:00:00 3 -0.02
2016-05-01 04:00:00 4 -0.02
2016-05-01 05:00:00 5 -0.01
2016-05-01 06:00:00 6 -0.03
2016-05-01 07:00:00 7 -0.10
2016-05-01 08:00:00 8 -0.09
2016-05-01 09:00:00 9 -0.08
2016-05-01 10:00:00 10 -0.10
2016-05-01 11:00:00 11 -0.12
2016-05-01 12:00:00 12 -0.14
2016-05-01 13:00:00 13 -0.17
2016-05-01 14:00:00 14 -0.16
2016-05-01 15:00:00 15 -0.15
2016-05-01 16:00:00 16 -0.15
2016-05-01 17:00:00 17 -0.17
2016-05-01 18:00:00 18 -0.16
2016-05-01 19:00:00 19 -0.18
2016-05-01 20:00:00 20 -0.17
2016-05-01 21:00:00 21 -0.14
2016-05-01 22:00:00 22 -0.16
2016-05-01 23:00:00 23 -0.08
2016-05-02 00:00:00 24 -0.06
df.reset_index().to_dict('rec'):
[{'datetime': Timestamp('2016-05-01 01:00:00'), 'hour': 1L, 'rev': -0.02},
{'datetime': Timestamp('2016-05-01 02:00:00'), 'hour': 2L, 'rev': -0.01},
{'datetime': Timestamp('2016-05-01 03:00:00'), 'hour': 3L, 'rev': -0.02},
{'datetime': Timestamp('2016-05-01 04:00:00'), 'hour': 4L, 'rev': -0.02},
{'datetime': Timestamp('2016-05-01 05:00:00'), 'hour': 5L, 'rev': -0.01},
{'datetime': Timestamp('2016-05-01 06:00:00'), 'hour': 6L, 'rev': -0.03},
{'datetime': Timestamp('2016-05-01 07:00:00'), 'hour': 7L, 'rev': -0.1},
{'datetime': Timestamp('2016-05-01 08:00:00'), 'hour': 8L, 'rev': -0.09},
{'datetime': Timestamp('2016-05-01 09:00:00'), 'hour': 9L, 'rev': -0.08},
{'datetime': Timestamp('2016-05-01 10:00:00'), 'hour': 10L, 'rev': -0.1},
{'datetime': Timestamp('2016-05-01 11:00:00'), 'hour': 11L, 'rev': -0.12},
{'datetime': Timestamp('2016-05-01 12:00:00'), 'hour': 12L, 'rev': -0.14},
{'datetime': Timestamp('2016-05-01 13:00:00'), 'hour': 13L, 'rev': -0.17},
{'datetime': Timestamp('2016-05-01 14:00:00'), 'hour': 14L, 'rev': -0.16},
{'datetime': Timestamp('2016-05-01 15:00:00'), 'hour': 15L, 'rev': -0.15},
{'datetime': Timestamp('2016-05-01 16:00:00'), 'hour': 16L, 'rev': -0.15},
{'datetime': Timestamp('2016-05-01 17:00:00'), 'hour': 17L, 'rev': -0.17},
{'datetime': Timestamp('2016-05-01 18:00:00'), 'hour': 18L, 'rev': -0.16},
{'datetime': Timestamp('2016-05-01 19:00:00'), 'hour': 19L, 'rev': -0.18},
{'datetime': Timestamp('2016-05-01 20:00:00'), 'hour': 20L, 'rev': -0.17},
{'datetime': Timestamp('2016-05-01 21:00:00'), 'hour': 21L, 'rev': -0.14},
{'datetime': Timestamp('2016-05-01 22:00:00'), 'hour': 22L, 'rev': -0.16},
{'datetime': Timestamp('2016-05-01 23:00:00'), 'hour': 23L, 'rev': -0.08},
{'datetime': Timestamp('2016-05-02 00:00:00'), 'hour': 24L, 'rev': -0.06}]
df.set_index('datetime', inplace=True)
I want to aggregate the data by DAY. So I do:
dfgrped = df.groupby([pd.TimeGrouper('D')])
I want to compute stats like the sum:
dfgrped.agg(sum)
hour rev
datetime
2016-05-01 276 -2.43
2016-05-02 24 -0.06
As you can see the aggregation occurs for 2016-05-01
and 2016-05-02
.
Notice, that the last hourly data entry in df occurs at 2016-05-02 00:00:00, which is meant to be the data for the last hour of the previous day i.e. 24 hourly data points for each day.
However, given the datetime stamp, things don't work out the way I intended. I want all 24
hours to be aggregated for 2016-05-01
.
I imagine this sort of issue must arise often in various applications when a measurement is taken at the end of the hour. This isn't a problem until the last hour, which occurs at the 00:00:00
timestamp of the following day.
How to address this issue in pandas?