I am working with non-uniformly collected, timestamp indexed data and will eventually be computing statistics on a per minute, per hourly basis. I'm wondering what the best way to aggregate by time periods is.
I currently compute two lambda functions and then add two columns to the dataframe like so:
h = lambda i: pd.to_datetime(i.strftime('%Y-%m-%d %H:00:00'))
m = lambda i: pd.to_datetime(i.strftime('%Y-%m-%d %H:%M:00'))
df['hours'] = df.index.map(h)
df['minutes'] = df.index.map(m)
This allows me to aggregate easily with groupby
like so:
by_hour = df.groupby('hours')
I'm sure there is a better or more pythonic way to do this, but I haven't figured it out and would appreciate any help.
You have a couple options with pandas. For simple statistics, you can use the resample method on a DataFrame/Series with a datetime index.
In [35]: ts
Out[35]:
2012-01-01 00:00:00 127
2012-01-01 00:00:01 452
2012-01-01 00:00:02 231
2012-01-01 00:00:03 434
2012-01-01 00:00:04 139
2012-01-01 00:00:05 223
2012-01-01 00:00:06 409
2012-01-01 00:00:07 101
2012-01-01 00:00:08 3
2012-01-01 00:00:09 393
2012-01-01 00:00:10 208
2012-01-01 00:00:11 416
2012-01-01 00:00:12 136
2012-01-01 00:00:13 343
2012-01-01 00:00:14 387
...
2012-01-01 00:01:25 307
2012-01-01 00:01:26 267
2012-01-01 00:01:27 199
2012-01-01 00:01:28 479
2012-01-01 00:01:29 423
2012-01-01 00:01:30 334
2012-01-01 00:01:31 442
2012-01-01 00:01:32 282
2012-01-01 00:01:33 289
2012-01-01 00:01:34 166
2012-01-01 00:01:35 4
2012-01-01 00:01:36 306
2012-01-01 00:01:37 165
2012-01-01 00:01:38 415
2012-01-01 00:01:39 316
Freq: S, Length: 100
In [37]: ts.resample('t', how='mean')
Out[37]:
2012-01-01 00:00:00 270.166667
2012-01-01 00:01:00 221.400000
Freq: T, dtype: float64
For more flexibility you can groupby the hour
(or minute, second, etc.) attribute of the timestamp objects:
In [38]: g = ts.groupby(lambda x: x.minute)
In [39]: g
Out[39]: <pandas.core.groupby.SeriesGroupBy object at 0x107045150>
Take a look at the docs on resampling: http://pandas.pydata.org/pandas-docs/dev/timeseries.html#up-and-downsampling