Better way to aggregate timestamped data?

I am working with non-uniformly collected, timestamp indexed data and will eventually be computing statistics on a per minute, per hourly basis. I'm wondering what the best way to aggregate by time periods is.

I currently compute two lambda functions and then add two columns to the dataframe like so:

h = lambda i: pd.to_datetime(i.strftime('%Y-%m-%d %H:00:00'))
m = lambda i: pd.to_datetime(i.strftime('%Y-%m-%d %H:%M:00'))
df['hours'] = df.index.map(h)
df['minutes'] = df.index.map(m)

This allows me to aggregate easily with groupby like so:

by_hour = df.groupby('hours')

I'm sure there is a better or more pythonic way to do this, but I haven't figured it out and would appreciate any help.

You have a couple options with pandas. For simple statistics, you can use the resample method on a DataFrame/Series with a datetime index.

In [35]: ts
Out[35]: 
2012-01-01 00:00:00    127
2012-01-01 00:00:01    452
2012-01-01 00:00:02    231
2012-01-01 00:00:03    434
2012-01-01 00:00:04    139
2012-01-01 00:00:05    223
2012-01-01 00:00:06    409
2012-01-01 00:00:07    101
2012-01-01 00:00:08      3
2012-01-01 00:00:09    393
2012-01-01 00:00:10    208
2012-01-01 00:00:11    416
2012-01-01 00:00:12    136
2012-01-01 00:00:13    343
2012-01-01 00:00:14    387
...
2012-01-01 00:01:25    307
2012-01-01 00:01:26    267
2012-01-01 00:01:27    199
2012-01-01 00:01:28    479
2012-01-01 00:01:29    423
2012-01-01 00:01:30    334
2012-01-01 00:01:31    442
2012-01-01 00:01:32    282
2012-01-01 00:01:33    289
2012-01-01 00:01:34    166
2012-01-01 00:01:35      4
2012-01-01 00:01:36    306
2012-01-01 00:01:37    165
2012-01-01 00:01:38    415
2012-01-01 00:01:39    316
Freq: S, Length: 100

In [37]: ts.resample('t', how='mean')
Out[37]: 
2012-01-01 00:00:00    270.166667
2012-01-01 00:01:00    221.400000
Freq: T, dtype: float64

For more flexibility you can groupby the hour (or minute, second, etc.) attribute of the timestamp objects:

In [38]: g = ts.groupby(lambda x: x.minute)

In [39]: g
Out[39]: <pandas.core.groupby.SeriesGroupBy object at 0x107045150>

Take a look at the docs on resampling: http://pandas.pydata.org/pandas-docs/dev/timeseries.html#up-and-downsampling

Better way to aggregate timestamped data?

问题:

回答1:

收藏的人(0)

Better way to aggregate timestamped data?

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮