Resampling timeseries with a given timedelta

I am using Pandas to structure and process Data. This is my DataFrame:

I want to do a resampling of time-series data, and have, for every ID (named here "3"), all bitrate scores, from beginning to end (beginning_time / end_time). For exemple, for the first row, I want to have all seconds, from 2016-07-08 02:17:42 to 2016-07-08 02:17:55, with the same bitrate score, and the same ID of course. Something like this :

For example, given :

df = pd.DataFrame(
{'Id' : ['CODI126640013.ts', 'CODI126622312.ts'],
 'beginning_time':['2016-07-08 02:17:42', '2016-07-08 02:05:35'], 
 'end_time' :['2016-07-08 02:17:55', '2016-07-08 02:26:11'],
 'bitrate': ['3750000', '3750000']})

which gives :

And I want to have for the first row :

Same thing for the secend row.. So the objectif is to resample the deltaTime between the beginning and the end times, the bitrate score must be the same of course.

I'm trying this code:

df['new_beginning_time'] = pd.to_datetime(df['beginning_time'])
df.set_index('new_beginning_time').groupby('Id', group_keys=False).apply(lambda df: df.resample('S').ffill()).reset_index()

But in this context, it didn't work ! Any ideas ? Thank you very much !

标签： python datetime pandas group-by resampling

2条回答

贪生不怕死

2楼-- · 2019-02-28 06:16

You can use melt with resample - 0.18.1 version of pandas:

df.beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1)
df.set_index('dates', inplace=True)
print(df)
                                   Id  bitrate
dates                                         
2016-07-08 02:17:42  CODI126640013.ts  3750000
2016-07-08 02:05:35  CODI126622312.ts  3750000
2016-07-08 02:17:55  CODI126640013.ts  3750000
2016-07-08 02:26:11  CODI126622312.ts  3750000

print (df.groupby('Id').resample('1S').ffill())
                                                    Id  bitrate
Id               dates                                         
CODI126622312.ts 2016-07-08 02:05:35  CODI126622312.ts  3750000
                 2016-07-08 02:05:36  CODI126622312.ts  3750000
                 2016-07-08 02:05:37  CODI126622312.ts  3750000
                 2016-07-08 02:05:38  CODI126622312.ts  3750000
                 2016-07-08 02:05:39  CODI126622312.ts  3750000
                 2016-07-08 02:05:40  CODI126622312.ts  3750000
                 2016-07-08 02:05:41  CODI126622312.ts  3750000
                 2016-07-08 02:05:42  CODI126622312.ts  3750000
                 2016-07-08 02:05:43  CODI126622312.ts  3750000
                 2016-07-08 02:05:44  CODI126622312.ts  3750000
                 2016-07-08 02:05:45  CODI126622312.ts  3750000
                 2016-07-08 02:05:46  CODI126622312.ts  3750000
                 2016-07-08 02:05:47  CODI126622312.ts  3750000
                 ...
                 ...

0人赞添加讨论(0) 举报

一纸荒年 Trace。

3楼-- · 2019-02-28 06:37

This should do the trick

all = []
for row in df.itertuples():
    time_range = pd.date_range(row.beginning_time, row.end_time, freq='1S')
    all += (zip(time_range, [row.Id]*len(time_range), [row.bitrate]*len(time_range)))
pd.DataFrame(all)

In[209]: pd.DataFrame(all)
Out[209]: 
                       0                 1        2
0    2016-07-08 02:17:42  CODI126640013.ts  3750000
1    2016-07-08 02:17:43  CODI126640013.ts  3750000
2    2016-07-08 02:17:44  CODI126640013.ts  3750000
3    2016-07-08 02:17:45  CODI126640013.ts  3750000
4    2016-07-08 02:17:46  CODI126640013.ts  3750000
5    2016-07-08 02:17:47  CODI126640013.ts  3750000
6    2016-07-08 02:17:48  CODI126640013.ts  3750000
7    2016-07-08 02:17:49  CODI126640013.ts  3750000

edit: I am using python 2.7, python 3 as a different zip()

0人赞添加讨论(0) 举报

Resampling timeseries with a given timedelta

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间