I am using Pandas to structure and process Data.
I have here a DataFrame with dates as index, Id and bitrate. I want to group my Data by Id and resample, at the same time, timedates which are relative to every Id, and finally keep the bitrate score.
For example, given :
df = pd.DataFrame(
{'Id' : ['CODI126640013.ts', 'CODI126622312.ts'],
'beginning_time':['2016-07-08 02:17:42', '2016-07-08 02:05:35'],
'end_time' :['2016-07-08 02:17:55', '2016-07-08 02:26:11'],
'bitrate': ['3750000', '3750000'],
'type' : ['vod', 'catchup'],
'unique_id' : ['f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30', 'f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22']})
which gives :
This is my code to get a unique column for dates with every time the Id and the bitrate :
df = df.drop(['type', 'unique_id'], axis=1)
df.beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1)
df.set_index('dates', inplace=True)
which gives :
And now, time for Resample ! This is my code :
print (df.groupby('Id').resample('1S').ffill())
And this is the result :
This is exactly what I want to do ! I have 38279 logs with the same columns and I have an error message when I do the same thing. The first part works perfectly, and gives this :
The part (df.groupby('Id').resample('1S').ffill()) gives this error message :
ValueError: cannot reindex a non-unique index with a method or limit
Any ideas ? Thnx !
It seems there is problem with duplicates in columns
beginning_time
andend_time
, I try simulate it:One possible solution is add
drop_duplicates
and use old way forresample
withgroupby
:You can also check duplicates by
boolean indexing
: