I have a df
similar to this:
print(df)
A B C
DATE_TIME
2016-10-08 13:57:00 in 5.61 0
2016-10-08 14:02:00 in 8.05 0
2016-10-08 14:07:00 out 7.92 0
2016-10-08 14:12:00 in 7.98 1
2016-10-08 14:17:00 out 8.18 0
2016-10-08 14:22:00 out 7.59 0
print (df.dtypes)
A object
B float64
C int64
dtype: object
I want to resample this df
to a 1S frecuency, so that I can concatenate it with another df
.
The problem I can't solve is that for the columns type object
and int64
I want the same value repeated for the newly created time rows, which could be done by this function:
df=df.resample('S', fill_method='pad')
whereas for the float64
columns I am looking for this:
df=df.interpolate()
I thought about applying an IF statement, but I also figuered that I first have to do the resample step before the interpolation step. When I resample just by df=df.resample('S')
I can interpolate afterwards, which works for the float64
columns but not for the object
and Int64
ones.
Could anybody help me please? Thanks.
Here is a method that use reindex
:
index = pd.date_range(df.index[0], df.index[-1], freq="s")
df2 = df.reindex(index)
for col, s in df2.iteritems():
if s.dtype == float:
s.interpolate(inplace=True)
else:
s.ffill(inplace=True)
UPDATE: I think you still can use vectorized approach (not looping through your data frame), even when you have multiple float
columns - it should be much faster:
assuming you have a following DF (columns: ['B','D']
are of float
dtype):
In [18]: df
Out[18]:
A B C D
DATE_TIME
2016-10-08 13:57:00 in 5.61 0 6.2271
2016-10-08 14:02:00 in 8.05 0 8.9355
2016-10-08 14:07:00 out 7.92 0 8.7912
2016-10-08 14:12:00 in 7.98 1 8.8578
2016-10-08 14:17:00 out 8.18 0 9.0798
2016-10-08 14:22:00 out 7.59 0 8.4249
In [19]: df.dtypes
Out[19]:
A object
B float64
C int64
D float64
dtype: object
you can do the following (it'll work for pandas versions: 0.18.0+):
rsmpl = df.resample('S')
pd.concat([rsmpl.pad()[df.select_dtypes(exclude=['float']).columns],
rsmpl.interpolate()[df.select_dtypes(include=['float']).columns]],
axis=1)
Example:
In [23]: pd.concat([rsmpl.pad()[df.select_dtypes(exclude=['float']).columns],
....: rsmpl.interpolate()[df.select_dtypes(include=['float']).columns]],
....: axis=1).head()
Out[23]:
A C B D
DATE_TIME
2016-10-08 13:57:00 in 0 5.610000 6.227100
2016-10-08 13:57:01 in 0 5.618133 6.236128
2016-10-08 13:57:02 in 0 5.626267 6.245156
2016-10-08 13:57:03 in 0 5.634400 6.254184
2016-10-08 13:57:04 in 0 5.642533 6.263212
OLD answer:
You can first resample('S')
and pad()
and then reassign your float64
column B
with df.resample('S').interpolate().B
:
In [96]: df.resample('S').pad().assign(B=df.resample('S').interpolate().B)
Out[96]:
A B C
DATE_TIME
2016-10-08 13:57:00 in 5.610000 0
2016-10-08 13:57:01 in 5.618133 0
2016-10-08 13:57:02 in 5.626267 0
2016-10-08 13:57:03 in 5.634400 0
2016-10-08 13:57:04 in 5.642533 0
2016-10-08 13:57:05 in 5.650667 0
2016-10-08 13:57:06 in 5.658800 0
2016-10-08 13:57:07 in 5.666933 0
2016-10-08 13:57:08 in 5.675067 0
2016-10-08 13:57:09 in 5.683200 0
2016-10-08 13:57:10 in 5.691333 0
2016-10-08 13:57:11 in 5.699467 0
2016-10-08 13:57:12 in 5.707600 0
2016-10-08 13:57:13 in 5.715733 0
2016-10-08 13:57:14 in 5.723867 0
2016-10-08 13:57:15 in 5.732000 0
2016-10-08 13:57:16 in 5.740133 0
2016-10-08 13:57:17 in 5.748267 0
2016-10-08 13:57:18 in 5.756400 0
2016-10-08 13:57:19 in 5.764533 0
2016-10-08 13:57:20 in 5.772667 0
2016-10-08 13:57:21 in 5.780800 0
2016-10-08 13:57:22 in 5.788933 0
2016-10-08 13:57:23 in 5.797067 0
2016-10-08 13:57:24 in 5.805200 0
2016-10-08 13:57:25 in 5.813333 0
2016-10-08 13:57:26 in 5.821467 0
2016-10-08 13:57:27 in 5.829600 0
... ... ... ..
Or a bit faster version (one resample()
call instead of two):
rsmpl = df.resample('S')
rsmpl.pad().assign(B=rsmpl.interpolate().B)