I have a function f that I would like to efficiently compute in a sliding window.
def efficient_f(x):
# do stuff
wSize=50
return another_f(rolling_window_using_strides(x, wSize), -1)
I have seen on SO that is particularly efficient to do that using strides: from numpy.lib.stride_tricks import as_strided
def rolling_window_using_strides(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
print np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides).shape
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
Then I try to apply it on a df:
df=pd.DataFrame(data=np.random.rand(180000,1),columns=['foo'])
df['bar']=df[['foo']].apply(efficient_f,raw=True)
# note the double [[, otherwise pd.Series.apply
# (not accepting raw, and axis kwargs) will be called instead of pd.DataFrame.
It is working very nicely, and it indeed led to significant performance gains. However, I still get the following error:
ValueError: Shape of passed values is (1, 179951), indices imply (1, 180000).
This is because I am using wSize=50, which yields
rolling_window_using_strides(df['foo'].values,50).shape
(1L, 179951L, 50L)
Is there a way by zero/np.nan padding at the borders to get
(1L, 180000, 50L)
hence same size as the original vector