I'd like to be able to vectorize, for speed purposes, this piece of code. the purpose is to calculate a function, in this case a standard deviation, from a tuple of pair of dates that are cointained in two separate arrays.
import pandas as pd
import numpy as np
asd_1 = pd.Series(0.01 * np.random.randn(252), index=pd.date_range('2011-1-1', periods=252))
index_1 = pd.to_datetime(['2011-2-2', '2011-4-3', '2011-5-1',])
index_2 = pd.to_datetime(['2011-2-15', '2011-4-16', '2011-5-17',])
index_tot = list(zip(index_1,index_2))
aux_learning_std = pd.DataFrame([np.nanstd(asd_1.loc[i:j]) for i, j in index_tot], index=index_1)
the solution, that works, is performed through a loop but i'd rather be able to vectorize it through numpy/pandas, which is much faster. initially I though about using something like:
df_aux = pd.concat([asd_1 for _ in range(len(index_1))], axis=1)
results = df_aux.apply(lambda x: np.nanstd(x.loc[i,j]), axis = 0)
but here I fail to put together the vectors into one operation.
any and all advice is welcome.
p.s.: below there is an image for explanatory purposes
Vectorized standard deviation across ranges in an array
Sample run (verify against a loopy version)
Runtime test
Case #1 : Very small number of ranges
3
Case #2 : Decent number of ranges
1000
Case #3 : Large number of ranges
10000
Really amazing speedups of
200x+
!Using
ranged_std
to solve our caseSample run for verification -