I have a python DataFrame containing some financial data that I am trying to create some technical indicators for. I am trying to figure out how to use a moving window function to speed up the process rather than going element by element. For each index I would like to return the maximum index for the last 30 days. I had implemented an element by element solution but as you can imagine it is dreadfully slow.
for s_sym in ls_symbols:
for i in range(refresh, len(ldt_timestamps)):
#Aroon-Up = ((period - Days Since High)/period) x 100 Aroon-Down = ((period - Days Since Low)/peiod) x 100'''
whrmax = df_close[s_sym].ix[ldt_timestamps[i-uplen:i]].idxmax()
maxaway = (df_close[s_sym].ix[whrmax : ldt_timestamps[i-1]]).count()
aroonup = ((uplen - maxaway) / uplen ) * 100
whrmin = df_close[s_sym].ix[ldt_timestamps[i-dnlen:i]].idxmin()
minaway = df_close[s_sym].ix[whrmin : ldt_timestamps[i-1]].count()
aroondn = ((dnlen - minaway) / dnlen ) * 100
How do I create custom rolling window functions?
See the docs at:
http://pandas.pydata.org/pandas-docs/dev/computation.html#moving-rolling-statistics-moments
And some good examples at:
http://pandas.pydata.org/pandas-docs/dev/cookbook.html#grouping
In [18]: df = DataFrame(randn(1000,4),index=pd.date_range('20000101',periods=1000),
columns=list('ABCD'))
In [19]: pandas.stats.moments.rolling_apply(df,30,lambda x: Series(x).idxmax())
Out[19]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1000 entries, 2000-01-01 00:00:00 to 2002-09-26 00:00:00
Freq: D
Data columns (total 4 columns):
A 971 non-null values
B 971 non-null values
C 971 non-null values
D 971 non-null values
dtypes: float64(4)
In [47]: pandas.stats.moments.rolling_apply(df,30,lambda x: Series(x).idxmax()).tail(30)
Out[47]:
A B C D
2002-08-28 24 3 26 21
2002-08-29 23 2 25 20
2002-08-30 22 1 24 19
2002-08-31 21 0 23 18
2002-09-01 20 6 29 17
2002-09-02 19 5 28 16
2002-09-03 18 4 27 15
2002-09-04 17 3 26 14
2002-09-05 16 2 25 13
2002-09-06 15 1 24 12
2002-09-07 14 0 23 11
2002-09-08 13 13 22 10
2002-09-09 12 12 21 9
2002-09-10 11 11 20 8
2002-09-11 10 10 19 7
2002-09-12 9 9 18 6
2002-09-13 8 8 17 5
2002-09-14 7 7 16 4
2002-09-15 6 6 15 3
2002-09-16 5 5 14 2
2002-09-17 4 4 13 1
2002-09-18 3 3 12 0
2002-09-19 2 2 11 11
2002-09-20 1 1 10 10
2002-09-21 0 0 9 9
2002-09-22 27 25 8 8
2002-09-23 26 24 7 7
2002-09-24 25 23 6 6
2002-09-25 24 22 5 5
2002-09-26 23 29 4 4
FYI, you are probably almost certainly better off just using rolling_max(df,30)
to get the max values in a specifc range, which is what I gather you are after