Maximum Drawdown is a common risk metric used in quantitative finance to assess the largest negative return that has been experienced.
Recently, I became impatient with the time to calculate max drawdown using my looped approach.
def max_dd_loop(returns):
"""returns is assumed to be a pandas series"""
max_so_far = None
start, end = None, None
r = returns.add(1).cumprod()
for r_start in r.index:
for r_end in r.index:
if r_start < r_end:
current = r.ix[r_end] / r.ix[r_start] - 1
if (max_so_far is None) or (current < max_so_far):
max_so_far = current
start, end = r_start, r_end
return max_so_far, start, end
I'm familiar with the common perception that a vectorized solution would be better.
The questions are:
- can I vectorize this problem?
- What does this solution look like?
- How beneficial is it?
Edit
I modified Alexander's answer into the following function:
def max_dd(returns):
"""Assumes returns is a pandas Series"""
r = returns.add(1).cumprod()
dd = r.div(r.cummax()).sub(1)
mdd = dd.min()
end = dd.argmin()
start = r.loc[:end].argmax()
return mdd, start, end
Given a time series of returns, we need to evaluate the aggregate return for every combination of starting point to ending point.
The first trick is to convert a time series of returns into a series of return indices. Given a series of return indices, I can calculate the return over any sub-period with the return index at the beginning ri_0 and at the end ri_1. The calculation is: ri_1 / ri_0 - 1.
The second trick is to produce a second series of inverses of return indices. If r is my series of return indices then 1 / r is my series of inverses.
The third trick is to take the matrix product of r * (1 / r).Transpose.
r is an n x 1 matrix. (1 / r).Transpose is a 1 x n matrix. The resulting product contains every combination of ri_j / ri_k. Just subtract 1 and I've actually got returns.
The fourth trick is to ensure that I'm constraining my denominator to represent periods prior to those being represented by the numerator.
Below is my vectorized function.
How does this perform?
for the vectorized solution I ran 10 iterations over the time series of lengths [10, 50, 100, 150, 200]. The time it took is below:
The same test for the looped solution is below:
Edit
Alexander's answer provides superior results. Same test using modified code
I modified his code into the following function:
I had first suggested using
.expanding()
window but that's obviously not necessary with the.cumprod()
and.cummax()
built ins to calculate max drawdown up to any given point:df_returns
is assumed to be a dataframe of returns, where each column is a seperate strategy/manager/security, and each row is a new date (e.g. monthly or daily).