I try to build a vectorized/parallel stock backtesting program. I implemented a sequential version with loops, but now I'm stuck at vectorizing the functionality. I'm looking to use Pandas/Numpy for that, here's a quick outline:
There are 2 given columns, left is order quantity (to be added to position), right is stops (if stop is 1, position gets reset to 0)
M = [[0.1, 0], # left column is order quantity, right is stop
[0.1, 0],
[0.5, 0],
[0.5, 0],
[0.3, 0],
[-0.3, 0], # negative order quantity means short or sell
[-0.1, 1]] # right column (stop) is 1, so position is reset to 0
And 2 columns which I want to calculate based on the initial matrix M: Left column is position (ranges from -1 to 1 but can't go beyond) based on order quantity and right column the executed order quantity
R = [[0.1, 0.1],
[0.2, 0.1],
[0.7, 0.5], # position (left column) is equal to cumsum of order quantity (from last stop trigger)
[1, 0.3], # executed quantity is < order quantity as it's the remainder to position's max of 1
[1, 0],
[0.7, -0.3],
[-0.1, -0.8]] # stop triggered, so position is reset to 0, and then -0.1 in order quantity is executed
- Position is basically cumsum of order quantity, but only until 1 or -1, and only if stops are not triggered
- Executed order quantity is either the order quantity if position limits are not exceeded, otherwise the remainder
- Stops (when 1) reset the position to 0
The problem is that each condition is based on the other one. Does that mean this task can't be solved in parallel?
I can imagine an approach with quantity cumsum and indices where stops trigger, applied on the cumsum to calculate the executed quantity. I would appreciate any tips for elegant ways to solve this. Maybe which Numpy functions to look into, besides cumsum.
Edit: A very simplified version of the sequential version:
orders = [{'quantity': 0.1,'stop': 0},{'quantity': 0.1,'stop': 0},{'quantity': 0.5,'stop': 0},{'quantity': 0.5,'stop': 0},{'quantity': 0.3,'stop': 0},{'quantity': -0.3,'stop': 0},{'quantity': -0.1,'stop': 1}]
position = 0
for order in orders:
position_beginning = position
if order['stop'] == 1:
position = 0
if order['quantity']+position <= 1 and order['quantity']+position >= -1:
position += order['quantity']
elif position < 0 and order['quantity'] < 0:
position = -1
elif position > 0 and order['quantity'] > 0:
position = 1
executed_quantity = abs(position - position_beginning) * (1 if position > position_beginning else -1)
print(position, executed_quantity)
In the actual app, the order quantities are much more complex, e.g. divided into sub quantities. The fact that the backtester has to run over millions of orders with sub quantities, makes things really slow using this loop approach.