I'm sorry for the poor phrasing of the question, but it was the best I could do. I know exactly what I want, but not exactly how to ask for it.
Here is the logic demonstrated by an example:
Two conditions that take on the values 1 or 0 trigger a signal that also takes on the values 1 or 0. Condition A triggers the signal (If A = 1 then signal = 1, else signal = 0) no matter what. Condition B does NOT trigger the signal, but the signal stays triggered if condition B stays equal to 1 after the signal previously has been triggered by condition A. The signal goes back to 0 only after both A and B have gone back to 0.
1. Input:
2. Desired output (signal_d) and confirmation that a for loop can solve it (signal_l):
3. My attempt using numpy.where():
4. Reproducible snippet:
# Settings
import numpy as np
import pandas as pd
import datetime
# Data frame with input and desired output i column signal_d
df = pd.DataFrame({'condition_A':list('00001100000110'),
'condition_B':list('01110011111000'),
'signal_d':list('00001111111110')})
colnames = list(df)
df[colnames] = df[colnames].apply(pd.to_numeric)
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'), periods=14).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
# Solution using a for loop with nested ifs in column signal_l
df['signal_l'] = df['condition_A'].copy(deep = True)
i=0
for observations in df['signal_l']:
if df.ix[i,'condition_A'] == 1:
df.ix[i,'signal_l'] = 1
else:
# Signal previously triggered by condition_A
# AND kept "alive" by condition_B:
if df.ix[i - 1,'signal_l'] & df.ix[i,'condition_B'] == 1:
df.ix[i,'signal_l'] = 1
else:
df.ix[i,'signal_l'] = 0
i = i + 1
# My attempt with np.where in column signal_v1
df['Signal_v1'] = df['condition_A'].copy()
df['Signal_v1'] = np.where(df.condition_A == 1, 1, np.where( (df.shift(1).Signal_v1 == 1) & (df.condition_B == 1), 1, 0))
print(df)
This is pretty straight forward using a for loop with lagged values and nested if sentences, but I can't figure it out using vectorized functions like numpy.where()
. And I know this would be much faster for bigger data frames.
Thank you for any suggestions!
I don't think there is a way to vectorize this operation that will be significantly faster than a Python loop. (At least, not if you want to stick with just Python, pandas and numpy.)
However, you can improve the performance of this operation by simplifying your code. Your implementation uses
if
statements and a lot of DataFrame indexing. These are relatively costly operations.Here's a modification of your script that includes two functions:
add_signal_l(df)
andadd_lagged(df)
. The first is your code, just wrapped up in a function. The second uses a simpler function to achieve the same result--still a Python loop, but it uses numpy arrays and bitwise operators.Here's a comparison of the timing of the two function, run in an IPython session:
As you can see,
add_lagged(df)
is much faster.