Here is my input:
import pandas as pd
import numpy as np
list1 = [10,79,6,38,4,557,12,220,46,22,45,22]
list2 = [4,3,23,6,234,47,312,2,426,42,435,23]
df = pd.DataFrame({'A' : list1, 'B' : list2}, columns = ['A', 'B'])
df['C'] = np.where (df['A'] > df['B'].shift(-2), 1, np.nan)
print (df)
that produces this output:
A B C
0 10 4 NaN
1 79 3 1.0
2 6 23 NaN
3 38 6 NaN
4 4 234 NaN
5 557 47 1.0
6 12 312 NaN
7 220 2 1.0
8 46 426 NaN
9 22 42 NaN
10 45 435 NaN
11 22 23 NaN
What I need to do is to change column 'C' to be a set of three 1's in a row, non-overlapping. The desired output is:
A B C
0 10 4 NaN
1 79 3 1.0
2 6 23 1.0
3 38 6 1.0
4 4 234 NaN
5 557 47 1.0
6 12 312 1.0
7 220 2 1.0
8 46 426 NaN
9 22 42 NaN
10 45 435 NaN
11 22 23 NaN
So, rows 2, 3, and 6 change from NaN to 1.0. Row 7 already has a 1.0 and it is ignored. Rows 8 and 9 need to stay NaN because row 7 is the last entry of the previous set.
I don't know if there is a better way to build column 'C' that would do this at creation.
I have tried several versions of fillna and ffill, none of them worked for me.
It seems very convoluted but I tried to isolate the row id's for each 1.0 with this line:
print (df.loc[df['C'] == 1])
Which correctly outputs this:
A B C
1 79 3 1.0
5 557 47 1.0
7 220 2 1.0
Even though I know that information, I don't know how to proceed from there.
Thank you so much for your help in advance, David
make an array from sequence:
This function will test segments of an array for matching patterns and will replace segments which match with another pattern. Previously matched segments will not be considered for future matches (the filler numbers are greater than one).
Patterns to be replaced:
And corresponding fill patterns:
make test_patterns and fill_patterns inputs
run function:
assign a to column C
df:
The patterns and fills may need to be adjusted/added to depending on the way the input column is initially populated and the specific result sequence rules.
EDIT:
Faster version (thanks to b2002):
First get the indices of all your starting points, i.e. all your points that are 1.0 and have two NaN following, by looking at the differences between the points that are not null in the
C
column (first index is included by default), then iterate over those indices and useloc
to change slices of yourC
column:Result: