create new column that compares across rows in pan

2019-07-18 02:47发布

问题:

I am looking to create a new column in a dataframe based on the values seen in the next 2 rows. Specifically, if any values in the next 2 rows are below 4, then I want the new value in the current row to be 0 (and if all values in the next 2 rows are above 4 then I want the new value in the current row to be 1).

>>> df = pandas.DataFrame({"A": [5,6,7,3,2]})
>>> df
   A
0  5
1  6
2  7
3  8
4  2
>>> desired_result = pandas.DataFrame({"A": [5,6,7,8,2], "new": [1,1,0,0,0]})
>>> desired_result
   A  new
0  5    1
1  6    1
2  7    0
3  8    0
4  2    0

Where you can see that in the "desired_result" the first value is 1 because 6 and 7 are both > 4 (and hte same logic applies) until in the third row the new value becomes 0 because when we look ahead to the next two rows (8,2) then we see that 2 is < 4 so the value becomes 0.

I have been trying to use the apply function but I cannot figure out how to pass along the next 2 row values as inputs.

I have found lots of help on this site about comparing across columns, but cannot figure out how to "look ahead" like I described.

Thanks for the help!

回答1:

You can set the new value to one and then use loc together with shift and lt (less than) to set the appropriate values to zero.

df = pd.DataFrame({"A": [5, 6, 7, 8, 2]})
df['new'] = 1

df.loc[(df.A.shift(-1).lt(4)) | (df.A.shift(-2).lt(4)), 'new'] = 0

# The last value does not have any future observations and should be set to zero.
df.new.iat[-1] = 0

>>> df
   A  new
0  5    1
1  6    1
2  7    0
3  8    0
4  2    0

To expand to the next 8 rows instead of 2:

nrows = 8
df.loc[eval(" | ".join("df.A.shift(-{0}).lt(4)".format(n) 
                       for n in range(1, nrows + 1))), 'new'] = 0