import pandas as pd
data={'col1':[1,3,3,1,2,3,2,2]}
df=pd.DataFrame(data,columns=['col1'])
print df
col1
0 1
1 3
2 3
3 1
4 2
5 3
6 2
7 2
I have the following Pandas DataFrame and I want to create another column that compares the previous row of col1 to see if they are equal. What would be the best way to do this? It would be like the following DataFrame. Thanks
col1 match
0 1 False
1 3 False
2 3 True
3 1 False
4 2 False
5 3 False
6 2 False
7 2 True
Here's a NumPy arrays based approach using
slicing
that lets us use the views into the input array for efficiency purposes -Sample run -
Runtime test -
I'm surprised no one mentioned rolling method here. rolling can be easily used to verify if the n-previous values are all the same or to perform any custom operations. This is certainly not as fast as using diff or shift but it can be easily adapted for larger windows:
You need
eq
withshift
:Or instead
eq
use==
, but it is a bit slowier in large DataFrame:Timings:
1) pandas approach: Use
diff
:2) numpy approach: Use
np.ediff1d
.Both produce:
Timings: (for the same
DF
used by @jezrael)