For example:
0 1
0 87.0 NaN
1 NaN 99.0
2 NaN NaN
3 NaN NaN
4 NaN 66.0
5 NaN NaN
6 NaN 77.0
7 NaN NaN
8 NaN NaN
9 88.0 NaN
My expected output is: [False, True]
since 87 is the first !NaN value but not the maximum in column 0
. 99
however is the first !NaN value and is indeed the max in that column.
Option a): Just do
groupby
withfirst
(May not be 100% reliable )
Option b):
bfill
Or using
bfill
(Fill any NaN value by the backward value in the column , then the first row afterbfill
is the first notNaN
value )Option c):
stack
Option d):
idxmax
withfirst_valid_index
Option e)(From Pir):
idxmax
withisna
Using pure
numpy
(I think this is very fast)The idea is to compare if the index of the first non-nan is also the index of the
argmax
.Timings
We can use
numpy
'snanmax
here for an efficient solution:Timings (Whole lot of options presented here):
Functions
Setup
Results
You can do something similar to Wens' answer with the underlying Numpy arrays:
df.max(axis=0)
gives the column-wise max.The left hand side indexes
df.values
, which is a 2d array, to make it a 1d array and compare it element-wise to the maxes per column.If you exclude
.values
from the right-hand side, the result will just be a Pandas Series:After posting the question I came up with this:
which seems to work, but not sure yet!