Shift NaNs to the end of their respective rows

2019-01-15 20:50发布

问题:

I have a DataFrame like :

     0    1    2
0  0.0  1.0  2.0
1  NaN  1.0  2.0
2  NaN  NaN  2.0

What I want to get is

Out[116]: 
     0    1    2
0  0.0  1.0  2.0
1  1.0  2.0  NaN
2  2.0  NaN  NaN

This is my approach as of now.

df.apply(lambda x : (x[x.notnull()].values.tolist()+x[x.isnull()].values.tolist()),1)
Out[117]: 
     0    1    2
0  0.0  1.0  2.0
1  1.0  2.0  NaN
2  2.0  NaN  NaN

Is there any efficient way to achieve this ? apply Here is way to slow . Thank you for your assistant!:)


My real data size

df.shape
Out[117]: (54812040, 1522)

回答1:

Here's a NumPy solution using justify -

In [455]: df
Out[455]: 
     0    1    2
0  0.0  1.0  2.0
1  NaN  1.0  2.0
2  NaN  NaN  2.0

In [456]: pd.DataFrame(justify(df.values, invalid_val=np.nan, axis=1, side='left'))
Out[456]: 
     0    1    2
0  0.0  1.0  2.0
1  1.0  2.0  NaN
2  2.0  NaN  NaN

If you want to save memory, assign it back instead -

df[:] = justify(df.values, invalid_val=np.nan, axis=1, side='left')


回答2:

Your best easiest option is to use sorted on df.apply/df.transform and sort by nullity.

df = df.apply(lambda x: sorted(x, key=pd.isnull), 1)
df
     0    1    2
0  0.0  1.0  2.0
1  1.0  2.0  NaN
2  2.0  NaN  NaN

You may also pass np.isnan to the key argument.