pandas replace NaN to None exhibits counterintuiti

2020-08-10 07:38发布

问题:

Given a series

s = pd.Series([1.1, 1.2, np.nan])
s
0    1.1
1    1.2
2    NaN
dtype: float64

If the need arises to convert the NaNs to None (to, for example, work with parquets), then I would like to have

0     1.1
1     1.2
2    None
dtype: object

I would assume Series.replace would be the obvious way of doing this, but here's what the function returns:

s.replace(np.nan, None)

0    1.1
1    1.2
2    1.2
dtype: float64

The NaN was forward filled, instead of being replaced. Going through the docs, I see that if the second argument is None, then the first argument should be a dictionary. Based on this, I would expect replace to either replace as intended, or throw an exception.

I believe the workaround here is

pd.Series([x if pd.notna(x) else None for x in s], dtype=object) 
0     1.1
1     1.2
2    None
dtype: object

Which is fine. But I would like to understand why this behaviour occurs, whether it is documented, or if it is just a bug and I have to dust off my git profile and log one on the issue tracker... any ideas?

回答1:

This is behaviour is in the documentation of the method parameter:

method : {‘pad’, ‘ffill’, ‘bfill’, None}

The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None.

So in your example to_replace is a scalar, and value is None. The method by default is pad, from the documentation of fillna:

pad / ffill: propagate last valid observation forward to next valid