pandas dataframe replace blanks with NaN

2020-07-09 08:46发布

I have a dataframe with empty cells and would like to replace these empty cells with NaN. A solution previously proposed at this forum works, but only if the cell contains a space:

df.replace(r'\s+',np.nan,regex=True)

This code does not work when the cell is empty. Has anyone a suggestion for a panda code to replace empty cells.

Wannes

标签: string pandas na
4条回答
虎瘦雄心在
2楼-- · 2020-07-09 09:12

I think the easiest thing here is to do the replace twice:

In [117]:
df = pd.DataFrame({'a':['',' ','asasd']})
df

Out[117]:
       a
0       
1       
2  asasd

In [118]:
df.replace(r'\s+',np.nan,regex=True).replace('',np.nan)

Out[118]:
       a
0    NaN
1    NaN
2  asasd
查看更多
ら.Afraid
3楼-- · 2020-07-09 09:13

As you've already seen, if you do the obvious thing and replace() with None it throws an error:

df.replace('', None)
TypeError: cannot replace [''] with method pad on a DataFrame

The solution seems to be to simply replace the empty string with numpy's NaN.

import numpy as np
df.replace('', np.NaN)

While I'm not 100% sure that pd.NaN is treated in exactly the same way as np.NaN across all edge cases, I've not had any problems. fillna() works, persisting NULLs to database in place of np.NaN works, persisting NaN to csv works.

(Pandas version 18.1)

查看更多
走好不送
4楼-- · 2020-07-09 09:18

Both other answers do not take in account all characters in a string. This is better:

df.replace(r'\s+( +\.)|#',np.nan,regex=True).replace('',np.nan))

More docs on: Replacing blank values (white space) with NaN in pandas

查看更多
▲ chillily
5楼-- · 2020-07-09 09:29

How about this?

df.replace(r'\s+|^$', np.nan, regex=True)
查看更多
登录 后发表回答