Pandas Fillna Mode

I have a data set in which there is a column known as Native Country which contain around 30000 records. Some are missing represented by NaN so I thought to fill it with mode() value. I wrote something like this:

data['Native Country'].fillna(data['Native Country'].mode(), inplace=True)

However when I do a count of missing values:

for col_name in data.columns: 
    print ("column:",col_name,".Missing:",sum(data[col_name].isnull()))

It is still coming up with the same number of NaN values for the column Native Country.

标签： python pandas fillna

3条回答

Ridiculous、

2楼-- · 2020-06-14 04:37

If we fill in the missing values with fillna(df['colX'].mode()), since the result of mode() is a Series, it will only fill in the first couple of rows for the matching indices. At least if done as below:

fill_mode = lambda col: col.fillna(col.mode())
df.apply(fill_mode, axis=0)

However, by simply taking the first value of the Series fillna(df['colX'].mode()[0]), I think we risk introducing unintended bias in the data. If the sample is multimodal, taking just the first mode value makes the already biased imputation method worse. For example, taking only 0 if we have [0, 21, 99] as the equally most frequent values. Or filling missing values with False when True and False values are equally frequent in a given column.

I don't have a clear cut solution here. Assigning a random value from all the local maxima could be one approach if using the mode is a necessity.

0人赞添加讨论(0) 举报

再贱就再见

3楼-- · 2020-06-14 04:39

Be careful, NaN may be the mode of your dataframe: in this case, you are replacing NaN with another NaN.

0人赞添加讨论(0) 举报

放荡不羁爱自由

4楼-- · 2020-06-14 04:40

Just call first element of series:

data['Native Country'].fillna(data['Native Country'].mode()[0], inplace=True)

or you can do the same with assisgnment:

data['Native Country'] = data['Native Country'].fillna(data['Native Country'].mode()[0])

0人赞添加讨论(0) 举报

Pandas Fillna Mode

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间