Can I replace Nans with the mode of a column in a

I have some data that looks like...

Year      Make   Model  Trim
2007     Acura  TL      Base
2010     Dodge  Avenger SXT
2009     Dodge  Caliber SXT
2008     Dodge  Caliber SXT
2008     Dodge  Avenger SXT

Trim has some missing values. What I would like to do is something like the following:

Group by year make and model
Impute Trim if there are missing valyes for that group

So for instance, I would look at all the 2007 Acura TL. That might look like

 Year      Make   Model Trim
2007     Acura  TL      Base
2007     Acura  TL      XLR
2007     Acura  TL      NaN
2007     Acura  TL      Base

Then, I would impute the Nan with Base (since Base is the Mode). It is important to remember here that I want to do this for every group of Year, Make, and Model.

标签： python pandas missing-data imputation

2条回答

再贱就再见

2楼-- · 2019-07-09 07:23

Use groupby then mode. Note that mode returns an array and you want to grab the first element of it. @John Galt deserves credit for this and gets my upvote.

I use assign to create a copy of df with an overwritten version of the Trim column.

df.assign(
    Trim=df.groupby(
        ['Year', 'Make', 'Model']
    ).Trim.apply(
        lambda x: x.fillna(x.mode()[0])
    )
)

   Year   Make Model  Trim
0  2007  Acura    TL  Base
1  2007  Acura    TL   XLR
2  2007  Acura    TL  Base
3  2007  Acura    TL  Base

You can overwrite the column directly with

df['Trim'] = df.groupby(
    ['Year', 'Make', 'Model']
).Trim.apply(
    lambda x: x.fillna(x.mode()[0])
)

0人赞添加讨论(0) 举报

我想做一个坏孩纸

3楼-- · 2019-07-09 07:38

Use mode

In [215]: df
Out[215]:
   Year   Make    Model  Trim
0  2007  Acura       TL  Base
1  2010  Dodge  Avenger   SXT
2  2009  Dodge  Caliber   NaN
3  2008  Dodge  Caliber   SXT
4  2008  Dodge  Avenger   SXT

In [216]: df.Trim.fillna(df.Trim.mode()[0])
Out[216]:
0    Base
1     SXT
2     SXT
3     SXT
4     SXT
Name: Trim, dtype: object

Use inplace=True to actually set

In [217]: df.Trim.fillna(df.Trim.mode()[0], inplace=True)

In [218]: df
Out[218]:
   Year   Make    Model  Trim
0  2007  Acura       TL  Base
1  2010  Dodge  Avenger   SXT
2  2009  Dodge  Caliber   SXT
3  2008  Dodge  Caliber   SXT
4  2008  Dodge  Avenger   SXT

If you're working on groups

In [227]: df
Out[227]:
   Year   Make Model  Trim
0  2007  Acura    TL  Base
1  2007  Acura    TL   XLR
2  2007  Acura    TL   NaN
3  2007  Acura    TL  Base

In [228]: (df.groupby(['Year', 'Make', 'Model'])['Trim']
             .apply(lambda x: x.fillna(x.mode()[0])))
     ...:
Out[228]:
0    Base
1     XLR
2    Base
3    Base
Name: Trim, dtype: object

0人赞添加讨论(0) 举报

Can I replace Nans with the mode of a column in a

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间