Adding new column to pandas DataFrame results in N

2019-06-25 13:22发布

I have a pandas DataFrame data with the following transaction data:

           A         date
0      M000833  2016-08-01
1      M000833  2016-08-01
2      M000833  2016-08-02
3      M000833  2016-08-02 
4      M000511  2016-08-05

I want a new column with the count of number of visits (multiple visits per day should be treated as 1) per consumer.

So I tried this:

import pandas as pd
data['noofvisits'] = data.groupby(['A'])['date'].nunique()

When I just run the statement without assigning it to the DataFrame, I get a pandas series with the desired output. However, the above statement result in:

           A         date       noofvisits
0      M000833  2016-08-01         NaN         
1      M000833  2016-08-01         NaN
2      M000833  2016-08-02         NaN
3      M000833  2016-08-02         NaN
4      M000511  2016-08-05         NaN

The expected output is:

           A         date       noofvisits
0      M000833  2016-08-01         2         
1      M000833  2016-08-01         2
2      M000833  2016-08-02         2
3      M000833  2016-08-02         2
4      M000511  2016-08-05         1

What is wrong with this approach? Why does the column noofvisits results in NAs rather than the count values?

标签： python pandas-groupby

1条回答

来，给爷笑一个

2楼-- · 2019-06-25 13:48

Use transform to generate a Series with it's index aligned to the original df:

In[32]:
df['noofvisits'] = df.groupby(['A'])['date'].transform('nunique')
df

Out[32]: 
             A        date  noofvisits
index                                 
0      M000833  2016-08-01           2
1      M000833  2016-08-01           2
2      M000833  2016-08-02           2
3      M000833  2016-08-02           2
4      M000511  2016-08-05           1

The problem with direct assigning is that you're grouping on column 'A' so this becomes the index of the groupby aggregation, you then try to assign to your df but the indices don't agree hence the NaN column values.

Also even if the index values did agree the shape is different anyway:

In[33]:
df.groupby(['A'])['date'].nunique()

Out[33]: 
A
M000511    1
M000833    2
Name: date, dtype: int64

0人赞添加讨论(0) 举报

Adding new column to pandas DataFrame results in N

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间