Pandas: for all set of duplicate entries in a part

2019-05-03 07:39发布

I have a large Dataframe that looks similar to this:

     ID_Code    Status1    Status2
0      A          Done       Not
1      A          Done       Done
2      B          Not        Not
3      B          Not        Done
4      C          Not        Not
5      C          Not        Not
6      C          Done       Done

What I want to do is calculate is for each of the set of duplicate ID codes, find out the percentage of Not-Not entries are present. (i.e. [# of Not-Not/# of total entries] * 100)

I'm struggling to do so using groupby and can't seem to get the right syntax to perform this.

标签： python pandas pandas-groupby

3条回答

Ridiculous、

2楼-- · 2019-05-03 08:09

Using sum and a boolean mask:

df.filter(like='Status').eq('Not').all(1).groupby(df.ID_Code).mean().mul(100)

ID_Code
A     0.000000
B    50.000000
C    66.666667
Name: flag, dtype: float64

0人赞添加讨论(0) 举报

成全新的幸福

3楼-- · 2019-05-03 08:12

IIUC using crosstab

pd.crosstab(df['ID_Code'],(df['Status1'].eq('Not'))&(df['Status2'].eq('Not')),normalize ='index')
Out[713]: 
col_0       False     True 
ID_Code                    
A        1.000000  0.000000
B        0.500000  0.500000
C        0.333333  0.666667



#pd.crosstab(df['ID_Code'],(df['Status1'].eq('Not'))&(df['Status2'].eq('Not')),normalize ='index')[True]

0人赞添加讨论(0) 举报

冷血范

4楼-- · 2019-05-03 08:18

I may have misunderstood the question, but you appear to be referring to when values of Status1 and Status2 are both Not, correct? If that's the case, you can do something like:

df.groupby('ID_Code').apply(lambda x: (x[['Status1','Status2']] == 'Not').all(1).sum()/len(x)*100)

ID_Code
A     0.000000
B    50.000000
C    66.666667
dtype: float64

0人赞添加讨论(0) 举报

Pandas: for all set of duplicate entries in a part

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间