Duplicate rows in pandas DF

2020-02-08 03:34发布

I have a DF in Pandas, which looks like:

Letters Numbers
A       1
A       3
A       2
A       1
B       1
B       2
B       3
C       2
C       2

I'm looking to count the number of similar rows and save the result in a third column. For example, the output I'm looking for:

Letters Numbers Events
A       1       2
A       2       1
A       3       1
B       1       1
B       2       1
B       3       1
C       2       2

An example of what I'm looking to do is here. The best idea I've come up with is to use count_values(), but I think this is just for one column. Another idea is to use duplicated(), anyway I don't want construct any for-loop. I'm pretty sure, that a Pythonic alternative to a for loop exists.

标签： pandas count duplicates row

2条回答

兄弟一词,经得起流年.

2楼-- · 2020-02-08 03:41

You can use a combination of groupby, transform and then drop_duplicates

In [84]:

df['Events'] = df.groupby('Letters')['Numbers'].transform(pd.Series.value_counts)
df.drop_duplicates()
Out[84]:
  Letters  Numbers  Events
0       A        1       2
1       A        3       1
2       A        2       1
4       B        1       1
5       B        2       1
6       B        3       1
7       C        2       2

0人赞添加讨论(0) 举报

一夜七次

3楼-- · 2020-02-08 03:50

You can groupby these two columns and then calculate the sizes of the groups:

In [16]: df.groupby(['Letters', 'Numbers']).size()
Out[16]: 
Letters  Numbers
A        1          2
         2          1
         3          1
B        1          1
         2          1
         3          1
C        2          2
dtype: int64

To get a DataFrame like in your example output, you can reset the index with reset_index.

0人赞添加讨论(0) 举报

Duplicate rows in pandas DF

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间