Drop duplicate row, If contain all same value

2020-04-17 05:21发布

I have a Dataframe as below:

df = pd.DataFrame({'first' : ['John', 'Mary','Peter'],
                      'last' : ['Mary', 'John','Mary']})

df
Out[700]: 
   first  last
0   John  Mary
1   Mary  John
2  Peter  Mary

I want to drop the duplicate when row contain the same value In this case, the expected out put will be :

   first  last  
0   John  Mary  
2  Peter  Mary

Below is my approach so far:

df['DropKey']=df.apply(lambda x: ''.join(sorted(pd.Series(x))),axis=1)
df.drop_duplicates('DropKey')

Are there any efficient way to achieve this ?

My real data size :

df.shape
Out[709]: (10000, 607)

标签： python pandas

1条回答

▲ chillily

2楼-- · 2020-04-17 05:57

In [13]: pd.DataFrame(np.sort(df.values, axis=1), columns=df.columns).drop_duplicates()
Out[13]:
  first   last
0  John   Mary
2  Mary  Peter

or:

In [18]: df.values.sort(axis=1)  # NOTE: it sorts DF in-place

In [19]: df
Out[19]:
  first   last
0  John   Mary
1  John   Mary
2  Mary  Peter

In [20]: df.drop_duplicates()
Out[20]:
  first   last
0  John   Mary
2  Mary  Peter

0人赞添加讨论(0) 举报

Drop duplicate row, If contain all same value

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间