I have a Dataframe
as below:
df = pd.DataFrame({'first' : ['John', 'Mary','Peter'],
'last' : ['Mary', 'John','Mary']})
df
Out[700]:
first last
0 John Mary
1 Mary John
2 Peter Mary
I want to drop the duplicate when row contain the same value In this case, the expected out put will be :
first last
0 John Mary
2 Peter Mary
Below is my approach so far:
df['DropKey']=df.apply(lambda x: ''.join(sorted(pd.Series(x))),axis=1)
df.drop_duplicates('DropKey')
Are there any efficient way to achieve this ?
My real data size :
df.shape
Out[709]: (10000, 607)
or: