Trying to remove duplicate based on unique values on column 'new', I have even tried two methods, but the output df.shape suggests before/after have the same df shape, meaning remove duplication fails.
import pandas
import numpy as np
import random
df = pandas.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
df['new'] = [1, 1, 3, 4, 5, 1, 7, 8, 1, 10]
df['new2'] = [1, 1, 2, 4, 5, 3, 7, 8, 9, 5]
print df.shape
df.drop_duplicates('new', take_last=False)
df.groupby('new').max()
print df.shape
# output
(10, 6)
(10, 6)
[Finished in 1.0s]
You need to assign the result of
drop_duplicates
, by defaultinplace=False
so it returns a copy of the modified df, as you don't pass paraminplace=True
your original df is unmodified:if you passed
inplace=True
it would work: