Normally I anonymize my data by using hashlib and using the .apply(hash) function.
Now im trying a new approach, imagine I have to following df called 'data':
contributor -- amount payed
eric -- 10
frank -- 28
john -- 49
frank -- 77
barbara -- 31
Which I want to anonymize by turning the names all into 'person1', 'person2' etc, like this:
contributor -- amount payed
person1 -- 10
person2 -- 28
person3 -- 49
person2 -- 77
person4 -- 31
So my first though was summarizing the name column so the names are attached to a unique index and I an use that index for the number after 'person'.
So now im stuck at the part how do I iterate through my data.name
column and look in the summarize dataframe for the index and replace the actual name by 'person3' for example.
my code so far
counter = 0
for names in data.contributor:
if names == summarize.contributor[counter]:
print(summarize.contributor[counter])
data.contributor.replace(summarize.contributor[counter], "Person %d" % counter)
counter = counter + 1
my thought was to put the names in a list + index, but I guess theres a faster way. Searching for 'Anthony' was just a test to see if my code was working.