I have this dataframe where gender is expected to be male or female.
from io import StringIO
import pandas as pd
audit_trail = StringIO('''
course_id AcademicYear_to months TotalFee Gender
260 2017 24 100 male
260 2018 12 140 male
274 2016 36 300 mail
274 2017 24 340 female
274 2018 12 200 animal
285 2017 24 300 bird
285 2018 12 200 maela
''')
df11 = pd.read_csv(audit_trail, sep=" " )
I can correct the spelling mistakes using dictionary.
corrections={'mail':'male', 'mael':'male', 'maae':'male'}
df11.Gender.replace(corrections)
But I am looking for a way to keep only male / female and "other" category for rest of the options. Expected output:
0 male
1 male
2 male
3 female
4 other
5 other
6 male
Name: Gender, dtype: object
Add another two dummy entries to your corrections
dict:
corrections = {'male' : 'male', # dummy entry for male
'female' : 'female', # dummy entry for female
'mail' : 'male',
'maela' : 'male',
'maae' : 'male'}
Now, use map
and fillna
:
df11.Gender = df11.Gender.map(corrections).fillna('other')
df11
course_id AcademicYear_to months TotalFee Gender
0 260 2017 24 100 male
1 260 2018 12 140 male
2 274 2016 36 300 male
3 274 2017 24 340 female
4 274 2018 12 200 other
5 285 2017 24 300 other
6 285 2018 12 200 male
You can use:
corrections={'mail':'male', 'maela':'male', 'maae':'male', 'male':'male', 'female':'female'}
df11[['Gender']] = df11[['Gender']].applymap(corrections.get).fillna('other')
print (df11)
course_id AcademicYear_to months TotalFee Gender
0 260 2017 24 100 male
1 260 2018 12 140 male
2 274 2016 36 300 male
3 274 2017 24 340 female
4 274 2018 12 200 other
5 285 2017 24 300 other
6 285 2018 12 200 male
EDIT:
For replace only one column is better cᴏʟᴅsᴘᴇᴇᴅ's answer. If want replace multiple columns, better is applymap
.