Replace column values using a dictionary

2019-02-26 06:45发布

问题:

I have this dataframe where gender is expected to be male or female.

from io import StringIO
import pandas as pd

audit_trail = StringIO('''
course_id AcademicYear_to months TotalFee Gender
260 2017 24 100 male
260 2018 12 140 male
274 2016 36 300 mail
274 2017 24 340 female
274 2018 12 200 animal
285 2017 24 300 bird
285 2018 12 200 maela
''')

df11 = pd.read_csv(audit_trail, sep=" "  )

I can correct the spelling mistakes using dictionary.

corrections={'mail':'male', 'mael':'male', 'maae':'male'}
df11.Gender.replace(corrections)

But I am looking for a way to keep only male / female and "other" category for rest of the options. Expected output:

0      male
1      male
2      male
3    female
4    other
5    other
6      male
Name: Gender, dtype: object

回答1:

Add another two dummy entries to your corrections dict:

corrections = {'male'   : 'male',    # dummy entry for male
               'female' : 'female',  # dummy entry for female
               'mail'   : 'male', 
               'maela'  : 'male', 
               'maae'   : 'male'}

Now, use map and fillna:

df11.Gender = df11.Gender.map(corrections).fillna('other')
df11

   course_id  AcademicYear_to  months  TotalFee  Gender
0        260             2017      24       100    male
1        260             2018      12       140    male
2        274             2016      36       300    male
3        274             2017      24       340  female
4        274             2018      12       200   other
5        285             2017      24       300   other
6        285             2018      12       200    male


回答2:

You can use:

corrections={'mail':'male', 'maela':'male', 'maae':'male', 'male':'male', 'female':'female'}
df11[['Gender']] = df11[['Gender']].applymap(corrections.get).fillna('other')
print (df11)
   course_id  AcademicYear_to  months  TotalFee  Gender
0        260             2017      24       100    male
1        260             2018      12       140    male
2        274             2016      36       300    male
3        274             2017      24       340  female
4        274             2018      12       200   other
5        285             2017      24       300   other
6        285             2018      12       200    male

EDIT:

For replace only one column is better cᴏʟᴅsᴘᴇᴇᴅ's answer. If want replace multiple columns, better is applymap.