Pandas Python Regex : error: nothing to repeat

2019-02-19 03:00发布

问题:

I have a dataframe with a couple of strange characters, "*" and "-".

import pandas as pd
import numpy as np

data = {'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
        'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions',     'Lions', 'Lions'],
        'wins': [11, '*', 10, '-', 11, 6, 10, 4],
        'losses': [5, 8, 6, 1, 5, 10, 6, 12]}
football = pd.DataFrame(data, columns=['year', 'team', 'wins', 'losses'])

I would like to replace the strange characters with '0.00' but I get an error -

error: nothing to repeat

I understand this is linked to regex but I still dont know how to overcome the issue.

the code I use to replace the characters:

football.replace(['*','-'], ['0.00','0.00'], regex=True).astype(np.float64)

回答1:

* is a special character in regex, you have to escape it:

football.replace(['\*','-'], ['0.00','0.00'], regex=True).astype(np.float64)

or use a character class:

football.replace([*-], '0.00', regex=True).astype(np.float64)


回答2:

Do

football.replace(['*','-'], ['0.00','0.00'], regex=False)

That is, there is no need to use regular expression for a simple case of matching just 1 character or another;

or if you want to use regular expression, do note that * is a special character; if you want to match values that are '*' or '-' exactly, use

football.replace('^[*-]$', '0.00', regex=True)


回答3:

You could use a list comprehension within a dict comprehension to do this

>>> {key: [i if i not in {'*','-'} else '0.00' for i in values] for key, values in data.items()}
{'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
 'wins': [11, '0.00', 10, '0.00', 11, 6, 10, 4],
 'losses': [5, 8, 6, 1, 5, 10, 6, 12],
 'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions', 'Lions', 'Lions']}

This would be done to clean up data before you make a DataFrame.