pandas.factorize
encodes input values as an enumerated type or categorical variable.
But how can I easily and efficiently convert many columns of a data frame? What about the reverse mapping step?
Example: This data frame contains columns with string values such as "type 2" which I would like to convert to numerical values - and possibly translate them back later.
I would like to redirect my answer: https://stackoverflow.com/a/32011969/1694714
Old answer
Another readable solution for this problem, when you want to keep the categories consistent across the the resulting DataFrame is using replace:
Performs slightly worse than the example by @jezrael, but easier to read. Also, it might escalate better for bigger datasets. I can do some proper testing if anyone is interested.
You can use
apply
if you need tofactorize
each column separately:If you need for the same string value the same numeric one:
If you need to apply the function only for some columns, use a subset:
Solution with
factorize
:Translate them back is possible via
map
bydict
, where you need to remove duplicates bydrop_duplicates
:I also found this answer quite helpful: https://stackoverflow.com/a/20051631/4643212
I was trying to take values from an existing column in a Pandas DataFrame (a list of IP addresses named 'SrcIP') and map them to numerical values in a new column (named 'ID' in this example).
Solution:
Result: