I will simplify as much as possible. I have a DataFrame with a list of businesses by state. Some States are abbreviated, some are not. I want to replace the full state name with the Abbreviation (ex: New Jersey to NJ).
I found a cool module "US" found here that lists all the states and their abbreviations in a dictionary. What I would like to do is replace the full name with the abbreviations.
code:
import pandas as pd
import numpy as np
import us
dfp = pd.DataFrame({'A' : [np.NaN,np.NaN,3,4,5,5,3,1,5,np.NaN],
'B' : [1,0,3,5,0,0,np.NaN,9,0,0],
'C' : ['Pharmacy of Oklahoma','NY Pharma','NJ Pharmacy','Idaho Rx','CA Herbals','Florida Pharma','AK RX','Ohio Drugs','PA Rx','USA Pharma'],
'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
'E' : ['Assign','Unassign','Assign','Ugly','Appreciate','Undo','Assign','Unicycle','Assign','Unicorn',]})
print(dfp)
statez = us.states.mapping('abbr', 'name')
lst_of_abbrv = statez.keys()
lst_of_states = statez.values()
phrase = "Pharmacy of Oklahoma"
for x in phrase.split():
if x in lst_of_states:
x= x.replace(x, 'State')
print(phrase.split())
Right now the only thing I'm able to do is use a string and replace it with the word "State". How do i replace the name with the abbreviations from the dictionary? I've tried and want something like x= x.replace(x, lst_of_abbrv)
but it errors because you obviously can't replace with dict_keys.
Extra points if you are able to explain how to apply this to column "C" of the Dataframe
First I would define a function that would replace the full name of states in a string if any exist or return the original string.
then you can apply this function to the entire column of the dataframe
Here is the complete solution: