I am trying to convert a list of Country Name Data to ISO3166 Country Codes (alpha3) using the pycountries library. My basic function is as:
import pycountries as pc
def guess_country(data, output='alpha3', verbose=False):
#Check Data isn't already in Alpha3
try:
country = pc.countries.get(alpha3=data)
return country
except:
pass #KeyError Raised, data doesn't directly match
#Check if Country is Actual CountryName
try:
country = pc.countries.get(name=data)
return country
except:
pass #KeyError Raised, data doesn't directly match
#Check RegExpr of 'data' in an attempt to match
The issue is that the CountryName data is rather dirty ... a short list of sample is
GUATMAL, CHINA T, COLOMB, MEXICO, HG KONG
Does anyone know if there is a package that returns the best 'guess' match given a cntry_name? I would be happy for some to be rejected based on difficulty (i.e. China T -> Taiwan). It would be nice if the best_guess returned a measure of certainty regarding the 'guess'.