I just learnt from Format numbers as currency in Python that the Python module babel provides babel.numbers.format_currency
to format numbers as currency. For instance,
from babel.numbers import format_currency
s = format_currency(123456.789, 'USD', locale='en_US') # u'$123,456.79'
s = format_currency(123456.789, 'EUR', locale='fr_FR') # u'123\xa0456,79\xa0\u20ac'
How about the reverse, from currency to numbers, such as $123,456,789.00
--> 123456789
? babel
provides babel.numbers.parse_number
to parse local numbers, but I didn't found something like parse_currency
. So, what is the ideal way to parse local currency into numbers?
I went through Python: removing characters except digits from string.
# Way 1
import string
all=string.maketrans('','')
nodigs=all.translate(all, string.digits)
s = '$123,456.79'
n = s.translate(all, nodigs) # 12345679, lost `.`
# Way 2
import re
n = re.sub("\D", "", s) # 12345679
It doesn't take care the decimal separator .
.
Remove all non-numeric characters, except for .
, from a string (refer to here),
import re
# Way 1:
s = '$123,456.79'
n = re.sub("[^0-9|.]", "", s) # 123456.79
# Way 2:
non_decimal = re.compile(r'[^\d.]+')
s = '$123,456.79'
n = non_decimal.sub('', s) # 123456.79
It does process the decimal separator .
.
But the above solutions don't work when coming to, for instance,
from babel.numbers import format_currency
s = format_currency(123456.789, 'EUR', locale='fr_FR') # u'123\xa0456,79\xa0\u20ac'
new_s = s.encode('utf-8') # 123 456,79 €
As you can see, the format of currency varies. What is the ideal way to parse currency into numbers in a general way?
Using babel
The babel documentation notes that the number parsing is not fully implemented yes but they have done a lot of work to get currency info into the library. You can use
get_currency_name()
andget_currency_symbol()
to get currency details, and also all otherget_...
functions to get the normal number details (decimal point, minus sign, etc.).Using that information you can exclude from a currency string the currency details (name, sign) and groupings (e.g.
,
in the US). Then you change the decimal details into the ones used by theC
locale (-
for minus, and.
for the decimal point).This results in this code (i added an object to keep some of the data, which may come handy in further processing):
The output looks promising (in US locale):
And it still works in different locales (Brazil is notable for using the comma as a decimal mark):
It is worth to point out that
babel
has some encoding problems. That is because the locale files (inlocale-data
) do use different encoding themselves. If you're working with currencies you're familiar with that should not be a problem. But if you try unfamiliar currencies you might run into problems (i just learned that Poland usesiso-8859-2
, notiso-8859-1
).