Flexible numeric string parsing in Python

2019-04-24 17:59发布

Are there any Python libraries that help parse and validate numeric strings beyond what is supported by the built-in float() function? For example, in addition to simple numbers (1234.56) and scientific notation (3.2e15), I would like to be able to parse formats like:

  • Numbers with commas: 2,147,483,647
  • Named large numbers: 5.5 billion
  • Fractions: 1/4

I did a bit of searching and could not find anything, though I would be surprised if such a library did not already exist.

4条回答
兄弟一词,经得起流年.
2楼-- · 2019-04-24 18:07

If you want to convert "localized" numbers such as the American "2,147,483,647" form, you can use the atof() function from the locale module. Example:

import locale
locale.setlocale(locale.LC_NUMERIC, 'en_US')
print locale.atof('1,234,456.23')  # Prints 1234456.23

As for fractions, Python now handles them directly (since version 2.6); they can even be built from a string:

from fractions import Fraction
x = Fraction('1/4')
print float(x)  # 0.25

Thus, you can parse a number written in any of the first 3 ways you mention, only with the help of the above two standard modules:

try:
    num = float(num_str)
except ValueError:
    try:
        num = locale.atof(num_str)
    except ValueError:
        try:
            num = float(Fraction(num_str))
        except ValueError:
            raise Exception("Cannot parse '%s'" % num_str)  # Or handle '42 billion' here
# 'num' has the numerical value of 'num_str', here.        
查看更多
爷、活的狠高调
3楼-- · 2019-04-24 18:12

babel has support for the first case (i18n numbers with commas). Docs: http://babel.edgewall.org/wiki/ApiDocs/babel.numbers.

Supporting simple named numbers should not be too hard to code up yourself, same with fractions.

查看更多
淡お忘
4楼-- · 2019-04-24 18:25

I haven't heard of one. Do you know of any such library for any other languages? That way you could leverage their documentation and tests.

If you can't find one, write a bunch of testcases, then we can help you fill out the parsing code.

Google must have one, try searching for 5.5billion * 10, but I don't think they have opensourced anything like that. Depending on how you need to use it, you might be able to use Google to do some of the work ;)

查看更多
淡お忘
5楼-- · 2019-04-24 18:30

It should be pretty straightforward to build one in pyparsing - in fact, one of the tutorial pyparsing projects does some of this (wordsToNum.py on this page) does some of it already. You're talking about things that don't really have standard representations (standard in the sense of ISO 8602, not standard in the sense of "what everybody knows"), so it could easily be that nobody's done just what you're looking for.

查看更多
登录 后发表回答