How can I consistently convert strings like “3.71B

2019-06-21 04:05发布


I have some rather mangled code that almost produces the tangible price/book from Yahoo Finance for companies (a nice module called ystockquote gets the intangible price/book value already).

My problem is this:

For one of the variables in the calculation, shares outstanding I'm getting strings like 10.89B and 4.9M, where B and M stand respectively for billion and million. I'm having trouble converting them to numbers, here's where I'm at:

shares=''.join(node.findAll(text=True)).strip().replace('M','000000').replace('B','000000000').replace('.','') for node in soup2.findAll('td')[110:112]

Which is pretty messy, but I think it would work if instead of


I was using a regular expression with variables. I guess the question is simply which regular expression and variables. Other suggestions are also good.


To be specific I'm hoping to have something that works for numbers with zero, one, or two decimals but these answers all look helpful.


>>> from decimal import Decimal
>>> d = {
        'M': 6,
        'B': 9
>>> def text_to_num(text):
        if text[-1] in d:
            num, magnitude = text[:-1], text[-1]
            return Decimal(num) * 10 ** d[magnitude]
            return Decimal(text)

>>> text_to_num('3.17B')
>>> text_to_num('4M')
>>> text_to_num('4.1234567891234B')

You can int() the result if you want too


Parse the numbers as floats, and use a multiplier mapping:

multipliers = dict(M=10**6, B=10**9)
def sharesNumber(nodeText):
    nodeText = nodeText.strip()
    mult = 1
    if nodeText[-1] in multipliers:
        mult = multipliers[nodeText[-1]]
        nodeText = nodeText[:-1]
    return float(nodeText) * mult


num_replace = {
    'B' : 1000000000,
    'M' : 1000000,

a = "4.9M" 
b = "10.89B" 

def pure_number(s):
    mult = 1.0
    while s[-1] in num_replace:
        mult *= num_replace[s[-1]]
        s = s[:-1]
    return float(s) * mult 

pure_number(a) # 4900000.0
pure_number(b) # 10890000000.0

This will work with idiocy like:

pure_number("5.2MB") # 5200000000000000.0

and because of the dictionary approach, you can add as many suffixes as you want in an easy to maintain way, and you can make it more lenient by expressing your dict keys in one capitalisation form and then doing a .lower() or .upper() to make it match.


num_replace = {
    'B' : 'e9',
    'M' : 'e6',

def str_to_num(s):
    if s[-1] in num_replace:
        s = s[:-1]+num_replace[s[-1]]
    return int(float(s))

>>> str_to_num('3.71B')
>>> str_to_num('4M')

So '3.71B' -> '3.71e9' -> 3710000000L etc.


This could be an opportunity to safely use eval!! :-)

Consider the following fragment:

>>> d = { "B" :' * 1e9', "M" : '* 1e6'}
>>> s = "1.493B"
>>> ll = [d.get(c, c) for c in s]
>>> eval(''.join(ll), {}, {})

Now put it all together into a neat one liner:

d = { "B" :' * 1e9', "M" : '* 1e6'}

def human_to_int(s):
    return eval(''.join([d.get(c, c) for c in s]), {}, {})

print human_to_int('1.439B')
print human_to_int('1.23456789M')

Gives back:
