How to parse and simplify a string like '3cm/µ

2019-04-06 06:47发布

I'd like to split a string like 3cm/µs² + 4e-4 sqmiles/km/h**2 into its SI unit (in this case, m/s**2) and its magnitude (in multiples of that unit).

Since sympy provides both a parsing module and many physical units and SI prefixes, I guess using sympy would be a good idea. But what is a nice way to achieve this? I'd write an algorithm like the following, but I'd like to avoid reinventing a squared wheel:

  • Treat the transition between a number and a letter (except for the 4e-4 like syntax) and whitespace (unless its next to an explicit operator) as multiplication, then tokenize
  • Replace each non-numeric token by its SI representation (also checking for SI-prefixes)
  • Simplify the new expression down to Magnitude * some SI units (giving a meaningful error message on inconsistent units, e.g. Cannot add m**2 to s)

Can this be easily achieved via existing means? Or how would this be best implemented?

2条回答
ゆ 、 Hurt°
2楼-- · 2019-04-06 07:16

Units

A solution would be to gather all units from the SymPy units module and use them to substitute the symbols created by sympify

>>> import sympy.physics.units as u 
... subs = {} 
... for k, v in u.__dict__.items(): 
...     if isinstance(v, Expr) and v.has(u.Unit): 
...         subs[Symbol(k)] = v # Map the `Symbol` for a unit to the unit

>>> # sympify returns `Symbol`s, `subs` maps them to `Unit`s
>>> print sympify('yard*millimeter/ly').subs(subs)
127*m/1313990343414000000000

If the symbol is not in units it will just be printed as unknown symbol (for instance barn)

>>> print sympify('barn/meter**2').subs(subs)
barn/m**2 

But you can always add stuff to the subs dictionary.

>>> subs[Symbol('almost_meter')] = 0.9*u.meter
... sympify('almost_meter').subs(subs)
0.9*m

SI prefixes don't work exactly like you want them. You will need to add a multiplication sign (or hope that it is a common unit like km which is explicitly implemented). Moreover, as they are not Unit instances but rather Integer instance you will have to add them to subs:

>>> import sympy.physics.units as u
... subs = {} 
... for k, v in u.__dict__.items(): 
...     if (isinstance(v, Expr) and v.has(u.Unit)) or isinstance(v, Integer): 
...         subs[Symbol(k)] = v 

>>> print sympify('mega*m').subs(subs)
1000000*m 

For unicode you might need some preprocessing. I do not think SymPy makes any promises about unicode support.

If you implement new Units, please consider making a pull request with them on github. The file to edit should be sympy/physics/units.py.

Whitespaces and implicit multiplication

In the dev version of SymPy you can find code for assuming implicit multiplications where appropriate whitespaces are written:

>>> from sympy.parsing.sympy_parser import (parse_expr,
... standard_transformations, implicit_multiplication_application)

>>> parse_expr("10sin**2 x**2 + 3xyz + tan theta",
...            transformations=(standard_transformations + 
...                             (implicit_multiplication_application,)))
3*x*y*z + 10*sin(x**2)**2 + tan(theta) 

Security

sympify uses eval which is exploitable if you are going to use it for a web facing app!

查看更多
够拽才男人
3楼-- · 2019-04-06 07:33

I've found astropy to have a good units module. After some preparation you can do

import astropy.units as u
from functools import reduce
u.Unit('MeV/fm').si #160.218 N
eval('1*MeV/fm+3*N',u.__dict__).si #163.21765649999998 N

from astropy.units import imperial
u.__dict__.update(imperial.__dict__)
u.sqmiles = u.mile**2
eval('3*cm/Ys**2 + 4e-4*sqmiles/km/h**2',u.__dict__).si #7.993790464000001e-08 m / s2

The following function adds scipy CODATA constants as quantities to astropy units

def units_and_constants():
    """
    >>> u = units_and_constants()
    >>> u.hartree_joule_relationship
    <Quantity 4.35974434e-18 J>

    >>> eval('1*MeV/fm+3*N',u.__dict__).si
    <Quantity 163.21765649999998 N>

    """
    import astropy.units as u
    from astropy.units import imperial
    u.__dict__.update(imperial.__dict__)
    from scipy.constants import physical_constants, value, unit
    import string
    def qntty(x): 
        un = unit(x)
        va = value(x)
        if un:
            return va*eval(un.strip().replace(' ','*').replace('^','**'),u.__dict__)
        else:
            return va
    u.sr = u.radian**2
    u.E_h = qntty('hartree-joule relationship')
    u.c = qntty('speed of light in vacuum')
    u.C_90 = (1+4.6e-8)*u.C 
    codata = {}
    for n, t in physical_constants.items():
        v = qntty(n)
        for x in string.punctuation+' ':
            n = n.replace(x,'_')
        codata[n] = v
    u.__dict__.update(codata)
    return u

yt also tackles a problem similar to yours. Have a look at the Test file to see how it is used.

查看更多
登录 后发表回答