Parsing an equation with sub-formulas in python

2019-09-07 01:52发布

I'm trying to develop an equation parser using a compiler approach in Python. The main issue that I encounter is that it is more likely that I don't have all the variables and need therefore to look for sub-formulas. Let's show an example that is worth a thousand words ;)

I have four variables whom I know the values: vx, vy, vz and c:

list_know_var = ['vx', 'vy', 'vz', 'c']

and I want to compute the Mach number (M) defined as

equation = 'M = V / c'

I already know the c variable but I don't know V. However, I know that the velocity V that can be computed using the vx, vy and vz and this is stored in a dictionary with other formulas (here only one sub formula is shown)

know_equations = {'V': '(vx ** 2 + vy ** 2 + vz ** 2) ** 0.5'}

Therefore, what I need is to parse the equation and check if I have all the variables. If not, I shall look into the know_equations dictionary to see if an equation is defined for it and this recursively until my equation is well defined.

For now on, I have been able using the answer given here to parse my equation and know if I know all the variables. The issue is that I did not find a way to replace the unknown variable (here V) by its expression in know_equation:

parsed_equation = compiler.parse(equation)
list_var = re.findall("Name\(\'(\w*)\'\)", str(parsed_equation.getChildNodes()[0]))

list_unknow_var = list(set(list_var) - set(list_know_var))
for var in list_unknow_var:
    if var in know_equations:
        # replace var in equation by its expression given in know_equations
        # and repeate until all the variables are known or raise Error
        pass

Thank you in advance for your help, much appreciate! Adrien

1条回答
疯言疯语
2楼-- · 2019-09-07 02:29

So i'm spitballing a bit, but here goes.

The compiler.parse function returns an instance of compiler.ast.Module which contains an abstract syntax tree. You can traverse this instance using the getChildNodes method. By recursively examining the left and right attributes of the nodes as you traverse the tree you can isolate compiler.ast.Name instances and swap them out for your substitution expressions.

So a worked example might be:

import compiler

def recursive_parse(node,substitutions):

    # look for left hand side of equation and test
    # if it is a variable name

    if hasattr(node.left,"name"):
        if node.left.name in substitutions.keys():
            node.left = substitutions[node.left.name]
    else:

        # if not, go deeper
        recursive_parse(node.left,substitutions)

    # look for right hand side of equation and test
    # if it is a variable name

    if hasattr(node.right,"name"):
        if node.right.name in substitutions.keys():
            node.right = substitutions[node.right.name]
    else:

        # if not, go deeper
        recursive_parse(node.right,substitutions)

def main(input):        

    substitutions = {
        "r":"sqrt(x**2+y**2)"
    }    

    # each of the substitutions needs to be compiled/parsed
    for key,value in substitutions.items():

        # this is a quick ugly way of getting the data of interest
        # really this should be done in a programatically cleaner manner 
        substitutions[key] = compiler.parse(substitutions[key]).getChildNodes()[0].getChildNodes()[0].getChildNodes()[0]

    # compile the input expression.
    expression = compiler.parse(input)

    print "Input:        ",expression

    # traverse the selected input, here we only pass the branch of interest. 
    # again, as with above, this done quick and dirty.
    recursive_parse(expression.getChildNodes()[0].getChildNodes()[0].getChildNodes()[1],substitutions)

    print "Substituted:  ",expression

if __name__ == "__main__":
    input = "t = r*p"
    main(input)

I have admittedly only tested this on a handful of use cases, but I think the basis is there for a generic implementation that can handle a wide variety of inputs.

Running this, I get the output:

Input:         Module(None, Stmt([Assign([AssName('t', 'OP_ASSIGN')], Mul((Name('r'), Name('p'))))]))
Substituted:   Module(None, Stmt([Assign([AssName('t', 'OP_ASSIGN')], Mul((CallFunc(Name('sqrt'), [Add((Power((Name('x'), Const(2))), Power((Name('y'), Const(2)))))], None, None), Name('p'))))]))

EDIT:

So the compiler module is depreciated in Python 3.0, so a better (and cleaner) solution would be to use the ast module:

import ast
from math import sqrt

# same a previous recursion function but with looking for 'id' not 'name' attribute
def recursive_parse(node,substitutions):
    if hasattr(node.left,"id"):
        if node.left.id in substitutions.keys():
            node.left = substitutions[node.left.id]
    else:
        recursive_parse(node.left,substitutions)

    if hasattr(node.right,"id"):
        if node.right.id in substitutions.keys():
            node.right = substitutions[node.right.id]
    else:
        recursive_parse(node.right,substitutions)

def main(input):

    substitutions = {
        "r":"sqrt(x**2+y**2)"
        }

    for key,value in substitutions.items():
        substitutions[key] = ast.parse(substitutions[key], mode='eval').body

    # As this is an assignment operation, mode must be set to exec
    module = ast.parse(input, mode='exec')

    print "Input:        ",ast.dump(module)

    recursive_parse(module.body[0].value,substitutions)

    print "Substituted:  ",ast.dump(module)

    # give some values for the equation
    x = 3
    y = 2
    p = 1
    code = compile(module,filename='<string>',mode='exec')
    exec(code)

    print input
    print "t =",t


if __name__ == "__main__":
    input = "t = r*p"
    main(input)

This will compile the expression and execute it in the local space. The output should be:

Input:         Module(body=[Assign(targets=[Name(id='t', ctx=Store())], value=BinOp(left=Name(id='r', ctx=Load()), op=Mult(), right=Name(id='p', ctx=Load())))])
Substituted:   Module(body=[Assign(targets=[Name(id='t', ctx=Store())], value=BinOp(left=Call(func=Name(id='sqrt', ctx=Load()), args=[BinOp(left=BinOp(left=Name(id='x', ctx=Load()), op=Pow(), right=Num(n=2)), op=Add(), right=BinOp(left=Name(id='y', ctx=Load()), op=Pow(), right=Num(n=2)))], keywords=[], starargs=None, kwargs=None), op=Mult(), right=Name(id='p', ctx=Load())))])
t = r*p
t = 3.60555127546
查看更多
登录 后发表回答