pyparsing - How to parse string with comparison op

2020-04-26 03:01发布

问题:

So, I have a NumericStringParser class (extracted from here), defined as below:

from __future__ import division
from pyparsing import Literal, CaselessLiteral, Word, Combine, Group, Optional, ZeroOrMore, Forward, nums, alphas, oneOf, ParseException
import math
import operator

class NumericStringParser(object):

    def __push_first__(self, strg, loc, toks):
        self.exprStack.append(toks[0])

    def __push_minus__(self, strg, loc, toks):
        if toks and toks[0] == "-":
            self.exprStack.append("unary -")

    def __init__(self):
        point = Literal(".")
        e = CaselessLiteral("E")
        fnumber = Combine(Word("+-" + nums, nums) +
                          Optional(point + Optional(Word(nums))) +
                          Optional(e + Word("+-" + nums, nums)))
        ident = Word(alphas, alphas + nums + "_$")
        plus = Literal("+")
        minus = Literal("-")
        mult = Literal("*")
        floordiv = Literal("//")
        div = Literal("/")
        mod = Literal("%")
        lpar = Literal("(").suppress()
        rpar = Literal(")").suppress()
        addop = plus | minus
        multop = mult | floordiv | div | mod
        expop = Literal("^")
        pi = CaselessLiteral("PI")
        tau = CaselessLiteral("TAU")
        expr = Forward()
        atom = ((Optional(oneOf("- +")) +
                 (ident + lpar + expr + rpar | pi | e | tau | fnumber).setParseAction(self.__push_first__))
                | Optional(oneOf("- +")) + Group(lpar + expr + rpar)
                ).setParseAction(self.__push_minus__)

        factor = Forward()
        factor << atom + \
            ZeroOrMore((expop + factor).setParseAction(self.__push_first__))
        term = factor + \
            ZeroOrMore((multop + factor).setParseAction(self.__push_first__))
        expr << term + \
            ZeroOrMore((addop + term).setParseAction(self.__push_first__))

        self.bnf = expr

        self.opn = {
            "+": operator.add,
            "-": operator.sub,
            "*": operator.mul,
            "/": operator.truediv,
            "//": operator.floordiv,
            "%": operator.mod,
            "^": operator.pow,
            "=": operator.eq,
            "!=": operator.ne,
            "<=": operator.le,
            ">=": operator.ge,
            "<": operator.lt,
            ">": operator.gt
            }

        self.fn = {
            "sin": math.sin,
            "cos": math.cos,
            "tan": math.tan,
            "asin": math.asin,
            "acos": math.acos,
            "atan": math.atan,
            "exp": math.exp,
            "abs": abs,
            "sqrt": math.sqrt,
            "floor": math.floor,
            "ceil": math.ceil,
            "trunc": math.trunc,
            "round": round,
            "fact": factorial,
            "gamma": math.gamma
            }

    def __evaluate_stack__(self, s):
        op = s.pop()
        if op == "unary -":
            return -self.__evaluate_stack__(s)
        if op in ("+", "-", "*", "//", "/", "^", "%", "!=", "<=", ">=", "<", ">", "="):
            op2 = self.__evaluate_stack__(s)
            op1 = self.__evaluate_stack__(s)
            return self.opn[op](op1, op2)
        if op == "PI":
            return math.pi
        if op == "E":
            return math.e
        if op == "PHI":
            return phi
        if op == "TAU":
            return math.tau
        if op in self.fn:
            return self.fn[op](self.__evaluate_stack__(s))
        if op[0].isalpha():
            raise NameError(f"{op} is not defined.")
        return float(op)

I have an evaluate() function, defined as below:

def evaluate(expression, parse_all=True):
    nsp = NumericStringParser()
    nsp.exprStack = []
    try:
        nsp.bnf.parseString(expression, parse_all)
    except ParseException as error:
        raise SyntaxError(error)
    return nsp.__evaluate_stack__(nsp.exprStack[:])

evaluate() is a function that will parse a string to calculate a mathematical operation, for example:

>>> evaluate("5+5")
10

>>> evaluate("5^2+1")
26

The problem is that it cannot compute comparison operators (=, !=, <, >, <=, >=), and when I try: evaluate("5=5"), it throws SyntaxError: Expected end of text (at char 1), (line:1, col:2) instead of returning True. How can the function compute those six comparison operators?

回答1:

As pointed out by @rici, you have added the evaluation part, but not the parsing part.

The parser is defined in these lines:

    factor = atom + \
        ZeroOrMore((expop + factor).setParseAction(self.__push_first__))
    term = factor + \
        ZeroOrMore((multop + factor).setParseAction(self.__push_first__))
    expr <<= term + \
        ZeroOrMore((addop + term).setParseAction(self.__push_first__))

The order of these statements is important, because they cause the parser to recognize the precedence of operations, which you learned in high school math. That is, exponentiation is highest, then multiplication and division next, then addition and subtraction next.

You'll need to insert your relational operators to this parser definition following the same pattern. After addition, the convention from C language operator precedence (I found this reference - https://www.tutorialspoint.com/cprogramming/c_operators_precedence.htm) is:

relational operations - <=, >=, >, <
equality operations - ==, !=

In your case, you choose to use '=' instead of '==', and that should be okay in this setting. I suggest you use pyparsing's oneOf helper to define these operator groups, as it will take care of the case where a short string might mask a longer string (as when '/' masked '//' in your earlier post).

Note that, by mixing these operations all into one expression parser, you will get things like 5 + 2 > 3. Since '>' has lower precedence, 5+2 will be evaluated first giving 7, then 7 > 3 will be evaluated, and operator.__gt__ will return 1 or 0.

The difficulty in extending this example to other operators was what caused me to write the infixNotation helper method in pyparsing. You may want to give that a look.

EDIT:

You asked about using Literal('<=') | Literal('>=) | etc., and as you wrote it, that will work just fine. You just have to be careful to look for the longer operators ahead of the shorter ones. If you write Literal('>') | Literal('>=') | ... then matching '>=' would fail because the first match would match the '>' and then you would be left with '='. Using oneOf takes care of this for you.

To add the additional parser steps, you only want do the expr <<= ... step for the last level. Look at the pattern of statements again. Change expr <<= term + etc. to arith_expr = term + etc., follow it to add levels for relational_expr and equality_expr, and then finish with expr <<= equality_expr.

The pattern for this is based on:

factor := atom (^ atom)...
term := factor (mult_op factor)...
arith_expr := term (add_op term)...
relation_expr := arith_expr (relation_op arith_expr)...
equality_expr := relation_expr (equality_op relation_expr)...

Try doing that conversion to Python/pyparsing on your own.



回答2:

factor << atom + \
    ZeroOrMore((expop + factor).setParseAction(self.__push_first__))
term = factor + \
    ZeroOrMore((multop + factor).setParseAction(self.__push_first__))
arith_expr = term + \
    ZeroOrMore((addop + term).setParseAction(self.__push_first__))
relational = arith_expr + \
    ZeroOrMore((diffop + arith_expr).setParseAction(self.__push_first__))
expr <<= relational + \
    ZeroOrMore((compop + relational).setParseAction(self.__push_first__))

So I tested that, it works! Thank you very much PaulMcG ! : )