ambiguity in parsing comma as a operator using PLY

2019-08-20 07:27发布

I have following tokens and many more, but I want to keep my question short that's why not including the whole code.

tokens = (
'COMMA',
'OP',
'FUNC1',
'FUNC2'
)

def t_OP(t):
    r'&|-|\||,'
    return t

def t_FUNC1(t):
    r'FUNC1'
    return t

def t_FUNC2(t):
    r'FUNC2'
    return t

Other methods:

def FUNC1(param):
  return {'a','b','c','d'}

def FUNC2(param,expression_result):
  return {'a','b','c','d'}

My grammar rules in YACC are and few more are there but listed important ones:

'expression : expression OP expression'
'expression : LPAREN expression RPAREN'
'expression : FUNC1 LPAREN PARAM RPAREN'
'expression : FUNC2 LPAREN PARAM COMMA expression RPAREN'
'expression : SET_ITEM'

In my yacc.py, below are the methods which are related to the issue:

def p_expr_op_expr(p):
    'expression : expression OP expression'
    if p[2] == '|' or p[2]== ',':
        p[0] = p[1] | p[3]
    elif p[2] == '&':
        p[0] = p[1] & p[3]
    elif p[2] == '-':
        p[0] = p[1] - p[3]

def p_expr_func1(p):
    'expression : FUNC1 LPAREN PARAM RPAREN'
    Param = p[3]
    Result = ANY(Param)
    p[0] = Result 

def p_expr_func2(p):
    'expression : FUNC2 LPAREN PARAM COMMA expression RPAREN'
    Param = p[3]
    expression_result = p[5]
    Result = EXPAND(Param,expression_result)
    p[0] = Result

def p_expr_set_item(p):
    'expression : SET_ITEM'
    p[0] = {p[1]}

So, the issue is:

If I give below input expression to this grammar:

FUNC1("foo"),bar

-- it works properly, and give me the result as the UNION of the SET returned by FUNC1("foo") and bar => {a,b,c,d} | {bar}

But, if i give below input expression, it gives syntax error at , and ): I have my parenthesis defined as tokens (for those who think may be brackets are not defined in tokens)

FUNC2("foo", FUNC1("foo"),bar)

According to me for this expression, it matches production rule 'expression : FUNC2 LPAREN PARAM COMMA expression RPAREN'

so everything after the first comma should be well treated as a expression and it should match 'expression : expression OP expression' and do the union when comma is encountered as a operator.

If that's the case, then it should not work for FUNC1("foo"),bar as well.

I know I can fix this issue by removing ',' from t_OP(t) and adding one more production rule as 'expression : expression COMMA expression' and the method for this rule will look like below:

def p_expr_comma_expr(p):
    'expression : expression COMMA expression'
    p[0] = p[1] | p[3]

I'm reluctant to include this rule because it will introduces '4 shift/reduce conflicts'.

I really want to understand why it executes in one case and why not the other and what's the way to consider ',' as a operator?

Thanks

2条回答
Summer. ? 凉城
2楼-- · 2019-08-20 07:45

Adding one more rule like solved my problem :

expression:expression COMMA expression

added because as @rici told, in expression like FUNC2("hello",FUNC1("ghost")) the first comma is always taken as operator.

and adding precedence thing removed 4shift/reduce conflicts.

precedence = (
    ('left','COMMA'),
    ('left','OP')
)
查看更多
太酷不给撩
3楼-- · 2019-08-20 07:57

Ply has no way to know whether you want a given , to be the lexeme COMMA or the lexeme OP. Or, rather, it has a way, but it will always choose the same one: OP. That's because patterns in token functions are tested before tokens in pattern variables.

I'm assuming you have t_COMMA = r',' somewhere in the part of your program you did not provide. It is also possible that you have a token function to recognise COMMA, in which case whichever function comes first will win. But however you do it, the order the regexes are trsted is fixed, so either , is always COMMA or it is always OP. This is well explained in the Ply manual section on Specification of Tokens.

Personally, I'd suggest removing the comma from OP and modifying the grammar to use COMMA in the definition of expression. As you observed, you will get shift-reduce conflicts so you must include it in your precedence declaration (which you have also chosen to omit from your question). In fact, it seems likely that you would want to have different precedences for different operators, so you will probably want to separate the different operators into different tokens, since that is precedence is by token. See the explanation in the Ply manual section on precedence declarations.

查看更多
登录 后发表回答