-->

LOOKAHEADs for the JavaScript/ECMAScript array lit

2019-06-02 18:59发布

问题:

I currently implementing a JavaScript/ECMAScript 5.1 parser with JavaCC and have problems with the ArrayLiteral production.

ArrayLiteral :
    [ Elision_opt ]
    [ ElementList ]
    [ ElementList , Elision_opt ]

ElementList :
    Elision_opt AssignmentExpression
    ElementList , Elision_opt AssignmentExpression

Elision :
    ,
    Elision ,

I have three questions, I'll ask them one by one.

This is the second one.


I have simplified this production to the following form:

ArrayLiteral:
    "[" ("," | AssignmentExpression ",") * AssignmentExpression ? "]"

Please see the first question on whether it is correct or not:

How to simplify JavaScript/ECMAScript array literal production?

Now I have tried to implement it in JavaCC as follows:

void ArrayLiteral() :
{
}
{
    "["
    (
        ","
    |   AssignmentExpression()
        ","
    ) *
    (
        AssignmentExpression()
    ) ?
    "]"
}

JavaCC complains about ambiguous , or AssignmentExpression (its contents). Obviously, a LOOKAHEAD specification is required. I have spent a lot of time trying to figure the LOOKAHEADs out, tried different things like

  • LOOKAHEAD (AssignmentExpression() ",") in (...)*
  • LOOKAHEAD (AssignmentExpression() "]") in (...)?

and a few other variations, but I could not get rid of the JavaCC warning.

I fail to understand why this does not work:

void ArrayLiteral() :
{
}
{
    "["
    (
        LOOKAHEAD ("," | AssignmentExpression() ",")
        ","
    |   AssignmentExpression()
        ","
    ) *
    (
        LOOKAHEAD (AssignmentExpression() "]")
        AssignmentExpression()
    ) ?
    "]"
}

Ok, AssignmentExpression() per se is ambiguous, but the trailing "," or "]" in LOOKAHEADs should make it clear which of the choices should be taken - or am I mistaken here?

What would a correct LOOKAHEAD specification for this production look like?

Update

This did not work, unfortunately:

void ArrayLiteral() :
{
}
{
    "["
    (
        ","
    |
        LOOKAHEAD (AssignmentExpression() ",")
        AssignmentExpression()
        ","
    ) *
    (
        AssignmentExpression()
    ) ?
    "]"
}

Warning:

Warning: Choice conflict in (...)* construct at line 6, column 5.
         Expansion nested within construct and expansion following construct
         have common prefixes, one of which is: "function"
         Consider using a lookahead of 2 or more for nested expansion.

Line 6 is ( before the first LOOKAHEAD. The common prefix "function" is simply one of the possible starts of AssignmentExpression.

回答1:

Here is yet another approach. It has the advantage of identifying which commas indicate an undefined elements without using any semantic actions.

void ArrayLiteral() : {} { "[" MoreArrayLiteral() }

void MoreArrayLiteral() : {} {
    "]"
|    "," /* undefined item */ MoreArrayLiteral()
|    AssignmentExpression() ( "]" |  "," MoreArrayLiteral() )
}


回答2:

JavaCC produces top-down parsers. I'll say off the top that I'm not a fan of top-down parser generators, so I'm not a JavaCC expert and I don't have it handy to test.

(Edit: I thought something else would work, but I realized afterwards that I don't understand how JavaCC attaches lookahead to actually choices; in the case of ( A | B )* C, there are actually three possible choices: A, B and C. I thought it would consider all three of them, but it's possible that it does them two at a time. So the following is yet another guess.)

Having said that, I think the following would work, but it involves parsing just about every AssignmentExpression() twice.

{
    "["
    (
        ","
    |
        AssignmentExpression()
        ","
    ) *
    (
        LOOKAHEAD (AssignmentExpression() "]")
        AssignmentExpression()
    ) ?
    "]"
}

As I indicated in the linked question, a better solution is to rewrite the production differently:

"[" AssignmentExpression ? ("," AssignmentExpression ?) * "]"

That leads to a one-token lookahead grammar, so you won't need the LOOKAHEAD declaration to handle it.



回答3:

This is how I solved it (thanks to the answer by @rici):

JSArrayLiteral ArrayLiteral() : 
{
    boolean lastElementWasAssignmentExpression = false;
}
{
    "["
    (
        (
            AssignmentExpression()
            {
                // Do something with expression
                lastElementWasAssignmentExpression = true;
            }
        ) ?
        (
            ","
            {
                if (!lastElementWasAssignmentExpression)
                {
                    // Do something with elision
                }
            }
            (
                AssignmentExpression()
                {
                    // Do something with expression
                    lastElementWasAssignmentExpression = true;
                }
            ) ?
        ) *
    )
    "]"
    {
        // Do something with results
    }
}