I currently implementing a JavaScript/ECMAScript 5.1 parser with JavaCC and have problems with the ArrayLiteral production.
ArrayLiteral :
[ Elision_opt ]
[ ElementList ]
[ ElementList , Elision_opt ]
ElementList :
Elision_opt AssignmentExpression
ElementList , Elision_opt AssignmentExpression
Elision :
,
Elision ,
I have three questions, I'll ask them one by one.
This is the second one.
I have simplified this production to the following form:
ArrayLiteral:
"[" ("," | AssignmentExpression ",") * AssignmentExpression ? "]"
Please see the first question on whether it is correct or not:
How to simplify JavaScript/ECMAScript array literal production?
Now I have tried to implement it in JavaCC as follows:
void ArrayLiteral() :
{
}
{
"["
(
","
| AssignmentExpression()
","
) *
(
AssignmentExpression()
) ?
"]"
}
JavaCC complains about ambiguous ,
or AssignmentExpression
(its contents). Obviously, a LOOKAHEAD
specification is required. I have spent a lot of time trying to figure the LOOKAHEAD
s out, tried different things like
LOOKAHEAD (AssignmentExpression() ",")
in(...)*
LOOKAHEAD (AssignmentExpression() "]")
in(...)?
and a few other variations, but I could not get rid of the JavaCC warning.
I fail to understand why this does not work:
void ArrayLiteral() :
{
}
{
"["
(
LOOKAHEAD ("," | AssignmentExpression() ",")
","
| AssignmentExpression()
","
) *
(
LOOKAHEAD (AssignmentExpression() "]")
AssignmentExpression()
) ?
"]"
}
Ok, AssignmentExpression()
per se is ambiguous, but the trailing ","
or "]"
in LOOKAHEAD
s should make it clear which of the choices should be taken - or am I mistaken here?
What would a correct LOOKAHEAD
specification for this production look like?
Update
This did not work, unfortunately:
void ArrayLiteral() :
{
}
{
"["
(
","
|
LOOKAHEAD (AssignmentExpression() ",")
AssignmentExpression()
","
) *
(
AssignmentExpression()
) ?
"]"
}
Warning:
Warning: Choice conflict in (...)* construct at line 6, column 5.
Expansion nested within construct and expansion following construct
have common prefixes, one of which is: "function"
Consider using a lookahead of 2 or more for nested expansion.
Line 6 is (
before the first LOOKAHEAD
. The common prefix "function"
is simply one of the possible starts of AssignmentExpression
.
This is how I solved it (thanks to the answer by
@rici
):Here is yet another approach. It has the advantage of identifying which commas indicate an undefined elements without using any semantic actions.
JavaCC produces top-down parsers. I'll say off the top that I'm not a fan of top-down parser generators, so I'm not a JavaCC expert and I don't have it handy to test.
(Edit: I thought something else would work, but I realized afterwards that I don't understand how JavaCC attaches lookahead to actually choices; in the case of
( A | B )* C
, there are actually three possible choices: A, B and C. I thought it would consider all three of them, but it's possible that it does them two at a time. So the following is yet another guess.)Having said that, I think the following would work, but it involves parsing just about every
AssignmentExpression()
twice.As I indicated in the linked question, a better solution is to rewrite the production differently:
That leads to a one-token lookahead grammar, so you won't need the
LOOKAHEAD
declaration to handle it.