I am finishing my ECMAScript 5.1/JavaScript grammar for JavaCC. I've done all the tokens and productions according to the specification.
Now I'm facing a big question which I don't know how to solve.
JavaScript has this nice feature of the automatic semicolon insertion:
What are the rules for JavaScript's automatic semicolon insertion (ASI)?
To quote the specifications, the rules are:
There are three basic rules of semicolon insertion:
When, as the program is parsed from left to right, a token (called the offending token) is encountered that is not allowed by any production of the grammar, then a semicolon is automatically inserted before the offending token if one or more of the following conditions is true:
- The offending token is separated from the previous token by at least one LineTerminator.
- The offending token is
}
.When, as the program is parsed from left to right, the end of the input stream of tokens is encountered and the parser is unable to parse the input token stream as a single complete ECMAScript Program, then a semicolon is automatically inserted at the end of the input stream.
When, as the program is parsed from left to right, a token is encountered that is allowed by some production of the grammar, but the production is a restricted production and the token would be the first token for a terminal or nonterminal immediately following the annotation
[no LineTerminator here]
within the restricted production (and therefore such a token is called a restricted token), and the restricted token is separated from the previous token by at least oneLineTerminator
, then a semicolon is automatically inserted before the restricted token.However, there is an additional overriding condition on the preceding rules: a semicolon is never inserted automatically if the semicolon would then be parsed as an empty statement or if that semicolon would become one of the two semicolons in the header of a for statement (see 12.6.3).
How could I implement this with JavaCC?
The closes thing to an answer I've found so far is this grammar from Dojo toolkit which has a JAVACODE
part called insertSemiColon
dedicated to the task. But I don't see that this method is called anywhere (neither in the grammar nor in the whole jslinker code).
How could I approach this problem with JavaCC?
See also this question:
javascript grammar and automatic semocolon insertion
(No answer there.)
A question from the comments:
Is it correct to say that semicolons need only be inserted where semicolons are syntactically allowed?
I think it would be correct to say that semicolons need only be inserted where semicolons are syntactically required.
The relevant part here is §7.9:
7.9 Automatic Semicolon Insertion
Certain ECMAScript statements (empty statement, variable statement, expression statement, do-while statement, continue statement, break statement, return statement, and throw statement) must be terminated with semicolons. Such semicolons may always appear explicitly in the source text. For convenience, however, such semicolons may be omitted from the source text in certain situations. These situations are described by saying that semicolons are automatically inserted into the source code token stream in those situations.
Let's take the return
statement for instance:
ReturnStatement :
return ;
return [no LineTerminator here] Expression ;
So (from my understanding) syntactically the semicolon is required, not just allowed (as in your question).