In VHDL it the ' character can be used to encapsulate a character token ie '.'
or it can as an attribute separator (similarish to CPP's :: token) ie string'("hello")
.
The issue comes up when parsing an attribute name containing a character ie string'('a','b','c')
. In this case a naive lexer will incorrectly tokenize the first '('
as a character, and all of the following actual character will be messed up.
There is a thread in comp.lang.vhdl google group from 2007 which asks a similar question Titled "Lexing the ' char" that has an answer by user diogratia
case '\'': /* IR1045 check */ if ( last_token == DELIM_RIGHT_PAREN || last_token == DELIM_RIGHT_BRACKET || last_token == KEYWD_ALL || last_token == IDENTIFIER_TOKEN || last_token == STR_LIT_TOKEN || last_token == CHAR_LIT_TOKEN || ! (buff_ptr<BUFSIZ-2) ) token_flag = DELIM_APOSTROPHE; else if (is_graphic_char(NEXT_CHAR) && line_buff[buff_ptr+2] == '\'') { CHARACTER_LITERAL: buff_ptr+= 3; /* lead,trailing \' and char */ last_token = CHAR_LIT_TOKEN; token_strlen = 3; return (last_token); } else token_flag = DELIM_APOSTROPHE; break;
See Issue Report IR1045: http://www.eda-twiki.org/isac/IRs-VHDL-93/IR1045.txt
As you can see from the above code fragment, the last token can be captured and used to di"sambiguate something like:
foo <= std_logic_vector'('a','b','c');
without a large look ahead or backtracking.
However, As far as I know, flex doesn't track the last token that was parsed.
Without having to manually keep track of the last parsed token, is there a better way to accomplish this lexing task?
I am using IntelliJ GrammarKit if that helps.
The idea behind IR1045 is to be able to tell whether a single quote/apostrophe is part of a character literal or not without looking ahead or backtracking when you're wrong, try:
How far ahead are you willing to look?
There is however a practical example of flex disambiguation of apostrophes and character literals for VHDL.
Nick Gasson's nvc uses flex, in which he implemented an Issue Report 1045 solution.
See the nvc/src/lexer.l which is licensed under GPLv3.
Search for last_token:
and
An added function to check it:
which is:
The IR1045 location has changed since the comp.lang.vhdl post it's now
You'll also want to search for resolve_ir1045 in lexer.l.
and
Where we find nvc uses the function to filter detecting the first single quote of a character literal.
This was originally an Ada issue. IR-1045 was never adopted but universally used. There are probably Ada flex lexers that also demonstrate disambiguation.
The requirement to disambiguate is discussed in Ada User Journal volume 27 number 3 from September 2006 in an article Lexical Analysis on PDF pages 30 and 31 (Volume 27 pages 159 and 160) where we see the solution is not well known.
The comment that character literals do not precede a single quote is inaccurate:
The first use of an attribute with selected name prefix that has a suffix that is a character literal demonstrates the inaccuracy, the second report statement shows it can matter:
In addition to an attribute name prefix containing a selected name with a character literal suffix there's a requirement that an attribute specification 'decorate' a declared entity (of an entity_class, see IEEE Std 1076-2008 7.2 Attribute specification) in the same declarative region the entity is declared in.
This example is syntactically and semantically valid VHDL. You could note that nvc doesn't allow decorating a named entity with the entity class literal. That's not according to 7.2.
Enumeration literals are declared in type declarations, here type twovalue. An enumerated type that has at least one character literal as an enumeration literal is a character type (5.2.2.1).