Another simple question : is there any way to tell flex to prefer a rule that matches a short thing over a rule that matches a longer thing ? I can't find any good documentation about that.
Here is why I need that : I parse a file for a pseudo language that contains some keywords corresponding to control instructions. I'd like them to be the absolute priority so that they're not parsed as parts of an expression. I actually need this priority thing because I don't have to write a full grammar for my project (that would be totally overkill in my case since I perform structural analysis on the program parsed, I don't need to know the details...), so I can't use a fine grammar tuning to be sure that those blocks won't be parsed into an expression.
Any help will be appreciated.
Here is an example of a file parsed :
If a > 0 Then read(b); Endif
c := "If I were...";
While d > 5 Do d := d + 1 Endwhile
I just want to collect info on the Ifs, Thens, Endifs etc... The rest doesn't matter to me. That's why I'd like the Ifs, Thens etc... related rules to be prioritized without to have to write a grammar.
From the Dragon Book 2nd edition, Section 3.5.3 "Conflict Resolution in Lex":
The rule above also applies to Flex. Here is what the Flex manual says (Chapter 7: How the input is matched.)
If I understood correctly, your lexer treats keywords like
Endif
as an identifier, so it will be considered as part of an expression afterwards. If this is your problem, simply put the rules of keywords on top of your specification, such as the following: (suppose each word in uppercase is a predefined enum corresponding to a token)Then the keywords will always matched before the identifier due to Rule No. 2.
EDIT:
Thank you for your comment, kol. I forgot to add the rule for string. But I don't think my solution is wrong. for example, if an identifier called
If_this_is_an_identifier
, rule 1 will apply, thus the identifier rule will take effect (Since it matches the longest string). I wrote a simple test case and saw no problem in my solution. Here is my lex.l file:I tested my solution with the following test case:
and it gives me the following output (other output not relevant to the problem you mentioned is ignored.)
The lex.l program is modified base on an example from the flex manual: (which use the same method to match keyword out of identifiers)
Also have a look at the ANSI C grammar, Lex specification.
I also used this approach in my personal project, and so far I didn't find any problem.