I'm creating a lexer.l file that is working as intended except for one part. I have the rule:
[\(\*.*\*\)] {}
which I want to make it so when I encounter (* this is a test *)
in a file, I simply do nothing with it. However when I run lex lexer.l
I get warning on lines with rules \(
, \*
, and \)
stating that they can never be met. So I guess my question is why would [\(\*.*\*\)] {}
interfere with \(
and the others? How can I catch (* this is a test *)
?
Languages with the comment syntax (*…*)
typically allow nested comments, and nested comments cannot easily be recognized by (f)lex because the nesting requires a context-free grammar, and the lexical scanner only implements regular languages.
If your comments do not nest (so that (* something (* else *)
is a comment, rather than the prefix of a longer comment), then you can use the regular expression
[(][*][^*]*[*]+([^*)][^*]*[*]+)*[)]
If you do require nested comments, you can use start conditions and a stack (or a simulated stack, as below):
%x SC_COMMENT
%%
int comment_nesting = 0;
"(*" { BEGIN(SC_COMMENT); }
<SC_COMMENT>{
"(*" { ++comment_nesting; }
"*"+")" { if (comment_nesting) --comment_nesting;
else BEGIN(INITIAL); }
"*"+ ;
[^(*\n]+ ;
[(] ;
\n ;
}
That snippet was taken from this answer, with a small adjustment because that answer recognizes nested /*…*/
comments. A fuller explanation of the code appears there.