checking unfinished comments in flex

2019-08-03 18:01发布

问题:

I am a new to flex. I have just written a sample code to detect multi line comments using a flex program. Now I want to improve the code. I want to detect unfinished and ill formed comments in the code. for example: a comment beginning with /* without an ending */ is an unfinished comment and by ill formed comment I mean the comment is not properly formed, say, an EOF appears inside the comment etc. What I have to add in my code to check these things? My sample code is as follows:

%x COMMENT_MULTI_LINE
%{
char* commentStart;
%}

%%

[\n\t\r ]+ { 
  /* ignore whitespace */ }


<INITIAL>"/*" { 
  commentStart = yytext; 
  BEGIN(COMMENT_MULTI_LINE); 
}

<COMMENT_MULTI_LINE>"*/" { 
  char* comment = strndup(commentStart, yytext + 2 - commentStart);
  printf("'%s': was a multi-line comment\n", comment);
  free(comment); 
  BEGIN(INITIAL); 
}

<COMMENT_MULTI_LINE>. { 

} 

<COMMENT_MULTI_LINE>\n { 

} 


%%

int main(int argc, char *argv[]){    
  yylex();         
}

回答1:

The flex manual section on using <<EOF>> is quite helpful as it has exactly your case as an example, and their code can also be copied verbatim into your flex program.

As it explains, when using <<EOF>> you cannot place it in a normal regular expression pattern. It can only be proceeded by a the name of a state. In your code you are using a state to indicate you are inside a comment. This state is called COMMENT_MULTI. All you have to do is put that in front of the <<EOF>> marker and give it an action to do:

<COMMENT_MULTI><<EOF>> {printf("Unterminated Comment: %s\n", yytext); 
                        yyterminate();}

The special action function yyterminate() tells flex that you have recognised the <<EOF>> and that it marks the end-of-input for your program.

I have tested this, and it works in your code. (And with multi-line strings also).