flex / bison : how can I switch two lexers on same

2019-03-04 23:12发布

How can I handover an open file e.g. read by another scanner to the next scanner - and give it to the parser ?

1条回答
Emotional °昔
2楼-- · 2019-03-04 23:58

Flex buffers cannot easily be transferred from one scanner to another. Many details are private to the scanner and would need to be reverse-engineered, with the consequent loss of maintainability.

However, it is not difficult to combine two (or more) scanner definitions into a single scanner, provided that the semantic types are compatible. It is simply necessary to give them different start conditions. Since the start condition can be set even outside of a scanner action, it is trivial to switch from one scanner definition to the other.

Since Flex scanners are table-based, there is no real inefficiency in combining the two scanners; indeed, there may be some value in not duplicating the code. The combined table may be slightly larger than the sum of the individual tables, because there are likely to be more character equivalence classes, but on the other hand the larger table may allow better table compression. Neither of these effects is likely to be noticeable.


Here's a simple but possibly useful example. This parser reads a file and substitutes ${arithmetic expressions} with the evaluated expression. (Since its just an example, only very basic expressions are allowed but it should be easy to extend.)

Since the lexical scanner needs to start in start condition SC_ECHO, it needs to be initialized. Personally, I'd prefer to start in INITIAL to avoid this initialization in this simple case, but sometimes scanners need to be able to handle various start conditions, so I left the code in. The error handling could be improved, but it's functional.

The parser uses a very simple error rule to resynchronize and keep track of substitution errors. The semantic value of the non-terminals subst, file and start is the error count for the file; the semantic value for expr is the value of the expression. In this simple case, they are both just integers so the default type for yylval works.

Unterminated substitutions are not handled gracefully; in particular, if EOF is read during the lexical scan for a substitution, no indication is inserted into the output. I leave fixing that as an exercise. :)

Here's the lexer:

%{
#include "xsub.tab.h"
%}
%option noinput nounput noyywrap nodefault
%option yylineno
%x SC_ECHO
%%
   /* In a reentrant lexer, this would go into the state object */
   static int braces;

   /* This start condition just echos until it finds ${... */
<SC_ECHO>{
  "${"        braces = 0; BEGIN(INITIAL);
  [^$\n]+     ECHO;
  "$"         ECHO;
  \n          ECHO;
}
 /* We need to figure out where the substitution ends, which is why we can't
  * just use a standard calculator. Here we deal with terminations.
  */
"{"           ++braces; return '{';
"}"           { if (braces) { --braces; return '}'; }
                else        { BEGIN(SC_ECHO); return FIN; }
              }

 /* The rest is just a normal calculator */
[0-9]+        yylval = strtol(yytext, NULL, 10); return NUMBER;
[[:blank:]]+  /* Ignore white space */
\n            /* Ignore newlines, too (but could also be an error) */
.             return yytext[0];

%%
void initialize_scanner(void) {
  BEGIN(SC_ECHO);
}

The parser exports a single interface:

int parseFile(FILE *in, *out);

which returns 0 if all went well, and otherwise the number of incorrect substitutions (modulo the issue mentioned above with unterminated substitutions). Here's the file:

%{
#include <stdio.h>
int yylex(void);
void yyerror(const char* msg);
void initialize_scanner(void);

extern int yylineno;
extern FILE *yyin, *yyout;
%}
%token NUMBER FIN UNOP
%left '+' '-'
%left '*' '/' '%'
%nonassoc UNOP

%define parse.lac full
%define parse.error verbose
%%
start: file          { if ($1) YYABORT; else YYACCEPT; }
file :               { $$ = 0; }
     | file subst    { $$ = $1 + $2; }
subst: expr FIN      { fprintf(yyout, "%d", $1); $$ = 0; }
     | error FIN     { fputs("${ BAD SUBSTITUTION }", yyout); $$ = 1; }
expr : NUMBER
     | '-' expr %prec UNOP { $$ = -$2; }
     | '(' expr ')'  { $$ = $2; }
     | expr '+' expr { $$ = $1 + $3; }
     | expr '-' expr { $$ = $1 - $3; }
     | expr '*' expr { $$ = $1 * $3; }
     | expr '/' expr { $$ = $1 / $3; }
     | expr '%' expr { $$ = $1 % $3; }
%%
void yyerror(const char* msg) {
  fprintf(stderr, "%d: %s\n", yylineno, msg);
}

int parseFile(FILE* in, FILE* out) {
  initialize_scanner();
  yyin = in;
  yyout = out;
  return yyparse();
}

And a simple driver:

#include <stdio.h>
int parseFile(FILE* in, FILE* out);
int main() {
  return parseFile(stdin, stdout);
}
查看更多
登录 后发表回答