-->

lex & yacc get current position

2019-02-20 05:36发布

问题:

In lex & yacc there is a macro called YY_INPUT which can be redefined, for example in a such way

#define YY_INPUT(buf,result,maxlen) do { \
 const int n = gzread(gz_yyin, buf, maxlen); \                                                                              
 if (n < 0) { \
       int errNumber = 0; \
       reportError( gzerror(gz_yyin, &errNumber)); } \
     \
     result = n > 0 ? n : YY_NULL; \
  } while (0)

I have some grammar rule which called YYACCEPT macro. If after YYACCEPT I called gztell (or ftell), then I got a wrong number, because parser already read some unnecessary data.

So how I can get current position if I have some rule which called YYACCEPT in it(one bad solution will be to read character by character)

(I have already done something like this:

#define YY_USER_ACTION do { \
        current_position += yyleng; \
} while (0)   

but seems its not work )

回答1:

You have to keep track of the offset yourself. A simple but annoying solution is to put:

offset += yyleng;

in every flex action. Fortunately, you can do this implicitly by defining the YY_USER_ACTION macro, which is executed just before the token action.

That might still not be right for your grammar, because bison often reads one token ahead. So you'll also need to attach the value of offset to each lexical token, most conveniently using the location facility (yylloc).

Edit: added more details on location tracking.

The following has not been tested. You should read the sections in both the flex and the bison manual about location tracking.

The yylloc global variable and its default type are included in the generated bison code if you use the --locations command line option or the %locations directive, or if you simply refer to a location value in some rule, using the @ syntax, which is analogous to the $ syntax (that is, @n is the location value of the right-hand-side object whose semantic value is $n). Unfortunately, the default type for yylloc uses ints, which are not wide enough to hold a file offset, although you might not be planning on parsing files for which this matters. In any event, it's easy enough to change; you merely have to #define the YYLTYPE macro at the top of your bison file. The default YYLTYPE is:

typedef struct YYLTYPE
     {
       int first_line;
       int first_column;
       int last_line;
       int last_column;
     } YYLTYPE;

For a minimum modification, I'd suggest keeping the names unchanged; otherwise you'll also need to fix the YYLLOC_DEFAULT macro in your bison file. The default YYLLOC_DEFAULT ensures that non-terminals get a location value whose first_line and first_column members come from the first element in the non-terminal's RHS, and whose last_line and last_column members come from the last element. Since it is a macro, it will work with any assignable type for the various members, so it will be sufficient to change the column members to long, size_t or offset_t, as you feel appropriate:

#define YYLTYPE yyltype;
typedef struct yyltype {
  int first_line;
  offset_t first_column;
  int last_line;
  offset_t last_column;
} yyltype;

Then in your flex input, you could define the YY_USER_ACTION macro:

offset_t offset;
extern YYLTYPE yylloc;

#define YY_USER_ACTION         \
  offset += yyleng;            \
  yylloc.last_line = yylineno; \
  yylloc.last_column = offset;

With all that done and appropriate initialization, you should be able to use the appropriate @n.last_column in the ACCEPT rule to extract the offset of the end of the last token in the accepted input.