Lex regex gets some extra characters

2020-05-02 03:56发布

I have the following definition in my lex file:

L   [a-zA-Z_]                                           
A   [a-zA-Z_0-9] 
%%
{L}{A}*                 { yylval.id = yytext; return IDENTIFIER; }

And I do the following in my YACC file:

primary_expression
    : IDENTIFIER            { puts("IDENTIFIER: "); printf("%s", $1); }

My source code (the one I'm analyzing) has the following assignment:

ab= 10;

For some reason, that printf("%s", $1); part is printing ab= and not only ab.

I'm pretty sure that's the section that is printing ab= because when I delete the printf("%s", $1); the identifier is not printed at all.

I really ran out of ideas. What am I doing wrong?

Let me know if I can be more clear.

标签： c bison yacc flex-lexer lex

1条回答

孤傲高冷的网名

2楼-- · 2020-05-02 04:25

What am I doing wrong?

You're assuming that the string pointed to by yytext is constant. It is not.

The lifetime of the string pointed to by yytext is the lexical action of the associated rule. If that rule ends up returning, yytext will survive until the next time yylex is called. And that's it.

bison-generated parsers have a one-symbol lookahead. So by the time the parser executes a semantic action, yylex has been called again (for the lookahead); consequently, you can't use the saved value of yytext even for the last (or only) token in a rule.

Solution: copy the string. (I use strdup, but for whatever reason some people like to malloc and strcpy. If you do, don't forget about the NUL terminator.) And remember to free() the copy when you're done with it.

For reference: what the flex manual says.

0人赞添加讨论(0) 举报

Lex regex gets some extra characters

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间