Distinguishing identifiers from common strings

2019-09-14 02:09发布

I want to write a parser using Bison/Yacc + Lex which can parse statements like:

VARIABLE_ID = 'STRING'

where:

ID       [a-zA-Z_][a-zA-Z0-9_]*

and:

STRING      [a-zA-Z0-9_]+

So, var1 = '123abc' is a valid statement while 1var = '123abc' isn't.

Therefore, a VARIABLE_ID is a STRING but a STRING not always is a VARIABLE_ID.

What I would like to know is if the only way to distinguish between the two is writing a checking procedure at a higher level (i.e. inside Bison code) or if I can work it out in the Lex code.

标签： parsing bison yacc lex

1条回答

ら.Afraid

2楼-- · 2019-09-14 02:29

Your abstract statement syntax is actually:

VARIABLE = STRING

and not

VARIABLE = 'STRING'

because the quote delimiters are a lexical detail that we generally want to keep out of the syntax. And so, the token patterns are actually this:

ID       [a-zA-Z_][a-zA-Z0-9_]*
STRING   '[a-zA-Z_0-9]*'

An ID is a letter or underscore, followed by any combination (including empty) of letters, digits and underscores.

A STRING is a single quote, followed by a sequence (possibly empty) letters, digits and underscores, followed by another single quote.

So the ambiguity you are concerned about does not exist. An ID is not in fact a STRING, nor vice versa.

Somewhere inside your Bison parser, or possibly in the lexer, you might want to massage the yytext of a STRING match to remove the quotes and just retain the text in between them as a string. This could be a Bison rule, possibly similar to:

string : STRING 
       {
          $$ = strip_quotes($1);
       }
       ;

0人赞添加讨论(0) 举报

Distinguishing identifiers from common strings

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间