Matching trailing context in flex

2019-07-25 04:40发布

In the flex manual it mentions a "trailing context" pattern (r/s), which means r, but only if followed by s. However the following code doesn't compile (instead it gives an error of "unrecognized rule". Why?

LITERAL a/b
%%
{LITERAL} { }

2条回答
祖国的老花朵
2楼-- · 2019-07-25 05:40

The simple answer is that unless you use the -l option, which is not recommended, you cannot put trailing context into a name definition. That's because flex:

  • doesn't allow trailing context inside parentheses; and

  • automatically surrounds expansions of definitions with parentheses, except in a few situations (see below).

The reason flex surrounds expansions with parentheses is that otherwise weird things happen. For example:

prefix        milli|centi
%%
{prefix}pede  return BUG;

Without the automatic parentheses, the pattern would expand to:

milli|centipede

which would not match millipede. (There's a similar problem with the various postfix operators. Consider {prefix}?pede, for example.)

Flex doesn't allow trailing context inside parentheses because many such expressions are harder to compile. In effect, you can end up writing patterns which are the intersection of two regular expressions. (For example, ({base}/{a}){b} matches {base} followed by a {b} which is either a prefix or a projection of an {a}.) These are still regular expressions, but they aren't contemplated by the Thomson algorithm for turning regular expressions into finite state machines. Since the feature is rarely if ever needed, no attempt was ever made to implement it.

Unfortunately, banning trailing context inside parentheses also bans redundant parentheses around patterns which include trailing context, and this includes definition expansions because definitions are expanded with possibly redundant parentheses.

The original AT&T lex did not add the parentheses, which is why forcing lex-compatibility with -l allows your flex file to compile. However, it may result in all sorts of other problems, as indicated above, so I wouldn't recommend it.

Also, "trailing context" here means either a full pattern of the form r/s or of the form r$. Putting r/s inside parentheses (whether explicitly or implicitly) produces an error message, but putting r$ inside parentheses just makes the $ match a $ character, instead of forcing the pattern to match at the end of a line. No error or warning is emitted in this case.

That would make it impossible to use $ (or ^) inside a name definition. However, at some point prior to version 2.3.53, a hack was inserted which suppresses the parentheses if the definition starts with ^ or ends with $. And, for reasons I don't fully understand, it also suppresses the parentheses if the expansion occurs at the end of trailing context. This might be a bug, and indeed there is a bug report relating to it.

查看更多
放荡不羁爱自由
3楼-- · 2019-07-25 05:45

I found the answer to your problem in the FAQ of the info pages of flex: "Your problem is that some of the definitions in the scanner use the '/' trailing context operator, and have it enclosed in ()'s. Flex does not allow this operator to be enclosed in ()'s because doing so allows undefined regular expressions such as "(a/b)+". So the solution is to remove the parentheses. Note that you must also be building the scanner with the -l option for AT&T lex compatibility. Without this option, flex automatically encloses the definitions in parentheses." (quote from Vern Paxson). See also FAQ trailing context

The use of trailing contexts is better avoided when possible. As it is described above it is not allowed in nested expressions. Your example does work with the -l option.

查看更多
登录 后发表回答