How to explain the same structure expression `(?=\

2019-08-18 04:37发布

问题:

This question already has an answer here:

  • Reference - What does this regex mean? 1 answer
debian@wifi:~$ echo "348dfgeccvdf" | grep -oP "\d+(?=\w{6,10})"
348
debian@wifi:~$ echo "348dfgeccvdf" | grep -oP "(?=\w{6,10})\d+"
348

For \d+(?=\w{6,10}) ,it is the standard positive look ahead expression.
As Wiktor Stribiżew say in the post
position and negative lookbehind
The negative lookbehind syntax starts with (?<! and ends with the unescaped ). Whether it appears at the start, middle or end of the pattern, it does not stop from being a negative lookbehind.
Maybe it's no realtionship between position and whole lookahead(lookbehind) expression.

To infer for positive look ahead according to the same logic ,such as Wiktor Stribiżew say for negative lookbehind:

"\d+(?=\w{6,10})" == "(?=\w{6,10})\d+"

Both of them are positive look ahead.

Now here is another example:

echo  "adabc  adabb" |grep -oP  "ad(?=abc)"
ad
echo  "adabc  adabb" |grep -oP  "(?=abc)ad"

Why "ad(?=abc)" is not equal to "(?=abc)ad" ?
How to explain the same structure expression (?=\w{6,10})\d+ and (?=abc)ad ?

回答1:

Why "ad(?=abc)" is not equal to "(?=abc)ad" ?

ad(?=abc) explains ad followed by abc whereas (?=abc)ad will not match with any input string. Cause expecting 'followed by abc' (?=abc) but defining ad after positive lookahead ! . Pattern should be like this (?=abc)\w+ to match with this input string abcad

How to explain the same structure expression (?=\w{6,10})\d+ and (?=abc)ad ?

(?=\w{6,10})\d+ this pattern also will not match as this (?=abc)ad for any input string. Pattern supposed to be like this :

(?=\w{6,10})\w+ and (?=abc)\w+.



回答2:

Wiktor Stribiżew say for negative lookbehind:

"\d+(?=\w{6,10})" == "(?=\w{6,10})\d+"

That's not what he said. His example was specifically with a word boundary, which is a zero-length assertion as well. Only

…(?lookaround)\b… == …\b(?lookaround)…

can hold.

Why "ad(?=abc)" is not equal to "(?=abc)ad"?

Because one asserts the sequence abc after the match ad while the other tries to assert it at the begin (which will always fail).