GNU sed, ^ and $ with | when first/last character

2019-07-13 18:33发布

问题:

When doing a substitution that includes something like ^|. in the REGEXP sed doesn't match the null string at beginning of the pattern space if the first character matches. It also doesn't match the end if the last character matches. Why is that?

Here are some examples using 123 as input (with the -r option):

substitution    expected output     actual output   comments
s/^/x/g         x123                x123            works as expected
s/$/x/g         123x                123x            works as expected
s/^|$/x/g       x123x               x123x           works as expected
s/^|./x/g       xxxx                xxx             didn't match the very begining
s/.|$/x/g       xxxx                xxx             didn't match the very end
s/^|1/x/g       xx23                x23             didn't match the very begining
s/^|2/x/g       x1x3                x1x3            this time it did match the begining

I get the same results when using \` instead of ^.
I've tried GNU sed version 4.2.1 and 4.2.2

Try it online!

回答1:

AFAIK sed will try to match the longest match in an alternation.

So when the null string at the beginning of the pattern space can be matched vs. 1 at the same position. 1 is chosen as it's the longest match.

Consider the following:

$ sed 's/12\|123/x/g' <<< 123
x
$ sed 's/123\|12/x/g' <<< 123
x
$ sed 's/^1\|12/x/g' <<< 123
x3

The same applies when reaching the end. Lets break sed 's/.\|$/x/g' <<< 123 down:

123
^
. matches and replace with x
x23
 ^
 . matches and replace with x
xx3
  ^
  . matches and replace with x
xxx
   ^
   Out of pattern space $ will not match.


标签: bash sed gnu-sed