Emacs regular expression: what \< and> can do that

2019-04-05 13:08发布

Regexp Backslash - GNU Emacs Manual says that \< matches at the beginning of a word, \> matches at the end of a word, and \b matches a word boundary. \b is just as in other non-Emacs regular expressions. But it seems that \< and \> are particular to Emacs regular expressions. Are there cases where \< and \> are needed instead of \b? For instance, \bword\b would match the same as \<word\> would, and the only difference is that the latter is more readable.

标签: regex emacs word
2条回答
做个烂人
2楼-- · 2019-04-05 13:50

You can get unexpected results if you assume they behave the same..
What can \< and > that \b can do?
The answer is that \< and\> are explicit... This end of a word! and only this end!
\bis general.... Either end of a word will match...

GNU Operators * Word Operators

line="cat dog sky"  
echo "$line" |sed -n "s/\(.*\)\b\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\>\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\<\(.*\)/# |\1|\2|/p"
echo
line="cat  dog  sky"  
echo "$line" |sed -n "s/\(.*\)\b\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\>\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\<\(.*\)/# |\1|\2|/p"
echo
line="cat  dog  sky  "  
echo "$line" |sed -n "s/\(.*\)\b\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\>\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\<\(.*\)/# |\1|\2|/p"
echo

output

# |cat dog |sky|
# |cat dog| sky|
# |cat dog |sky|

# |cat  dog  |sky|
# |cat  dog|  sky|
# |cat  dog  |sky|

# |cat  dog  sky|  |
# |cat  dog  sky|  |
# |cat  dog  |sky  |
查看更多
该账号已被封号
3楼-- · 2019-04-05 13:52

It looks to me like \<.*?\> would match only series of word characters, while \b.*?\b would match either series of word characters or a series non-word characters, since it can also accept the end of a word, and then the beginning of one. If you force the expression between the two to be a word, they do indeed act the same.

Of course, you could replicate the behavior of \< and \> with \b\w and \w\b. So I guess the answer is that yes, it's mostly for readability. Then again, isn't that what most escape characters in regular expression are for?

查看更多
登录 后发表回答