How to make word boundary \b not match on dashes

2020-07-10 07:44发布

I simplified my code to the specific problem I am having.

import re
pattern = re.compile(r'\bword\b')
result = pattern.sub(lambda x: "match", "-word- word")

I am getting

'-match- match'

but I want

'-word- match'

edit:

Or for the string "word -word-"

I want

"match -word-"

标签: python regex
3条回答
神经病院院长
2楼-- · 2020-07-10 08:19

Instead of word boundaries, you could also match the character before and after the word with a (\s|^) and (\s|$) pattern.

Breakdown: \s matches every whitespace character, which seems to be what you are trying to achieve, as you are excluding the dashes. The ^ and $ ensure that if the word is either the first or last in the string(ie. no character before or after) those are matched too.

Your code would become something like this:

pattern = re.compile(r'(\s|^)(word)(\s|$)')
result = pattern.sub(r"\1match\3", "-word- word")

Because this solution uses character classes such as \s, it means that those could be easily replaced or extended. For example if you wanted your words to be delimited by spaces or commas, your pattern would become something like this: r'(,|\s|^)(word)(,|\s|$)'.

查看更多
地球回转人心会变
3楼-- · 2020-07-10 08:23

\b basically denotes a word boundary on characters other than [a-zA-Z0-9_] which includes spaces as well. Surround word with negative lookarounds to ensure there is no non-space character after and before it:

re.compile(r'(?<!\S)word(?!\S)')
查看更多
地球回转人心会变
4楼-- · 2020-07-10 08:33

What you need is a negative lookbehind.

pattern = re.compile(r'(?<!-)\bword\b')
result = pattern.sub(lambda x: "match", "-word- word")

To cite the documentation:

(?<!...) Matches if the current position in the string is not preceded by a match for ....

So this will only match, if the word-break \b is not preceded with a minus sign -.

If you need this for the end of the string you'll have to use a negative lookahead which will look like this: (?!-). The complete regular expression will then result in: (?<!-)\bword(?!-)\b

查看更多
登录 后发表回答