I simplified my code to the specific problem I am having.
import re
pattern = re.compile(r'\bword\b')
result = pattern.sub(lambda x: "match", "-word- word")
I am getting
'-match- match'
but I want
'-word- match'
edit:
Or for the string "word -word-"
I want
"match -word-"
Instead of word boundaries, you could also match the character before and after the word with a
(\s|^)
and(\s|$)
pattern.Breakdown:
\s
matches every whitespace character, which seems to be what you are trying to achieve, as you are excluding the dashes. The^
and$
ensure that if the word is either the first or last in the string(ie. no character before or after) those are matched too.Your code would become something like this:
Because this solution uses character classes such as
\s
, it means that those could be easily replaced or extended. For example if you wanted your words to be delimited by spaces or commas, your pattern would become something like this:r'(,|\s|^)(word)(,|\s|$)'
.\b
basically denotes a word boundary on characters other than[a-zA-Z0-9_]
which includes spaces as well. Surroundword
with negative lookarounds to ensure there is no non-space character after and before it:What you need is a negative lookbehind.
To cite the documentation:
So this will only match, if the word-break
\b
is not preceded with a minus sign-
.If you need this for the end of the string you'll have to use a negative lookahead which will look like this:
(?!-)
. The complete regular expression will then result in:(?<!-)\bword(?!-)\b