The R help file for regex says
The symbols \< and \> respectively match the empty string at the
beginning and end of a word. The symbol \b matches the empty string at
the edge of a word
What is the difference between an end and an edge (of a word)?
The difference between the \b
and \<
/ \>
is that \b
can be used in PCRE regex patterns (when you specify perl=TRUE
) and ICU regex patterns (stringr package).
> s = "no where nowhere"
> sub("\\<no\\>", "", s)
[1] " where nowhere"
> sub("\\<no\\>", "", s, perl=T) ## \> and \< do not work with PCRE
[1] "no where nowhere"
> sub("\\bno\\b", "", s, perl=T) ## \b works with PCRE
[1] " where nowhere"
> library(stringr)
> str_replace(s, "\\bno\\b", "")
[1] " where nowhere"
> str_replace(s, "\\<no\\>", "")
[1] "no where nowhere"
The advantage of \<
(always stands for the beginning of a word) and \>
(always matches the end of a word) is that they are unambiguous. The \b
may match both positions.
One more thing to consider (refrence):
POSIX 1003.2 mode of gsub and gregexpr does not work correctly with repeated word-boundaries (e.g., pattern = "\b"
). Use perl = TRUE
for such matches (but that may not work as expected with non-ASCII inputs, as the meaning of ‘word’ is system-dependent).