Difference between \b and \B in regex

2019-01-02 17:20发布

I am reading a book on regular expression and I came across this example for \b:

The cat scattered his food all over the room.

Using regex - \bcat\b will match the word cat but not the cat in scattered.

For \B the author uses the following example:

Please enter the nine-digit id as it

appears on your color - coded pass-key.

Using regex \B-\B matches - between the word color - coded. Using \b-\b on the other hand matches the - in nine-digit and pass-key.

How come in the first example we use \b to separate cat and in the second use \B to separate -? Using \b in the second example does the opposite of what it did earlier.

Please explain the difference to me.

EDIT: Also, can anyone please explain with a new example?

标签: regex
7条回答
像晚风撩人
2楼-- · 2019-01-02 17:48

With a different example:

Consider this is the string and pattern to be searched for is 'cat':

text = "catmania thiscat thiscatmaina";

Now definitions,

'\b' finds/matches the pattern at the beginning or end of each word.

'\B' does not find/match the pattern at the beginning or end of each word.

Different Cases:

Case 1: At the beginning of each word

result = text.replace(/\bcat/g, "ct");

Now, result is "ctmania thiscat thiscatmaina"

Case 2: At the end of each word

result = text.replace(/cat\b/g, "ct");

Now, result is "catmania thisct thiscatmaina"

Case 3: Not in the beginning

result = text.replace(/\Bcat/g, "ct");

Now, result is "catmania thisct thisctmaina"

Case 4: Not in the end

result = text.replace(/cat\B/g, "ct");

Now, result is "ctmania thiscat thisctmaina"

Case 5: Neither beginning nor end

result = text.replace(/\Bcat\B/g, "ct");

Now, result is "catmania thiscat thisctmaina"

Hope this helps :)

查看更多
浪荡孟婆
3楼-- · 2019-01-02 17:51

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a "word boundary". This match is zero-length.

There are three different positions that qualify as word boundaries:

  • Before the first character in the string, if the first character is a word character.
  • After the last character in the string, if the last character is a word character.
  • Between two characters in the string, where one is a word character and the other is not a word character.

\B is the negated version of \b. \B matches at every position where \b does not. Effectively, \B matches at any position between two word characters as well as at any position between two non-word characters.

Source: http://www.regular-expressions.info/wordboundaries.html

查看更多
十年一品温如言
4楼-- · 2019-01-02 17:52

\B is not \b e.g. negative \b

pass-key here is no word boundary beside - so it matches \B in your first example there are word boundary beside cat so it matches \b

similar rules apply for others too. \W is negative of \w \UPPER CASE is negative of \LOWER CASE

查看更多
余生请多指教
5楼-- · 2019-01-02 17:53

The confusion stems from your thinking \b matches spaces (probably because "b" suggests "blank").

\b matches the empty string at the beginning or end of a word. \B matches the empty string not at the beginning or end of a word. The key here is that "-" is not a part of a word. So <left>-<right> matches \b-\b because there are word boundaries on either side of the -. On the other hand for <left> - <right> (note the spaces), there are not word boundaries on either side of the dash. The word boundaries are one space further left and right.

On the other hand, when searching for \bcat\b word boundaries behave more intuitively, and it matches " cat " as expected.

查看更多
与风俱净
6楼-- · 2019-01-02 17:55

Let take a string like :

XIX IXI XX X I II IIXX XXII I-I X-X -X X- X-I I-X -X- -I-X -X-I I-X- X-I- X_X _X-

Note: Underscore ( _ ) is not considered a special character in this case.

  1. /\bX\b/g Should begin and end with a special character or white Space

XIX IXI XX X I II IIXX XXII I-I X-X -X X- X-I I-X -X- -I-X -X-I I-X- X-I- X_X _X-


  1. /\bX/g Should begin with a special character or white Space

XIX IXI XX X I II IIXX XXII I-I X-X -X X- X-I I-X -X- -I-X -X-I I-X- X-I- X_X _X-


  1. /X\b/g Should end with a special character or white Space

XIX IXI XX X I II IIXX XXII I-I X-X -X X- X-I I-X -X- -I-X -X-I I-X- X-I- X_X _X-


  1. /\BX\B/g
    Should not begin and not end with a special character or white Space

XIX IXI XX X I II IIXX XXII I-I X-X -X X- X-I I-X -X- -I-X -X-I I-X- X-I- X_X _X-


  1. /\BX/g Should not begin with a special character or white Space

XIX IXI XX X I II IIXX XXII I-I X-X -X X- X-I I-X -X- -I-X -X-I I-X- X-I- X_X _X-


  1. /X\B/g Should not end with a special character or white Space

XIX IXI XX X I II IIXX XXII I-I X-X -X X- X-I I-X -X- -I-X -X-I I-X- X-I- X_X _X-


  1. /\bX\B/g Should begin and not end with a special character or white Space

XIX IXI XX X I II IIXX XXII I-I X-X -X X- X-I I-X -X- -I-X -X-I I-X- X-I- X_X _X-


  1. /\BX\b/g Should not begin and should end with a special character or white Space

XIX IXI XX X I II IIXX XXII I-I X-X -X X- X-I I-X -X- -I-X -X-I I-X- X-I- X_X _X-

查看更多
情到深处是孤独
7楼-- · 2019-01-02 18:01

\b matches a word-boundary. \B matches non-word-boundaries, and is equivalent to [^\b](?!\b) (thanks to @Alan Moore for the correction!). Both are zero-width.

See http://www.regular-expressions.info/wordboundaries.html for details. The site is extremely useful for many basic regex questions.

查看更多
登录 后发表回答