Regex word boundary issue when angle brackets are

2019-01-27 04:40发布

Regex:

\b< low="" number="" low="">\b

Example string:

 <b22>Aquí se muestran algunos síntomas < low="" number="" low=""> tienen el siguiente aspecto.</b22> 

I'm not sure why the word boundary between síntomas and < is not being found. Same problem exists on the other side between > and tienen

Suggestions on how I might more properly match this boundary?

When I give it the following input, the Regex matches as expected:

Aquí se muestran algunos síntomas< low="" number="" low="">tienen el siguiente aspecto.

removing the edge conditions \b \bPHRASE\b are not an option because it cannot match parts of words

Update

This did the trick: (Thanks to Igor, Mosty, DK and NickC)

Regex(String.Format(@"(?<=[\s\.\?\!]){0}(?=[\s\.\?\!])", innerStringToMatch);

I needed to improve my boundary matching to [\s\.\?\!] and make these edge matches positive lookahead and lookbehind.

2条回答
老娘就宠你
2楼-- · 2019-01-27 05:25

\b is a zero-length match which can occur between two characters in the string, where one is a word character and the other is not a word character. Word character is defined as [A-Za-z0-9_]*. < is not a word character, that's why \b doesn't match.

You can try the following regex instead ((?: ) is a non-capturing parentheses group):

(?:\b|\s+)< low="" number="" low="">(?:\b|\s+)

*) Actually, this is not correct for all regex engines. To be precise, \b matches between \w and \W, where \w matches any word character. As Tim Pietzcker pointed out in the comment to this answer, the meaning of "word character" differs between implementations, but I don't know any where \w matches < or >.

查看更多
萌系小妹纸
3楼-- · 2019-01-27 05:30

I think you're trying to do the following:

\s< low="" number="" low="">\s
查看更多
登录 后发表回答