Regex word boundary issue when angle brackets are

2019-01-27 04:56发布

问题:

Regex:

\b< low="" number="" low="">\b

Example string:

 <b22>Aquí se muestran algunos síntomas < low="" number="" low=""> tienen el siguiente aspecto.</b22> 

I'm not sure why the word boundary between síntomas and < is not being found. Same problem exists on the other side between > and tienen

Suggestions on how I might more properly match this boundary?

When I give it the following input, the Regex matches as expected:

Aquí se muestran algunos síntomas< low="" number="" low="">tienen el siguiente aspecto.

removing the edge conditions \b \bPHRASE\b are not an option because it cannot match parts of words

Update

This did the trick: (Thanks to Igor, Mosty, DK and NickC)

Regex(String.Format(@"(?<=[\s\.\?\!]){0}(?=[\s\.\?\!])", innerStringToMatch);

I needed to improve my boundary matching to [\s\.\?\!] and make these edge matches positive lookahead and lookbehind.

回答1:

\b is a zero-length match which can occur between two characters in the string, where one is a word character and the other is not a word character. Word character is defined as [A-Za-z0-9_]*. < is not a word character, that's why \b doesn't match.

You can try the following regex instead ((?: ) is a non-capturing parentheses group):

(?:\b|\s+)< low="" number="" low="">(?:\b|\s+)

*) Actually, this is not correct for all regex engines. To be precise, \b matches between \w and \W, where \w matches any word character. As Tim Pietzcker pointed out in the comment to this answer, the meaning of "word character" differs between implementations, but I don't know any where \w matches < or >.



回答2:

I think you're trying to do the following:

\s< low="" number="" low="">\s