Regexp to ignore hyphenated words during common wo

2019-09-03 23:09发布

问题:

I've got this regular expression which removes common words($commonWords) from a string($input) an I would like to tweak it so that it ignores hyphenated words as these sometimes contain common words.

return preg_replace('/\b('.implode('|',$commonWords).')\b/i','',$input);

thanks

回答1:

Try

return preg_replace('/(?<!-)\b('.implode('|',$commonWords).')\b(?!-)/i','',$input);

This adds negative lookaround expressions to the start and end of the regex so that a match is only allowed if there is no dash before or after the match.



回答2:

preg_replace('/\b('.implode('|',$commonWords).'|\w-\w)\b/i','',$input);

\w Any word character (letter, number, underscore) it'll remove all all the commonwords, AND all the words who've a hyphene.



回答3:

return preg_replace('/(?<![-\'"])\b('.implode('|',$commonWords).')\b(?![-'"])i','',$input);

The above will work if we have more symbols to be escaped.