I'm wanting to match a list of words which is easy enough when those words are truly words. For example /\b (pop|push) \b/gsx
when ran against the string
pop gave the door a push but it popped back
will match the words pop and push but not popped.
I need similar functionality for words that contain characters that would normally qualify as word boundaries. So I need /\b (reverse!|push) \b/gsx
when ran against the string
push reverse! reverse!push
to only match reverse! and push but not match reverse!push. Obviously this regex isn't going to do that so what do I need to use instead of \b to make my regex smart enough to handle these funky requirements?
Your first problem is that you need three (possibly four) cases in your alternation, not two.
/\breverse!(?:\s|$)/
reverse! by itself/\bpush\b/
push by itself/\breverse!push\b/
together/\bpushreverse!(?:\s|$)/
this is the possible caseYour second problem is that a
\b
won't match after a"!"
because it is not a\w
. Here is what Perl 5 has to say about\b
, you may want to consult your docs to see if they agree:So, the regex that you need is something like
I left out the
/s
because there are not periods in this regex, so treat as single line makes no sense. If/s
doesn't mean treat as a single line in your engine you should probably add it back. Also, you should read up on how your engine handles alternation. I know in Perl 5 to get the right behaviour you must arrange the items this way (otherwise reverse! would always win over reverse!push).You can replace \b by something equivalent, but less strict:
This way the limiting factor of the
\b
(that it can only match before or after an actual\w
word character) is removed.Now white space or the start/end of the string function as valid separators, and the inner expression can be easily built at run-time, from a list of search terms for example.
At the end of a word, \b means "the previous character was a word character, and the next character (if there is a next character) is not a word character. You want to drop the first condition because there might be a non-word character at the end of the "word". That leaves you with a negative lookahead:
I'm pretty sure AS3 regexes support lookahead.