I have a list of words - "foo"
, "bar"
, "baz"
- and I want to write a regexp which would match strings which contain at least 2 of them. E.g., "foo baz"
should match while "ba foo z"
should not.
The obvious solution "(foo|bar|baz).*(foo|bar|baz)"
works, but I find it unsatisfactory because it lists the words twice. What if I have 25 words instead of just 3? What if I am looking for strings which contain at least 4 given words instead of just 2?
It didn't sound like you were looking for exact words, so Donkey's solution might not be what you want
((foo|bar|baz).*?){2}
It searches the text for any of those strings, then any character until one of those optional strings is found again, and since the lazy any character part will be fulfilled by matching nothing, the match is complete.
If you want it to match over multiple lines, be sure to either turn on dot all, or use \s\S instead of dot.
I think this solution should work:
"(foo|bar|baz).*\s+\1(\s+|$)"
The \s
means that a space character is expected to make sure you find the exact word and not just a prefix. For instance, "foo ... fooo"
is not recognized.