Basically, I have an array of keywords, and a piece of text. I am wondering what would be the best way to find out if any of those keywords are present in the text, bearing in mind performance issues.
I was thinking of just looping over the array and doing a strpos() for each keyword, but with well over ten thousand words in the array, it takes PHP a bit of time to do it, and so I was wondering if there is a more efficient way to do it.
Depending on the size of the string You could use a hash to make it faster.
First iterate the text. For each word, assign it to an array:
Then iterate the keywords checking the $string:
EDIT If your text is much smaller than your keywords, do it backwards, check the keywords for each word in the text. This would likley be faster than the above if the text is pretty short.
You could dump the text into an array and do a array_intersect_key on the two arrays. I am not sure of the performance of this though...
I really don't know if it is more efficient, but you could try to put them all in a regex like this: (keyword1|keyword2|...) With the preg_quote function you can escape the keywords for the regex. If you set the compiled option, it might be more efficient when using it with multiple strings.
Assuming the formatting and only that you care if any (not which) of the keywords exist, you could try something like:
Working off eWolf's idea...
You don't have to check for boundaries if you don't want to, but if you do just surround the group (the
()
characters) with \b then it'll match only a given word. And you'll want to make sure all the array's members are preg_quoted, for safety.