I have a set of words say -- apple, orange, pear , banana, kiwi
I want to check if a sentence contains any of the above listed words, and If it does , I want to find which word matched. How can I accomplish this in Regex ?
I am currently calling String.indexOf() for each of my set of words. I am assuming this is not as efficient as a regex matching?
I don't think a regexp will do a better job in terms of performance but you can use it as follow:
The best way to see which method is more efficient is to test it.
You can use
String.contains()
instead ofString.indexOf()
to simplify your non-regexp code.To search for different words the Regular Expression looks like this:
The
|
works as anOR
in Regular Expressions.My very simple test code looks like this:
The results I got were as follows:
Obviously timings will vary depending on the number of words being searched for and the Strings being searched, but
contains()
does seem to be ~10 times faster than regular expressions for a simple search like this.By using Regular Expressions to search for Strings inside another String you're using a sledgehammer to crack a nut so I guess we shouldn't be surprised that it's slower. Save Regular Expressions for when the patterns you want to find are more complex.
One case where you may want to use Regular Expressions is if
indexOf()
andcontains()
won't do the job because you only want to match whole words and not just substrings, e.g. you want to matchpear
but notspears
. Regular Expressions handle this case well as they have the concept of word boundaries.In this case we'd change our pattern to:
The
\b
says to only match the beginning or end of a word and the brackets group the OR expressions together.Note, when defining this pattern in your code you need to escape the backslashes with another backslash:
Here is the most simple solution I found (matching with wildcards):