I would like to use regular expression that matches if a sentence contains one of the words that I am looking for.
All of these are matching now which is not correct. I used " " for all words in words
(like " seven "
) but this time it doesn't match if a word is at the end of the string.
words = ('seven', 'eight')
regex = re.compile('|'.join(words))
print regex.search('aaaaaasd seven asdfadsf') #1 - should match
print regex.search('AAAsevenAAA') #2 - shouldn't match
print regex.search('AAA eightaaa') #3 - shouldn't match
print regex.search('eight aaa') #4 - should match
print regex.search('aaaa eight') #5 - should match
How can I make that my regular expression doesn't match if matching word is one of the words' substring (like #2 and #3 above)?
As @CasimiretHippolyte pointed out you want to add word boundaries. If you don't want to manually do this for each word in your list, you need to modify your compiled regular expression.
Note: If you have escape sequences in your regex, it's best to use raw string notation. By using a non-capturing
(?:...)
group, this allows your words to be grouped with word boundaries placed around them, otherwise it places a boundary at the very beginning and the very end.Ideone Demo