I am using BeautifulSoup to look for user entered strings on a specific page. For example, I want to see if the string 'Python' is located on the page: http://python.org
When I used:
find_string = soup.body.findAll(text='Python')
find_string returned []
But when I used:
find_string = soup.body.findAll(text=re.compile('Python'), limit=1)
find_string returned [u'Python Jobs']
as expected
What is the difference between these two statements that makes the second statement work when there are more than one instances of the word to be searched
text='Python'
searches for elements that have the exact text you provided:Output
"To see if the string 'Python' is located on the page http://python.org":
If you need to find a position of substring within a string you could do
html.find('Python')
.I have not used BeuatifulSoup but maybe the following can help in some tiny way.
I'm not suggesting this is a replacement but maybe you can glean some value in the concept until a direct answer comes along.
The following line is looking for the exact NavigableString 'Python':
Note that the following NavigableString is found:
Note this behaviour:
So your regexp is looking for an occurrence of 'Python' not the exact match to the NavigableString 'Python'.