BeautifulSoup - search by text inside a tag

Observe the following problem:

import re
from bs4 import BeautifulSoup as BS

soup = BS("""
<a href="/customer-menu/1/accounts/1/update">

# This returns the <a> element

soup = BS("""
<a href="/customer-menu/1/accounts/1/update">
    <i class="fa fa-edit"></i> Edit

# This returns None

For some reason, BeautifulSoup will not match the text, when the <i> tag is there as well. Finding the tag and showing its text produces

>>> a2 = soup.find(
>>> print(repr(a2.text))
'\n Edit\n'

Right. According to the Docs, soup uses the match function of the regular expression, not the search function. So I need to provide the DOTALL flag:

pattern = re.compile('.*Edit.*')
pattern.match('\n Edit\n')  # Returns None

pattern = re.compile('.*Edit.*', flags=re.DOTALL)
pattern.match('\n Edit\n')  # Returns MatchObject

Alright. Looks good. Let's try it with soup

soup = BS("""
<a href="/customer-menu/1/accounts/1/update">
    <i class="fa fa-edit"></i> Edit

    text=re.compile(".*Edit.*", flags=re.DOTALL)
)  # Still return None... Why?!


My solution based on geckons answer: I implemented these helpers:

import re

MATCH_ALL = r'.*'

def like(string):
    Return a compiled regular expression that matches the given
    string with any prefix and postfix, e.g. if string = "hello",
    the returned regex matches r".*hello.*"
    string_ = string
    if not isinstance(string_, str):
        string_ = str(string_)
    regex = MATCH_ALL + re.escape(string_) + MATCH_ALL
    return re.compile(regex, flags=re.DOTALL)

def find_by_text(soup, text, tag, **kwargs):
    Find the tag in soup that matches all provided kwargs, and contains the

    If no match is found, return None.
    If more than one match is found, raise ValueError.
    elements = soup.find_all(tag, **kwargs)
    matches = []
    for element in elements:
        if element.find(text=like(text)):
    if len(matches) > 1:
        raise ValueError("Too many matches:\n" + "\n".join(matches))
    elif len(matches) == 0:
        return None
        return matches[0]

Now, when I want to find the element above, I just run find_by_text(soup, 'Edit', 'a', href='/customer-menu/1/accounts/1/update')


The problem is that your <a> tag with the <i> tag inside, doesn't have the string attribute you expect it to have. First let's take a look at what text="" argument for find() does.

NOTE: The text argument is an old name, since BeautifulSoup 4.4.0 it's called string.

From the docs:

Although string is for finding strings, you can combine it with arguments that find tags: Beautiful Soup will find all tags whose .string matches your value for string. This code finds the tags whose .string is “Elsie”:

soup.find_all("a", string="Elsie")
# [<a href="" class="sister" id="link1">Elsie</a>]

Now let's take a look what Tag's string attribute is (from the docs again):

If a tag has only one child, and that child is a NavigableString, the child is made available as .string:

# u'The Dormouse's story'


If a tag contains more than one thing, then it’s not clear what .string should refer to, so .string is defined to be None:

# None

This is exactly your case. Your <a> tag contains a text and <i> tag. Therefore, the find gets None when trying to search for a string and thus it can't match.

How to solve this?

Maybe there is a better solution but I would probably go with something like this:

import re
from bs4 import BeautifulSoup as BS

soup = BS("""
<a href="/customer-menu/1/accounts/1/update">
    <i class="fa fa-edit"></i> Edit

links = soup.find_all('a', href="/customer-menu/1/accounts/1/update")

for link in links:
    if link.find(text=re.compile("Edit")):
        thelink = link


I think there are not too many links pointing to /customer-menu/1/accounts/1/update so it should be fast enough.


You can pass a function that return True if a text contains "Edit" to .find

In [51]: def Edit_in_text(tag):
   ....:     return == 'a' and 'Edit' in tag.text

In [52]: soup.find(Edit_in_text, href="/customer-menu/1/accounts/1/update")
<a href="/customer-menu/1/accounts/1/update">
<i class="fa fa-edit"></i> Edit


You can use the .get_text() method instead of the text in your function which gives the same result:

def Edit_in_text(tag):
    return == 'a' and 'Edit' in tag.get_text()


in one line using lambda

soup.find(lambda"a" and "Edit" in tag.text)