I'm playing with BeautifulSoup 4 and I have this html code:
</tr>
<tr>
<td id="freistoesse">Giraffe</td>
<td>14</td>
<td>7</td>
</tr>
I want to match both values between <td>
tags so here 14 and 7.
I tried this:
giraffe = soup.find(text='Giraffe').findNext('td').text
but this only matches 14
. How can I match both values with this function?
Use find_all
instead of findNext
:
import bs4 as bs
content = '''\
<tr>
<td id="freistoesse">Giraffe</td>
<td>14</td>
<td>7</td>
</tr>'''
soup = bs.BeautifulSoup(content)
for td in soup.find('td', text='Giraffe').parent.find_all('td'):
print(td.text)
yields
Giraffe
14
7
Or, you could use find_next_siblings
(also known as fetchNextSiblings
):
for td in soup.find(text='Giraffe').parent.find_next_siblings():
print(td.text)
yields
14
7
Explanation:
Note that soup.find(text='Giraffe')
returns a NavigableString.
In [30]: soup.find(text='Giraffe')
Out[30]: u'Giraffe'
To get the associated td
tag, use
In [31]: soup.find('td', text='Giraffe')
Out[31]: <td id="freistoesse">Giraffe</td>
or
In [32]: soup.find(text='Giraffe').parent
Out[32]: <td id="freistoesse">Giraffe</td>
Once you have the td
tag, you could use find_next_siblings
:
In [35]: soup.find(text='Giraffe').parent.find_next_siblings()
Out[35]: [<td>14</td>, <td>7</td>]
PS. BeautifulSoup has added method names that use underscores instead of CamelCase. They do the same thing, but comform to the PEP8 style guide recommendations. Thus, prefer find_next_siblings
over fetchNextSiblings
.