BeautifulSoup 4, findNext() function

2020-06-19 06:03发布

问题:

I'm playing with BeautifulSoup 4 and I have this html code:

</tr>
          <tr>
<td id="freistoesse">Giraffe</td>
<td>14</td>
<td>7</td>
</tr>

I want to match both values between <td> tags so here 14 and 7.

I tried this:

giraffe = soup.find(text='Giraffe').findNext('td').text

but this only matches 14. How can I match both values with this function?

回答1:

Use find_all instead of findNext:

import bs4 as bs
content = '''\
<tr>
<td id="freistoesse">Giraffe</td>
<td>14</td>
<td>7</td>
</tr>'''
soup = bs.BeautifulSoup(content)

for td in soup.find('td', text='Giraffe').parent.find_all('td'):
    print(td.text)

yields

Giraffe
14
7

Or, you could use find_next_siblings (also known as fetchNextSiblings):

for td in soup.find(text='Giraffe').parent.find_next_siblings():
    print(td.text)

yields

14
7

Explanation:

Note that soup.find(text='Giraffe') returns a NavigableString.

In [30]: soup.find(text='Giraffe')
Out[30]: u'Giraffe'

To get the associated td tag, use

In [31]: soup.find('td', text='Giraffe')
Out[31]: <td id="freistoesse">Giraffe</td>

or

In [32]: soup.find(text='Giraffe').parent
Out[32]: <td id="freistoesse">Giraffe</td>

Once you have the td tag, you could use find_next_siblings:

In [35]: soup.find(text='Giraffe').parent.find_next_siblings()
Out[35]: [<td>14</td>, <td>7</td>]

PS. BeautifulSoup has added method names that use underscores instead of CamelCase. They do the same thing, but comform to the PEP8 style guide recommendations. Thus, prefer find_next_siblings over fetchNextSiblings.