how to get tbody from table from python beautiful

2020-02-05 10:40发布


I'm trying to scrap Year & Winners ( first & second columns ) from "List of finals matches" table (second table) from I'm using the code below:

import urllib2
from BeautifulSoup import BeautifulSoup

url = ""
soup = BeautifulSoup(urllib2.urlopen(url).read())
for row in soup.findAll('table')[0].tbody.findAll('tr'):
    first_column = row.findAll('th')[0].contents
    third_column = row.findAll('td')[2].contents
    print first_column, third_column

With the above code, I was able to get first & thrid column just fine. But when I use the same code with, It could not find tbody as its element, but I can see the tbody when I inspect the element.

url = ""
soup = BeautifulSoup(urllib2.urlopen(url).read())

print soup.findAll('table')[2]

    for row in soup.findAll('table')[0].tbody.findAll('tr'):
        first_column = row.findAll('th')[0].contents
        third_column = row.findAll('td')[2].contents
        print first_column, third_column

Here's what I got from comment error:

AttributeError                            Traceback (most recent call last)
<ipython-input-150-fedd08c6da16> in <module>()
      7 # print soup.findAll('table')[2]
----> 9 soup.findAll('table')[2].tbody.findAll('tr')
     10 for row in soup.findAll('table')[0].tbody.findAll('tr'):
     11     first_column = row.findAll('th')[0].contents

AttributeError: 'NoneType' object has no attribute 'findAll'



If you are inspecting through the inspect tool in the browser it will insert the tbody tags.

The source code, may, or may not contain them. I suggest looking at the source view if you really want to know.

Either way, you do not need to traverse to the tbody, simply:

soup.findAll('table')[0].findAll('tr') should work.


url = ""
soup = BeautifulSoup(urllib2.urlopen(url).read())
for tr in soup.findAll('table')[2].findAll('tr'):
    #get data

And then search what you need in the table :)


Directly run the below code.

tr_elements = soup.find_all('table')[2].find_all('tr')

By doing this, you can access the all the <tr>; You will have to use for loop for doing this (There are other possible ways to iterate too). Don't try to find the tbody, it gets added by default.


If you are having a problem in getting to the desired tag, decompose the previous tags with .decompose() method.