How to find elements by class

2019-01-01 11:57发布

I'm having trouble parsing html elements with "class" attribute using Beautifulsoup. The code looks like this

soup = BeautifulSoup(sdata)
mydivs = soup.findAll('div')
for div in mydivs: 
    if (div["class"]=="stylelistrow"):
        print div

I get an error on the same line "after" the script finishes.

File "./beautifulcoding.py", line 130, in getlanguage
  if (div["class"]=="stylelistrow"):
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 599, in __getitem__
   return self._getAttrMap()[key]
KeyError: 'class'

How do I get rid or this error?

9条回答
情到深处是孤独
2楼-- · 2019-01-01 12:33

How to find elements by class

I'm having trouble parsing html elements with "class" attribute using Beautifulsoup.

You can easily find by one class, but if you want to find by the intersection of two classes, it's a little more difficult,

From the documentation (emphasis added):

If you want to search for tags that match two or more CSS classes, you should use a CSS selector:

css_soup.select("p.strikeout.body")
# [<p class="body strikeout"></p>]

To be clear, this selects only the p tags that are both strikeout and body class.

To find for the intersection of any in a set of classes (not the intersection, but the union), you can give a list to the class_ keyword argument (as of 4.1.2):

soup = BeautifulSoup(sdata)
class_list = ["stylelistrow"] # can add any other classes to this list.
# will find any divs with any names in class_list:
mydivs = soup.find_all('div', class_=class_list) 

Also note that findAll has been renamed from the camelCase to the more Pythonic find_all.

查看更多
何处买醉
3楼-- · 2019-01-01 12:48

Update: 2016 In the latest version of beautifulsoup, the method 'findAll' has been renamed to 'find_all'. Link to official documentation

List of method names changed

Hence the answer will be

soup.find_all("html_element", class_="your_class_name")
查看更多
还给你的自由
4楼-- · 2019-01-01 12:48

Specific to BeautifulSoup 3:

soup.findAll('div',
             {'class': lambda x: x 
                       and 'stylelistrow' in x.split()
             }
            )

Will find all of these:

<div class="stylelistrow">
<div class="stylelistrow button">
<div class="button stylelistrow">
查看更多
登录 后发表回答