BeautifulSoup raise AttributeError when xml tag na

2020-04-21 02:49发布

I'm trying to get all the XML attributes for the tag Name.

Getting this error:

AttributeError: 'NoneType' object has no attribute 'attrs'

when I executed the following code:

import BeautifulSoup as bs

xml = '''
<Product Code="1" HighPic="http://upload.wikimedia.org/wikipedia/commons/thumb/5/5f/Linksys48portswitch.jpg/220px-Linksys48portswitch.jpg" HighPicHeight="320" HighPicSize="37217" HighPicWidth="400" ID="35" Title="Demo Product">
<Category ID="23">
<Name ID="57" Value="Switches" langid="1"/>
</Category>
</Product>'''

doc = bs.BeautifulSoup(xml)
div = doc.find("Name")

for attr, val in div.attrs:
    print "%s:%s" % (attr, val)

I changed the tag "Name" to "name", and then it works.

Why am I getting this error when the tag name contains capital letters?

2条回答
一夜七次
2楼-- · 2020-04-21 03:47

In BeautifulSoup 4, you can use

doc = bs.BeautifulSoup(xml, "xml")
div = doc.find("Name")

This should work.

查看更多
Root(大扎)
3楼-- · 2020-04-21 03:51

BeautifulSoup is a HTML-parsing library, primarily. It can handle XML too, but all tags are lowercased as per the HTML specification. Quoting the BeautifulSoup documentation:

Because HTML tags and attributes are case-insensitive, all three HTML parsers convert tag and attribute names to lowercase. That is, the markup <TAG></TAG> is converted to <tag></tag>. If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to parse the document as XML.

There is a XML modus where tags are matches case-sensitively and are not lowercased, but this requires the lxml library to be installed. Because lxml is a C-extension library, this is not supported on the Google App Engine.

Use the ElementTree API instead:

import xml.etree.ElementTree as ET

root = ET.fromstring(xml)
div = root.find('.//Name')

for attr, val in div.items():
     print "%s:%s" % (attr, val)
查看更多
登录 后发表回答