Python lxml iterfind w/ namespace but prefix=None

2020-06-18 09:02发布

问题:

I want to perform iterfind() for elements which have a namespace but no prefix. I'd like to call

iterfind([tagname]) or iterfind([tagname], [namespace dict])

I don't care to enter the tag as follows every time:

"{%s}tagname" % tree.nsmap[None]

Details

I'm running through an xml response from a Google API. The root node defines several namespaces, including one for which there is no prefix: xmlns="http://www.w3.org/2005/Atom"

It looks as though when I try to search through my etree, everything behaves as I would expect for elements with a prefix. e.g.:

>>> for x in root.iterfind('dxp:segment'): print x
...
<Element {http://schemas.google.com/analytics/2009}segment at 0x1211b98>
<Element {http://schemas.google.com/analytics/2009}segment at 0x1211d78>
<Element {http://schemas.google.com/analytics/2009}segment at 0x1211a08>
>>>

But when I try to search for something without a prefix, the search doesn't automatically add the namespace for root.nsmap[None]. e.g.:

>>> for x in root.iterfind('entry'): print x
...
>>>

Even if I try to throw the namespace map in as the optional argument for iterfind, It won't attach the namespace.

回答1:

Try this:

for x in root.iterfind('{http://www.w3.org/2005/Atom}entry'):
    print x

For more information: read the docs: http://lxml.de/tutorial.html#namespaces

If you do not want to type that, and you want to provide a namespace map, you always have to use a prefix, like this for example:

nsmap = {'atom': 'http://www.w3.org/2005/Atom'}
for x in root.iterfind('atom:entry', namespaces=nsmap):
    print x

(same thing goes if you want to use xpath)

What prefix is used in the document, if any, is not important, it's about you specifying the fully qualified name of the element, either writing it out complete with URI using the curly bracket notation, or using a prefix that is mapped to a URI.