Is there a way to ignore the XML namespace in tage names in elementtree.ElementTree
?
I try to print all technicalContact
tags:
for item in root.getiterator(tag='{http://www.example.com}technicalContact'):
print item.tag, item.text
And I get something like:
{http://www.example.com}technicalContact blah@example.com
But what I really want is:
technicalContact blah@example.com
Is there a way to display only the suffix (sans xmlns), or better - iterate over the elements without explicitly stating xmlns?
You can define a generator to recursively search through your element tree in order to find tags which end with the appropriate tag name. For example, something like this:
def get_element_by_tag(element, tag):
if element.tag.endswith(tag):
yield element
for child in element:
for g in get_element_by_tag(child, tag):
yield g
This just checks for tags which end with tag
, i.e. ignoring any leading namespace. You can then iterate over any tag you want as follows:
for item in get_element_by_tag(elemettree, 'technicalContact'):
...
This generator in action:
>>> xml_str = """<root xmlns="http://www.example.com">
... <technicalContact>Test1</technicalContact>
... <technicalContact>Test2</technicalContact>
... </root>
... """
xml_etree = etree.fromstring(xml_str)
>>> for item in get_element_by_tag(xml_etree, 'technicalContact')
... print item.tag, item.text
...
{http://www.example.com}technicalContact Test1
{http://www.example.com}technicalContact Test2
I always end up by using something like
item.tag.split("}")[1][0:]