My file contains the following data:
Original:
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <changefreq>daily</changefreq> <loc>http://www.example.com</loc></url></urlset>
Expected:
<?xml version="1.0" encoding="UTF-8"?><urlset> <url> <changefreq>daily</changefreq> <loc>http://www.example.com</loc></url></urlset>
I use etree to parse the file and I want to remove the attribute from the root element 'urlset'
import xml.etree.ElementTree as ET
tree = ET.parse("/Users/hsyang/Downloads/VI-0-11-14-2016_20.xml")
root = tree.getroot()
print root.attrib
>> {}
root.attrib.pop("xmlns", None)
print root.attrib
>> {}
ET.tostring(root)
I thought I was supposed to get {xmlns:"http://www.sitemaps.org/schemas/sitemap/0.9"} when i print root.attrib the first time but I got an empty dictionary. Can someone help?
Appreciate it!
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
looks like a regular attribute but it is a special case, namely a namespace declaration.
Removing, adding, or modifying namespaces can be quite hard. "Normal" attributes are stored in an element's writable attrib
property. Namespace mappings on the other hand are not readily available via the API (in the lxml library, elements do have a nsmap
property, but it is read-only).
I suggest a simple textual search-and-replace operation, similar to the answer to Modify namespaces in a given xml document with lxml. Something like this:
with open("input.xml", "r") as infile, open("output.xml", "w") as outfile:
data = infile.read()
data = data.replace(' xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"', '')
outfile.write(data)
See also How to insert namespace and prefixes into an XML string with Python?.
In standard library xml.etree.ElementTree there is no special method to remove an attribute, but all attributes are stored in a attrib
which is a dict
and any attribute can be removed from attrib
as a key from a dict
:
import xml.etree.ElementTree as ET
tree = ET.parse(file_path)
root = tree.getroot()
print(root.attrib) # {'xyz': '123'}
root.attrib.pop("xyz", None) # None is to not raise an exception if xyz does not exist
print(root.attrib) # {}
ET.tostring(root)
'<urlset> <url> <changefreq>daily</changefreq> <loc>http://www.example.com</loc></url></urlset>'