I am trying to check whether an xml file contains the necessary xml declaration ("header"), let's say:
<?xml version="1.0" encoding="UTF-8"?>
...rest of xml file...
I am using xml ElementTree for reading and getting info out of the file, but it seems to load a file just fine even if it does not have the header.
What I tried so far is this:
import xml.etree.ElementTree as ET
tree = ET.parse(someXmlFile)
try:
xmlFile = ET.tostring(tree.getroot(), encoding='utf8').decode('utf8')
except:
sys.stderr.write("Wrong xml2 header\n")
exit(31)
if re.match(r"^\s*<\?xml version=\'1\.0\' encoding=\'utf8\'\?>\s+", xmlFile) is None:
sys.stderr.write("Wrong xml1 header\n")
exit(31)
But the ET.tostring() function just "makes up" a header if it is not present in the file.
Is there any way to check for a xml header with ET? Or somehow throw an error while loading the file with ET.parse, if a file does not contain the xml header?
tl;dr
From Wikipedia's XML declaration
...
So even if the XML declaration is omitted in an XML document, the code-snippet:
will find "the" default XML declaration in this XML document. Please note, that I have used xmlFile.decode('utf-8') instead of xmlFile. If you don't worry to use
minidom
, you can use the following code-snippet:Here is a working fiddle Int bookstore-001.xml an XML declaration ist present, in bookstore-002.xml no XML declaration ist present and in bookstore-003.xml a different XML declaration than in the first example ist present. The
print
instruction prints accordingly the version and the encoding: