How do I get properly escaped XML in python etree

2019-02-23 08:42发布

问题:

I'm using python version 2.7.3.

test.txt:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <test>The tag &lt;StackOverflow&gt; is good to bring up at parties.</test>
</root>

Result:

>>> import xml.etree.ElementTree as ET
>>> e = ET.parse('test.txt')
>>> root = e.getroot()
>>> print root.find('test').text
The tag <StackOverflow> is good to bring up at parties.

As you can see, the parser must have changed the &lt;'s to <'s etc.

What I'd like to see:

The tag &lt;StackOverflow&gt; is good to bring up at parties.

Untouched, raw text. Sometimes I really like it raw. Uncooked.

I'd like to use this text as-is for display within HTML, therefore I don't want an XML parser to mess with it.

Do I have to re-escape each string or can there be another way?

回答1:

import xml.etree.ElementTree as ET
e = ET.parse('test.txt')
root = e.getroot()
print(ET.tostring(root.find('test')))

yields

<test>The tag &lt;StackOverflow&gt; is good to bring up at parties.</test>

Alternatively, you could escape the text with saxutils.escape:

import xml.sax.saxutils as saxutils
print(saxutils.escape(root.find('test').text))

yields

The tag &lt;StackOverflow&gt; is good to bring up at parties.