I'm using jsoup to do some xml processing. Problem is, it is replacing xml entities, ie.: »
with html entities: »
How could I keep original (xml) entities?
Groovy script:
import org.jsoup.Jsoup
import org.jsoup.nodes.Document
import org.jsoup.nodes.Entities
import org.jsoup.parser.Parser
String HTML_STRING = '''
<html>
<div></div>
<div>Some text »</div>
</html>
'''
Document doc = Jsoup.parse(new ByteArrayInputStream(HTML_STRING.getBytes("UTF-8")), "UTF-8", "", Parser.xmlParser())
doc.outputSettings().charset("UTF-8")
doc.outputSettings().escapeMode(Entities.EscapeMode.base)
println doc.toString()
Result:
<html>
<div></div>
<div>
Some text »
</div>
</html>
If I use Entities.EscapeMode.xhtml
the result is:
<html>
<div></div>
<div>
Some text »
</div>
</html>
Thanks.