JAXB outputting invalid XML when data contains non

2020-06-17 16:29发布

问题:

I'm using JAXB 2.2.5 to output Xml from a JAXB Model, the data is populated from the database and occasionally the database contains non-displayable characters that it should not such as

0x1a 

If it does then JAXB outputs invalid Xml by just outputting this char as is, shouldn't it escape it or something ?

Update

I wonder if there are any implementations that do fix this problem, maybe Eclipselink MOXy does ?

EDIT

I tried the workaround that fixes the illegal char issue however it changes the output in an undesirable way.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><metadata created="2013-02-27T11:40:04.009Z" xmlns="http://musicbrainz.org/ns/mmd-2.0#" xmlns:ext="http://musicbrainz.org/ns/ext#-2.0"><cdstub-list count="1" offset="0"><cdstub id="w237dKURKperVfmckD5b_xo8BO8-" ext:score="100"><title>fred</title><artist></artist><track-list count="5"/></cdstub></cdstub-list></metadata>

to

<?xml version="1.0" ?><metadata xmlns:ext="http://musicbrainz.org/ns/ext#-2.0" xmlns="http://musicbrainz.org/ns/mmd-2.0#" created="2013-02-27T11:39:15.394Z"><cdstub-list count="1" offset="0"><cdstub id="w237dKURKperVfmckD5b_xo8BO8-" ext:score="100"><title>fred</title><artist></artist><track-list count="5"></track-list></cdstub></cdstub-list></metadata>

i.e <track-list count="5"/> has become <track-list count="5"></track-list>which is undesirable, I'm not sure why it is doing this.

回答1:

It is apparently a common problem - and marked as a bug JAXB generates illegal XML characters.

You can find a workaround at Escape illegal characters



回答2:

Another solution is to use Apache Commons Lang to remove the invalid XML characters:

import org.apache.commons.lang3.StringEscapeUtils;

String xml = "<root>content with some invalid characters...</root>";
xml = StringEscapeUtils.unescapeXml(StringEscapeUtils.escapeXml10(xml));

The escapeXml10 method will escape the String and remove the invalid characters. The unescapeXml method will undo the escaping. The end result being the same XML but with the invalid XML characters removed.



回答3:

Simply replace character with any or space in message content. If you don't want to use extra jar or third party things, you can try below method for it:

String msgContent = "......";// string with some illegal character
msgContent = msgContent .replaceALL("\\P{Print}","_");

At this example, replaceALL method replace unprintable characters with underscore. So your msgContent will be just printable characters and that prevent JAXB from illegal characters.