I have a XML file that contains non-standard characters (like a weird "quote").
I read the XML using UTF-8 / ISO / ascii + unmarshalled it:
BufferedReader br = new BufferedReader(new InputStreamReader(
(conn.getInputStream()),"ISO-8859-1"));
String output;
StringBuffer sb = new StringBuffer();
while ((output = br.readLine()) != null) {
//fetch XML
sb.append(output);
}
try {
jc = JAXBContext.newInstance(ServiceResponse.class);
Unmarshaller unmarshaller = jc.createUnmarshaller();
ServiceResponse OWrsp = (ServiceResponse) unmarshaller
.unmarshal(new InputSource(new StringReader(sb.toString())));
I have a oracle function that will take iso-8859-1 codes, and converts/maps them to "literal" symbols. i.e: "’" => "left single quote"
JAXB unmarshal using iso, displays the characters with iso conversion fine. i.e all weird single quotes will be encoded to "’"
so suppose my string is: class of 10–11‐year‐olds (note the weird - between 11 and year)
jc = JAXBContext.newInstance(ScienceProductBuilderInfoType.class);
Marshaller m = jc.createMarshaller();
m.setProperty(Marshaller.JAXB_ENCODING, "ISO-8859-1");
//save a temp file
File file2 = new File("tmp.xml");
this will save in file :
class of 10–11‐year‐olds. (what i want..so file saving works!)
[side note: i have read the file using java file reader, and it out puts the above string fine]
the issue i have is that the STRING representation using jaxb unmarshaller has weird output, for some reason i cannot seem to get the string to represent –.
when I 1: check the xml unmarshalled output:
class of 10?11?year?olds
2: the File output:
class of 10–11‐year‐olds
i even tried to read the file from the saved XML, and then unmarshal that (in hopes of getting the – in my string)
String sCurrentLine;
BufferedReader br = new BufferedReader(new FileReader("tmp.xml"));
StringBuffer sb = new StringBuffer();
while ((sCurrentLine = br.readLine()) != null) {
sb.append(sCurrentLine);
}
ScienceProductBuilderInfoType rsp = (ScienceProductBuilderInfoType) unm
.unmarshal(new InputSource(new StringReader(sb.toString())));
no avail.
any ideas how to get the iso-8859-1 encoded character in jaxb?
Solved: using this tibid code found on stackoverflow
HtmlEncoder.escapeNonLatin(MYSTRING)