I have a string which comes via an xml , and it is text in German. The characters that are German specific are encoded via the UTF-8 format. Before display the string I need to decode it.
I have tried the following:
try {
BufferedReader in = new BufferedReader(
new InputStreamReader(
new ByteArrayInputStream(nodevalue.getBytes()), "UTF8"));
event.attributes.put("title", in.readLine());
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
I have also tried this:
try {
event.attributes.put("title", URLDecoder.decode(nodevalue, "UTF-8"));
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
None of them are working. How do I decode the German string
thank you in advance.
UDPDATE:
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
// TODO Auto-generated method stub
super.characters(ch, start, length);
if (nodename != null) {
String nodevalue = String.copyValueOf(ch, 0, length);
if (nodename.equals("startdat")) {
if (event.attributes.get("eventid").equals("187")) {
}
}
if (nodename.equals("startscreen")) {
imageaddress = nodevalue;
}
else {
if (nodename.equals("title")) {
// try {
// BufferedReader in = new BufferedReader(
// new InputStreamReader(
// new ByteArrayInputStream(nodevalue.getBytes()), "UTF8"));
// event.attributes.put("title", in.readLine());
// } catch (UnsupportedEncodingException e) {
// // TODO Auto-generated catch block
// e.printStackTrace();
// } catch (IOException e) {
// // TODO Auto-generated catch block
// e.printStackTrace();
// }
// try {
// event.attributes.put("title",
// URLDecoder.decode(nodevalue, "UTF-8"));
// } catch (UnsupportedEncodingException e) {
// // TODO Auto-generated catch block
// e.printStackTrace();
// }
event.attributes.put("title", StringEscapeUtils
.unescapeHtml(new String(ch, start, length).trim()));
} else
event.attributes.put(nodename, nodevalue);
}
}
}
You could use the String constructor with the charset parameter:
Also, since you get the data from an xml document, and I assume it is encoded UTF-8, probably the problem is in parsing it.
You should use
InputStream
/InputSource
instead of aXMLReader
implementation, because it comes with the encoding. So if you're getting this data from a http response, you could either use bothInputStream
andInputSource
or just the
InputStream
:Update 1
Here is a sample of a complete request and response handling:
Update 2
As the problem is not the encoding but the source xml being escaped to html entities, the best solution is (besides correcting the php to do not escape the response), to use the apache.commons.lang library's very handy
static StringEscapeUtils class
.After importing the library, in your xml handler's
characters
method you put the following:Update 3
In your last code the problem is with the initialization of the
nodevalue
variable. It should be: