I have the following value in a string variable in Java which has UTF-8 characters encoded like below
Dodd\u2013Frank
instead of
Dodd–Frank
(Assume that I don't have control over how this value is assigned to this string variable)
Now how do I convert (encode) it properly and store it back in a String
variable?
I found the following code
Charset.forName("UTF-8").encode(str);
But this returns a ByteBuffer
, but I want a String
back.
Edit:
Some more additional information.
When I use System.out.println(str);
I get
Dodd\u2013Frank
I am not sure what is the correct terminology (UTF-8 or unicode). Pardon me for that.
try
str = org.apache.commons.lang3.StringEscapeUtils.unescapeJava(str);
from Apache Commons Lang
You can take advantage of the fact that java.util.Properties supports strings with '\uXXXX' escape sequences and do something like this:
Properties p = new Properties();
p.load(new StringReader("key="+yourInputString));
System.out.println("Escaped value: " + p.getProperty("key"));
Inelegant, but functional.
Suppose you have a Unicode value, such as 00B0 (degree symbol, or superscript 'o', as in Spanish abbreviation for 'primero')
Here is a function that does just what you want:
public static String unicodeToString( char charValue )
{
Character ch = new Character( charValue );
return ch.toString();
}
I used StringEscapeUtils.unescapeXml
to unescape the string loaded from an API that gives XML result.
You can convert that byte buffer to String like this :
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.ByteBuffer
public static CharsetDecoder decoder = CharsetDecoder.newDecoder();
public static String byteBufferToString(ByteBuffer buffer)
{
String data = "";
try
{
// EDITOR'S NOTE -- There is no 'position' method for ByteBuffer.
// As such, this is pseudocode.
int old_position = buffer.position();
data = decoder.decode(buffer).toString();
// reset buffer's position to its original so it is not altered:
buffer.position(old_position);
}
catch (Exception e)
{
e.printStackTrace();
return "";
}
return data;
}
Perhaps the following solution which decodes the string correctly without any additional dependencies.
This works in a scala repl, though should work just as good in Java only solution.
import java.nio.charset.StandardCharsets
import java.nio.charset.Charset
> StandardCharsets.UTF_8.decode(Charset.forName("UTF-8").encode("Dodd\u2013Frank"))
res: java.nio.CharBuffer = Dodd–Frank