I have the following value in a string variable in Java which has UTF-8 characters encoded like below
Dodd\u2013Frank
instead of
Dodd–Frank
(Assume that I don't have control over how this value is assigned to this string variable)
Now how do I convert (encode) it properly and store it back in a String
variable?
I found the following code
Charset.forName("UTF-8").encode(str);
But this returns a ByteBuffer
, but I want a String
back.
Edit:
Some more additional information.
When I use System.out.println(str);
I get
Dodd\u2013Frank
I am not sure what is the correct terminology (UTF-8 or unicode). Pardon me for that.
You can take advantage of the fact that java.util.Properties supports strings with '\uXXXX' escape sequences and do something like this:
Inelegant, but functional.
I used
StringEscapeUtils.unescapeXml
to unescape the string loaded from an API that gives XML result.try
from Apache Commons Lang
Suppose you have a Unicode value, such as 00B0 (degree symbol, or superscript 'o', as in Spanish abbreviation for 'primero')
Here is a function that does just what you want:
Perhaps the following solution which decodes the string correctly without any additional dependencies.
This works in a scala repl, though should work just as good in Java only solution.
You can convert that byte buffer to String like this :