I need to convert unicode string to string which have non-ascii characters encoded in unicode. For example, string "漢字 Max" should be presented as "\u6F22\u5B57 Max".
What I have tried:
Differenct combinations of
new String(sourceString.getBytes(encoding1), encoding2)
Apache StringEscapeUtils which escapes also ascii chars like double-quote
Is there an easy way to encode such string? Ideally only Java 6 SE or Apache Commons should be used to achieve desired result.
This is the kind of simple code Jon Skeet had in mind in his comment:
final String in = "šđčćasdf";
final StringBuilder out = new StringBuilder();
for (int i = 0; i < in.length(); i++) {
final char ch = in.charAt(i);
if (ch <= 127) out.append(ch);
else out.append("\\u").append(String.format("%04x", (int)ch));
As Jon said, surrogate pairs will be represented as a pair of \u
Guava Escaper Based Solution:
This escapes any non-ASCII characters into Unicode escape sequences.
import static java.lang.String.format;
import com.google.common.escape.CharEscaper;
public class NonAsciiUnicodeEscaper extends CharEscaper
protected char[] escape(final char c)
if (c >= 32 && c <= 127) { return new char[]{c}; }
else { return format("\\u%04x", (int) c).toCharArray(); }