I'm searching for a library (Apache / BSD / EPL licensed) to convert native text to ASCII using \u for characters not available in ASCII (basically what java.util.Properties does).
I had a look and there don't seem to be any readily available libraries. I found:
- JDK, tools.jar, native2ascii
- Properties.saveConvert() (private method)
- http://www.koders.com/java/fidD26ED81BEBE41932C405904AD53AEE8459BB8509.aspx (GPL)
Is anyone aware of a library under the above stated licenses?
You can do this with an CharsetEncoder. You have to read the 'native' Text with the correct encoding to unicode. Than you can use an 'US-ASCII'-encoder to detect, which characters are to be translated into unicode escapes.
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
import org.junit.Test;
public class EncodeToEscapes {
@Test
public void testEncoding() {
final String src = "Hallo äöü"; // this has to be read with the right encoding
final CharsetEncoder asciiEncoder = Charset.forName("US-ASCII").newEncoder();
final StringBuilder result = new StringBuilder();
for (final Character character : src.toCharArray()) {
if (asciiEncoder.canEncode(character)) {
result.append(character);
} else {
result.append("\\u");
result.append(Integer.toHexString(0x10000 | character).substring(1).toUpperCase());
}
}
System.out.println(result);
}
}
Additionally org.apache.commons:commons-lang contains StringEscapeUtils.escapeJava() which can escape and unescape native strings.
Try this piece of code from Apache commons-lang:
StringEscapeUtils.escapeJava("ایران زیبای من");
StringEscapeUtils.unescapeJava("\u0627\u06CC\u0631\u0627\u0646 \u0632\u06CC\u0628\u0627\u06CC \u0645\u0646");