How do I convert strings representing code points to the appropriate character?
For example, I want to have a function which gets U+00E4
and returns ä
.
I know that in the character class I have a function toChars(int codePoint)
which takes an integer but there is no function which takes a string of this type.
Is there a built in function or do I have to do some transformation on the string to get the integer which I can send to the function?
Code points are written as hexadecimal numbers prefixed by U+
So,you can do this
int codepoint=Integer.parseInt(yourString.substring(2),16);
char[] ch=Character.toChars(codepoint);
"\u00E4"
new String(new int[] { 0x00E4 }, 0, 1);
Converted from Kotlin:
public String codepointToString(int cp) {
StringBuilder sb = new StringBuilder();
if (Character.isBmpCodePoint(cp)) {
sb.append((char) cp);
} else if (Character.isValidCodePoint(cp)) {
sb.append(Character.highSurrogate(cp));
sb.append(Character.lowSurrogate(cp));
} else {
sb.append('?');
}
return sb.toString();
}
this example does not use char[].
// this code is Kotlin, but you can write same thing in Java
val sb = StringBuilder()
val cp :Int // codepoint
when {
Character.isBmpCodePoint(cp) -> sb.append(cp.toChar())
Character.isValidCodePoint(cp) -> {
sb.append(Character.highSurrogate(cp))
sb.append(Character.lowSurrogate(cp))
}
else -> sb.append('?')
}
The question asked for a function to convert a string value representing a Unicode code point (i.e. "+Unnnn"
rather than the Java formats of "\unnnn"
or "0xnnnn
). However, newer releases of Java have enhancements which simplify the processing of a string contain multiple code points in Unicode format:
- The introduction of Streams in Java 8.
- Method
public static String toString(int codePoint)
which was added to the Character
class in Java 11. It returns a String
rather than a char[]
, so Character.toString(0x00E4)
returns "ä"
.
Those enhancements allow a different approach to solving the issue raised in the OP. This method transforms a set of code points in Unicode format to a readable String
in a single statement:
void processUnicode() {
// Create a test string containing "Hello World